WEBVTT 00:00:01.520 --> 00:00:04.400 hello everyone my name is toniang 00:00:04.400 --> 00:00:07.200 I've been using amax for about 10 years 00:00:07.200 --> 00:00:09.280 today I'm going to talk about 360 00:00:09.280 --> 00:00:11.519 a new imax package that allows ems to 00:00:11.519 --> 00:00:13.759 pass multiple programming languages 00:00:13.759 --> 00:00:17.840 in real time 00:00:17.840 --> 00:00:21.840 so what is the problem statement 00:00:21.840 --> 00:00:23.359 in order to support programming 00:00:23.359 --> 00:00:24.960 functionalities for a particular 00:00:24.960 --> 00:00:25.760 language 00:00:25.760 --> 00:00:27.680 a text editor needs to have some degree 00:00:27.680 --> 00:00:29.679 of language understanding 00:00:29.679 --> 00:00:31.840 traditionally text editors have relied 00:00:31.840 --> 00:00:33.840 very heavily on regular expressions for 00:00:33.840 --> 00:00:34.960 this 00:00:34.960 --> 00:00:38.320 e-max is no different most language 00:00:38.320 --> 00:00:39.280 major modes use 00:00:39.280 --> 00:00:40.879 regular expressions for syntax 00:00:40.879 --> 00:00:42.960 highlighting code navigation 00:00:42.960 --> 00:00:46.239 folding indexing and so on regular 00:00:46.239 --> 00:00:47.440 expressions are 00:00:47.440 --> 00:00:50.559 problematic for a couple of reasons 00:00:50.559 --> 00:00:53.600 they're slow and inaccurate they also 00:00:53.600 --> 00:00:54.000 make 00:00:54.000 --> 00:00:56.800 the code hard to read and write 00:00:56.800 --> 00:00:57.440 sometimes 00:00:57.440 --> 00:00:59.199 it's because the regular expressions 00:00:59.199 --> 00:01:01.199 themselves are very hairy 00:01:01.199 --> 00:01:04.000 and sometimes because they are just not 00:01:04.000 --> 00:01:05.199 powerful enough 00:01:05.199 --> 00:01:07.840 some helper code is usually needed to 00:01:07.840 --> 00:01:11.200 pass more intricate language features 00:01:11.200 --> 00:01:13.280 that also illustrates the core problem 00:01:13.280 --> 00:01:16.159 with regular expressions 00:01:16.159 --> 00:01:18.400 in that they are not powerful enough to 00:01:18.400 --> 00:01:21.119 pass programming languages 00:01:21.119 --> 00:01:22.640 an example feature that regular 00:01:22.640 --> 00:01:25.040 expressions cannot handle very well 00:01:25.040 --> 00:01:27.520 is string interpolation which is a very 00:01:27.520 --> 00:01:28.320 common feature 00:01:28.320 --> 00:01:31.680 in many modern programming languages 00:01:31.680 --> 00:01:34.079 it would be much nicer if image somehow 00:01:34.079 --> 00:01:35.840 had structural understanding of source 00:01:35.840 --> 00:01:36.479 code 00:01:36.479 --> 00:01:39.520 like ides do 00:01:39.520 --> 00:01:41.119 there have been multiple efforts to 00:01:41.119 --> 00:01:42.960 bring this kind of programming language 00:01:42.960 --> 00:01:45.280 understanding into Emacs 00:01:45.280 --> 00:01:47.119 there are language specific persons 00:01:47.119 --> 00:01:48.640 written in elise 00:01:48.640 --> 00:01:50.240 they can be thought of as the next 00:01:50.240 --> 00:01:52.320 logical step of the glue code on top 00:01:52.320 --> 00:01:54.960 of tribal expressions moving from 00:01:54.960 --> 00:01:56.000 partial local 00:01:56.000 --> 00:01:58.079 pattern recognition into a full-fledged 00:01:58.079 --> 00:01:59.840 parser 00:01:59.840 --> 00:02:01.439 the most prominent example of this 00:02:01.439 --> 00:02:03.040 approach is probably the famous 00:02:03.040 --> 00:02:06.479 js2 mode 00:02:06.479 --> 00:02:10.080 however this approach has several issues 00:02:10.080 --> 00:02:12.959 parsing is computationally expensive and 00:02:12.959 --> 00:02:13.680 imagine 00:02:13.680 --> 00:02:16.800 is not good at that kind of stuff 00:02:16.800 --> 00:02:18.400 furthermore maintenance is very 00:02:18.400 --> 00:02:20.840 troublesome in order to work on these 00:02:20.840 --> 00:02:22.160 process 00:02:22.160 --> 00:02:23.599 first you have to know at least well 00:02:23.599 --> 00:02:25.599 enough and then you have to be 00:02:25.599 --> 00:02:27.760 comfortable with writing a 00:02:27.760 --> 00:02:30.319 recursive ascendant parser while 00:02:30.319 --> 00:02:32.080 constantly keeping up with changes to 00:02:32.080 --> 00:02:34.000 the language itself 00:02:34.000 --> 00:02:36.879 which can be evolving very quickly like 00:02:36.879 --> 00:02:39.360 javascript for example 00:02:39.360 --> 00:02:41.599 together these constraints significantly 00:02:41.599 --> 00:02:45.680 reduce the pull of potential maintenance 00:02:45.680 --> 00:02:47.760 the biggest issue though in my opinion 00:02:47.760 --> 00:02:49.680 is lack of the set of generic 00:02:49.680 --> 00:02:52.879 and reusable apis this makes them very 00:02:52.879 --> 00:02:54.319 hard to use 00:02:54.319 --> 00:02:55.920 for minor modes that want to deal with 00:02:55.920 --> 00:02:57.920 cross-cutting concerns across multiple 00:02:57.920 --> 00:02:59.920 languages 00:02:59.920 --> 00:03:01.760 the other approach which has been 00:03:01.760 --> 00:03:03.599 gaining a lot of momentum in recent 00:03:03.599 --> 00:03:04.319 years 00:03:04.319 --> 00:03:06.560 is externalizing language understanding 00:03:06.560 --> 00:03:08.159 to another process 00:03:08.159 --> 00:03:12.239 also known as language server protocol 00:03:12.239 --> 00:03:14.480 this second approach is actually a very 00:03:14.480 --> 00:03:16.560 interesting one 00:03:16.560 --> 00:03:18.400 my decoupling language understanding 00:03:18.400 --> 00:03:21.280 from the editing facility itself 00:03:21.280 --> 00:03:23.760 the usb servers can attract a lot more 00:03:23.760 --> 00:03:25.120 contributors 00:03:25.120 --> 00:03:28.959 which makes maintenance easier however 00:03:28.959 --> 00:03:32.400 they also have several issues available 00:03:32.400 --> 00:03:34.720 being a separate process they are 00:03:34.720 --> 00:03:36.000 usually more resource 00:03:36.000 --> 00:03:39.920 intensive and depending on the language 00:03:39.920 --> 00:03:42.159 the usb server itself can bring with it 00:03:42.159 --> 00:03:44.640 a host of additional dependencies 00:03:44.640 --> 00:03:47.680 external to Emacs which may message to 00:03:47.680 --> 00:03:50.640 install and manage 00:03:50.640 --> 00:03:53.760 furthermore json over rpc has pretty 00:03:53.760 --> 00:03:55.120 high latency 00:03:55.120 --> 00:03:57.840 for one-off tasks like jumping to source 00:03:57.840 --> 00:04:00.879 or on-demand completion is great 00:04:00.879 --> 00:04:03.040 but for things like code highlighting 00:04:03.040 --> 00:04:06.000 the latency is just too much 00:04:06.000 --> 00:04:08.319 I was using rust and I was following the 00:04:08.319 --> 00:04:10.480 community effort to improve its id 00:04:10.480 --> 00:04:11.760 support 00:04:11.760 --> 00:04:13.680 hoping to integrate some of that into 00:04:13.680 --> 00:04:15.760 Emacs itself 00:04:15.760 --> 00:04:17.600 then I heard someone from community 00:04:17.600 --> 00:04:19.759 mention tree sitter 00:04:19.759 --> 00:04:23.360 and I decided to check it out 00:04:23.360 --> 00:04:25.520 basically trisita is an incremental 00:04:25.520 --> 00:04:28.720 parsing library and a parser generator 00:04:28.720 --> 00:04:31.000 it was introduced by the item editor in 00:04:31.000 --> 00:04:33.040 2018 00:04:33.040 --> 00:04:35.680 besides item is also being integrated 00:04:35.680 --> 00:04:36.960 into the neo-vim 00:04:36.960 --> 00:04:41.040 editor and github is using it to power 00:04:41.040 --> 00:04:42.479 their source code analysis and 00:04:42.479 --> 00:04:45.840 navigation features 00:04:45.840 --> 00:04:48.639 it is written in c and can be compiled 00:04:48.639 --> 00:04:49.199 for all 00:04:49.199 --> 00:04:53.120 major platforms it can even be compiled 00:04:53.120 --> 00:04:56.080 to web assembly to run on the web that's 00:04:56.080 --> 00:04:57.600 how github is using it 00:04:57.600 --> 00:05:00.800 on their website 00:05:00.800 --> 00:05:02.960 so why is trisita an interesting 00:05:02.960 --> 00:05:05.840 solution to this problem 00:05:05.840 --> 00:05:07.360 there are multiple features that make it 00:05:07.360 --> 00:05:10.000 an attractive option 00:05:10.000 --> 00:05:12.400 it is designed to be fast by being 00:05:12.400 --> 00:05:13.680 incremental 00:05:13.680 --> 00:05:15.680 the initial parts of a typical big fight 00:05:15.680 --> 00:05:18.160 can take tens of milliseconds 00:05:18.160 --> 00:05:20.240 while subsequent incremental processes 00:05:20.240 --> 00:05:22.560 are sub milliseconds 00:05:22.560 --> 00:05:24.720 it achieves this by using structural 00:05:24.720 --> 00:05:26.240 sharing 00:05:26.240 --> 00:05:29.360 meaning replacing only affected nodes 00:05:29.360 --> 00:05:32.960 in the old tree when it needs to 00:05:32.960 --> 00:05:36.000 also unlike lsp being in the same 00:05:36.000 --> 00:05:37.120 process 00:05:37.120 --> 00:05:40.639 it has much lower latency 00:05:40.639 --> 00:05:42.880 secondly it provides a uniform 00:05:42.880 --> 00:05:44.960 programming interface 00:05:44.960 --> 00:05:47.039 the same data structures and functions 00:05:47.039 --> 00:05:48.720 work on parse trees of different 00:05:48.720 --> 00:05:50.400 languages 00:05:50.400 --> 00:05:52.160 syntax knows of different languages 00:05:52.160 --> 00:05:54.160 differ only by their types 00:05:54.160 --> 00:05:57.360 and their possible child nodes this 00:05:57.360 --> 00:05:58.960 is a big advantage over language 00:05:58.960 --> 00:06:02.240 specific parcels 00:06:02.240 --> 00:06:04.880 thirdly it's written in self-contained 00:06:04.880 --> 00:06:06.880 embeddable c 00:06:06.880 --> 00:06:09.680 as I mentioned previously it can even be 00:06:09.680 --> 00:06:10.400 compiled 00:06:10.400 --> 00:06:13.759 to webassembly this makes integrating it 00:06:13.759 --> 00:06:15.199 into various editors 00:06:15.199 --> 00:06:18.240 quite easy without having to install 00:06:18.240 --> 00:06:22.880 any external dependencies 00:06:22.880 --> 00:06:24.639 one thing that is not mentioned here is 00:06:24.639 --> 00:06:28.000 that being a parcel generator 00:06:28.000 --> 00:06:31.039 scrummers are declarative 00:06:31.039 --> 00:06:34.880 together with being editor independent 00:06:34.880 --> 00:06:36.720 this makes the pool of potential 00:06:36.720 --> 00:06:38.160 contributors 00:06:38.160 --> 00:06:42.400 much larger so I was convinced 00:06:42.400 --> 00:06:45.520 that trisito is a good fit for Emacs 00:06:45.520 --> 00:06:48.000 last year I started writing the bindings 00:06:48.000 --> 00:06:48.720 using 00:06:48.720 --> 00:06:50.960 dynamic model support introduced in imax 00:06:50.960 --> 00:06:53.280 25. 00:06:53.280 --> 00:06:55.360 dynamic module means there is platform 00:06:55.360 --> 00:06:58.479 specific native code involved 00:06:58.479 --> 00:07:00.560 but since they are pre-compiled binaries 00:07:00.560 --> 00:07:02.880 for the three major platforms 00:07:02.880 --> 00:07:06.319 it should work in most places currently 00:07:06.319 --> 00:07:08.319 the core functionalities are in a pretty 00:07:08.319 --> 00:07:09.440 good shape 00:07:09.440 --> 00:07:12.560 syntax highlighting is working nicely 00:07:12.560 --> 00:07:14.840 the whole thing is split into three 00:07:14.840 --> 00:07:16.080 packages 00:07:16.080 --> 00:07:17.759 tree sitter is the main package that 00:07:17.759 --> 00:07:20.319 other packages should depend on 00:07:20.319 --> 00:07:22.800 tree system lens is the language bundle 00:07:22.800 --> 00:07:24.000 that includes support 00:07:24.000 --> 00:07:27.199 for most common languages 00:07:27.199 --> 00:07:30.080 and finally the core apis are in the 00:07:30.080 --> 00:07:32.160 package tsc 00:07:32.160 --> 00:07:36.160 which stands for trees the core 00:07:36.160 --> 00:07:38.800 it is the implicit dependency of the 00:07:38.800 --> 00:07:43.520 three-seater package 00:07:43.520 --> 00:07:46.000 the main package includes the miner mode 00:07:46.000 --> 00:07:47.520 3-seater mode 00:07:47.520 --> 00:07:49.840 this provides the base for other major 00:07:49.840 --> 00:07:52.560 or minor modes to build on 00:07:52.560 --> 00:07:55.280 using image change tracking hooks it 00:07:55.280 --> 00:07:55.840 enables 00:07:55.840 --> 00:07:58.080 incremental parsing and provides a 00:07:58.080 --> 00:08:00.800 syntax tree that is always up to date 00:08:00.800 --> 00:08:04.080 after any edits in a buffer 00:08:04.080 --> 00:08:06.560 there is also a basic debug mode that 00:08:06.560 --> 00:08:10.080 shows the parse tree in another buffer 00:08:10.080 --> 00:08:13.360 here is a quick demo 00:08:13.360 --> 00:08:15.759 here I mean an empty python buffer with 00:08:15.759 --> 00:08:17.520 three seater enabled 00:08:17.520 --> 00:08:19.440 I'm going to turn on the debug mode to 00:08:19.440 --> 00:08:26.560 see the parse tree 00:08:26.560 --> 00:08:28.720 since the buffer is empty there is only 00:08:28.720 --> 00:08:30.639 one node in the syntax tree the top 00:08:30.639 --> 00:08:33.279 level module node 00:08:33.279 --> 00:09:11.040 let's try typing some code 00:09:11.040 --> 00:09:13.600 as you can see as I type into the python 00:09:13.600 --> 00:09:14.640 buffer 00:09:14.640 --> 00:09:19.120 the syntax tree updates in real time 00:09:19.120 --> 00:09:21.120 the other minor mode included in the 00:09:21.120 --> 00:09:23.279 main package is 3-seater 00:09:23.279 --> 00:09:26.640 hl mode it overrides font-lock mode and 00:09:26.640 --> 00:09:28.480 provides its own set of phases 00:09:28.480 --> 00:09:31.839 and customization options it is query 00:09:31.839 --> 00:09:32.800 driven 00:09:32.800 --> 00:09:35.200 that means instead of regular 00:09:35.200 --> 00:09:36.240 expressions 00:09:36.240 --> 00:09:38.720 it uses a list like query language to 00:09:38.720 --> 00:09:40.320 map syntax notes 00:09:40.320 --> 00:09:43.760 to highlighting phrases I'm going to 00:09:43.760 --> 00:09:45.760 open a python file with small snippets 00:09:45.760 --> 00:09:54.320 that showcase syntax highlighting 00:09:54.320 --> 00:09:55.920 so this is the default highlighting 00:09:55.920 --> 00:10:00.880 provided by python mode 00:10:00.880 --> 00:10:02.839 this is the highlighting enabled by tree 00:10:02.839 --> 00:10:04.640 sitter 00:10:04.640 --> 00:10:07.680 as you can see string interpolation 00:10:07.680 --> 00:10:11.680 and decorators are highlighted correctly 00:10:11.680 --> 00:10:17.440 function calls are also highlighted 00:10:17.440 --> 00:10:20.240 you can also note that property 00:10:20.240 --> 00:10:21.839 assessors 00:10:21.839 --> 00:10:24.640 and property assignments are highlighted 00:10:24.640 --> 00:10:27.440 differently 00:10:27.440 --> 00:10:29.360 what I like the most about this is that 00:10:29.360 --> 00:10:30.880 new bindings are consistently 00:10:30.880 --> 00:10:32.640 highlighted 00:10:32.640 --> 00:10:36.320 this included local variable 00:10:36.320 --> 00:10:39.760 function parameters and property 00:10:39.760 --> 00:10:45.760 mutations 00:10:45.760 --> 00:10:48.000 before going through the three queries 00:10:48.000 --> 00:10:49.279 and the syntax highlighting 00:10:49.279 --> 00:10:51.680 customization options 00:10:51.680 --> 00:10:53.760 let's take a brief look at the core data 00:10:53.760 --> 00:10:55.040 structures and functions 00:10:55.040 --> 00:10:58.079 that tree sitter provides 00:10:58.079 --> 00:10:59.839 so parsing is done with the help of a 00:10:59.839 --> 00:11:02.240 generic parser object 00:11:02.240 --> 00:11:04.160 a single parser object can be used to 00:11:04.160 --> 00:11:06.000 pass different languages 00:11:06.000 --> 00:11:08.320 by sending different language objects to 00:11:08.320 --> 00:11:09.279 it 00:11:09.279 --> 00:11:10.880 the language objects themselves are 00:11:10.880 --> 00:11:14.079 loaded from shared libraries 00:11:14.079 --> 00:11:16.079 since three seater mode already handles 00:11:16.079 --> 00:11:17.360 the parsing part 00:11:17.360 --> 00:11:19.440 we will instead focus on the functions 00:11:19.440 --> 00:11:20.800 that inspect nodes 00:11:20.800 --> 00:11:25.279 and in the resulting path tree 00:11:25.279 --> 00:11:27.200 we can ask tree sitter what is the 00:11:27.200 --> 00:11:44.240 syntax node at point 00:11:44.240 --> 00:11:47.200 uh is it an opaque object so this is not 00:11:47.200 --> 00:11:48.480 very useful 00:11:48.480 --> 00:12:03.760 we can instead ask what is its type 00:12:03.760 --> 00:12:06.560 so his type is the symbol comparison 00:12:06.560 --> 00:12:08.959 operator 00:12:08.959 --> 00:12:11.600 trees there are two kinds of nodes 00:12:11.600 --> 00:12:13.680 anonymous nodes and named nodes 00:12:13.680 --> 00:12:15.519 anonymous nodes correspond to simple 00:12:15.519 --> 00:12:17.040 grammar elements 00:12:17.040 --> 00:12:19.839 like keywords operators punctuations and 00:12:19.839 --> 00:12:21.279 so on 00:12:21.279 --> 00:12:24.160 name nodes on the other hand grammar 00:12:24.160 --> 00:12:25.920 elements that are interesting enough for 00:12:25.920 --> 00:12:26.639 their own 00:12:26.639 --> 00:12:30.320 to have a name like an identifier an 00:12:30.320 --> 00:12:31.839 expression 00:12:31.839 --> 00:12:35.440 or a function definition 00:12:35.440 --> 00:12:37.760 name node types are symbols while 00:12:37.760 --> 00:12:42.639 anonymous node types are strings 00:12:42.639 --> 00:12:46.320 for example if we are on this 00:12:46.320 --> 00:12:49.760 comparison operator 00:12:49.760 --> 00:12:55.920 the node type should be a string 00:12:55.920 --> 00:12:57.920 we can also get other information about 00:12:57.920 --> 00:12:58.959 the node 00:12:58.959 --> 00:13:09.680 for example what is this text 00:13:09.680 --> 00:13:20.800 or where it is in the buffer 00:13:20.800 --> 00:13:43.199 or what is its parent 00:13:43.199 --> 00:13:46.160 there are many other apis to query or 00:13:46.160 --> 00:13:46.839 not 00:13:46.839 --> 00:13:52.639 properties 00:13:52.639 --> 00:13:54.399 tree sitter allows searching for 00:13:54.399 --> 00:13:58.240 structural patterns within a parse tree 00:13:58.240 --> 00:14:01.440 it does so through a list like language 00:14:01.440 --> 00:14:03.519 this language supports by the matching 00:14:03.519 --> 00:14:04.639 by node types 00:14:04.639 --> 00:14:07.760 field names and predicates 00:14:07.760 --> 00:14:10.079 it also allows capturing nodes for 00:14:10.079 --> 00:14:12.639 further processing 00:14:12.639 --> 00:14:37.680 let's try to see some examples 00:14:37.680 --> 00:14:41.040 so in this very simple query we just 00:14:41.040 --> 00:14:43.839 try to highlight all the identifiers in 00:14:43.839 --> 00:14:49.040 the buffer 00:14:49.040 --> 00:14:51.920 this s side tells trisito to capture a 00:14:51.920 --> 00:14:53.120 node 00:14:53.120 --> 00:14:55.839 in the context of the query builder it's 00:14:55.839 --> 00:14:57.360 not very important 00:14:57.360 --> 00:15:00.320 but in normal highlighting query this 00:15:00.320 --> 00:15:01.760 will determine 00:15:01.760 --> 00:15:06.639 the face used to highlight the note 00:15:06.639 --> 00:15:08.800 suppose we want to capture all the 00:15:08.800 --> 00:15:10.320 function names 00:15:10.320 --> 00:15:13.519 instead of just any identifier 00:15:13.519 --> 00:15:29.440 you can improve the query like this 00:15:29.440 --> 00:15:31.600 uh this will highlight the whole 00:15:31.600 --> 00:15:32.639 definition 00:15:32.639 --> 00:15:35.519 but we only want to capture the function 00:15:35.519 --> 00:15:36.399 name 00:15:36.399 --> 00:15:39.600 which means the identifier 00:15:39.600 --> 00:15:42.800 here so we 00:15:42.800 --> 00:15:46.320 move the capture to after the identifier 00:15:46.320 --> 00:15:49.600 node 00:15:49.600 --> 00:15:51.759 if we want to capture the class names as 00:15:51.759 --> 00:15:52.959 well 00:15:52.959 --> 00:16:10.079 we just add another pattern 00:16:10.079 --> 00:16:20.320 let's look at a more practical example 00:16:20.320 --> 00:16:22.959 here we can see that single quotes 00:16:22.959 --> 00:16:23.759 strings and 00:16:23.759 --> 00:16:25.600 double quotes screens are highlighted 00:16:25.600 --> 00:16:27.279 the same 00:16:27.279 --> 00:16:30.399 but in some places 00:16:30.399 --> 00:16:33.440 because of some coding conventions 00:16:33.440 --> 00:16:35.440 it may be desirable to highlight them 00:16:35.440 --> 00:16:37.279 differently for example if 00:16:37.279 --> 00:16:39.680 the string is single quoted we may want 00:16:39.680 --> 00:16:40.880 to highlight it 00:16:40.880 --> 00:16:44.399 as a constant 00:16:44.399 --> 00:16:46.160 let's try to see whether we can 00:16:46.160 --> 00:16:47.600 distinguish these 00:16:47.600 --> 00:16:56.240 two cases 00:16:56.240 --> 00:17:00.639 so here we get all the strings 00:17:00.639 --> 00:17:04.079 if we want to see if it's single quotes 00:17:04.079 --> 00:17:04.559 or 00:17:04.559 --> 00:17:08.799 double quote strings 00:17:08.799 --> 00:17:11.039 we can try looking at the first 00:17:11.039 --> 00:17:12.480 character 00:17:12.480 --> 00:17:15.280 of the string I mean the first character 00:17:15.280 --> 00:17:16.720 of the note 00:17:16.720 --> 00:17:19.360 to check whether it's a single quote or 00:17:19.360 --> 00:17:33.600 a double quote 00:17:33.600 --> 00:17:36.080 yeah so for that we use the three 00:17:36.080 --> 00:17:36.799 setters 00:17:36.799 --> 00:17:40.160 support for predicate in this case 00:17:40.160 --> 00:17:43.360 we use a match predicate 00:17:43.360 --> 00:17:46.080 to check whether the string where the 00:17:46.080 --> 00:17:46.799 note 00:17:46.799 --> 00:17:50.320 starts with a single quote and with this 00:17:50.320 --> 00:17:51.280 pattern 00:17:51.280 --> 00:17:58.840 we only capture the single quotes 00:17:58.840 --> 00:18:00.400 strings 00:18:00.400 --> 00:18:03.760 let's try to give it a different face 00:18:03.760 --> 00:18:13.039 so we copy the pattern 00:18:13.039 --> 00:18:18.640 and we add this pattern 00:18:18.640 --> 00:18:25.120 pop item only 00:18:25.120 --> 00:18:28.400 but we also want to give the 00:18:28.400 --> 00:18:31.440 capture a different name 00:18:31.440 --> 00:18:40.840 let's say we want to highlight it as a 00:18:40.840 --> 00:18:46.559 keyword 00:18:46.559 --> 00:19:06.320 and now if we refresh the buffer 00:19:06.320 --> 00:19:08.799 we see that single quote strings are 00:19:08.799 --> 00:19:10.320 highlighted as 00:19:10.320 --> 00:19:14.400 keywords 00:19:14.400 --> 00:19:16.400 the highlighting patterns can also be 00:19:16.400 --> 00:19:19.200 set for a single project 00:19:19.200 --> 00:19:23.440 using directory local variable 00:19:23.440 --> 00:19:26.880 for example let's take a look at 00:19:26.880 --> 00:19:35.760 ems source code 00:19:35.760 --> 00:19:40.400 so in image c source there are a lot of 00:19:40.400 --> 00:19:43.760 uses of these different macros 00:19:43.760 --> 00:19:47.679 to define functions 00:19:47.679 --> 00:19:51.200 and you can see 00:19:51.200 --> 00:19:53.520 this is actually the function name but 00:19:53.520 --> 00:19:55.760 it's highlighted as the 00:19:55.760 --> 00:19:59.120 string so what we want 00:19:59.120 --> 00:20:03.679 is to somehow recognize this pattern 00:20:03.679 --> 00:20:07.600 and highlight it 00:20:07.600 --> 00:20:11.280 as highlight this part 00:20:11.280 --> 00:20:14.559 with the function phase instead 00:20:14.559 --> 00:20:17.679 in order to do that 00:20:17.679 --> 00:20:20.240 we put a pattern in this project 00:20:20.240 --> 00:20:21.760 directory local 00:20:21.760 --> 00:20:31.760 settings file 00:20:31.760 --> 00:20:34.799 so we can put this button in the c 00:20:34.799 --> 00:20:40.159 mode section 00:20:40.159 --> 00:20:48.000 and now if we enable tree sitter 00:20:48.000 --> 00:20:50.480 you can see that this is the highlighted 00:20:50.480 --> 00:20:53.200 uh 00:20:53.200 --> 00:20:55.520 as a normal function definition so this 00:20:55.520 --> 00:20:56.559 is the function 00:20:56.559 --> 00:21:01.200 face like we wanted 00:21:01.200 --> 00:21:03.760 the pattern for this is actually pretty 00:21:03.760 --> 00:21:07.200 simple 00:21:07.200 --> 00:21:10.720 it's only 00:21:10.720 --> 00:21:14.720 only this part so 00:21:14.720 --> 00:21:17.440 if it's a function call where the name 00:21:17.440 --> 00:21:19.679 of the function is different 00:21:19.679 --> 00:21:21.600 then we highlight the different as a 00:21:21.600 --> 00:21:24.240 keyword 00:21:24.240 --> 00:21:27.360 and then the first string element we 00:21:27.360 --> 00:21:28.159 highlighted 00:21:28.159 --> 00:21:35.360 as a function name 00:21:35.360 --> 00:21:37.679 since the language objects are actually 00:21:37.679 --> 00:21:39.280 native code 00:21:39.280 --> 00:21:40.799 they have to be compiled for each 00:21:40.799 --> 00:21:43.440 platform that we want to support 00:21:43.440 --> 00:21:45.600 this will become a big obstacle for 00:21:45.600 --> 00:21:48.159 3-seater adoption 00:21:48.159 --> 00:21:50.240 therefore I've created a language window 00:21:50.240 --> 00:21:52.960 package 3-seater length 00:21:52.960 --> 00:21:54.960 that takes care of pre-compiling the 00:21:54.960 --> 00:21:56.320 grammars the 00:21:56.320 --> 00:21:59.679 most common grammars for all three major 00:21:59.679 --> 00:22:01.600 platforms 00:22:01.600 --> 00:22:04.080 it also takes care of distributing these 00:22:04.080 --> 00:22:05.360 binaries 00:22:05.360 --> 00:22:08.080 and provides some highlighting queries 00:22:08.080 --> 00:22:11.440 for some of the languages 00:22:11.440 --> 00:22:13.760 it should be noted that this package 00:22:13.760 --> 00:22:15.919 should be treated as a temporary 00:22:15.919 --> 00:22:19.919 distribution mechanism only 00:22:19.919 --> 00:22:22.240 to help with bootstrapping three-seaters 00:22:22.240 --> 00:22:24.720 adoption 00:22:24.720 --> 00:22:27.760 the plan is that eventually these files 00:22:27.760 --> 00:22:29.760 should be provided by the language major 00:22:29.760 --> 00:22:32.480 modes themselves 00:22:32.480 --> 00:22:35.120 but in order to do that we need better 00:22:35.120 --> 00:22:36.320 tooling 00:22:36.320 --> 00:22:40.240 so we're not there yet 00:22:40.240 --> 00:22:42.559 since the call already works reasonably 00:22:42.559 --> 00:22:43.280 well 00:22:43.280 --> 00:22:44.640 there are several areas that would 00:22:44.640 --> 00:22:46.320 benefit from the community's 00:22:46.320 --> 00:22:49.120 contribution 00:22:49.120 --> 00:22:51.520 so three seaters upstream language 00:22:51.520 --> 00:22:52.640 prepositories 00:22:52.640 --> 00:22:54.400 already contain highlighting queries on 00:22:54.400 --> 00:22:55.679 their own 00:22:55.679 --> 00:22:58.480 however they are pretty basic and they 00:22:58.480 --> 00:23:00.480 may not fit well with existing emax 00:23:00.480 --> 00:23:02.559 conventions 00:23:02.559 --> 00:23:04.320 therefore the language bundle has its 00:23:04.320 --> 00:23:07.120 own set of highlighting queries 00:23:07.120 --> 00:23:10.559 this requires maintenance until language 00:23:10.559 --> 00:23:11.600 measurements adopt 00:23:11.600 --> 00:23:13.760 three sitter and maintain the queries on 00:23:13.760 --> 00:23:16.640 their own 00:23:16.640 --> 00:23:18.480 the queries are actually quite easy to 00:23:18.480 --> 00:23:22.000 write as you've already seen 00:23:22.000 --> 00:23:24.240 you just need to be familiar with the 00:23:24.240 --> 00:23:25.360 language 00:23:25.360 --> 00:23:30.000 familiar enough to come up with sensible 00:23:30.000 --> 00:23:35.200 highlighting patterns 00:23:35.200 --> 00:23:37.600 and if you are a maintainer of a 00:23:37.600 --> 00:23:39.679 language major mode 00:23:39.679 --> 00:23:42.320 you may want to consider integrating 00:23:42.320 --> 00:23:43.360 tree sitter into 00:23:43.360 --> 00:23:46.960 your mode initially maybe as an 00:23:46.960 --> 00:23:50.080 optional feature the integration is 00:23:50.080 --> 00:23:53.279 actually pretty straightforward 00:23:53.279 --> 00:23:56.640 especially for syntax highlighting 00:23:56.640 --> 00:24:01.520 or alternatively 00:24:01.520 --> 00:24:03.760 you can also try writing a new major 00:24:03.760 --> 00:24:04.640 mode 00:24:04.640 --> 00:24:08.000 from scratch that relies on tree sitter 00:24:08.000 --> 00:24:12.559 from the very beginning 00:24:12.559 --> 00:24:16.320 the code for such a major mode is 00:24:16.320 --> 00:24:19.679 quite simple for example 00:24:19.679 --> 00:24:23.200 this is the proposed 00:24:23.200 --> 00:24:26.240 what mode for web assembly 00:24:26.240 --> 00:24:31.039 the code is just 00:24:31.039 --> 00:24:34.559 like one page of code not 00:24:34.559 --> 00:24:39.520 not a lot 00:24:39.520 --> 00:24:42.720 you can also try writing new minor modes 00:24:42.720 --> 00:24:46.559 or writing integration packages 00:24:46.559 --> 00:24:50.080 for example a lot of package a lot of 00:24:50.080 --> 00:24:50.880 packages 00:24:50.880 --> 00:24:54.559 may benefit from tree sitter integration 00:24:54.559 --> 00:24:58.840 but no one has written the integration 00:24:58.840 --> 00:25:02.960 yet 00:25:02.960 --> 00:25:05.039 if you are interested in 3-seater you 00:25:05.039 --> 00:25:06.720 can use these links to 00:25:06.720 --> 00:25:10.320 learn more about it I think that's it 00:25:10.320 --> 00:25:11.440 for me today 00:25:11.440 --> 00:25:18.159 I'm happy to answer any questions