From b98df6fbe2a5c48013cfca81a95a5af41e202d07 Mon Sep 17 00:00:00 2001 From: Sacha Chua Date: Sun, 13 Dec 2020 00:06:32 -0500 Subject: Actually post subtitles, I think --- ...-emacs-tree-sitter--tuan-anh-nguyen-autogen.vtt | 1522 ++++++++++++++++++++ 1 file changed, 1522 insertions(+) create mode 100644 2020/subtitles/emacsconf-2020--23-incremental-parsing-with-emacs-tree-sitter--tuan-anh-nguyen-autogen.vtt (limited to '2020/subtitles/emacsconf-2020--23-incremental-parsing-with-emacs-tree-sitter--tuan-anh-nguyen-autogen.vtt') diff --git a/2020/subtitles/emacsconf-2020--23-incremental-parsing-with-emacs-tree-sitter--tuan-anh-nguyen-autogen.vtt b/2020/subtitles/emacsconf-2020--23-incremental-parsing-with-emacs-tree-sitter--tuan-anh-nguyen-autogen.vtt new file mode 100644 index 00000000..99133c78 --- /dev/null +++ b/2020/subtitles/emacsconf-2020--23-incremental-parsing-with-emacs-tree-sitter--tuan-anh-nguyen-autogen.vtt @@ -0,0 +1,1522 @@ +WEBVTT + +00:00:01.520 --> 00:00:04.400 +hello everyone my name is toniang + +00:00:04.400 --> 00:00:07.200 +I've been using amax for about 10 years + +00:00:07.200 --> 00:00:09.280 +today I'm going to talk about 360 + +00:00:09.280 --> 00:00:11.519 +a new imax package that allows ems to + +00:00:11.519 --> 00:00:13.759 +pass multiple programming languages + +00:00:13.759 --> 00:00:17.840 +in real time + +00:00:17.840 --> 00:00:21.840 +so what is the problem statement + +00:00:21.840 --> 00:00:23.359 +in order to support programming + +00:00:23.359 --> 00:00:24.960 +functionalities for a particular + +00:00:24.960 --> 00:00:25.760 +language + +00:00:25.760 --> 00:00:27.680 +a text editor needs to have some degree + +00:00:27.680 --> 00:00:29.679 +of language understanding + +00:00:29.679 --> 00:00:31.840 +traditionally text editors have relied + +00:00:31.840 --> 00:00:33.840 +very heavily on regular expressions for + +00:00:33.840 --> 00:00:34.960 +this + +00:00:34.960 --> 00:00:38.320 +e-max is no different most language + +00:00:38.320 --> 00:00:39.280 +major modes use + +00:00:39.280 --> 00:00:40.879 +regular expressions for syntax + +00:00:40.879 --> 00:00:42.960 +highlighting code navigation + +00:00:42.960 --> 00:00:46.239 +folding indexing and so on regular + +00:00:46.239 --> 00:00:47.440 +expressions are + +00:00:47.440 --> 00:00:50.559 +problematic for a couple of reasons + +00:00:50.559 --> 00:00:53.600 +they're slow and inaccurate they also + +00:00:53.600 --> 00:00:54.000 +make + +00:00:54.000 --> 00:00:56.800 +the code hard to read and write + +00:00:56.800 --> 00:00:57.440 +sometimes + +00:00:57.440 --> 00:00:59.199 +it's because the regular expressions + +00:00:59.199 --> 00:01:01.199 +themselves are very hairy + +00:01:01.199 --> 00:01:04.000 +and sometimes because they are just not + +00:01:04.000 --> 00:01:05.199 +powerful enough + +00:01:05.199 --> 00:01:07.840 +some helper code is usually needed to + +00:01:07.840 --> 00:01:11.200 +pass more intricate language features + +00:01:11.200 --> 00:01:13.280 +that also illustrates the core problem + +00:01:13.280 --> 00:01:16.159 +with regular expressions + +00:01:16.159 --> 00:01:18.400 +in that they are not powerful enough to + +00:01:18.400 --> 00:01:21.119 +pass programming languages + +00:01:21.119 --> 00:01:22.640 +an example feature that regular + +00:01:22.640 --> 00:01:25.040 +expressions cannot handle very well + +00:01:25.040 --> 00:01:27.520 +is string interpolation which is a very + +00:01:27.520 --> 00:01:28.320 +common feature + +00:01:28.320 --> 00:01:31.680 +in many modern programming languages + +00:01:31.680 --> 00:01:34.079 +it would be much nicer if image somehow + +00:01:34.079 --> 00:01:35.840 +had structural understanding of source + +00:01:35.840 --> 00:01:36.479 +code + +00:01:36.479 --> 00:01:39.520 +like ides do + +00:01:39.520 --> 00:01:41.119 +there have been multiple efforts to + +00:01:41.119 --> 00:01:42.960 +bring this kind of programming language + +00:01:42.960 --> 00:01:45.280 +understanding into Emacs + +00:01:45.280 --> 00:01:47.119 +there are language specific persons + +00:01:47.119 --> 00:01:48.640 +written in elise + +00:01:48.640 --> 00:01:50.240 +they can be thought of as the next + +00:01:50.240 --> 00:01:52.320 +logical step of the glue code on top + +00:01:52.320 --> 00:01:54.960 +of tribal expressions moving from + +00:01:54.960 --> 00:01:56.000 +partial local + +00:01:56.000 --> 00:01:58.079 +pattern recognition into a full-fledged + +00:01:58.079 --> 00:01:59.840 +parser + +00:01:59.840 --> 00:02:01.439 +the most prominent example of this + +00:02:01.439 --> 00:02:03.040 +approach is probably the famous + +00:02:03.040 --> 00:02:06.479 +js2 mode + +00:02:06.479 --> 00:02:10.080 +however this approach has several issues + +00:02:10.080 --> 00:02:12.959 +parsing is computationally expensive and + +00:02:12.959 --> 00:02:13.680 +imagine + +00:02:13.680 --> 00:02:16.800 +is not good at that kind of stuff + +00:02:16.800 --> 00:02:18.400 +furthermore maintenance is very + +00:02:18.400 --> 00:02:20.840 +troublesome in order to work on these + +00:02:20.840 --> 00:02:22.160 +process + +00:02:22.160 --> 00:02:23.599 +first you have to know at least well + +00:02:23.599 --> 00:02:25.599 +enough and then you have to be + +00:02:25.599 --> 00:02:27.760 +comfortable with writing a + +00:02:27.760 --> 00:02:30.319 +recursive ascendant parser while + +00:02:30.319 --> 00:02:32.080 +constantly keeping up with changes to + +00:02:32.080 --> 00:02:34.000 +the language itself + +00:02:34.000 --> 00:02:36.879 +which can be evolving very quickly like + +00:02:36.879 --> 00:02:39.360 +javascript for example + +00:02:39.360 --> 00:02:41.599 +together these constraints significantly + +00:02:41.599 --> 00:02:45.680 +reduce the pull of potential maintenance + +00:02:45.680 --> 00:02:47.760 +the biggest issue though in my opinion + +00:02:47.760 --> 00:02:49.680 +is lack of the set of generic + +00:02:49.680 --> 00:02:52.879 +and reusable apis this makes them very + +00:02:52.879 --> 00:02:54.319 +hard to use + +00:02:54.319 --> 00:02:55.920 +for minor modes that want to deal with + +00:02:55.920 --> 00:02:57.920 +cross-cutting concerns across multiple + +00:02:57.920 --> 00:02:59.920 +languages + +00:02:59.920 --> 00:03:01.760 +the other approach which has been + +00:03:01.760 --> 00:03:03.599 +gaining a lot of momentum in recent + +00:03:03.599 --> 00:03:04.319 +years + +00:03:04.319 --> 00:03:06.560 +is externalizing language understanding + +00:03:06.560 --> 00:03:08.159 +to another process + +00:03:08.159 --> 00:03:12.239 +also known as language server protocol + +00:03:12.239 --> 00:03:14.480 +this second approach is actually a very + +00:03:14.480 --> 00:03:16.560 +interesting one + +00:03:16.560 --> 00:03:18.400 +my decoupling language understanding + +00:03:18.400 --> 00:03:21.280 +from the editing facility itself + +00:03:21.280 --> 00:03:23.760 +the usb servers can attract a lot more + +00:03:23.760 --> 00:03:25.120 +contributors + +00:03:25.120 --> 00:03:28.959 +which makes maintenance easier however + +00:03:28.959 --> 00:03:32.400 +they also have several issues available + +00:03:32.400 --> 00:03:34.720 +being a separate process they are + +00:03:34.720 --> 00:03:36.000 +usually more resource + +00:03:36.000 --> 00:03:39.920 +intensive and depending on the language + +00:03:39.920 --> 00:03:42.159 +the usb server itself can bring with it + +00:03:42.159 --> 00:03:44.640 +a host of additional dependencies + +00:03:44.640 --> 00:03:47.680 +external to Emacs which may message to + +00:03:47.680 --> 00:03:50.640 +install and manage + +00:03:50.640 --> 00:03:53.760 +furthermore json over rpc has pretty + +00:03:53.760 --> 00:03:55.120 +high latency + +00:03:55.120 --> 00:03:57.840 +for one-off tasks like jumping to source + +00:03:57.840 --> 00:04:00.879 +or on-demand completion is great + +00:04:00.879 --> 00:04:03.040 +but for things like code highlighting + +00:04:03.040 --> 00:04:06.000 +the latency is just too much + +00:04:06.000 --> 00:04:08.319 +I was using rust and I was following the + +00:04:08.319 --> 00:04:10.480 +community effort to improve its id + +00:04:10.480 --> 00:04:11.760 +support + +00:04:11.760 --> 00:04:13.680 +hoping to integrate some of that into + +00:04:13.680 --> 00:04:15.760 +Emacs itself + +00:04:15.760 --> 00:04:17.600 +then I heard someone from community + +00:04:17.600 --> 00:04:19.759 +mention tree sitter + +00:04:19.759 --> 00:04:23.360 +and I decided to check it out + +00:04:23.360 --> 00:04:25.520 +basically trisita is an incremental + +00:04:25.520 --> 00:04:28.720 +parsing library and a parser generator + +00:04:28.720 --> 00:04:31.000 +it was introduced by the item editor in + +00:04:31.000 --> 00:04:33.040 +2018 + +00:04:33.040 --> 00:04:35.680 +besides item is also being integrated + +00:04:35.680 --> 00:04:36.960 +into the neo-vim + +00:04:36.960 --> 00:04:41.040 +editor and github is using it to power + +00:04:41.040 --> 00:04:42.479 +their source code analysis and + +00:04:42.479 --> 00:04:45.840 +navigation features + +00:04:45.840 --> 00:04:48.639 +it is written in c and can be compiled + +00:04:48.639 --> 00:04:49.199 +for all + +00:04:49.199 --> 00:04:53.120 +major platforms it can even be compiled + +00:04:53.120 --> 00:04:56.080 +to web assembly to run on the web that's + +00:04:56.080 --> 00:04:57.600 +how github is using it + +00:04:57.600 --> 00:05:00.800 +on their website + +00:05:00.800 --> 00:05:02.960 +so why is trisita an interesting + +00:05:02.960 --> 00:05:05.840 +solution to this problem + +00:05:05.840 --> 00:05:07.360 +there are multiple features that make it + +00:05:07.360 --> 00:05:10.000 +an attractive option + +00:05:10.000 --> 00:05:12.400 +it is designed to be fast by being + +00:05:12.400 --> 00:05:13.680 +incremental + +00:05:13.680 --> 00:05:15.680 +the initial parts of a typical big fight + +00:05:15.680 --> 00:05:18.160 +can take tens of milliseconds + +00:05:18.160 --> 00:05:20.240 +while subsequent incremental processes + +00:05:20.240 --> 00:05:22.560 +are sub milliseconds + +00:05:22.560 --> 00:05:24.720 +it achieves this by using structural + +00:05:24.720 --> 00:05:26.240 +sharing + +00:05:26.240 --> 00:05:29.360 +meaning replacing only affected nodes + +00:05:29.360 --> 00:05:32.960 +in the old tree when it needs to + +00:05:32.960 --> 00:05:36.000 +also unlike lsp being in the same + +00:05:36.000 --> 00:05:37.120 +process + +00:05:37.120 --> 00:05:40.639 +it has much lower latency + +00:05:40.639 --> 00:05:42.880 +secondly it provides a uniform + +00:05:42.880 --> 00:05:44.960 +programming interface + +00:05:44.960 --> 00:05:47.039 +the same data structures and functions + +00:05:47.039 --> 00:05:48.720 +work on parse trees of different + +00:05:48.720 --> 00:05:50.400 +languages + +00:05:50.400 --> 00:05:52.160 +syntax knows of different languages + +00:05:52.160 --> 00:05:54.160 +differ only by their types + +00:05:54.160 --> 00:05:57.360 +and their possible child nodes this + +00:05:57.360 --> 00:05:58.960 +is a big advantage over language + +00:05:58.960 --> 00:06:02.240 +specific parcels + +00:06:02.240 --> 00:06:04.880 +thirdly it's written in self-contained + +00:06:04.880 --> 00:06:06.880 +embeddable c + +00:06:06.880 --> 00:06:09.680 +as I mentioned previously it can even be + +00:06:09.680 --> 00:06:10.400 +compiled + +00:06:10.400 --> 00:06:13.759 +to webassembly this makes integrating it + +00:06:13.759 --> 00:06:15.199 +into various editors + +00:06:15.199 --> 00:06:18.240 +quite easy without having to install + +00:06:18.240 --> 00:06:22.880 +any external dependencies + +00:06:22.880 --> 00:06:24.639 +one thing that is not mentioned here is + +00:06:24.639 --> 00:06:28.000 +that being a parcel generator + +00:06:28.000 --> 00:06:31.039 +scrummers are declarative + +00:06:31.039 --> 00:06:34.880 +together with being editor independent + +00:06:34.880 --> 00:06:36.720 +this makes the pool of potential + +00:06:36.720 --> 00:06:38.160 +contributors + +00:06:38.160 --> 00:06:42.400 +much larger so I was convinced + +00:06:42.400 --> 00:06:45.520 +that trisito is a good fit for Emacs + +00:06:45.520 --> 00:06:48.000 +last year I started writing the bindings + +00:06:48.000 --> 00:06:48.720 +using + +00:06:48.720 --> 00:06:50.960 +dynamic model support introduced in imax + +00:06:50.960 --> 00:06:53.280 +25. + +00:06:53.280 --> 00:06:55.360 +dynamic module means there is platform + +00:06:55.360 --> 00:06:58.479 +specific native code involved + +00:06:58.479 --> 00:07:00.560 +but since they are pre-compiled binaries + +00:07:00.560 --> 00:07:02.880 +for the three major platforms + +00:07:02.880 --> 00:07:06.319 +it should work in most places currently + +00:07:06.319 --> 00:07:08.319 +the core functionalities are in a pretty + +00:07:08.319 --> 00:07:09.440 +good shape + +00:07:09.440 --> 00:07:12.560 +syntax highlighting is working nicely + +00:07:12.560 --> 00:07:14.840 +the whole thing is split into three + +00:07:14.840 --> 00:07:16.080 +packages + +00:07:16.080 --> 00:07:17.759 +tree sitter is the main package that + +00:07:17.759 --> 00:07:20.319 +other packages should depend on + +00:07:20.319 --> 00:07:22.800 +tree system lens is the language bundle + +00:07:22.800 --> 00:07:24.000 +that includes support + +00:07:24.000 --> 00:07:27.199 +for most common languages + +00:07:27.199 --> 00:07:30.080 +and finally the core apis are in the + +00:07:30.080 --> 00:07:32.160 +package tsc + +00:07:32.160 --> 00:07:36.160 +which stands for trees the core + +00:07:36.160 --> 00:07:38.800 +it is the implicit dependency of the + +00:07:38.800 --> 00:07:43.520 +three-seater package + +00:07:43.520 --> 00:07:46.000 +the main package includes the miner mode + +00:07:46.000 --> 00:07:47.520 +3-seater mode + +00:07:47.520 --> 00:07:49.840 +this provides the base for other major + +00:07:49.840 --> 00:07:52.560 +or minor modes to build on + +00:07:52.560 --> 00:07:55.280 +using image change tracking hooks it + +00:07:55.280 --> 00:07:55.840 +enables + +00:07:55.840 --> 00:07:58.080 +incremental parsing and provides a + +00:07:58.080 --> 00:08:00.800 +syntax tree that is always up to date + +00:08:00.800 --> 00:08:04.080 +after any edits in a buffer + +00:08:04.080 --> 00:08:06.560 +there is also a basic debug mode that + +00:08:06.560 --> 00:08:10.080 +shows the parse tree in another buffer + +00:08:10.080 --> 00:08:13.360 +here is a quick demo + +00:08:13.360 --> 00:08:15.759 +here I mean an empty python buffer with + +00:08:15.759 --> 00:08:17.520 +three seater enabled + +00:08:17.520 --> 00:08:19.440 +I'm going to turn on the debug mode to + +00:08:19.440 --> 00:08:26.560 +see the parse tree + +00:08:26.560 --> 00:08:28.720 +since the buffer is empty there is only + +00:08:28.720 --> 00:08:30.639 +one node in the syntax tree the top + +00:08:30.639 --> 00:08:33.279 +level module node + +00:08:33.279 --> 00:09:11.040 +let's try typing some code + +00:09:11.040 --> 00:09:13.600 +as you can see as I type into the python + +00:09:13.600 --> 00:09:14.640 +buffer + +00:09:14.640 --> 00:09:19.120 +the syntax tree updates in real time + +00:09:19.120 --> 00:09:21.120 +the other minor mode included in the + +00:09:21.120 --> 00:09:23.279 +main package is 3-seater + +00:09:23.279 --> 00:09:26.640 +hl mode it overrides font-lock mode and + +00:09:26.640 --> 00:09:28.480 +provides its own set of phases + +00:09:28.480 --> 00:09:31.839 +and customization options it is query + +00:09:31.839 --> 00:09:32.800 +driven + +00:09:32.800 --> 00:09:35.200 +that means instead of regular + +00:09:35.200 --> 00:09:36.240 +expressions + +00:09:36.240 --> 00:09:38.720 +it uses a list like query language to + +00:09:38.720 --> 00:09:40.320 +map syntax notes + +00:09:40.320 --> 00:09:43.760 +to highlighting phrases I'm going to + +00:09:43.760 --> 00:09:45.760 +open a python file with small snippets + +00:09:45.760 --> 00:09:54.320 +that showcase syntax highlighting + +00:09:54.320 --> 00:09:55.920 +so this is the default highlighting + +00:09:55.920 --> 00:10:00.880 +provided by python mode + +00:10:00.880 --> 00:10:02.839 +this is the highlighting enabled by tree + +00:10:02.839 --> 00:10:04.640 +sitter + +00:10:04.640 --> 00:10:07.680 +as you can see string interpolation + +00:10:07.680 --> 00:10:11.680 +and decorators are highlighted correctly + +00:10:11.680 --> 00:10:17.440 +function calls are also highlighted + +00:10:17.440 --> 00:10:20.240 +you can also note that property + +00:10:20.240 --> 00:10:21.839 +assessors + +00:10:21.839 --> 00:10:24.640 +and property assignments are highlighted + +00:10:24.640 --> 00:10:27.440 +differently + +00:10:27.440 --> 00:10:29.360 +what I like the most about this is that + +00:10:29.360 --> 00:10:30.880 +new bindings are consistently + +00:10:30.880 --> 00:10:32.640 +highlighted + +00:10:32.640 --> 00:10:36.320 +this included local variable + +00:10:36.320 --> 00:10:39.760 +function parameters and property + +00:10:39.760 --> 00:10:45.760 +mutations + +00:10:45.760 --> 00:10:48.000 +before going through the three queries + +00:10:48.000 --> 00:10:49.279 +and the syntax highlighting + +00:10:49.279 --> 00:10:51.680 +customization options + +00:10:51.680 --> 00:10:53.760 +let's take a brief look at the core data + +00:10:53.760 --> 00:10:55.040 +structures and functions + +00:10:55.040 --> 00:10:58.079 +that tree sitter provides + +00:10:58.079 --> 00:10:59.839 +so parsing is done with the help of a + +00:10:59.839 --> 00:11:02.240 +generic parser object + +00:11:02.240 --> 00:11:04.160 +a single parser object can be used to + +00:11:04.160 --> 00:11:06.000 +pass different languages + +00:11:06.000 --> 00:11:08.320 +by sending different language objects to + +00:11:08.320 --> 00:11:09.279 +it + +00:11:09.279 --> 00:11:10.880 +the language objects themselves are + +00:11:10.880 --> 00:11:14.079 +loaded from shared libraries + +00:11:14.079 --> 00:11:16.079 +since three seater mode already handles + +00:11:16.079 --> 00:11:17.360 +the parsing part + +00:11:17.360 --> 00:11:19.440 +we will instead focus on the functions + +00:11:19.440 --> 00:11:20.800 +that inspect nodes + +00:11:20.800 --> 00:11:25.279 +and in the resulting path tree + +00:11:25.279 --> 00:11:27.200 +we can ask tree sitter what is the + +00:11:27.200 --> 00:11:44.240 +syntax node at point + +00:11:44.240 --> 00:11:47.200 +uh is it an opaque object so this is not + +00:11:47.200 --> 00:11:48.480 +very useful + +00:11:48.480 --> 00:12:03.760 +we can instead ask what is its type + +00:12:03.760 --> 00:12:06.560 +so his type is the symbol comparison + +00:12:06.560 --> 00:12:08.959 +operator + +00:12:08.959 --> 00:12:11.600 +trees there are two kinds of nodes + +00:12:11.600 --> 00:12:13.680 +anonymous nodes and named nodes + +00:12:13.680 --> 00:12:15.519 +anonymous nodes correspond to simple + +00:12:15.519 --> 00:12:17.040 +grammar elements + +00:12:17.040 --> 00:12:19.839 +like keywords operators punctuations and + +00:12:19.839 --> 00:12:21.279 +so on + +00:12:21.279 --> 00:12:24.160 +name nodes on the other hand grammar + +00:12:24.160 --> 00:12:25.920 +elements that are interesting enough for + +00:12:25.920 --> 00:12:26.639 +their own + +00:12:26.639 --> 00:12:30.320 +to have a name like an identifier an + +00:12:30.320 --> 00:12:31.839 +expression + +00:12:31.839 --> 00:12:35.440 +or a function definition + +00:12:35.440 --> 00:12:37.760 +name node types are symbols while + +00:12:37.760 --> 00:12:42.639 +anonymous node types are strings + +00:12:42.639 --> 00:12:46.320 +for example if we are on this + +00:12:46.320 --> 00:12:49.760 +comparison operator + +00:12:49.760 --> 00:12:55.920 +the node type should be a string + +00:12:55.920 --> 00:12:57.920 +we can also get other information about + +00:12:57.920 --> 00:12:58.959 +the node + +00:12:58.959 --> 00:13:09.680 +for example what is this text + +00:13:09.680 --> 00:13:20.800 +or where it is in the buffer + +00:13:20.800 --> 00:13:43.199 +or what is its parent + +00:13:43.199 --> 00:13:46.160 +there are many other apis to query or + +00:13:46.160 --> 00:13:46.839 +not + +00:13:46.839 --> 00:13:52.639 +properties + +00:13:52.639 --> 00:13:54.399 +tree sitter allows searching for + +00:13:54.399 --> 00:13:58.240 +structural patterns within a parse tree + +00:13:58.240 --> 00:14:01.440 +it does so through a list like language + +00:14:01.440 --> 00:14:03.519 +this language supports by the matching + +00:14:03.519 --> 00:14:04.639 +by node types + +00:14:04.639 --> 00:14:07.760 +field names and predicates + +00:14:07.760 --> 00:14:10.079 +it also allows capturing nodes for + +00:14:10.079 --> 00:14:12.639 +further processing + +00:14:12.639 --> 00:14:37.680 +let's try to see some examples + +00:14:37.680 --> 00:14:41.040 +so in this very simple query we just + +00:14:41.040 --> 00:14:43.839 +try to highlight all the identifiers in + +00:14:43.839 --> 00:14:49.040 +the buffer + +00:14:49.040 --> 00:14:51.920 +this s side tells trisito to capture a + +00:14:51.920 --> 00:14:53.120 +node + +00:14:53.120 --> 00:14:55.839 +in the context of the query builder it's + +00:14:55.839 --> 00:14:57.360 +not very important + +00:14:57.360 --> 00:15:00.320 +but in normal highlighting query this + +00:15:00.320 --> 00:15:01.760 +will determine + +00:15:01.760 --> 00:15:06.639 +the face used to highlight the note + +00:15:06.639 --> 00:15:08.800 +suppose we want to capture all the + +00:15:08.800 --> 00:15:10.320 +function names + +00:15:10.320 --> 00:15:13.519 +instead of just any identifier + +00:15:13.519 --> 00:15:29.440 +you can improve the query like this + +00:15:29.440 --> 00:15:31.600 +uh this will highlight the whole + +00:15:31.600 --> 00:15:32.639 +definition + +00:15:32.639 --> 00:15:35.519 +but we only want to capture the function + +00:15:35.519 --> 00:15:36.399 +name + +00:15:36.399 --> 00:15:39.600 +which means the identifier + +00:15:39.600 --> 00:15:42.800 +here so we + +00:15:42.800 --> 00:15:46.320 +move the capture to after the identifier + +00:15:46.320 --> 00:15:49.600 +node + +00:15:49.600 --> 00:15:51.759 +if we want to capture the class names as + +00:15:51.759 --> 00:15:52.959 +well + +00:15:52.959 --> 00:16:10.079 +we just add another pattern + +00:16:10.079 --> 00:16:20.320 +let's look at a more practical example + +00:16:20.320 --> 00:16:22.959 +here we can see that single quotes + +00:16:22.959 --> 00:16:23.759 +strings and + +00:16:23.759 --> 00:16:25.600 +double quotes screens are highlighted + +00:16:25.600 --> 00:16:27.279 +the same + +00:16:27.279 --> 00:16:30.399 +but in some places + +00:16:30.399 --> 00:16:33.440 +because of some coding conventions + +00:16:33.440 --> 00:16:35.440 +it may be desirable to highlight them + +00:16:35.440 --> 00:16:37.279 +differently for example if + +00:16:37.279 --> 00:16:39.680 +the string is single quoted we may want + +00:16:39.680 --> 00:16:40.880 +to highlight it + +00:16:40.880 --> 00:16:44.399 +as a constant + +00:16:44.399 --> 00:16:46.160 +let's try to see whether we can + +00:16:46.160 --> 00:16:47.600 +distinguish these + +00:16:47.600 --> 00:16:56.240 +two cases + +00:16:56.240 --> 00:17:00.639 +so here we get all the strings + +00:17:00.639 --> 00:17:04.079 +if we want to see if it's single quotes + +00:17:04.079 --> 00:17:04.559 +or + +00:17:04.559 --> 00:17:08.799 +double quote strings + +00:17:08.799 --> 00:17:11.039 +we can try looking at the first + +00:17:11.039 --> 00:17:12.480 +character + +00:17:12.480 --> 00:17:15.280 +of the string I mean the first character + +00:17:15.280 --> 00:17:16.720 +of the note + +00:17:16.720 --> 00:17:19.360 +to check whether it's a single quote or + +00:17:19.360 --> 00:17:33.600 +a double quote + +00:17:33.600 --> 00:17:36.080 +yeah so for that we use the three + +00:17:36.080 --> 00:17:36.799 +setters + +00:17:36.799 --> 00:17:40.160 +support for predicate in this case + +00:17:40.160 --> 00:17:43.360 +we use a match predicate + +00:17:43.360 --> 00:17:46.080 +to check whether the string where the + +00:17:46.080 --> 00:17:46.799 +note + +00:17:46.799 --> 00:17:50.320 +starts with a single quote and with this + +00:17:50.320 --> 00:17:51.280 +pattern + +00:17:51.280 --> 00:17:58.840 +we only capture the single quotes + +00:17:58.840 --> 00:18:00.400 +strings + +00:18:00.400 --> 00:18:03.760 +let's try to give it a different face + +00:18:03.760 --> 00:18:13.039 +so we copy the pattern + +00:18:13.039 --> 00:18:18.640 +and we add this pattern + +00:18:18.640 --> 00:18:25.120 +pop item only + +00:18:25.120 --> 00:18:28.400 +but we also want to give the + +00:18:28.400 --> 00:18:31.440 +capture a different name + +00:18:31.440 --> 00:18:40.840 +let's say we want to highlight it as a + +00:18:40.840 --> 00:18:46.559 +keyword + +00:18:46.559 --> 00:19:06.320 +and now if we refresh the buffer + +00:19:06.320 --> 00:19:08.799 +we see that single quote strings are + +00:19:08.799 --> 00:19:10.320 +highlighted as + +00:19:10.320 --> 00:19:14.400 +keywords + +00:19:14.400 --> 00:19:16.400 +the highlighting patterns can also be + +00:19:16.400 --> 00:19:19.200 +set for a single project + +00:19:19.200 --> 00:19:23.440 +using directory local variable + +00:19:23.440 --> 00:19:26.880 +for example let's take a look at + +00:19:26.880 --> 00:19:35.760 +ems source code + +00:19:35.760 --> 00:19:40.400 +so in image c source there are a lot of + +00:19:40.400 --> 00:19:43.760 +uses of these different macros + +00:19:43.760 --> 00:19:47.679 +to define functions + +00:19:47.679 --> 00:19:51.200 +and you can see + +00:19:51.200 --> 00:19:53.520 +this is actually the function name but + +00:19:53.520 --> 00:19:55.760 +it's highlighted as the + +00:19:55.760 --> 00:19:59.120 +string so what we want + +00:19:59.120 --> 00:20:03.679 +is to somehow recognize this pattern + +00:20:03.679 --> 00:20:07.600 +and highlight it + +00:20:07.600 --> 00:20:11.280 +as highlight this part + +00:20:11.280 --> 00:20:14.559 +with the function phase instead + +00:20:14.559 --> 00:20:17.679 +in order to do that + +00:20:17.679 --> 00:20:20.240 +we put a pattern in this project + +00:20:20.240 --> 00:20:21.760 +directory local + +00:20:21.760 --> 00:20:31.760 +settings file + +00:20:31.760 --> 00:20:34.799 +so we can put this button in the c + +00:20:34.799 --> 00:20:40.159 +mode section + +00:20:40.159 --> 00:20:48.000 +and now if we enable tree sitter + +00:20:48.000 --> 00:20:50.480 +you can see that this is the highlighted + +00:20:50.480 --> 00:20:53.200 +uh + +00:20:53.200 --> 00:20:55.520 +as a normal function definition so this + +00:20:55.520 --> 00:20:56.559 +is the function + +00:20:56.559 --> 00:21:01.200 +face like we wanted + +00:21:01.200 --> 00:21:03.760 +the pattern for this is actually pretty + +00:21:03.760 --> 00:21:07.200 +simple + +00:21:07.200 --> 00:21:10.720 +it's only + +00:21:10.720 --> 00:21:14.720 +only this part so + +00:21:14.720 --> 00:21:17.440 +if it's a function call where the name + +00:21:17.440 --> 00:21:19.679 +of the function is different + +00:21:19.679 --> 00:21:21.600 +then we highlight the different as a + +00:21:21.600 --> 00:21:24.240 +keyword + +00:21:24.240 --> 00:21:27.360 +and then the first string element we + +00:21:27.360 --> 00:21:28.159 +highlighted + +00:21:28.159 --> 00:21:35.360 +as a function name + +00:21:35.360 --> 00:21:37.679 +since the language objects are actually + +00:21:37.679 --> 00:21:39.280 +native code + +00:21:39.280 --> 00:21:40.799 +they have to be compiled for each + +00:21:40.799 --> 00:21:43.440 +platform that we want to support + +00:21:43.440 --> 00:21:45.600 +this will become a big obstacle for + +00:21:45.600 --> 00:21:48.159 +3-seater adoption + +00:21:48.159 --> 00:21:50.240 +therefore I've created a language window + +00:21:50.240 --> 00:21:52.960 +package 3-seater length + +00:21:52.960 --> 00:21:54.960 +that takes care of pre-compiling the + +00:21:54.960 --> 00:21:56.320 +grammars the + +00:21:56.320 --> 00:21:59.679 +most common grammars for all three major + +00:21:59.679 --> 00:22:01.600 +platforms + +00:22:01.600 --> 00:22:04.080 +it also takes care of distributing these + +00:22:04.080 --> 00:22:05.360 +binaries + +00:22:05.360 --> 00:22:08.080 +and provides some highlighting queries + +00:22:08.080 --> 00:22:11.440 +for some of the languages + +00:22:11.440 --> 00:22:13.760 +it should be noted that this package + +00:22:13.760 --> 00:22:15.919 +should be treated as a temporary + +00:22:15.919 --> 00:22:19.919 +distribution mechanism only + +00:22:19.919 --> 00:22:22.240 +to help with bootstrapping three-seaters + +00:22:22.240 --> 00:22:24.720 +adoption + +00:22:24.720 --> 00:22:27.760 +the plan is that eventually these files + +00:22:27.760 --> 00:22:29.760 +should be provided by the language major + +00:22:29.760 --> 00:22:32.480 +modes themselves + +00:22:32.480 --> 00:22:35.120 +but in order to do that we need better + +00:22:35.120 --> 00:22:36.320 +tooling + +00:22:36.320 --> 00:22:40.240 +so we're not there yet + +00:22:40.240 --> 00:22:42.559 +since the call already works reasonably + +00:22:42.559 --> 00:22:43.280 +well + +00:22:43.280 --> 00:22:44.640 +there are several areas that would + +00:22:44.640 --> 00:22:46.320 +benefit from the community's + +00:22:46.320 --> 00:22:49.120 +contribution + +00:22:49.120 --> 00:22:51.520 +so three seaters upstream language + +00:22:51.520 --> 00:22:52.640 +prepositories + +00:22:52.640 --> 00:22:54.400 +already contain highlighting queries on + +00:22:54.400 --> 00:22:55.679 +their own + +00:22:55.679 --> 00:22:58.480 +however they are pretty basic and they + +00:22:58.480 --> 00:23:00.480 +may not fit well with existing emax + +00:23:00.480 --> 00:23:02.559 +conventions + +00:23:02.559 --> 00:23:04.320 +therefore the language bundle has its + +00:23:04.320 --> 00:23:07.120 +own set of highlighting queries + +00:23:07.120 --> 00:23:10.559 +this requires maintenance until language + +00:23:10.559 --> 00:23:11.600 +measurements adopt + +00:23:11.600 --> 00:23:13.760 +three sitter and maintain the queries on + +00:23:13.760 --> 00:23:16.640 +their own + +00:23:16.640 --> 00:23:18.480 +the queries are actually quite easy to + +00:23:18.480 --> 00:23:22.000 +write as you've already seen + +00:23:22.000 --> 00:23:24.240 +you just need to be familiar with the + +00:23:24.240 --> 00:23:25.360 +language + +00:23:25.360 --> 00:23:30.000 +familiar enough to come up with sensible + +00:23:30.000 --> 00:23:35.200 +highlighting patterns + +00:23:35.200 --> 00:23:37.600 +and if you are a maintainer of a + +00:23:37.600 --> 00:23:39.679 +language major mode + +00:23:39.679 --> 00:23:42.320 +you may want to consider integrating + +00:23:42.320 --> 00:23:43.360 +tree sitter into + +00:23:43.360 --> 00:23:46.960 +your mode initially maybe as an + +00:23:46.960 --> 00:23:50.080 +optional feature the integration is + +00:23:50.080 --> 00:23:53.279 +actually pretty straightforward + +00:23:53.279 --> 00:23:56.640 +especially for syntax highlighting + +00:23:56.640 --> 00:24:01.520 +or alternatively + +00:24:01.520 --> 00:24:03.760 +you can also try writing a new major + +00:24:03.760 --> 00:24:04.640 +mode + +00:24:04.640 --> 00:24:08.000 +from scratch that relies on tree sitter + +00:24:08.000 --> 00:24:12.559 +from the very beginning + +00:24:12.559 --> 00:24:16.320 +the code for such a major mode is + +00:24:16.320 --> 00:24:19.679 +quite simple for example + +00:24:19.679 --> 00:24:23.200 +this is the proposed + +00:24:23.200 --> 00:24:26.240 +what mode for web assembly + +00:24:26.240 --> 00:24:31.039 +the code is just + +00:24:31.039 --> 00:24:34.559 +like one page of code not + +00:24:34.559 --> 00:24:39.520 +not a lot + +00:24:39.520 --> 00:24:42.720 +you can also try writing new minor modes + +00:24:42.720 --> 00:24:46.559 +or writing integration packages + +00:24:46.559 --> 00:24:50.080 +for example a lot of package a lot of + +00:24:50.080 --> 00:24:50.880 +packages + +00:24:50.880 --> 00:24:54.559 +may benefit from tree sitter integration + +00:24:54.559 --> 00:24:58.840 +but no one has written the integration + +00:24:58.840 --> 00:25:02.960 +yet + +00:25:02.960 --> 00:25:05.039 +if you are interested in 3-seater you + +00:25:05.039 --> 00:25:06.720 +can use these links to + +00:25:06.720 --> 00:25:10.320 +learn more about it I think that's it + +00:25:10.320 --> 00:25:11.440 +for me today + +00:25:11.440 --> 00:25:18.159 +I'm happy to answer any questions -- cgit v1.2.3