diff options
Diffstat (limited to '2020/subtitles/emacsconf-2020--23-incremental-parsing-with-emacs-tree-sitter--tuan-anh-nguyen-autogen.vtt')
-rw-r--r-- | 2020/subtitles/emacsconf-2020--23-incremental-parsing-with-emacs-tree-sitter--tuan-anh-nguyen-autogen.vtt | 1522 |
1 files changed, 0 insertions, 1522 deletions
diff --git a/2020/subtitles/emacsconf-2020--23-incremental-parsing-with-emacs-tree-sitter--tuan-anh-nguyen-autogen.vtt b/2020/subtitles/emacsconf-2020--23-incremental-parsing-with-emacs-tree-sitter--tuan-anh-nguyen-autogen.vtt deleted file mode 100644 index 99133c78..00000000 --- a/2020/subtitles/emacsconf-2020--23-incremental-parsing-with-emacs-tree-sitter--tuan-anh-nguyen-autogen.vtt +++ /dev/null @@ -1,1522 +0,0 @@ -WEBVTT - -00:00:01.520 --> 00:00:04.400 -hello everyone my name is toniang - -00:00:04.400 --> 00:00:07.200 -I've been using amax for about 10 years - -00:00:07.200 --> 00:00:09.280 -today I'm going to talk about 360 - -00:00:09.280 --> 00:00:11.519 -a new imax package that allows ems to - -00:00:11.519 --> 00:00:13.759 -pass multiple programming languages - -00:00:13.759 --> 00:00:17.840 -in real time - -00:00:17.840 --> 00:00:21.840 -so what is the problem statement - -00:00:21.840 --> 00:00:23.359 -in order to support programming - -00:00:23.359 --> 00:00:24.960 -functionalities for a particular - -00:00:24.960 --> 00:00:25.760 -language - -00:00:25.760 --> 00:00:27.680 -a text editor needs to have some degree - -00:00:27.680 --> 00:00:29.679 -of language understanding - -00:00:29.679 --> 00:00:31.840 -traditionally text editors have relied - -00:00:31.840 --> 00:00:33.840 -very heavily on regular expressions for - -00:00:33.840 --> 00:00:34.960 -this - -00:00:34.960 --> 00:00:38.320 -e-max is no different most language - -00:00:38.320 --> 00:00:39.280 -major modes use - -00:00:39.280 --> 00:00:40.879 -regular expressions for syntax - -00:00:40.879 --> 00:00:42.960 -highlighting code navigation - -00:00:42.960 --> 00:00:46.239 -folding indexing and so on regular - -00:00:46.239 --> 00:00:47.440 -expressions are - -00:00:47.440 --> 00:00:50.559 -problematic for a couple of reasons - -00:00:50.559 --> 00:00:53.600 -they're slow and inaccurate they also - -00:00:53.600 --> 00:00:54.000 -make - -00:00:54.000 --> 00:00:56.800 -the code hard to read and write - -00:00:56.800 --> 00:00:57.440 -sometimes - -00:00:57.440 --> 00:00:59.199 -it's because the regular expressions - -00:00:59.199 --> 00:01:01.199 -themselves are very hairy - -00:01:01.199 --> 00:01:04.000 -and sometimes because they are just not - -00:01:04.000 --> 00:01:05.199 -powerful enough - -00:01:05.199 --> 00:01:07.840 -some helper code is usually needed to - -00:01:07.840 --> 00:01:11.200 -pass more intricate language features - -00:01:11.200 --> 00:01:13.280 -that also illustrates the core problem - -00:01:13.280 --> 00:01:16.159 -with regular expressions - -00:01:16.159 --> 00:01:18.400 -in that they are not powerful enough to - -00:01:18.400 --> 00:01:21.119 -pass programming languages - -00:01:21.119 --> 00:01:22.640 -an example feature that regular - -00:01:22.640 --> 00:01:25.040 -expressions cannot handle very well - -00:01:25.040 --> 00:01:27.520 -is string interpolation which is a very - -00:01:27.520 --> 00:01:28.320 -common feature - -00:01:28.320 --> 00:01:31.680 -in many modern programming languages - -00:01:31.680 --> 00:01:34.079 -it would be much nicer if image somehow - -00:01:34.079 --> 00:01:35.840 -had structural understanding of source - -00:01:35.840 --> 00:01:36.479 -code - -00:01:36.479 --> 00:01:39.520 -like ides do - -00:01:39.520 --> 00:01:41.119 -there have been multiple efforts to - -00:01:41.119 --> 00:01:42.960 -bring this kind of programming language - -00:01:42.960 --> 00:01:45.280 -understanding into Emacs - -00:01:45.280 --> 00:01:47.119 -there are language specific persons - -00:01:47.119 --> 00:01:48.640 -written in elise - -00:01:48.640 --> 00:01:50.240 -they can be thought of as the next - -00:01:50.240 --> 00:01:52.320 -logical step of the glue code on top - -00:01:52.320 --> 00:01:54.960 -of tribal expressions moving from - -00:01:54.960 --> 00:01:56.000 -partial local - -00:01:56.000 --> 00:01:58.079 -pattern recognition into a full-fledged - -00:01:58.079 --> 00:01:59.840 -parser - -00:01:59.840 --> 00:02:01.439 -the most prominent example of this - -00:02:01.439 --> 00:02:03.040 -approach is probably the famous - -00:02:03.040 --> 00:02:06.479 -js2 mode - -00:02:06.479 --> 00:02:10.080 -however this approach has several issues - -00:02:10.080 --> 00:02:12.959 -parsing is computationally expensive and - -00:02:12.959 --> 00:02:13.680 -imagine - -00:02:13.680 --> 00:02:16.800 -is not good at that kind of stuff - -00:02:16.800 --> 00:02:18.400 -furthermore maintenance is very - -00:02:18.400 --> 00:02:20.840 -troublesome in order to work on these - -00:02:20.840 --> 00:02:22.160 -process - -00:02:22.160 --> 00:02:23.599 -first you have to know at least well - -00:02:23.599 --> 00:02:25.599 -enough and then you have to be - -00:02:25.599 --> 00:02:27.760 -comfortable with writing a - -00:02:27.760 --> 00:02:30.319 -recursive ascendant parser while - -00:02:30.319 --> 00:02:32.080 -constantly keeping up with changes to - -00:02:32.080 --> 00:02:34.000 -the language itself - -00:02:34.000 --> 00:02:36.879 -which can be evolving very quickly like - -00:02:36.879 --> 00:02:39.360 -javascript for example - -00:02:39.360 --> 00:02:41.599 -together these constraints significantly - -00:02:41.599 --> 00:02:45.680 -reduce the pull of potential maintenance - -00:02:45.680 --> 00:02:47.760 -the biggest issue though in my opinion - -00:02:47.760 --> 00:02:49.680 -is lack of the set of generic - -00:02:49.680 --> 00:02:52.879 -and reusable apis this makes them very - -00:02:52.879 --> 00:02:54.319 -hard to use - -00:02:54.319 --> 00:02:55.920 -for minor modes that want to deal with - -00:02:55.920 --> 00:02:57.920 -cross-cutting concerns across multiple - -00:02:57.920 --> 00:02:59.920 -languages - -00:02:59.920 --> 00:03:01.760 -the other approach which has been - -00:03:01.760 --> 00:03:03.599 -gaining a lot of momentum in recent - -00:03:03.599 --> 00:03:04.319 -years - -00:03:04.319 --> 00:03:06.560 -is externalizing language understanding - -00:03:06.560 --> 00:03:08.159 -to another process - -00:03:08.159 --> 00:03:12.239 -also known as language server protocol - -00:03:12.239 --> 00:03:14.480 -this second approach is actually a very - -00:03:14.480 --> 00:03:16.560 -interesting one - -00:03:16.560 --> 00:03:18.400 -my decoupling language understanding - -00:03:18.400 --> 00:03:21.280 -from the editing facility itself - -00:03:21.280 --> 00:03:23.760 -the usb servers can attract a lot more - -00:03:23.760 --> 00:03:25.120 -contributors - -00:03:25.120 --> 00:03:28.959 -which makes maintenance easier however - -00:03:28.959 --> 00:03:32.400 -they also have several issues available - -00:03:32.400 --> 00:03:34.720 -being a separate process they are - -00:03:34.720 --> 00:03:36.000 -usually more resource - -00:03:36.000 --> 00:03:39.920 -intensive and depending on the language - -00:03:39.920 --> 00:03:42.159 -the usb server itself can bring with it - -00:03:42.159 --> 00:03:44.640 -a host of additional dependencies - -00:03:44.640 --> 00:03:47.680 -external to Emacs which may message to - -00:03:47.680 --> 00:03:50.640 -install and manage - -00:03:50.640 --> 00:03:53.760 -furthermore json over rpc has pretty - -00:03:53.760 --> 00:03:55.120 -high latency - -00:03:55.120 --> 00:03:57.840 -for one-off tasks like jumping to source - -00:03:57.840 --> 00:04:00.879 -or on-demand completion is great - -00:04:00.879 --> 00:04:03.040 -but for things like code highlighting - -00:04:03.040 --> 00:04:06.000 -the latency is just too much - -00:04:06.000 --> 00:04:08.319 -I was using rust and I was following the - -00:04:08.319 --> 00:04:10.480 -community effort to improve its id - -00:04:10.480 --> 00:04:11.760 -support - -00:04:11.760 --> 00:04:13.680 -hoping to integrate some of that into - -00:04:13.680 --> 00:04:15.760 -Emacs itself - -00:04:15.760 --> 00:04:17.600 -then I heard someone from community - -00:04:17.600 --> 00:04:19.759 -mention tree sitter - -00:04:19.759 --> 00:04:23.360 -and I decided to check it out - -00:04:23.360 --> 00:04:25.520 -basically trisita is an incremental - -00:04:25.520 --> 00:04:28.720 -parsing library and a parser generator - -00:04:28.720 --> 00:04:31.000 -it was introduced by the item editor in - -00:04:31.000 --> 00:04:33.040 -2018 - -00:04:33.040 --> 00:04:35.680 -besides item is also being integrated - -00:04:35.680 --> 00:04:36.960 -into the neo-vim - -00:04:36.960 --> 00:04:41.040 -editor and github is using it to power - -00:04:41.040 --> 00:04:42.479 -their source code analysis and - -00:04:42.479 --> 00:04:45.840 -navigation features - -00:04:45.840 --> 00:04:48.639 -it is written in c and can be compiled - -00:04:48.639 --> 00:04:49.199 -for all - -00:04:49.199 --> 00:04:53.120 -major platforms it can even be compiled - -00:04:53.120 --> 00:04:56.080 -to web assembly to run on the web that's - -00:04:56.080 --> 00:04:57.600 -how github is using it - -00:04:57.600 --> 00:05:00.800 -on their website - -00:05:00.800 --> 00:05:02.960 -so why is trisita an interesting - -00:05:02.960 --> 00:05:05.840 -solution to this problem - -00:05:05.840 --> 00:05:07.360 -there are multiple features that make it - -00:05:07.360 --> 00:05:10.000 -an attractive option - -00:05:10.000 --> 00:05:12.400 -it is designed to be fast by being - -00:05:12.400 --> 00:05:13.680 -incremental - -00:05:13.680 --> 00:05:15.680 -the initial parts of a typical big fight - -00:05:15.680 --> 00:05:18.160 -can take tens of milliseconds - -00:05:18.160 --> 00:05:20.240 -while subsequent incremental processes - -00:05:20.240 --> 00:05:22.560 -are sub milliseconds - -00:05:22.560 --> 00:05:24.720 -it achieves this by using structural - -00:05:24.720 --> 00:05:26.240 -sharing - -00:05:26.240 --> 00:05:29.360 -meaning replacing only affected nodes - -00:05:29.360 --> 00:05:32.960 -in the old tree when it needs to - -00:05:32.960 --> 00:05:36.000 -also unlike lsp being in the same - -00:05:36.000 --> 00:05:37.120 -process - -00:05:37.120 --> 00:05:40.639 -it has much lower latency - -00:05:40.639 --> 00:05:42.880 -secondly it provides a uniform - -00:05:42.880 --> 00:05:44.960 -programming interface - -00:05:44.960 --> 00:05:47.039 -the same data structures and functions - -00:05:47.039 --> 00:05:48.720 -work on parse trees of different - -00:05:48.720 --> 00:05:50.400 -languages - -00:05:50.400 --> 00:05:52.160 -syntax knows of different languages - -00:05:52.160 --> 00:05:54.160 -differ only by their types - -00:05:54.160 --> 00:05:57.360 -and their possible child nodes this - -00:05:57.360 --> 00:05:58.960 -is a big advantage over language - -00:05:58.960 --> 00:06:02.240 -specific parcels - -00:06:02.240 --> 00:06:04.880 -thirdly it's written in self-contained - -00:06:04.880 --> 00:06:06.880 -embeddable c - -00:06:06.880 --> 00:06:09.680 -as I mentioned previously it can even be - -00:06:09.680 --> 00:06:10.400 -compiled - -00:06:10.400 --> 00:06:13.759 -to webassembly this makes integrating it - -00:06:13.759 --> 00:06:15.199 -into various editors - -00:06:15.199 --> 00:06:18.240 -quite easy without having to install - -00:06:18.240 --> 00:06:22.880 -any external dependencies - -00:06:22.880 --> 00:06:24.639 -one thing that is not mentioned here is - -00:06:24.639 --> 00:06:28.000 -that being a parcel generator - -00:06:28.000 --> 00:06:31.039 -scrummers are declarative - -00:06:31.039 --> 00:06:34.880 -together with being editor independent - -00:06:34.880 --> 00:06:36.720 -this makes the pool of potential - -00:06:36.720 --> 00:06:38.160 -contributors - -00:06:38.160 --> 00:06:42.400 -much larger so I was convinced - -00:06:42.400 --> 00:06:45.520 -that trisito is a good fit for Emacs - -00:06:45.520 --> 00:06:48.000 -last year I started writing the bindings - -00:06:48.000 --> 00:06:48.720 -using - -00:06:48.720 --> 00:06:50.960 -dynamic model support introduced in imax - -00:06:50.960 --> 00:06:53.280 -25. - -00:06:53.280 --> 00:06:55.360 -dynamic module means there is platform - -00:06:55.360 --> 00:06:58.479 -specific native code involved - -00:06:58.479 --> 00:07:00.560 -but since they are pre-compiled binaries - -00:07:00.560 --> 00:07:02.880 -for the three major platforms - -00:07:02.880 --> 00:07:06.319 -it should work in most places currently - -00:07:06.319 --> 00:07:08.319 -the core functionalities are in a pretty - -00:07:08.319 --> 00:07:09.440 -good shape - -00:07:09.440 --> 00:07:12.560 -syntax highlighting is working nicely - -00:07:12.560 --> 00:07:14.840 -the whole thing is split into three - -00:07:14.840 --> 00:07:16.080 -packages - -00:07:16.080 --> 00:07:17.759 -tree sitter is the main package that - -00:07:17.759 --> 00:07:20.319 -other packages should depend on - -00:07:20.319 --> 00:07:22.800 -tree system lens is the language bundle - -00:07:22.800 --> 00:07:24.000 -that includes support - -00:07:24.000 --> 00:07:27.199 -for most common languages - -00:07:27.199 --> 00:07:30.080 -and finally the core apis are in the - -00:07:30.080 --> 00:07:32.160 -package tsc - -00:07:32.160 --> 00:07:36.160 -which stands for trees the core - -00:07:36.160 --> 00:07:38.800 -it is the implicit dependency of the - -00:07:38.800 --> 00:07:43.520 -three-seater package - -00:07:43.520 --> 00:07:46.000 -the main package includes the miner mode - -00:07:46.000 --> 00:07:47.520 -3-seater mode - -00:07:47.520 --> 00:07:49.840 -this provides the base for other major - -00:07:49.840 --> 00:07:52.560 -or minor modes to build on - -00:07:52.560 --> 00:07:55.280 -using image change tracking hooks it - -00:07:55.280 --> 00:07:55.840 -enables - -00:07:55.840 --> 00:07:58.080 -incremental parsing and provides a - -00:07:58.080 --> 00:08:00.800 -syntax tree that is always up to date - -00:08:00.800 --> 00:08:04.080 -after any edits in a buffer - -00:08:04.080 --> 00:08:06.560 -there is also a basic debug mode that - -00:08:06.560 --> 00:08:10.080 -shows the parse tree in another buffer - -00:08:10.080 --> 00:08:13.360 -here is a quick demo - -00:08:13.360 --> 00:08:15.759 -here I mean an empty python buffer with - -00:08:15.759 --> 00:08:17.520 -three seater enabled - -00:08:17.520 --> 00:08:19.440 -I'm going to turn on the debug mode to - -00:08:19.440 --> 00:08:26.560 -see the parse tree - -00:08:26.560 --> 00:08:28.720 -since the buffer is empty there is only - -00:08:28.720 --> 00:08:30.639 -one node in the syntax tree the top - -00:08:30.639 --> 00:08:33.279 -level module node - -00:08:33.279 --> 00:09:11.040 -let's try typing some code - -00:09:11.040 --> 00:09:13.600 -as you can see as I type into the python - -00:09:13.600 --> 00:09:14.640 -buffer - -00:09:14.640 --> 00:09:19.120 -the syntax tree updates in real time - -00:09:19.120 --> 00:09:21.120 -the other minor mode included in the - -00:09:21.120 --> 00:09:23.279 -main package is 3-seater - -00:09:23.279 --> 00:09:26.640 -hl mode it overrides font-lock mode and - -00:09:26.640 --> 00:09:28.480 -provides its own set of phases - -00:09:28.480 --> 00:09:31.839 -and customization options it is query - -00:09:31.839 --> 00:09:32.800 -driven - -00:09:32.800 --> 00:09:35.200 -that means instead of regular - -00:09:35.200 --> 00:09:36.240 -expressions - -00:09:36.240 --> 00:09:38.720 -it uses a list like query language to - -00:09:38.720 --> 00:09:40.320 -map syntax notes - -00:09:40.320 --> 00:09:43.760 -to highlighting phrases I'm going to - -00:09:43.760 --> 00:09:45.760 -open a python file with small snippets - -00:09:45.760 --> 00:09:54.320 -that showcase syntax highlighting - -00:09:54.320 --> 00:09:55.920 -so this is the default highlighting - -00:09:55.920 --> 00:10:00.880 -provided by python mode - -00:10:00.880 --> 00:10:02.839 -this is the highlighting enabled by tree - -00:10:02.839 --> 00:10:04.640 -sitter - -00:10:04.640 --> 00:10:07.680 -as you can see string interpolation - -00:10:07.680 --> 00:10:11.680 -and decorators are highlighted correctly - -00:10:11.680 --> 00:10:17.440 -function calls are also highlighted - -00:10:17.440 --> 00:10:20.240 -you can also note that property - -00:10:20.240 --> 00:10:21.839 -assessors - -00:10:21.839 --> 00:10:24.640 -and property assignments are highlighted - -00:10:24.640 --> 00:10:27.440 -differently - -00:10:27.440 --> 00:10:29.360 -what I like the most about this is that - -00:10:29.360 --> 00:10:30.880 -new bindings are consistently - -00:10:30.880 --> 00:10:32.640 -highlighted - -00:10:32.640 --> 00:10:36.320 -this included local variable - -00:10:36.320 --> 00:10:39.760 -function parameters and property - -00:10:39.760 --> 00:10:45.760 -mutations - -00:10:45.760 --> 00:10:48.000 -before going through the three queries - -00:10:48.000 --> 00:10:49.279 -and the syntax highlighting - -00:10:49.279 --> 00:10:51.680 -customization options - -00:10:51.680 --> 00:10:53.760 -let's take a brief look at the core data - -00:10:53.760 --> 00:10:55.040 -structures and functions - -00:10:55.040 --> 00:10:58.079 -that tree sitter provides - -00:10:58.079 --> 00:10:59.839 -so parsing is done with the help of a - -00:10:59.839 --> 00:11:02.240 -generic parser object - -00:11:02.240 --> 00:11:04.160 -a single parser object can be used to - -00:11:04.160 --> 00:11:06.000 -pass different languages - -00:11:06.000 --> 00:11:08.320 -by sending different language objects to - -00:11:08.320 --> 00:11:09.279 -it - -00:11:09.279 --> 00:11:10.880 -the language objects themselves are - -00:11:10.880 --> 00:11:14.079 -loaded from shared libraries - -00:11:14.079 --> 00:11:16.079 -since three seater mode already handles - -00:11:16.079 --> 00:11:17.360 -the parsing part - -00:11:17.360 --> 00:11:19.440 -we will instead focus on the functions - -00:11:19.440 --> 00:11:20.800 -that inspect nodes - -00:11:20.800 --> 00:11:25.279 -and in the resulting path tree - -00:11:25.279 --> 00:11:27.200 -we can ask tree sitter what is the - -00:11:27.200 --> 00:11:44.240 -syntax node at point - -00:11:44.240 --> 00:11:47.200 -uh is it an opaque object so this is not - -00:11:47.200 --> 00:11:48.480 -very useful - -00:11:48.480 --> 00:12:03.760 -we can instead ask what is its type - -00:12:03.760 --> 00:12:06.560 -so his type is the symbol comparison - -00:12:06.560 --> 00:12:08.959 -operator - -00:12:08.959 --> 00:12:11.600 -trees there are two kinds of nodes - -00:12:11.600 --> 00:12:13.680 -anonymous nodes and named nodes - -00:12:13.680 --> 00:12:15.519 -anonymous nodes correspond to simple - -00:12:15.519 --> 00:12:17.040 -grammar elements - -00:12:17.040 --> 00:12:19.839 -like keywords operators punctuations and - -00:12:19.839 --> 00:12:21.279 -so on - -00:12:21.279 --> 00:12:24.160 -name nodes on the other hand grammar - -00:12:24.160 --> 00:12:25.920 -elements that are interesting enough for - -00:12:25.920 --> 00:12:26.639 -their own - -00:12:26.639 --> 00:12:30.320 -to have a name like an identifier an - -00:12:30.320 --> 00:12:31.839 -expression - -00:12:31.839 --> 00:12:35.440 -or a function definition - -00:12:35.440 --> 00:12:37.760 -name node types are symbols while - -00:12:37.760 --> 00:12:42.639 -anonymous node types are strings - -00:12:42.639 --> 00:12:46.320 -for example if we are on this - -00:12:46.320 --> 00:12:49.760 -comparison operator - -00:12:49.760 --> 00:12:55.920 -the node type should be a string - -00:12:55.920 --> 00:12:57.920 -we can also get other information about - -00:12:57.920 --> 00:12:58.959 -the node - -00:12:58.959 --> 00:13:09.680 -for example what is this text - -00:13:09.680 --> 00:13:20.800 -or where it is in the buffer - -00:13:20.800 --> 00:13:43.199 -or what is its parent - -00:13:43.199 --> 00:13:46.160 -there are many other apis to query or - -00:13:46.160 --> 00:13:46.839 -not - -00:13:46.839 --> 00:13:52.639 -properties - -00:13:52.639 --> 00:13:54.399 -tree sitter allows searching for - -00:13:54.399 --> 00:13:58.240 -structural patterns within a parse tree - -00:13:58.240 --> 00:14:01.440 -it does so through a list like language - -00:14:01.440 --> 00:14:03.519 -this language supports by the matching - -00:14:03.519 --> 00:14:04.639 -by node types - -00:14:04.639 --> 00:14:07.760 -field names and predicates - -00:14:07.760 --> 00:14:10.079 -it also allows capturing nodes for - -00:14:10.079 --> 00:14:12.639 -further processing - -00:14:12.639 --> 00:14:37.680 -let's try to see some examples - -00:14:37.680 --> 00:14:41.040 -so in this very simple query we just - -00:14:41.040 --> 00:14:43.839 -try to highlight all the identifiers in - -00:14:43.839 --> 00:14:49.040 -the buffer - -00:14:49.040 --> 00:14:51.920 -this s side tells trisito to capture a - -00:14:51.920 --> 00:14:53.120 -node - -00:14:53.120 --> 00:14:55.839 -in the context of the query builder it's - -00:14:55.839 --> 00:14:57.360 -not very important - -00:14:57.360 --> 00:15:00.320 -but in normal highlighting query this - -00:15:00.320 --> 00:15:01.760 -will determine - -00:15:01.760 --> 00:15:06.639 -the face used to highlight the note - -00:15:06.639 --> 00:15:08.800 -suppose we want to capture all the - -00:15:08.800 --> 00:15:10.320 -function names - -00:15:10.320 --> 00:15:13.519 -instead of just any identifier - -00:15:13.519 --> 00:15:29.440 -you can improve the query like this - -00:15:29.440 --> 00:15:31.600 -uh this will highlight the whole - -00:15:31.600 --> 00:15:32.639 -definition - -00:15:32.639 --> 00:15:35.519 -but we only want to capture the function - -00:15:35.519 --> 00:15:36.399 -name - -00:15:36.399 --> 00:15:39.600 -which means the identifier - -00:15:39.600 --> 00:15:42.800 -here so we - -00:15:42.800 --> 00:15:46.320 -move the capture to after the identifier - -00:15:46.320 --> 00:15:49.600 -node - -00:15:49.600 --> 00:15:51.759 -if we want to capture the class names as - -00:15:51.759 --> 00:15:52.959 -well - -00:15:52.959 --> 00:16:10.079 -we just add another pattern - -00:16:10.079 --> 00:16:20.320 -let's look at a more practical example - -00:16:20.320 --> 00:16:22.959 -here we can see that single quotes - -00:16:22.959 --> 00:16:23.759 -strings and - -00:16:23.759 --> 00:16:25.600 -double quotes screens are highlighted - -00:16:25.600 --> 00:16:27.279 -the same - -00:16:27.279 --> 00:16:30.399 -but in some places - -00:16:30.399 --> 00:16:33.440 -because of some coding conventions - -00:16:33.440 --> 00:16:35.440 -it may be desirable to highlight them - -00:16:35.440 --> 00:16:37.279 -differently for example if - -00:16:37.279 --> 00:16:39.680 -the string is single quoted we may want - -00:16:39.680 --> 00:16:40.880 -to highlight it - -00:16:40.880 --> 00:16:44.399 -as a constant - -00:16:44.399 --> 00:16:46.160 -let's try to see whether we can - -00:16:46.160 --> 00:16:47.600 -distinguish these - -00:16:47.600 --> 00:16:56.240 -two cases - -00:16:56.240 --> 00:17:00.639 -so here we get all the strings - -00:17:00.639 --> 00:17:04.079 -if we want to see if it's single quotes - -00:17:04.079 --> 00:17:04.559 -or - -00:17:04.559 --> 00:17:08.799 -double quote strings - -00:17:08.799 --> 00:17:11.039 -we can try looking at the first - -00:17:11.039 --> 00:17:12.480 -character - -00:17:12.480 --> 00:17:15.280 -of the string I mean the first character - -00:17:15.280 --> 00:17:16.720 -of the note - -00:17:16.720 --> 00:17:19.360 -to check whether it's a single quote or - -00:17:19.360 --> 00:17:33.600 -a double quote - -00:17:33.600 --> 00:17:36.080 -yeah so for that we use the three - -00:17:36.080 --> 00:17:36.799 -setters - -00:17:36.799 --> 00:17:40.160 -support for predicate in this case - -00:17:40.160 --> 00:17:43.360 -we use a match predicate - -00:17:43.360 --> 00:17:46.080 -to check whether the string where the - -00:17:46.080 --> 00:17:46.799 -note - -00:17:46.799 --> 00:17:50.320 -starts with a single quote and with this - -00:17:50.320 --> 00:17:51.280 -pattern - -00:17:51.280 --> 00:17:58.840 -we only capture the single quotes - -00:17:58.840 --> 00:18:00.400 -strings - -00:18:00.400 --> 00:18:03.760 -let's try to give it a different face - -00:18:03.760 --> 00:18:13.039 -so we copy the pattern - -00:18:13.039 --> 00:18:18.640 -and we add this pattern - -00:18:18.640 --> 00:18:25.120 -pop item only - -00:18:25.120 --> 00:18:28.400 -but we also want to give the - -00:18:28.400 --> 00:18:31.440 -capture a different name - -00:18:31.440 --> 00:18:40.840 -let's say we want to highlight it as a - -00:18:40.840 --> 00:18:46.559 -keyword - -00:18:46.559 --> 00:19:06.320 -and now if we refresh the buffer - -00:19:06.320 --> 00:19:08.799 -we see that single quote strings are - -00:19:08.799 --> 00:19:10.320 -highlighted as - -00:19:10.320 --> 00:19:14.400 -keywords - -00:19:14.400 --> 00:19:16.400 -the highlighting patterns can also be - -00:19:16.400 --> 00:19:19.200 -set for a single project - -00:19:19.200 --> 00:19:23.440 -using directory local variable - -00:19:23.440 --> 00:19:26.880 -for example let's take a look at - -00:19:26.880 --> 00:19:35.760 -ems source code - -00:19:35.760 --> 00:19:40.400 -so in image c source there are a lot of - -00:19:40.400 --> 00:19:43.760 -uses of these different macros - -00:19:43.760 --> 00:19:47.679 -to define functions - -00:19:47.679 --> 00:19:51.200 -and you can see - -00:19:51.200 --> 00:19:53.520 -this is actually the function name but - -00:19:53.520 --> 00:19:55.760 -it's highlighted as the - -00:19:55.760 --> 00:19:59.120 -string so what we want - -00:19:59.120 --> 00:20:03.679 -is to somehow recognize this pattern - -00:20:03.679 --> 00:20:07.600 -and highlight it - -00:20:07.600 --> 00:20:11.280 -as highlight this part - -00:20:11.280 --> 00:20:14.559 -with the function phase instead - -00:20:14.559 --> 00:20:17.679 -in order to do that - -00:20:17.679 --> 00:20:20.240 -we put a pattern in this project - -00:20:20.240 --> 00:20:21.760 -directory local - -00:20:21.760 --> 00:20:31.760 -settings file - -00:20:31.760 --> 00:20:34.799 -so we can put this button in the c - -00:20:34.799 --> 00:20:40.159 -mode section - -00:20:40.159 --> 00:20:48.000 -and now if we enable tree sitter - -00:20:48.000 --> 00:20:50.480 -you can see that this is the highlighted - -00:20:50.480 --> 00:20:53.200 -uh - -00:20:53.200 --> 00:20:55.520 -as a normal function definition so this - -00:20:55.520 --> 00:20:56.559 -is the function - -00:20:56.559 --> 00:21:01.200 -face like we wanted - -00:21:01.200 --> 00:21:03.760 -the pattern for this is actually pretty - -00:21:03.760 --> 00:21:07.200 -simple - -00:21:07.200 --> 00:21:10.720 -it's only - -00:21:10.720 --> 00:21:14.720 -only this part so - -00:21:14.720 --> 00:21:17.440 -if it's a function call where the name - -00:21:17.440 --> 00:21:19.679 -of the function is different - -00:21:19.679 --> 00:21:21.600 -then we highlight the different as a - -00:21:21.600 --> 00:21:24.240 -keyword - -00:21:24.240 --> 00:21:27.360 -and then the first string element we - -00:21:27.360 --> 00:21:28.159 -highlighted - -00:21:28.159 --> 00:21:35.360 -as a function name - -00:21:35.360 --> 00:21:37.679 -since the language objects are actually - -00:21:37.679 --> 00:21:39.280 -native code - -00:21:39.280 --> 00:21:40.799 -they have to be compiled for each - -00:21:40.799 --> 00:21:43.440 -platform that we want to support - -00:21:43.440 --> 00:21:45.600 -this will become a big obstacle for - -00:21:45.600 --> 00:21:48.159 -3-seater adoption - -00:21:48.159 --> 00:21:50.240 -therefore I've created a language window - -00:21:50.240 --> 00:21:52.960 -package 3-seater length - -00:21:52.960 --> 00:21:54.960 -that takes care of pre-compiling the - -00:21:54.960 --> 00:21:56.320 -grammars the - -00:21:56.320 --> 00:21:59.679 -most common grammars for all three major - -00:21:59.679 --> 00:22:01.600 -platforms - -00:22:01.600 --> 00:22:04.080 -it also takes care of distributing these - -00:22:04.080 --> 00:22:05.360 -binaries - -00:22:05.360 --> 00:22:08.080 -and provides some highlighting queries - -00:22:08.080 --> 00:22:11.440 -for some of the languages - -00:22:11.440 --> 00:22:13.760 -it should be noted that this package - -00:22:13.760 --> 00:22:15.919 -should be treated as a temporary - -00:22:15.919 --> 00:22:19.919 -distribution mechanism only - -00:22:19.919 --> 00:22:22.240 -to help with bootstrapping three-seaters - -00:22:22.240 --> 00:22:24.720 -adoption - -00:22:24.720 --> 00:22:27.760 -the plan is that eventually these files - -00:22:27.760 --> 00:22:29.760 -should be provided by the language major - -00:22:29.760 --> 00:22:32.480 -modes themselves - -00:22:32.480 --> 00:22:35.120 -but in order to do that we need better - -00:22:35.120 --> 00:22:36.320 -tooling - -00:22:36.320 --> 00:22:40.240 -so we're not there yet - -00:22:40.240 --> 00:22:42.559 -since the call already works reasonably - -00:22:42.559 --> 00:22:43.280 -well - -00:22:43.280 --> 00:22:44.640 -there are several areas that would - -00:22:44.640 --> 00:22:46.320 -benefit from the community's - -00:22:46.320 --> 00:22:49.120 -contribution - -00:22:49.120 --> 00:22:51.520 -so three seaters upstream language - -00:22:51.520 --> 00:22:52.640 -prepositories - -00:22:52.640 --> 00:22:54.400 -already contain highlighting queries on - -00:22:54.400 --> 00:22:55.679 -their own - -00:22:55.679 --> 00:22:58.480 -however they are pretty basic and they - -00:22:58.480 --> 00:23:00.480 -may not fit well with existing emax - -00:23:00.480 --> 00:23:02.559 -conventions - -00:23:02.559 --> 00:23:04.320 -therefore the language bundle has its - -00:23:04.320 --> 00:23:07.120 -own set of highlighting queries - -00:23:07.120 --> 00:23:10.559 -this requires maintenance until language - -00:23:10.559 --> 00:23:11.600 -measurements adopt - -00:23:11.600 --> 00:23:13.760 -three sitter and maintain the queries on - -00:23:13.760 --> 00:23:16.640 -their own - -00:23:16.640 --> 00:23:18.480 -the queries are actually quite easy to - -00:23:18.480 --> 00:23:22.000 -write as you've already seen - -00:23:22.000 --> 00:23:24.240 -you just need to be familiar with the - -00:23:24.240 --> 00:23:25.360 -language - -00:23:25.360 --> 00:23:30.000 -familiar enough to come up with sensible - -00:23:30.000 --> 00:23:35.200 -highlighting patterns - -00:23:35.200 --> 00:23:37.600 -and if you are a maintainer of a - -00:23:37.600 --> 00:23:39.679 -language major mode - -00:23:39.679 --> 00:23:42.320 -you may want to consider integrating - -00:23:42.320 --> 00:23:43.360 -tree sitter into - -00:23:43.360 --> 00:23:46.960 -your mode initially maybe as an - -00:23:46.960 --> 00:23:50.080 -optional feature the integration is - -00:23:50.080 --> 00:23:53.279 -actually pretty straightforward - -00:23:53.279 --> 00:23:56.640 -especially for syntax highlighting - -00:23:56.640 --> 00:24:01.520 -or alternatively - -00:24:01.520 --> 00:24:03.760 -you can also try writing a new major - -00:24:03.760 --> 00:24:04.640 -mode - -00:24:04.640 --> 00:24:08.000 -from scratch that relies on tree sitter - -00:24:08.000 --> 00:24:12.559 -from the very beginning - -00:24:12.559 --> 00:24:16.320 -the code for such a major mode is - -00:24:16.320 --> 00:24:19.679 -quite simple for example - -00:24:19.679 --> 00:24:23.200 -this is the proposed - -00:24:23.200 --> 00:24:26.240 -what mode for web assembly - -00:24:26.240 --> 00:24:31.039 -the code is just - -00:24:31.039 --> 00:24:34.559 -like one page of code not - -00:24:34.559 --> 00:24:39.520 -not a lot - -00:24:39.520 --> 00:24:42.720 -you can also try writing new minor modes - -00:24:42.720 --> 00:24:46.559 -or writing integration packages - -00:24:46.559 --> 00:24:50.080 -for example a lot of package a lot of - -00:24:50.080 --> 00:24:50.880 -packages - -00:24:50.880 --> 00:24:54.559 -may benefit from tree sitter integration - -00:24:54.559 --> 00:24:58.840 -but no one has written the integration - -00:24:58.840 --> 00:25:02.960 -yet - -00:25:02.960 --> 00:25:05.039 -if you are interested in 3-seater you - -00:25:05.039 --> 00:25:06.720 -can use these links to - -00:25:06.720 --> 00:25:10.320 -learn more about it I think that's it - -00:25:10.320 --> 00:25:11.440 -for me today - -00:25:11.440 --> 00:25:18.159 -I'm happy to answer any questions |