diff options
author | Sacha Chua <sacha@sachachua.com> | 2021-03-06 00:12:31 -0500 |
---|---|---|
committer | Sacha Chua <sacha@sachachua.com> | 2021-03-06 00:12:31 -0500 |
commit | c29b14845a8c5e0e9f530134e6f95a051cf697db (patch) | |
tree | d9c41bf74e3091535a9ba5b732dfaedc880ad9c9 | |
parent | 76f373b8d6eea83cdde5ed1305becfeddb13349f (diff) | |
download | emacsconf-wiki-c29b14845a8c5e0e9f530134e6f95a051cf697db.tar.xz emacsconf-wiki-c29b14845a8c5e0e9f530134e6f95a051cf697db.zip |
Transcript for #23 main talk
-rw-r--r-- | 2020/info/23.md | 374 | ||||
-rw-r--r-- | 2020/subtitles/emacsconf-2020--23-incremental-parsing-with-emacs-tree-sitter--tuan-anh-nguyen.vtt (renamed from 2020/subtitles/emacsconf-2020--23-incremental-parsing-with-emacs-tree-sitter--tuan-anh-nguyen-autogen.vtt) | 902 |
2 files changed, 737 insertions, 539 deletions
diff --git a/2020/info/23.md b/2020/info/23.md index f87ed299..7f20b2fd 100644 --- a/2020/info/23.md +++ b/2020/info/23.md @@ -1,8 +1,9 @@ # Incremental Parsing with emacs-tree-sitter Tuấn-Anh Nguyễn -[[!template id=vid src="https://mirror.csclub.uwaterloo.ca/emacsconf/2020/emacsconf-2020--23-incremental-parsing-with-emacs-tree-sitter--tuan-anh-nguyen.webm"]] -[Download compressed .webm video (21.8M)](https://mirror.csclub.uwaterloo.ca/emacsconf/2020/smaller/emacsconf-2020--23-incremental-parsing-with-emacs-tree-sitter--tuan-anh-nguyen--vp9-q56-video-original-audio.webm) +[[!template vidid="mainVideo" id=vid src="https://mirror.csclub.uwaterloo.ca/emacsconf/2020/emacsconf-2020--23-incremental-parsing-with-emacs-tree-sitter--tuan-anh-nguyen.webm" subtitles="/2020/subtitles/emacsconf-2020--23-incremental-parsing-with-emacs-tree-sitter--tuan-anh-nguyen.vtt"]] +[Download compressed .webm video (21.8M)](https://mirror.csclub.uwaterloo.ca/emacsconf/2020/smaller/emacsconf-2020--23-incremental-parsing-with-emacs-tree-sitter--tuan-anh-nguyen--vp9-q56-video-original-audio.webm) +[View transcript](#transcript) [[!template id=vid src="https://mirror.csclub.uwaterloo.ca/emacsconf/2020/emacsconf-2020--23-incremental-parsing-with-emacs-tree-sitter--questions--tuan-anh-nguyen.webm" download="Download Q&A video"]] [Download compressed Q&A .webm video (16.4M)](https://mirror.csclub.uwaterloo.ca/emacsconf/2020/smaller/emacsconf-2020--23-incremental-parsing-with-emacs-tree-sitter--questions--tuan-anh-nguyen--vp9-q56-video-original-audio.webm) @@ -175,3 +176,372 @@ Yes, it is just matter of paperwork. - LSP has high latency and is resource intensive, oft. - An updated video version was uploaded after the event, with the missing introduction to Tree-sitter added. + +<a name="transcript"></a> +# Transcript + +[[!template text="Hello, everyone! My name is Tuấn-Anh." start="00:00:01.520" video="mainVideo" id=subtitle]] +[[!template text="I've been using Emacs for about 10 years." start="00:00:04.400" video="mainVideo" id=subtitle]] +[[!template text="Today, I'm going to talk about tree-sitter," start="00:00:07.200" video="mainVideo" id=subtitle]] +[[!template text="a new Emacs package that allows Emacs" start="00:00:09.280" video="mainVideo" id=subtitle]] +[[!template text="to parse multiple programming languages in real-time." start="00:00:11.351" video="mainVideo" id=subtitle]] +[[!template new="1" text="So what is the problem statement?" start="00:00:17.840" video="mainVideo" id=subtitle]] +[[!template text="In order to support programming functionalities" start="00:00:21.840" video="mainVideo" id=subtitle]] +[[!template text="for a particular language," start="00:00:24.131" video="mainVideo" id=subtitle]] +[[!template text="a text editor needs to have some degree" start="00:00:25.760" video="mainVideo" id=subtitle]] +[[!template text="of language understanding." start="00:00:27.680" video="mainVideo" id=subtitle]] +[[!template text="Traditionally, text editors have relied" start="00:00:29.679" video="mainVideo" id=subtitle]] +[[!template text="very heavily on regular expressions for this." start="00:00:31.840" video="mainVideo" id=subtitle]] +[[!template text="Emacs is no different." start="00:00:34.960" video="mainVideo" id=subtitle]] +[[!template text="Most language major modes use regular expressions" start="00:00:37.013" video="mainVideo" id=subtitle]] +[[!template text="for syntax-highlighting, code navigation," start="00:00:40.170" video="mainVideo" id=subtitle]] +[[!template text="folding, indexing, and so on." start="00:00:42.960" video="mainVideo" id=subtitle]] +[[!template text="Regular expressions are problematic for a couple of reasons." start="00:00:46.618" video="mainVideo" id=subtitle]] +[[!template text="They're slow and inaccurate." start="00:00:50.559" video="mainVideo" id=subtitle]] +[[!template text="They also make the code hard to read and write." start="00:00:53.778" video="mainVideo" id=subtitle]] +[[!template text="Sometimes it's because the regular expressions themselves are very hairy," start="00:00:56.800" video="mainVideo" id=subtitle]] +[[!template text="and sometimes because they are just not powerful enough." start="00:01:01.199" video="mainVideo" id=subtitle]] +[[!template text="Some helper code is usually needed" start="00:01:05.199" video="mainVideo" id=subtitle]] +[[!template text="to parse more intricate language features." start="00:01:08.625" video="mainVideo" id=subtitle]] +[[!template text="That also illustrates the core problem with regular expressions," start="00:01:11.200" video="mainVideo" id=subtitle]] +[[!template text="in that they are not powerful enough to parse programming languages." start="00:01:16.159" video="mainVideo" id=subtitle]] +[[!template text="An example feature that regular expressions cannot handle very well" start="00:01:21.119" video="mainVideo" id=subtitle]] +[[!template text="is string interpolation, which is a very common feature" start="00:01:25.040" video="mainVideo" id=subtitle]] +[[!template text="in many modern programming languages." start="00:01:28.320" video="mainVideo" id=subtitle]] +[[!template new="1" text="It would be much nicer if Emacs somehow" start="00:01:31.680" video="mainVideo" id=subtitle]] +[[!template text="had structural understanding of source code, like IDEs do." start="00:01:34.079" video="mainVideo" id=subtitle]] +[[!template text="There have been multiple efforts" start="00:01:39.520" video="mainVideo" id=subtitle]] +[[!template text="to bring this kind of programming language understanding into Emacs." start="00:01:41.981" video="mainVideo" id=subtitle]] +[[!template text="There are language-specific parsers" start="00:01:45.280" video="mainVideo" id=subtitle]] +[[!template text="written in Elisp" start="00:01:47.119" video="mainVideo" id=subtitle]] +[[!template text="that can be thought of" start="00:01:48.640" video="mainVideo" id=subtitle]] +[[!template text="as the next logical step of the glue code" start="00:01:50.675" video="mainVideo" id=subtitle]] +[[!template text="on top of regular expressions," start="00:01:51.989" video="mainVideo" id=subtitle]] +[[!template text="moving from partial local pattern recognition" start="00:01:53.856" video="mainVideo" id=subtitle]] +[[!template text="into a full-fledged parser." start="00:01:57.356" video="mainVideo" id=subtitle]] +[[!template text="The most prominent example of this approach" start="00:01:59.840" video="mainVideo" id=subtitle]] +[[!template text="is probably the famous js2-mode." start="00:02:02.023" video="mainVideo" id=subtitle]] +[[!template new="1" text="However, this approach has several issues." start="00:02:06.479" video="mainVideo" id=subtitle]] +[[!template text="Parsing is computationally expensive," start="00:02:10.080" video="mainVideo" id=subtitle]] +[[!template text="and Emacs Lisp is not good at that kind of stuff." start="00:02:12.606" video="mainVideo" id=subtitle]] +[[!template new="1" text="Furthermore, maintenance is very troublesome." start="00:02:16.800" video="mainVideo" id=subtitle]] +[[!template text="In order to work on these parsers," start="00:02:19.156" video="mainVideo" id=subtitle]] +[[!template text="first, you have to know Elisp well enough," start="00:02:22.160" video="mainVideo" id=subtitle]] +[[!template text="and then you have to be comfortable with" start="00:02:24.239" video="mainVideo" id=subtitle]] +[[!template text="writing a recursive descending parser," start="00:02:26.606" video="mainVideo" id=subtitle]] +[[!template text="while constantly keeping up with changes to the language itself," start="00:02:29.739" video="mainVideo" id=subtitle]] +[[!template text="which can be evolving very quickly," start="00:02:34.000" video="mainVideo" id=subtitle]] +[[!template text="like Javascript, for example." start="00:02:36.356" video="mainVideo" id=subtitle]] +[[!template new="1" text="Together, these constraints significantly reduce" start="00:02:39.360" video="mainVideo" id=subtitle]] +[[!template text="the pool of potential maintainers." start="00:02:42.373" video="mainVideo" id=subtitle]] +[[!template text="The biggest issue, though, in my opinion," start="00:02:45.680" video="mainVideo" id=subtitle]] +[[!template text="is lack of the set of generic and reusable APIs." start="00:02:47.760" video="mainVideo" id=subtitle]] +[[!template text="This makes them very hard to use" start="00:02:52.139" video="mainVideo" id=subtitle]] +[[!template text="for minor modes that want to deal with" start="00:02:54.319" video="mainVideo" id=subtitle]] +[[!template text="cross-cutting concerns across multiple languages." start="00:02:55.920" video="mainVideo" id=subtitle]] +[[!template new="1" text="The other approach which has been" start="00:02:59.920" video="mainVideo" id=subtitle]] +[[!template text="gaining a lot of momentum in recent years" start="00:03:01.760" video="mainVideo" id=subtitle]] +[[!template text="is externalizing language understanding" start="00:03:04.319" video="mainVideo" id=subtitle]] +[[!template text="to another process," start="00:03:06.560" video="mainVideo" id=subtitle]] +[[!template text="also known as language server protocol." start="00:03:08.159" video="mainVideo" id=subtitle]] +[[!template new="1" text="This second approach is actually a very interesting one." start="00:03:12.239" video="mainVideo" id=subtitle]] +[[!template text="By decoupling language understanding" start="00:03:16.560" video="mainVideo" id=subtitle]] +[[!template text="from the editing facility itself," start="00:03:18.400" video="mainVideo" id=subtitle]] +[[!template text="the LSP servers can attract a lot more contributors," start="00:03:21.280" video="mainVideo" id=subtitle]] +[[!template text="which makes maintenance easier." start="00:03:25.120" video="mainVideo" id=subtitle]] +[[!template new="1" text="However, they also have several issues of their own." start="00:03:27.189" video="mainVideo" id=subtitle]] +[[!template text="Being a separate process," start="00:03:32.400" video="mainVideo" id=subtitle]] +[[!template text="they are usually more resource-intensive," start="00:03:34.089" video="mainVideo" id=subtitle]] +[[!template text="and depending on the language," start="00:03:37.073" video="mainVideo" id=subtitle]] +[[!template text="the LSP server itself can bring with it" start="00:03:39.920" video="mainVideo" id=subtitle]] +[[!template text="a host of additional dependencies" start="00:03:42.159" video="mainVideo" id=subtitle]] +[[!template text="external to Emacs, which may be messy to install and manage." start="00:03:44.640" video="mainVideo" id=subtitle]] +[[!template new="1" text="Furthermore, JSON over RPC has pretty high latency." start="00:03:50.640" video="mainVideo" id=subtitle]] +[[!template text="For one-off tasks like jumping to source" start="00:03:55.120" video="mainVideo" id=subtitle]] +[[!template text="or on-demand completion, it's great." start="00:03:57.840" video="mainVideo" id=subtitle]] +[[!template text="But for things like code highlighting," start="00:04:00.879" video="mainVideo" id=subtitle]] +[[!template text="the latency is just too much." start="00:04:03.040" video="mainVideo" id=subtitle]] +[[!template new="1" text="I was using Rust and I was following the" start="00:04:06.000" video="mainVideo" id=subtitle]] +[[!template text="community effort to improve its IDE support," start="00:04:08.319" video="mainVideo" id=subtitle]] +[[!template text="hoping to integrate some of that into Emacs itself." start="00:04:11.760" video="mainVideo" id=subtitle]] +[[!template text="Then I heard someone from the community mention tree-sitter," start="00:04:15.760" video="mainVideo" id=subtitle]] +[[!template text="and I decided to check it out." start="00:04:19.759" video="mainVideo" id=subtitle]] +[[!template text="Basically, tree-sitter is an incremental parsing library and a parser generator." start="00:04:23.360" video="mainVideo" id=subtitle]] +[[!template text="It was introduced by the Atom editor in 2018." start="00:04:28.720" video="mainVideo" id=subtitle]] +[[!template text="Besides Atom, it is also being integrated" start="00:04:33.040" video="mainVideo" id=subtitle]] +[[!template text="into the NeoVim editor," start="00:04:35.923" video="mainVideo" id=subtitle]] +[[!template text="and Github is using it to power" start="00:04:37.623" video="mainVideo" id=subtitle]] +[[!template text="their source code analysis" start="00:04:41.040" video="mainVideo" id=subtitle]] +[[!template text="and navigation features." start="00:04:42.423" video="mainVideo" id=subtitle]] +[[!template text="It is written in C and can be compiled" start="00:04:45.840" video="mainVideo" id=subtitle]] +[[!template text="for all major platforms." start="00:04:48.639" video="mainVideo" id=subtitle]] +[[!template text="It can even be compiled" start="00:04:50.623" video="mainVideo" id=subtitle]] +[[!template text="to web assembly to run on the web." start="00:04:53.120" video="mainVideo" id=subtitle]] +[[!template text="That's how Github is using it on their website." start="00:04:55.323" video="mainVideo" id=subtitle]] +[[!template new="1" text="So why is tree-sitter an interesting solution to this problem?" start="00:05:00.800" video="mainVideo" id=subtitle]] +[[!template text="There are multiple features that make it an attractive option." start="00:05:05.840" video="mainVideo" id=subtitle]] +[[!template text="It is designed to be fast." start="00:05:10.000" video="mainVideo" id=subtitle]] +[[!template text="By being incremental," start="00:05:11.839" video="mainVideo" id=subtitle]] +[[!template text="the initial parse of a typical big file" start="00:05:13.680" video="mainVideo" id=subtitle]] +[[!template text="can take tens of milliseconds," start="00:05:15.680" video="mainVideo" id=subtitle]] +[[!template text="while subsequent incremental processes" start="00:05:18.160" video="mainVideo" id=subtitle]] +[[!template text="are sub-millisecond." start="00:05:20.240" video="mainVideo" id=subtitle]] +[[!template text="It achieves this by using structural sharing," start="00:05:22.560" video="mainVideo" id=subtitle]] +[[!template text="meaning replacing only affected nodes" start="00:05:26.240" video="mainVideo" id=subtitle]] +[[!template text="in the old tree when it needs to." start="00:05:29.360" video="mainVideo" id=subtitle]] +[[!template text="Also, unlike LSP, being in the same process," start="00:05:32.960" video="mainVideo" id=subtitle]] +[[!template text="it has much lower latency." start="00:05:37.120" video="mainVideo" id=subtitle]] +[[!template new="1" text="Secondly, it provides a uniform programming interface." start="00:05:40.639" video="mainVideo" id=subtitle]] +[[!template text="The same data structures and functions" start="00:05:44.960" video="mainVideo" id=subtitle]] +[[!template text="work on parse trees of different languages." start="00:05:47.039" video="mainVideo" id=subtitle]] +[[!template text="Syntax nodes of different languages" start="00:05:50.400" video="mainVideo" id=subtitle]] +[[!template text="differ only by their types" start="00:05:52.160" video="mainVideo" id=subtitle]] +[[!template text="and their possible child nodes." start="00:05:54.160" video="mainVideo" id=subtitle]] +[[!template text="This is a big advantage over language-specific parsers." start="00:05:55.723" video="mainVideo" id=subtitle]] +[[!template text="Thirdly, it's written in self-contained embeddable C." start="00:06:02.240" video="mainVideo" id=subtitle]] +[[!template text="As I mentioned previously, it can even be compiled to webassembly." start="00:06:06.880" video="mainVideo" id=subtitle]] +[[!template text="This makes integrating it into various editors quite easy" start="00:06:11.723" video="mainVideo" id=subtitle]] +[[!template text="without having to install any external dependencies." start="00:06:16.106" video="mainVideo" id=subtitle]] +[[!template new="1" text="One thing that is not mentioned here" start="00:06:22.880" video="mainVideo" id=subtitle]] +[[!template text="is that being a parser generator," start="00:06:25.503" video="mainVideo" id=subtitle]] +[[!template text="its grammars are declarative." start="00:06:28.000" video="mainVideo" id=subtitle]] +[[!template text="Together with being editor-independent," start="00:06:31.039" video="mainVideo" id=subtitle]] +[[!template text="this makes the pool of potential contributors much larger." start="00:06:34.880" video="mainVideo" id=subtitle]] +[[!template new="1" text="So I was convinced that tree-sitter is a good fit for Emacs." start="00:06:39.139" video="mainVideo" id=subtitle]] +[[!template text="Last year, I started writing the bindings" start="00:06:45.520" video="mainVideo" id=subtitle]] +[[!template text="using dynamic module support introduced in Emacs 25." start="00:06:48.000" video="mainVideo" id=subtitle]] +[[!template text="Dynamic module means there is platform-specific native code involved," start="00:06:53.280" video="mainVideo" id=subtitle]] +[[!template text="but since there are pre-compiled binaries" start="00:06:58.479" video="mainVideo" id=subtitle]] +[[!template text="for the three major platforms," start="00:07:00.560" video="mainVideo" id=subtitle]] +[[!template text="it should work in most places." start="00:07:02.880" video="mainVideo" id=subtitle]] +[[!template text="Currently, the core functionalities are in a pretty good shape." start="00:07:04.706" video="mainVideo" id=subtitle]] +[[!template text="Syntax highlighting is working nicely." start="00:07:09.440" video="mainVideo" id=subtitle]] +[[!template new="1" text="The whole thing is split into three packages." start="00:07:12.560" video="mainVideo" id=subtitle]] +[[!template text="tree-sitter is the main package that other packages should depend on." start="00:07:16.080" video="mainVideo" id=subtitle]] +[[!template text="tree-sitter-langs is the language bundle" start="00:07:20.319" video="mainVideo" id=subtitle]] +[[!template text="that includes support" start="00:07:22.800" video="mainVideo" id=subtitle]] +[[!template text="for most common languages." start="00:07:24.000" video="mainVideo" id=subtitle]] +[[!template text="And finally, the core APIs are in the package tsc," start="00:07:27.199" video="mainVideo" id=subtitle]] +[[!template text="which stands for tree-sitter-core." start="00:07:32.160" video="mainVideo" id=subtitle]] +[[!template text="It is the implicit dependency of the" start="00:07:36.160" video="mainVideo" id=subtitle]] +[[!template text="tree-sitter package." start="00:07:38.800" video="mainVideo" id=subtitle]] +[[!template text="The main package includes the minor mode tree-sitter-mode." start="00:07:43.520" video="mainVideo" id=subtitle]] +[[!template text="This provides the base for other major or minor modes to build on." start="00:07:47.520" video="mainVideo" id=subtitle]] +[[!template text="Using Emacs's change tracking hooks," start="00:07:52.560" video="mainVideo" id=subtitle]] +[[!template text="it enables incremental parsing" start="00:07:54.839" video="mainVideo" id=subtitle]] +[[!template text="and provides a syntax tree that is always up to date" start="00:07:57.073" video="mainVideo" id=subtitle]] +[[!template text="after any edits in a buffer." start="00:08:00.800" video="mainVideo" id=subtitle]] +[[!template text="There is also a basic debug mode" start="00:08:04.080" video="mainVideo" id=subtitle]] +[[!template text="that shows the parse tree in another buffer." start="00:08:06.223" video="mainVideo" id=subtitle]] +[[!template new="1" text="Here is a quick demo." start="00:08:10.080" video="mainVideo" id=subtitle]] +[[!template text="Here I'm in an empty Python buffer" start="00:08:13.360" video="mainVideo" id=subtitle]] +[[!template text="with tree-sitter enabled." start="00:08:15.673" video="mainVideo" id=subtitle]] +[[!template text="I'm going to turn on the debug mode to" start="00:08:17.520" video="mainVideo" id=subtitle]] +[[!template text="see the parse tree." start="00:08:19.440" video="mainVideo" id=subtitle]] +[[!template text="Since the buffer is empty," start="00:08:26.560" video="mainVideo" id=subtitle]] +[[!template text="there is only one node in the syntax tree:" start="00:08:28.106" video="mainVideo" id=subtitle]] +[[!template text="the top-level module node." start="00:08:30.423" video="mainVideo" id=subtitle]] +[[!template text="Let's try typing some code." start="00:08:33.279" video="mainVideo" id=subtitle]] +[[!template text="As you can see, as I type into the Python buffer," start="00:09:11.040" video="mainVideo" id=subtitle]] +[[!template text="the syntax tree updates in real time." start="00:09:14.640" video="mainVideo" id=subtitle]] +[[!template new="1" text="The other minor mode included in the main package" start="00:09:19.120" video="mainVideo" id=subtitle]] +[[!template text="is tree-sitter-hl-mode." start="00:09:22.039" video="mainVideo" id=subtitle]] +[[!template text="It overrides font-lock mode" start="00:09:24.389" video="mainVideo" id=subtitle]] +[[!template text="and provides its own set of phases" start="00:09:26.349" video="mainVideo" id=subtitle]] +[[!template text="and customization options" start="00:09:28.480" video="mainVideo" id=subtitle]] +[[!template text="It is query-driven." start="00:09:30.139" video="mainVideo" id=subtitle]] +[[!template text="That means instead of regular expressions," start="00:09:32.800" video="mainVideo" id=subtitle]] +[[!template text="it uses a Lisp-like query language" start="00:09:36.240" video="mainVideo" id=subtitle]] +[[!template text="to map syntax nodes" start="00:09:39.518" video="mainVideo" id=subtitle]] +[[!template text="to highlighting phrases." start="00:09:40.320" video="mainVideo" id=subtitle]] +[[!template text="I'm going to open a python file with small snippets" start="00:09:41.923" video="mainVideo" id=subtitle]] +[[!template text="that showcase syntax highlighting." start="00:09:45.760" video="mainVideo" id=subtitle]] +[[!template text="So this is the default highlighting" start="00:09:54.320" video="mainVideo" id=subtitle]] +[[!template text="provided by python-mode." start="00:09:55.920" video="mainVideo" id=subtitle]] +[[!template text="This is the highlighting enabled by tree-sitter." start="00:10:00.880" video="mainVideo" id=subtitle]] +[[!template text="As you can see, string interpolation" start="00:10:04.640" video="mainVideo" id=subtitle]] +[[!template text="and decorators are highlighted correctly." start="00:10:07.680" video="mainVideo" id=subtitle]] +[[!template text="Function calls are also highlighted." start="00:10:11.680" video="mainVideo" id=subtitle]] +[[!template text="You can also note that property accessors" start="00:10:17.440" video="mainVideo" id=subtitle]] +[[!template text="and property assignments are highlighted differently." start="00:10:21.839" video="mainVideo" id=subtitle]] +[[!template text="What I like the most about this is that" start="00:10:27.440" video="mainVideo" id=subtitle]] +[[!template text="new bindings are consistently highlighted." start="00:10:29.360" video="mainVideo" id=subtitle]] +[[!template text="This included local variables," start="00:10:32.640" video="mainVideo" id=subtitle]] +[[!template text="function parameters, and property mutations." start="00:10:36.320" video="mainVideo" id=subtitle]] +[[!template new="1" text="Before going through the tree queries" start="00:10:45.760" video="mainVideo" id=subtitle]] +[[!template text="and the syntax highlighting" start="00:10:48.000" video="mainVideo" id=subtitle]] +[[!template text="customization options," start="00:10:49.279" video="mainVideo" id=subtitle]] +[[!template text="let's take a brief look at" start="00:10:51.680" video="mainVideo" id=subtitle]] +[[!template text="the core data structures and functions" start="00:10:53.339" video="mainVideo" id=subtitle]] +[[!template text="that tree-sitter provides." start="00:10:55.040" video="mainVideo" id=subtitle]] +[[!template text="So parsing is done with the help of" start="00:10:58.079" video="mainVideo" id=subtitle]] +[[!template text="a generic parser object." start="00:11:00.743" video="mainVideo" id=subtitle]] +[[!template text="A single parser object can be used to" start="00:11:02.240" video="mainVideo" id=subtitle]] +[[!template text="parse different languages" start="00:11:04.160" video="mainVideo" id=subtitle]] +[[!template text="by sending different language objects to it." start="00:11:06.000" video="mainVideo" id=subtitle]] +[[!template text="The language objects themselves are" start="00:11:09.279" video="mainVideo" id=subtitle]] +[[!template text="loaded from shared libraries." start="00:11:10.880" video="mainVideo" id=subtitle]] +[[!template text="Since tree-sitter-mmode already handles" start="00:11:14.079" video="mainVideo" id=subtitle]] +[[!template text="the parsing part," start="00:11:16.079" video="mainVideo" id=subtitle]] +[[!template text="we will instead focus on the functions" start="00:11:17.360" video="mainVideo" id=subtitle]] +[[!template text="that inspect nodes," start="00:11:19.440" video="mainVideo" id=subtitle]] +[[!template text="and in the resulting path tree," start="00:11:20.800" video="mainVideo" id=subtitle]] +[[!template text="we can ask tree-sitter what is" start="00:11:25.279" video="mainVideo" id=subtitle]] +[[!template text="the syntax node at point." start="00:11:27.030" video="mainVideo" id=subtitle]] +[[!template text="This is an opaque object, so this is not very useful." start="00:11:44.240" video="mainVideo" id=subtitle]] +[[!template text="We can instead ask what is its type." start="00:11:48.480" video="mainVideo" id=subtitle]] +[[!template text="So its type is the symbol comparison operator." start="00:12:03.760" video="mainVideo" id=subtitle]] +[[!template new="1" text="In tree-sitter, there are two kinds of nodes," start="00:12:08.959" video="mainVideo" id=subtitle]] +[[!template text="anonymous nodes and named nodes." start="00:12:11.600" video="mainVideo" id=subtitle]] +[[!template text="Anonymous nodes correspond to simple grammar elements" start="00:12:13.680" video="mainVideo" id=subtitle]] +[[!template text="like keywords, operators, punctuations, and so on." start="00:12:17.040" video="mainVideo" id=subtitle]] +[[!template text="Name nodes, on the other hand, are grammar elements" start="00:12:21.279" video="mainVideo" id=subtitle]] +[[!template text="that are interesting enough on their own" start="00:12:24.656" video="mainVideo" id=subtitle]] +[[!template text="to have a name, like an identifier," start="00:12:26.639" video="mainVideo" id=subtitle]] +[[!template text="an expression, or a function definition." start="00:12:30.029" video="mainVideo" id=subtitle]] +[[!template text="Name node types are symbols," start="00:12:35.440" video="mainVideo" id=subtitle]] +[[!template text="while anonymous node types are strings." start="00:12:37.323" video="mainVideo" id=subtitle]] +[[!template text="For example, if we are on this comparison operator," start="00:12:42.639" video="mainVideo" id=subtitle]] +[[!template text="the node type should be a string." start="00:12:49.760" video="mainVideo" id=subtitle]] +[[!template text="We can also get other information about the node." start="00:12:55.920" video="mainVideo" id=subtitle]] +[[!template text="For example: what is this text," start="00:12:58.959" video="mainVideo" id=subtitle]] +[[!template text="or where it is in the buffer," start="00:13:09.680" video="mainVideo" id=subtitle]] +[[!template text="or what is its parent." start="00:13:20.800" video="mainVideo" id=subtitle]] +[[!template new="1" text="There are many other APIs to query" start="00:13:43.199" video="mainVideo" id=subtitle]] +[[!template text="our node's properties." start="00:13:46.106" video="mainVideo" id=subtitle]] +[[!template text="tree-sitter allows searching" start="00:13:52.639" video="mainVideo" id=subtitle]] +[[!template text="for structural patterns within a parse tree." start="00:13:54.234" video="mainVideo" id=subtitle]] +[[!template text="It does so through a Lisp-like language." start="00:13:58.240" video="mainVideo" id=subtitle]] +[[!template text="This language supports matching by node types," start="00:14:01.440" video="mainVideo" id=subtitle]] +[[!template text="field names, and predicates." start="00:14:04.639" video="mainVideo" id=subtitle]] +[[!template text="It also allows capturing nodes for further processing." start="00:14:07.760" video="mainVideo" id=subtitle]] +[[!template text="Let's try to see some examples." start="00:14:12.639" video="mainVideo" id=subtitle]] +[[!template text="So in this very simple query," start="00:14:37.680" video="mainVideo" id=subtitle]] +[[!template text="we just try to highlight all the identifiers in the buffer." start="00:14:40.206" video="mainVideo" id=subtitle]] +[[!template text="This s side tells tree-sitter to capture a node." start="00:14:49.040" video="mainVideo" id=subtitle]] +[[!template text="In the context of the query builder," start="00:14:53.120" video="mainVideo" id=subtitle]] +[[!template text="it's not very important," start="00:14:55.507" video="mainVideo" id=subtitle]] +[[!template text="but in normal highlighting query," start="00:14:57.360" video="mainVideo" id=subtitle]] +[[!template text="this will determine" start="00:14:59.706" video="mainVideo" id=subtitle]] +[[!template text="the face used to highlight the note." start="00:15:01.760" video="mainVideo" id=subtitle]] +[[!template text="Suppose we want to capture" start="00:15:06.639" video="mainVideo" id=subtitle]] +[[!template text="all the function names," start="00:15:08.256" video="mainVideo" id=subtitle]] +[[!template text="instead of just any identifier." start="00:15:10.320" video="mainVideo" id=subtitle]] +[[!template text="You can improve the query like this." start="00:15:13.519" video="mainVideo" id=subtitle]] +[[!template text="This will highlight the whole definition." start="00:15:29.440" video="mainVideo" id=subtitle]] +[[!template text="But we only want to capture the function name," start="00:15:32.639" video="mainVideo" id=subtitle]] +[[!template text="which means the identifier here." start="00:15:36.399" video="mainVideo" id=subtitle]] +[[!template text="So we move the capture to after the identifier node." start="00:15:41.054" video="mainVideo" id=subtitle]] +[[!template text="If we want to capture the class names as well," start="00:15:49.600" video="mainVideo" id=subtitle]] +[[!template text="we just add another pattern." start="00:15:52.959" video="mainVideo" id=subtitle]] +[[!template new="1" text="Let's look at a more practical example." start="00:16:10.079" video="mainVideo" id=subtitle]] +[[!template text="Here we can see that single-quoted strings" start="00:16:20.320" video="mainVideo" id=subtitle]] +[[!template text="and double-quoted strings are highlighted the same." start="00:16:23.468" video="mainVideo" id=subtitle]] +[[!template text="But in some places," start="00:16:27.279" video="mainVideo" id=subtitle]] +[[!template text="because of some coding conventions," start="00:16:30.399" video="mainVideo" id=subtitle]] +[[!template text="it may be desirable to highlight them differently." start="00:16:33.440" video="mainVideo" id=subtitle]] +[[!template text="For example, if the string is single-quoted," start="00:16:36.373" video="mainVideo" id=subtitle]] +[[!template text="we may want to highlight it as a constant." start="00:16:39.073" video="mainVideo" id=subtitle]] +[[!template text="Let's try to see whether we can" start="00:16:44.399" video="mainVideo" id=subtitle]] +[[!template text="distinguish these two cases." start="00:16:46.160" video="mainVideo" id=subtitle]] +[[!template text="So here we get all the strings." start="00:16:56.240" video="mainVideo" id=subtitle]] +[[!template text="If we want to see if it's single quotes" start="00:17:00.639" video="mainVideo" id=subtitle]] +[[!template text="or double quote strings," start="00:17:04.079" video="mainVideo" id=subtitle]] +[[!template text="we can try looking at the first character of the string--" start="00:17:08.799" video="mainVideo" id=subtitle]] +[[!template text="I mean the first character of the node--" start="00:17:13.436" video="mainVideo" id=subtitle]] +[[!template text="to check whether it's a single quote or a double quote." start="00:17:16.720" video="mainVideo" id=subtitle]] +[[!template text="So for that, we use tree-sitter's support for predicates." start="00:17:33.600" video="mainVideo" id=subtitle]] +[[!template text="In this case, we use a match predicate" start="00:17:38.920" video="mainVideo" id=subtitle]] +[[!template text="to check whether the string-- whether the node starts" start="00:17:43.360" video="mainVideo" id=subtitle]] +[[!template text="with a single quote." start="00:17:47.339" video="mainVideo" id=subtitle]] +[[!template text="And with this pattern," start="00:17:49.556" video="mainVideo" id=subtitle]] +[[!template text="we only capture the single-quotes strings." start="00:17:51.280" video="mainVideo" id=subtitle]] +[[!template text="Let's try to give it a different face." start="00:18:00.400" video="mainVideo" id=subtitle]] +[[!template text="So we copy the pattern," start="00:18:03.760" video="mainVideo" id=subtitle]] +[[!template text="and we add this pattern for Python only." start="00:18:13.039" video="mainVideo" id=subtitle]] +[[!template text="But we also want to give the capture a different name." start="00:18:25.120" video="mainVideo" id=subtitle]] +[[!template text="Let's say we want to highlight it as a keyword." start="00:18:31.440" video="mainVideo" id=subtitle]] +[[!template text="And now, if we refresh the buffer," start="00:18:46.559" video="mainVideo" id=subtitle]] +[[!template text="we see that single quote strings" start="00:19:06.320" video="mainVideo" id=subtitle]] +[[!template text="are highlighted as keywords." start="00:19:08.523" video="mainVideo" id=subtitle]] +[[!template new="1" text="The highlighting patterns" start="00:19:14.400" video="mainVideo" id=subtitle]] +[[!template text="can also be set for a single project" start="00:19:15.751" video="mainVideo" id=subtitle]] +[[!template text="using directory-local variables." start="00:19:19.200" video="mainVideo" id=subtitle]] +[[!template text="For example, let's take a look at Emacs's source code." start="00:19:23.440" video="mainVideo" id=subtitle]] +[[!template text="So in Emacs's C source, there are a lot of uses" start="00:19:35.760" video="mainVideo" id=subtitle]] +[[!template text="of these different macros" start="00:19:41.123" video="mainVideo" id=subtitle]] +[[!template text="to define functions," start="00:19:43.760" video="mainVideo" id=subtitle]] +[[!template text="and you can see this is actually the function name," start="00:19:47.679" video="mainVideo" id=subtitle]] +[[!template text="but it's highlighted as the string." start="00:19:53.256" video="mainVideo" id=subtitle]] +[[!template text="So what we want is to somehow recognize this pattern" start="00:19:56.373" video="mainVideo" id=subtitle]] +[[!template text="and highlight it." start="00:20:03.679" video="mainVideo" id=subtitle]] +[[!template text="Highlight this part" start="00:20:07.600" video="mainVideo" id=subtitle]] +[[!template text="with the function face instead." start="00:20:11.280" video="mainVideo" id=subtitle]] +[[!template text="In order to do that," start="00:20:14.559" video="mainVideo" id=subtitle]] +[[!template text="we put a pattern in this project's directory-local settings file." start="00:20:17.679" video="mainVideo" id=subtitle]] +[[!template text="So we can put this button in the C mode section." start="00:20:31.760" video="mainVideo" id=subtitle]] +[[!template text="And now, if we enable tree-sitter," start="00:20:40.159" video="mainVideo" id=subtitle]] +[[!template text="you can see that this is highlighted" start="00:20:48.000" video="mainVideo" id=subtitle]] +[[!template text="as a normal function definition." start="00:20:53.200" video="mainVideo" id=subtitle]] +[[!template text="So this is the function face like we wanted." start="00:20:55.056" video="mainVideo" id=subtitle]] +[[!template text="The pattern for this is actually pretty simple." start="00:21:01.200" video="mainVideo" id=subtitle]] +[[!template text="It's only this part." start="00:21:07.200" video="mainVideo" id=subtitle]] +[[!template text="So if it's a function call" start="00:21:12.373" video="mainVideo" id=subtitle]] +[[!template text="where the name of the function is defun," start="00:21:16.456" video="mainVideo" id=subtitle]] +[[!template text="then we highlight the defun as a keyword," start="00:21:19.679" video="mainVideo" id=subtitle]] +[[!template text="and then the first string element," start="00:21:24.240" video="mainVideo" id=subtitle]] +[[!template text="we highlight it as a function name." start="00:21:26.923" video="mainVideo" id=subtitle]] +[[!template new="1" text="Since the language objects are actually native code," start="00:21:35.360" video="mainVideo" id=subtitle]] +[[!template text="they have to be compiled for each platform" start="00:21:39.280" video="mainVideo" id=subtitle]] +[[!template text="that we want to support." start="00:21:41.459" video="mainVideo" id=subtitle]] +[[!template text="This will become a big obstacle for tree-sitter adoption." start="00:21:43.440" video="mainVideo" id=subtitle]] +[[!template text="Therefore, I've created a language bundle package, tree-sitter-langs," start="00:21:48.159" video="mainVideo" id=subtitle]] +[[!template text="that takes care of pre-compiling the grammars," start="00:21:52.960" video="mainVideo" id=subtitle]] +[[!template text="the most common grammars for all three major platforms." start="00:21:55.773" video="mainVideo" id=subtitle]] +[[!template text="It also takes care of distributing these binaries" start="00:22:01.600" video="mainVideo" id=subtitle]] +[[!template text="and provides some highlighting queries" start="00:22:05.360" video="mainVideo" id=subtitle]] +[[!template text="for some of the languages." start="00:22:08.080" video="mainVideo" id=subtitle]] +[[!template text="It should be noted that this package" start="00:22:11.440" video="mainVideo" id=subtitle]] +[[!template text="should be treated as a temporary distribution mechanism only," start="00:22:13.760" video="mainVideo" id=subtitle]] +[[!template text="to help with bootstrapping tree-sitter adoption." start="00:22:19.919" video="mainVideo" id=subtitle]] +[[!template text="The plan is that eventually these files" start="00:22:24.720" video="mainVideo" id=subtitle]] +[[!template text="should be provided by" start="00:22:27.760" video="mainVideo" id=subtitle]] +[[!template text="the language major modes themselves." start="00:22:29.156" video="mainVideo" id=subtitle]] +[[!template text="But in order to do that, we need better tooling," start="00:22:32.480" video="mainVideo" id=subtitle]] +[[!template text="so we're not there yet." start="00:22:36.320" video="mainVideo" id=subtitle]] +[[!template new="1" text="Since the core already works reasonably well," start="00:22:40.240" video="mainVideo" id=subtitle]] +[[!template text="there are several areas that would benefit" start="00:22:43.280" video="mainVideo" id=subtitle]] +[[!template text="from the community's contribution." start="00:22:45.289" video="mainVideo" id=subtitle]] +[[!template text="So tree-sitter's upstream language repositories" start="00:22:49.120" video="mainVideo" id=subtitle]] +[[!template text="already contain highlighting queries on their own." start="00:22:52.640" video="mainVideo" id=subtitle]] +[[!template text="However, they are pretty basic," start="00:22:55.679" video="mainVideo" id=subtitle]] +[[!template text="and they may not fit well with existing Emacs conventions." start="00:22:57.573" video="mainVideo" id=subtitle]] +[[!template text="Therefore, the language bundle has its own set of highlighting queries." start="00:23:02.559" video="mainVideo" id=subtitle]] +[[!template text="This requires maintenance until language major modes adopt tree-sitter" start="00:23:07.120" video="mainVideo" id=subtitle]] +[[!template text="and maintain the queries on their own." start="00:23:12.556" video="mainVideo" id=subtitle]] +[[!template text="The queries are actually quite easy to write," start="00:23:16.640" video="mainVideo" id=subtitle]] +[[!template text="as you've already seen." start="00:23:19.056" video="mainVideo" id=subtitle]] +[[!template text="You just need to be familiar with the language," start="00:23:22.000" video="mainVideo" id=subtitle]] +[[!template text="familiar enough to come up with sensible highlighting patterns." start="00:23:25.360" video="mainVideo" id=subtitle]] +[[!template text="And if you are a maintainer of a language major mode," start="00:23:35.200" video="mainVideo" id=subtitle]] +[[!template text="you may want to consider integrating tree-sitter into your mode," start="00:23:39.679" video="mainVideo" id=subtitle]] +[[!template text="initially maybe as an optional feature." start="00:23:44.189" video="mainVideo" id=subtitle]] +[[!template text="The integration is actually pretty straightforward," start="00:23:48.573" video="mainVideo" id=subtitle]] +[[!template text="especially for syntax highlighting." start="00:23:53.279" video="mainVideo" id=subtitle]] +[[!template text="Or alternatively," start="00:23:56.640" video="mainVideo" id=subtitle]] +[[!template text="you can also try writing a new major mode from scratch" start="00:24:01.520" video="mainVideo" id=subtitle]] +[[!template text="that relies on tree-sitter" start="00:24:05.760" video="mainVideo" id=subtitle]] +[[!template text="from the very beginning." start="00:24:08.000" video="mainVideo" id=subtitle]] +[[!template text="The code for such a major mode is quite simple." start="00:24:12.559" video="mainVideo" id=subtitle]] +[[!template text="For example, this is the proposed" start="00:24:17.523" video="mainVideo" id=subtitle]] +[[!template text="wat-mode for web assembly." start="00:24:23.200" video="mainVideo" id=subtitle]] +[[!template text="The code is just one page of code, not a lot." start="00:24:26.240" video="mainVideo" id=subtitle]] +[[!template text="You can also try writing new minor modes" start="00:24:39.520" video="mainVideo" id=subtitle]] +[[!template text="or writing integration packages." start="00:24:42.720" video="mainVideo" id=subtitle]] +[[!template text="For example, a lot of packages" start="00:24:46.559" video="mainVideo" id=subtitle]] +[[!template text="may benefit from tree-sitter integration," start="00:24:50.880" video="mainVideo" id=subtitle]] +[[!template text="but no one has written the integration yet." start="00:24:54.559" video="mainVideo" id=subtitle]] +[[!template new="1" text="If you are interested in tree-sitter," start="00:25:02.960" video="mainVideo" id=subtitle]] +[[!template text="you can use these links to learn more about it." start="00:25:04.836" video="mainVideo" id=subtitle]] +[[!template text="I think that's it for me today." start="00:25:08.023" video="mainVideo" id=subtitle]] +[[!template text="I'm happy to answer any questions." start="00:25:11.440" video="mainVideo" id=subtitle]] diff --git a/2020/subtitles/emacsconf-2020--23-incremental-parsing-with-emacs-tree-sitter--tuan-anh-nguyen-autogen.vtt b/2020/subtitles/emacsconf-2020--23-incremental-parsing-with-emacs-tree-sitter--tuan-anh-nguyen.vtt index 62ad5f65..276f3150 100644 --- a/2020/subtitles/emacsconf-2020--23-incremental-parsing-with-emacs-tree-sitter--tuan-anh-nguyen-autogen.vtt +++ b/2020/subtitles/emacsconf-2020--23-incremental-parsing-with-emacs-tree-sitter--tuan-anh-nguyen.vtt @@ -606,802 +606,630 @@ This is the highlighting enabled by tree-sitter. 00:10:04.640 --> 00:10:07.680 -as you can see string interpolation +As you can see, string interpolation 00:10:07.680 --> 00:10:11.680 -and decorators are highlighted correctly +and decorators are highlighted correctly. 00:10:11.680 --> 00:10:17.440 -function calls are also highlighted +Function calls are also highlighted. -00:10:17.440 --> 00:10:20.240 -you can also note that property +00:10:17.440 --> 00:10:21.839 +You can also note that +property accessors -00:10:20.240 --> 00:10:21.839 -assessors - -00:10:21.839 --> 00:10:24.640 +00:10:21.839 --> 00:10:27.440 and property assignments are highlighted - -00:10:24.640 --> 00:10:27.440 -differently +differently. 00:10:27.440 --> 00:10:29.360 -what I like the most about this is that +What I like the most about this is that -00:10:29.360 --> 00:10:30.880 +00:10:29.360 --> 00:10:32.640 new bindings are consistently - -00:10:30.880 --> 00:10:32.640 -highlighted +highlighted. 00:10:32.640 --> 00:10:36.320 -this included local variable +This included local variables, -00:10:36.320 --> 00:10:39.760 -function parameters and property - -00:10:39.760 --> 00:10:45.760 -mutations +00:10:36.320 --> 00:10:45.760 +function parameters, and property +mutations. 00:10:45.760 --> 00:10:48.000 -before going through the three queries +Before going through the tree queries 00:10:48.000 --> 00:10:49.279 and the syntax highlighting 00:10:49.279 --> 00:10:51.680 -customization options +customization options, -00:10:51.680 --> 00:10:53.760 -let's take a brief look at the core data +00:10:51.680 --> 00:10:53.339 +let's take a brief look at -00:10:53.760 --> 00:10:55.040 -structures and functions +00:10:53.339 --> 00:10:55.040 +the core data structures and functions 00:10:55.040 --> 00:10:58.079 -that tree sitter provides +that tree-sitter provides. -00:10:58.079 --> 00:10:59.839 -so parsing is done with the help of a +00:10:58.079 --> 00:11:00.743 +So parsing is done with the help of -00:10:59.839 --> 00:11:02.240 -generic parser object +00:11:00.743 --> 00:11:02.240 +a generic parser object. 00:11:02.240 --> 00:11:04.160 -a single parser object can be used to +A single parser object can be used to 00:11:04.160 --> 00:11:06.000 -pass different languages +parse different languages -00:11:06.000 --> 00:11:08.320 +00:11:06.000 --> 00:11:09.279 by sending different language objects to - -00:11:08.320 --> 00:11:09.279 -it +it. 00:11:09.279 --> 00:11:10.880 -the language objects themselves are +The language objects themselves are 00:11:10.880 --> 00:11:14.079 -loaded from shared libraries +loaded from shared libraries. 00:11:14.079 --> 00:11:16.079 -since three seater mode already handles +Since tree-sitter-mmode already handles 00:11:16.079 --> 00:11:17.360 -the parsing part +the parsing part, 00:11:17.360 --> 00:11:19.440 we will instead focus on the functions 00:11:19.440 --> 00:11:20.800 -that inspect nodes +that inspect nodes, 00:11:20.800 --> 00:11:25.279 -and in the resulting path tree - -00:11:25.279 --> 00:11:27.200 -we can ask tree sitter what is the +and in the resulting path tree, -00:11:27.200 --> 00:11:44.240 -syntax node at point +00:11:25.279 --> 00:11:27.030 +we can ask tree-sitter what is -00:11:44.240 --> 00:11:47.200 -uh is it an opaque object so this is not +00:11:27.030 --> 00:11:44.240 +the syntax node at point. -00:11:47.200 --> 00:11:48.480 -very useful +00:11:44.240 --> 00:11:48.480 +This is an opaque object, so this is not +very useful. 00:11:48.480 --> 00:12:03.760 -we can instead ask what is its type +We can instead ask what is its type. -00:12:03.760 --> 00:12:06.560 -so his type is the symbol comparison - -00:12:06.560 --> 00:12:08.959 -operator +00:12:03.760 --> 00:12:08.959 +So its type is the symbol comparison +operator. 00:12:08.959 --> 00:12:11.600 -trees there are two kinds of nodes +In tree-sitter, there are two kinds of nodes, 00:12:11.600 --> 00:12:13.680 -anonymous nodes and named nodes - -00:12:13.680 --> 00:12:15.519 -anonymous nodes correspond to simple +anonymous nodes and named nodes. -00:12:15.519 --> 00:12:17.040 +00:12:13.680 --> 00:12:17.040 +Anonymous nodes correspond to simple grammar elements -00:12:17.040 --> 00:12:19.839 -like keywords operators punctuations and - -00:12:19.839 --> 00:12:21.279 -so on - -00:12:21.279 --> 00:12:24.160 -name nodes on the other hand grammar - -00:12:24.160 --> 00:12:25.920 -elements that are interesting enough for +00:12:17.040 --> 00:12:21.279 +like keywords, operators, punctuations, +and so on. -00:12:25.920 --> 00:12:26.639 -their own - -00:12:26.639 --> 00:12:30.320 -to have a name like an identifier an +00:12:21.279 --> 00:12:24.656 +Name nodes, on the other hand, are +grammar elements -00:12:30.320 --> 00:12:31.839 -expression +00:12:24.656 --> 00:12:26.639 +that are interesting enough +on their own -00:12:31.839 --> 00:12:35.440 -or a function definition +00:12:26.639 --> 00:12:30.029 +to have a name, like an identifier, -00:12:35.440 --> 00:12:37.760 -name node types are symbols while +00:12:30.029 --> 00:12:35.440 +an expression, or a function definition. -00:12:37.760 --> 00:12:42.639 -anonymous node types are strings +00:12:35.440 --> 00:12:37.323 +Name node types are symbols, -00:12:42.639 --> 00:12:46.320 -for example if we are on this +00:12:37.323 --> 00:12:42.639 +while anonymous node types are strings. -00:12:46.320 --> 00:12:49.760 -comparison operator +00:12:42.639 --> 00:12:49.760 +For example, if we are on this +comparison operator, 00:12:49.760 --> 00:12:55.920 -the node type should be a string - -00:12:55.920 --> 00:12:57.920 -we can also get other information about +the node type should be a string. -00:12:57.920 --> 00:12:58.959 -the node +00:12:55.920 --> 00:12:58.959 +We can also get other information about +the node. 00:12:58.959 --> 00:13:09.680 -for example what is this text +For example: what is this text, 00:13:09.680 --> 00:13:20.800 -or where it is in the buffer +or where it is in the buffer, 00:13:20.800 --> 00:13:43.199 -or what is its parent +or what is its parent. -00:13:43.199 --> 00:13:46.160 -there are many other apis to query or +00:13:43.199 --> 00:13:46.106 +There are many other APIs to query -00:13:46.160 --> 00:13:46.839 -not +00:13:46.106 --> 00:13:52.639 +our node's properties. -00:13:46.839 --> 00:13:52.639 -properties +00:13:52.639 --> 00:13:54.234 +tree-sitter allows searching -00:13:52.639 --> 00:13:54.399 -tree sitter allows searching for - -00:13:54.399 --> 00:13:58.240 -structural patterns within a parse tree +00:13:54.234 --> 00:13:58.240 +for structural patterns +within a parse tree. 00:13:58.240 --> 00:14:01.440 -it does so through a list like language - -00:14:01.440 --> 00:14:03.519 -this language supports by the matching +It does so through a Lisp-like language. -00:14:03.519 --> 00:14:04.639 -by node types +00:14:01.440 --> 00:14:04.639 +This language supports matching +by node types, 00:14:04.639 --> 00:14:07.760 -field names and predicates +field names, and predicates. -00:14:07.760 --> 00:14:10.079 -it also allows capturing nodes for - -00:14:10.079 --> 00:14:12.639 -further processing +00:14:07.760 --> 00:14:12.639 +It also allows capturing nodes for +further processing. 00:14:12.639 --> 00:14:37.680 -let's try to see some examples - -00:14:37.680 --> 00:14:41.040 -so in this very simple query we just - -00:14:41.040 --> 00:14:43.839 -try to highlight all the identifiers in +Let's try to see some examples. -00:14:43.839 --> 00:14:49.040 -the buffer +00:14:37.680 --> 00:14:40.206 +So in this very simple query, -00:14:49.040 --> 00:14:51.920 -this s side tells trisito to capture a +00:14:40.206 --> 00:14:49.040 +we just try to highlight all the +identifiers in the buffer. -00:14:51.920 --> 00:14:53.120 -node +00:14:49.040 --> 00:14:53.120 +This s side tells tree-sitter +to capture a node. -00:14:53.120 --> 00:14:55.839 -in the context of the query builder it's +00:14:53.120 --> 00:14:55.507 +In the context of the query builder, -00:14:55.839 --> 00:14:57.360 -not very important +00:14:55.507 --> 00:14:57.360 +it's not very important, -00:14:57.360 --> 00:15:00.320 -but in normal highlighting query this +00:14:57.360 --> 00:14:59.706 +but in normal highlighting query, -00:15:00.320 --> 00:15:01.760 -will determine +00:14:59.706 --> 00:15:01.760 +this will determine 00:15:01.760 --> 00:15:06.639 -the face used to highlight the note +the face used to highlight the note. -00:15:06.639 --> 00:15:08.800 -suppose we want to capture all the +00:15:06.639 --> 00:15:08.256 +Suppose we want to capture -00:15:08.800 --> 00:15:10.320 -function names +00:15:08.256 --> 00:15:10.320 +all the function names, 00:15:10.320 --> 00:15:13.519 -instead of just any identifier +instead of just any identifier. 00:15:13.519 --> 00:15:29.440 -you can improve the query like this - -00:15:29.440 --> 00:15:31.600 -uh this will highlight the whole - -00:15:31.600 --> 00:15:32.639 -definition +You can improve the query like this. -00:15:32.639 --> 00:15:35.519 -but we only want to capture the function +00:15:29.440 --> 00:15:32.639 +This will highlight the whole definition. -00:15:35.519 --> 00:15:36.399 -name +00:15:32.639 --> 00:15:36.399 +But we only want to capture +the function name, -00:15:36.399 --> 00:15:39.600 -which means the identifier +00:15:36.399 --> 00:15:41.054 +which means the identifier here. -00:15:39.600 --> 00:15:42.800 -here so we +00:15:41.054 --> 00:15:49.600 +So we move the capture to after the +identifier node. -00:15:42.800 --> 00:15:46.320 -move the capture to after the identifier - -00:15:46.320 --> 00:15:49.600 -node - -00:15:49.600 --> 00:15:51.759 -if we want to capture the class names as - -00:15:51.759 --> 00:15:52.959 -well +00:15:49.600 --> 00:15:52.959 +If we want to capture the +class names as well, 00:15:52.959 --> 00:16:10.079 -we just add another pattern +we just add another pattern. 00:16:10.079 --> 00:16:20.320 -let's look at a more practical example - -00:16:20.320 --> 00:16:22.959 -here we can see that single quotes - -00:16:22.959 --> 00:16:23.759 -strings and +Let's look at a more practical example. -00:16:23.759 --> 00:16:25.600 -double quotes screens are highlighted +00:16:20.320 --> 00:16:23.468 +Here we can see that +single-quoted strings -00:16:25.600 --> 00:16:27.279 -the same +00:16:23.468 --> 00:16:27.279 +and double-quoted strings are +highlighted the same. 00:16:27.279 --> 00:16:30.399 -but in some places +But in some places, 00:16:30.399 --> 00:16:33.440 -because of some coding conventions +because of some coding conventions, -00:16:33.440 --> 00:16:35.440 +00:16:33.440 --> 00:16:36.373 it may be desirable to highlight them +differently. -00:16:35.440 --> 00:16:37.279 -differently for example if +00:16:36.373 --> 00:16:39.073 +For example, if the string is +single-quoted, -00:16:37.279 --> 00:16:39.680 -the string is single quoted we may want - -00:16:39.680 --> 00:16:40.880 -to highlight it - -00:16:40.880 --> 00:16:44.399 -as a constant +00:16:39.073 --> 00:16:44.399 +we may want to highlight it as a +constant. 00:16:44.399 --> 00:16:46.160 -let's try to see whether we can - -00:16:46.160 --> 00:16:47.600 -distinguish these +Let's try to see whether we can -00:16:47.600 --> 00:16:56.240 -two cases +00:16:46.160 --> 00:16:56.240 +distinguish these two cases. 00:16:56.240 --> 00:17:00.639 -so here we get all the strings +So here we get all the strings. 00:17:00.639 --> 00:17:04.079 -if we want to see if it's single quotes +If we want to see if it's single quotes -00:17:04.079 --> 00:17:04.559 -or +00:17:04.079 --> 00:17:08.799 +or double quote strings, -00:17:04.559 --> 00:17:08.799 -double quote strings - -00:17:08.799 --> 00:17:11.039 +00:17:08.799 --> 00:17:13.436 we can try looking at the first +character of the string-- -00:17:11.039 --> 00:17:12.480 -character - -00:17:12.480 --> 00:17:15.280 -of the string I mean the first character +00:17:13.436 --> 00:17:16.720 +I mean the first character of the node-- -00:17:15.280 --> 00:17:16.720 -of the note - -00:17:16.720 --> 00:17:19.360 +00:17:16.720 --> 00:17:33.600 to check whether it's a single quote or +a double quote. -00:17:19.360 --> 00:17:33.600 -a double quote - -00:17:33.600 --> 00:17:36.080 -yeah so for that we use the three - -00:17:36.080 --> 00:17:36.799 -setters - -00:17:36.799 --> 00:17:40.160 -support for predicate in this case - -00:17:40.160 --> 00:17:43.360 -we use a match predicate +00:17:33.600 --> 00:17:38.920 +So for that, we use tree-sitter's +support for predicates. -00:17:43.360 --> 00:17:46.080 -to check whether the string where the +00:17:38.920 --> 00:17:43.360 +In this case, we use a match predicate -00:17:46.080 --> 00:17:46.799 -note +00:17:43.360 --> 00:17:47.339 +to check whether the string-- +whether the node starts -00:17:46.799 --> 00:17:50.320 -starts with a single quote and with this +00:17:47.339 --> 00:17:49.556 +with a single quote. -00:17:50.320 --> 00:17:51.280 -pattern +00:17:49.556 --> 00:17:51.280 +And with this pattern, -00:17:51.280 --> 00:17:58.840 -we only capture the single quotes - -00:17:58.840 --> 00:18:00.400 -strings +00:17:51.280 --> 00:18:00.400 +we only capture the single-quotes +strings. 00:18:00.400 --> 00:18:03.760 -let's try to give it a different face +Let's try to give it a different face. 00:18:03.760 --> 00:18:13.039 -so we copy the pattern - -00:18:13.039 --> 00:18:18.640 -and we add this pattern - -00:18:18.640 --> 00:18:25.120 -pop item only +So we copy the pattern, -00:18:25.120 --> 00:18:28.400 -but we also want to give the +00:18:13.039 --> 00:18:25.120 +and we add this pattern for Python only. -00:18:28.400 --> 00:18:31.440 -capture a different name +00:18:25.120 --> 00:18:31.440 +But we also want to give the capture +a different name. -00:18:31.440 --> 00:18:40.840 -let's say we want to highlight it as a - -00:18:40.840 --> 00:18:46.559 -keyword +00:18:31.440 --> 00:18:46.559 +Let's say we want to highlight it +as a keyword. 00:18:46.559 --> 00:19:06.320 -and now if we refresh the buffer - -00:19:06.320 --> 00:19:08.799 -we see that single quote strings are +And now, if we refresh the buffer, -00:19:08.799 --> 00:19:10.320 -highlighted as +00:19:06.320 --> 00:19:08.523 +we see that single quote strings -00:19:10.320 --> 00:19:14.400 -keywords +00:19:08.523 --> 00:19:14.400 +are highlighted as keywords. -00:19:14.400 --> 00:19:16.400 -the highlighting patterns can also be +00:19:14.400 --> 00:19:15.751 +The highlighting patterns -00:19:16.400 --> 00:19:19.200 -set for a single project +00:19:15.751 --> 00:19:19.200 +can also be set for a single project 00:19:19.200 --> 00:19:23.440 -using directory local variable +using directory-local variables. -00:19:23.440 --> 00:19:26.880 -for example let's take a look at +00:19:23.440 --> 00:19:35.760 +For example, let's take a look at +Emacs's source code. -00:19:26.880 --> 00:19:35.760 -ems source code +00:19:35.760 --> 00:19:41.123 +So in Emacs's C source, +there are a lot of uses -00:19:35.760 --> 00:19:40.400 -so in image c source there are a lot of - -00:19:40.400 --> 00:19:43.760 -uses of these different macros +00:19:41.123 --> 00:19:43.760 +of these different macros 00:19:43.760 --> 00:19:47.679 -to define functions - -00:19:47.679 --> 00:19:51.200 -and you can see - -00:19:51.200 --> 00:19:53.520 -this is actually the function name but +to define functions, -00:19:53.520 --> 00:19:55.760 -it's highlighted as the +00:19:47.679 --> 00:19:53.256 +and you can see this is actually +the function name, -00:19:55.760 --> 00:19:59.120 -string so what we want +00:19:53.256 --> 00:19:56.373 +but it's highlighted as the string. -00:19:59.120 --> 00:20:03.679 -is to somehow recognize this pattern +00:19:56.373 --> 00:20:03.679 +So what we want is to somehow +recognize this pattern 00:20:03.679 --> 00:20:07.600 -and highlight it +and highlight it. 00:20:07.600 --> 00:20:11.280 -as highlight this part +Highlight this part 00:20:11.280 --> 00:20:14.559 -with the function phase instead +with the function face instead. 00:20:14.559 --> 00:20:17.679 -in order to do that - -00:20:17.679 --> 00:20:20.240 -we put a pattern in this project - -00:20:20.240 --> 00:20:21.760 -directory local - -00:20:21.760 --> 00:20:31.760 -settings file +In order to do that, -00:20:31.760 --> 00:20:34.799 -so we can put this button in the c +00:20:17.679 --> 00:20:31.760 +we put a pattern in this project's +directory-local settings file. -00:20:34.799 --> 00:20:40.159 -mode section +00:20:31.760 --> 00:20:40.159 +So we can put this button in +the C mode section. 00:20:40.159 --> 00:20:48.000 -and now if we enable tree sitter +And now, if we enable tree-sitter, 00:20:48.000 --> 00:20:50.480 -you can see that this is the highlighted - -00:20:50.480 --> 00:20:53.200 -uh - -00:20:53.200 --> 00:20:55.520 -as a normal function definition so this - -00:20:55.520 --> 00:20:56.559 -is the function - -00:20:56.559 --> 00:21:01.200 -face like we wanted - -00:21:01.200 --> 00:21:03.760 -the pattern for this is actually pretty +you can see that this is highlighted -00:21:03.760 --> 00:21:07.200 -simple +00:20:53.200 --> 00:20:55.056 +as a normal function definition. -00:21:07.200 --> 00:21:10.720 -it's only +00:20:55.056 --> 00:21:01.200 +So this is the function face +like we wanted. -00:21:10.720 --> 00:21:14.720 -only this part so +00:21:01.200 --> 00:21:07.200 +The pattern for this is +actually pretty simple. -00:21:14.720 --> 00:21:17.440 -if it's a function call where the name +00:21:07.200 --> 00:21:12.373 +It's only this part. -00:21:17.440 --> 00:21:19.679 -of the function is different +00:21:12.373 --> 00:21:16.456 +So if it's a function call -00:21:19.679 --> 00:21:21.600 -then we highlight the different as a +00:21:16.456 --> 00:21:19.679 +where the name of the function is +defun, -00:21:21.600 --> 00:21:24.240 -keyword +00:21:19.679 --> 00:21:24.240 +then we highlight the defun as a +keyword, -00:21:24.240 --> 00:21:27.360 -and then the first string element we +00:21:24.240 --> 00:21:26.923 +and then the first string element, -00:21:27.360 --> 00:21:28.159 -highlighted +00:21:26.923 --> 00:21:35.360 +we highlight it as a function name. -00:21:28.159 --> 00:21:35.360 -as a function name +00:21:35.360 --> 00:21:39.280 +Since the language objects are actually +native code, -00:21:35.360 --> 00:21:37.679 -since the language objects are actually +00:21:39.280 --> 00:21:41.459 +they have to be compiled +for each platform -00:21:37.679 --> 00:21:39.280 -native code +00:21:41.459 --> 00:21:43.440 +that we want to support. -00:21:39.280 --> 00:21:40.799 -they have to be compiled for each +00:21:43.440 --> 00:21:48.159 +This will become a big obstacle for +tree-sitter adoption. -00:21:40.799 --> 00:21:43.440 -platform that we want to support +00:21:48.159 --> 00:21:52.960 +Therefore, I've created a language bundle +package, tree-sitter-langs, -00:21:43.440 --> 00:21:45.600 -this will become a big obstacle for - -00:21:45.600 --> 00:21:48.159 -3-seater adoption - -00:21:48.159 --> 00:21:50.240 -therefore I've created a language window - -00:21:50.240 --> 00:21:52.960 -package 3-seater length - -00:21:52.960 --> 00:21:54.960 +00:21:52.960 --> 00:21:55.773 that takes care of pre-compiling the +grammars, -00:21:54.960 --> 00:21:56.320 -grammars the - -00:21:56.320 --> 00:21:59.679 -most common grammars for all three major +00:21:55.773 --> 00:22:01.600 +the most common grammars for all three +major platforms. -00:21:59.679 --> 00:22:01.600 -platforms - -00:22:01.600 --> 00:22:04.080 -it also takes care of distributing these - -00:22:04.080 --> 00:22:05.360 -binaries +00:22:01.600 --> 00:22:05.360 +It also takes care of distributing +these binaries 00:22:05.360 --> 00:22:08.080 and provides some highlighting queries 00:22:08.080 --> 00:22:11.440 -for some of the languages +for some of the languages. 00:22:11.440 --> 00:22:13.760 -it should be noted that this package +It should be noted that this package -00:22:13.760 --> 00:22:15.919 +00:22:13.760 --> 00:22:19.919 should be treated as a temporary +distribution mechanism only, -00:22:15.919 --> 00:22:19.919 -distribution mechanism only - -00:22:19.919 --> 00:22:22.240 -to help with bootstrapping three-seaters - -00:22:22.240 --> 00:22:24.720 -adoption +00:22:19.919 --> 00:22:24.720 +to help with bootstrapping +tree-sitter adoption. 00:22:24.720 --> 00:22:27.760 -the plan is that eventually these files - -00:22:27.760 --> 00:22:29.760 -should be provided by the language major +The plan is that eventually these files -00:22:29.760 --> 00:22:32.480 -modes themselves +00:22:27.760 --> 00:22:29.156 +should be provided by -00:22:32.480 --> 00:22:35.120 -but in order to do that we need better +00:22:29.156 --> 00:22:32.480 +the language major modes themselves. -00:22:35.120 --> 00:22:36.320 -tooling +00:22:32.480 --> 00:22:36.320 +But in order to do that, we need better +tooling, 00:22:36.320 --> 00:22:40.240 -so we're not there yet +so we're not there yet. -00:22:40.240 --> 00:22:42.559 -since the call already works reasonably +00:22:40.240 --> 00:22:43.280 +Since the core already works +reasonably well, -00:22:42.559 --> 00:22:43.280 -well +00:22:43.280 --> 00:22:45.289 +there are several areas +that would benefit -00:22:43.280 --> 00:22:44.640 -there are several areas that would +00:22:45.289 --> 00:22:49.120 +from the community's contribution. -00:22:44.640 --> 00:22:46.320 -benefit from the community's +00:22:49.120 --> 00:22:52.640 +So tree-sitter's upstream language +repositories -00:22:46.320 --> 00:22:49.120 -contribution - -00:22:49.120 --> 00:22:51.520 -so three seaters upstream language - -00:22:51.520 --> 00:22:52.640 -prepositories - -00:22:52.640 --> 00:22:54.400 +00:22:52.640 --> 00:22:55.679 already contain highlighting queries on +their own. -00:22:54.400 --> 00:22:55.679 -their own - -00:22:55.679 --> 00:22:58.480 -however they are pretty basic and they - -00:22:58.480 --> 00:23:00.480 -may not fit well with existing emax +00:22:55.679 --> 00:22:57.573 +However, they are pretty basic, -00:23:00.480 --> 00:23:02.559 -conventions +00:22:57.573 --> 00:23:02.559 +and they may not fit well with existing +Emacs conventions. -00:23:02.559 --> 00:23:04.320 -therefore the language bundle has its +00:23:02.559 --> 00:23:07.120 +Therefore, the language bundle has its +own set of highlighting queries. -00:23:04.320 --> 00:23:07.120 -own set of highlighting queries +00:23:07.120 --> 00:23:12.556 +This requires maintenance until language +major modes adopt tree-sitter -00:23:07.120 --> 00:23:10.559 -this requires maintenance until language +00:23:12.556 --> 00:23:16.640 +and maintain the queries on their own. -00:23:10.559 --> 00:23:11.600 -measurements adopt +00:23:16.640 --> 00:23:19.056 +The queries are actually +quite easy to write, -00:23:11.600 --> 00:23:13.760 -three sitter and maintain the queries on +00:23:19.056 --> 00:23:22.000 +as you've already seen. -00:23:13.760 --> 00:23:16.640 -their own +00:23:22.000 --> 00:23:25.360 +You just need to be familiar +with the language, -00:23:16.640 --> 00:23:18.480 -the queries are actually quite easy to - -00:23:18.480 --> 00:23:22.000 -write as you've already seen - -00:23:22.000 --> 00:23:24.240 -you just need to be familiar with the - -00:23:24.240 --> 00:23:25.360 -language - -00:23:25.360 --> 00:23:30.000 +00:23:25.360 --> 00:23:35.200 familiar enough to come up with sensible +highlighting patterns. -00:23:30.000 --> 00:23:35.200 -highlighting patterns - -00:23:35.200 --> 00:23:37.600 -and if you are a maintainer of a +00:23:35.200 --> 00:23:39.679 +And if you are a maintainer of a +language major mode, -00:23:37.600 --> 00:23:39.679 -language major mode - -00:23:39.679 --> 00:23:42.320 +00:23:39.679 --> 00:23:44.189 you may want to consider integrating +tree-sitter into your mode, -00:23:42.320 --> 00:23:43.360 -tree sitter into - -00:23:43.360 --> 00:23:46.960 -your mode initially maybe as an - -00:23:46.960 --> 00:23:50.080 -optional feature the integration is +00:23:44.189 --> 00:23:48.573 +initially maybe as an optional feature. -00:23:50.080 --> 00:23:53.279 -actually pretty straightforward +00:23:48.573 --> 00:23:53.279 +The integration is actually pretty +straightforward, 00:23:53.279 --> 00:23:56.640 -especially for syntax highlighting +especially for syntax highlighting. 00:23:56.640 --> 00:24:01.520 -or alternatively +Or alternatively, -00:24:01.520 --> 00:24:03.760 +00:24:01.520 --> 00:24:05.760 you can also try writing a new major +mode from scratch -00:24:03.760 --> 00:24:04.640 -mode - -00:24:04.640 --> 00:24:08.000 -from scratch that relies on tree sitter +00:24:05.760 --> 00:24:08.000 +that relies on tree-sitter 00:24:08.000 --> 00:24:12.559 -from the very beginning - -00:24:12.559 --> 00:24:16.320 -the code for such a major mode is +from the very beginning. -00:24:16.320 --> 00:24:19.679 -quite simple for example +00:24:12.559 --> 00:24:17.523 +The code for such a major mode is +quite simple. -00:24:19.679 --> 00:24:23.200 -this is the proposed +00:24:17.523 --> 00:24:23.200 +For example, this is the proposed 00:24:23.200 --> 00:24:26.240 -what mode for web assembly +wat-mode for web assembly. -00:24:26.240 --> 00:24:31.039 -the code is just - -00:24:31.039 --> 00:24:34.559 -like one page of code not - -00:24:34.559 --> 00:24:39.520 -not a lot +00:24:26.240 --> 00:24:39.520 +The code is just one page of code, +not a lot. 00:24:39.520 --> 00:24:42.720 -you can also try writing new minor modes +You can also try writing new minor modes 00:24:42.720 --> 00:24:46.559 -or writing integration packages - -00:24:46.559 --> 00:24:50.080 -for example a lot of package a lot of +or writing integration packages. -00:24:50.080 --> 00:24:50.880 -packages +00:24:46.559 --> 00:24:50.880 +For example, a lot of packages 00:24:50.880 --> 00:24:54.559 -may benefit from tree sitter integration - -00:24:54.559 --> 00:24:58.840 -but no one has written the integration - -00:24:58.840 --> 00:25:02.960 -yet +may benefit from tree-sitter integration, -00:25:02.960 --> 00:25:05.039 -if you are interested in 3-seater you +00:24:54.559 --> 00:25:02.960 +but no one has written +the integration yet. -00:25:05.039 --> 00:25:06.720 -can use these links to +00:25:02.960 --> 00:25:04.836 +If you are interested in tree-sitter, -00:25:06.720 --> 00:25:10.320 -learn more about it I think that's it +00:25:04.836 --> 00:25:08.023 +you can use these links to learn more +about it. -00:25:10.320 --> 00:25:11.440 -for me today +00:25:08.023 --> 00:25:11.440 +I think that's it for me today. 00:25:11.440 --> 00:25:18.159 -I'm happy to answer any questions +I'm happy to answer any questions. |