# Incremental Parsing with emacs-tree-sitter
Tuấn-Anh Nguyễn
[[!template vidid="mainVideo" id=vid src="https://mirror.csclub.uwaterloo.ca/emacsconf/2020/emacsconf-2020--23-incremental-parsing-with-emacs-tree-sitter--tuan-anh-nguyen.webm" subtitles="/2020/subtitles/emacsconf-2020--23-incremental-parsing-with-emacs-tree-sitter--tuan-anh-nguyen.vtt"]]
[Download compressed .webm video (26.2M)](https://media.emacsconf.org/2020/emacsconf-2020--23-incremental-parsing-with-emacs-tree-sitter--tuan-anh-nguyen--compressed32.webm)
[Download compressed .webm video (21.8M, highly compressed)](https://mirror.csclub.uwaterloo.ca/emacsconf/2020/smaller/emacsconf-2020--23-incremental-parsing-with-emacs-tree-sitter--tuan-anh-nguyen--vp9-q56-video-original-audio.webm)
[View transcript](#transcript)
[[!template id=vid src="https://mirror.csclub.uwaterloo.ca/emacsconf/2020/emacsconf-2020--23-incremental-parsing-with-emacs-tree-sitter--questions--tuan-anh-nguyen.webm" download="Download Q&A video"]]
[Download compressed Q&A .webm video (35.8M)](https://media.emacsconf.org/2020/emacsconf-2020--23-incremental-parsing-with-emacs-tree-sitter--questions--tuan-anh-nguyen--compressed32.webm)
[Download compressed Q&A .webm video (16.4M, highly compressed)](https://mirror.csclub.uwaterloo.ca/emacsconf/2020/smaller/emacsconf-2020--23-incremental-parsing-with-emacs-tree-sitter--questions--tuan-anh-nguyen--vp9-q56-video-original-audio.webm)
Tree-sitter is a parser generator and an incremental parsing library.
emacs-tree-sitter is its most popular Emacs binding, which aims to be
the foundation of Emacs packages that understand source code's
structure. Examples include better code highlighting, folding,
indexing, structural navigation.
In this talk, I will describe the current state of emacs-tree-sitter's
APIs and functionalities. I will also discuss areas that need
improvements and contribution from the community.
- Slides: <https://ubolonton.org/slides/emacs-tree-sitter-emacsconf2020.pdf>
<!-- from the pad --->
- Actual start and end time (EST): Start: 2020-11-29T09.49.24; Q&A:
2020-11-29T10.13.56; End: 2020-11-29T10.31.44
# Questions
## Q20: can we integrate it with Spacemacs Python layer
## Q19: The Python mode example was pretty good. Is that something that one can use already?
Yes, already using it at work right now.
## Q18: Regarding Emacs integration, will it always need to be a foreign library or can it be included / linked directly in compilation?
Building a parser from source needs Node.js
<https://tree-sitter.github.io/tree-sitter/creating-parsers#dependencies>
so I don't know if it'll be in-tree and included at compile time.
Core library dynamic module, would be better to be included in core
Emacs eventually. Language definitions might be better distributed
separately.
## Q17: Is there a link to the slides?
Yes, will post in IRC later.
Slides: <https://ubolonton.org/slides/emacs-tree-sitter-emacsconf2020.pdf>
## Q16: Are there any language major modes that have integrated already?
Not yet (answered during talk).
Typescript: discussing integration, not integrated yet.
## Q15: Is it possible to use tree-sitter for structural editing?
Covered by Q4 / Q8 / Q11.
## Q14: Is there a folding mode for tree-sitter?
Not yet. There are multiple code folding frameworks inside Emacs, and
it's better to integrate with these modes rather than writing
something new entirely.
+1 Would be nice if it worked with outshine mode or similar.
## Q13: MaxCity on IRC asks: "That pop up M-x window. How do you get that?"
ivy-posframe most likely
<https://github.com/tumashu/ivy-posframe/>. Or not. Cool!
Custom helm code.
## Q12: I'm new to the tree-sitter world. Is it easy to install/use it also on Windows? (I have to use winbloat at work)
The usual approach is hoping someone else made a precompiled version
for you and download it. Otherwise you'll have to set up a development
environment with mingw-msys or whatever.
- No, both tree-sitter and tree-sitter-langs provide pre-compiled
binaries for macOS, Linux, and Windows.
Yes, it should work out-of-the-box on Windows, provided that Emacs was
compiled with module support turned on.
## Q11: Is it possible to use this for refactoring too?
For the kind of refactoring inside a buffer, it's very doable right
now with some glue code. For more extensive refactoring where you want
to touch all files in a project, there needs to be some kind of
understanding of the language model system, how they are laid out in
the filesystem… even files that are not yet loaded into
Emacs. That sounds like something a lot more extensive. Sounds like an
IDE in Emacs.
## Q10: Can language major-mode authors start taking advantage of this now? Or is it intended to be used as a minor-mode?
Minor mode depended on by the major modes.
## Q9: I'm completely new to tree-sitter, how do I use it as an end user? Is there an easy example config out there by the organizer or otherwise that shows standard usage with whatever programming language? Or are we not there yet?
Answering own question: Sounds like major mode maintainers need to
integrate.
Syntax highlighting is pretty easy to activate
<https://ubolonton.github.io/emacs-tree-sitter/getting-started/> -
nice, tree-sitter-hl-mode looks easy
Need to add more examples to the documentation.
## Q8: (Following on from Q4) Could there be a standardised approach to coding automatic refactorings in the future? e.g. so that whichever language mode you are using, you could see a menu of available refactoring operations?
Not sure about this. Most refactoring operations are highly specific
to a class of languages. Not one single approach for all the
languages, but maybe one for object-oriented languages, one for
Lisp-type languages, one for Javascript and Typescript…
I meant the lisp and user interfaces being unified, not the
implementations of the refactorings. But maybe it belongs in a
separate mode on top. So you could have a defrefactor macro or
similar.
## Q7: How extensive will the compatibility be between highlighting grammars for Emacs and those for Vim/Neovim with Tree-sitter?
For the time being it looks like nvim-treesitter also uses the S-exp
syntax for queries so it shouldn't be too hard. See
<https://github.com/nvim-treesitter/nvim-treesitter/blob/master/queries/rust/highlights.scm>.
- No effort has been spent on compatibility yet. Each editor has its
own existing conventions for highlighting. Having a common set of
basic "capture names" is possible, and will require efforts from
multiple editor communities. (Emacs and NeoVim for now. The editor
that introduced Tree-sitter, Atom, hasn't used these queries for
highlighting.)
## Q6: Will it ever be possible to write Tree-sitter grammars in a Lisp, or will JS be required?
The grammar part is written in JSON, you don't need to actually
understand JS to write it. Using Lisp would merely give you a
s-expression version, that wouldn't buy you much.
- Ah, so all that is needed is `(json-encode '(grammar …))`? Great!
## Q5: Could you show the source that was matched by the parser in the debug view in addition to the grammar part matched?
## Q4: Could this be used with packages like `smartparens` that aim to bring structrual editing to non-s-expression based languages? AST-based refactoring?
It is one of the goals, but not yet achieved.
## Q3: Do you think Tree-sitter would be useful for Org buffers? I can imagine it being used to keep a parsed AST of an Org buffer (e.g. like org-element's output) updated in real time.
An obstacle here is Org not having anything anywhere close to a formal
grammar, so that would need to be corrected first.
- <https://orgmode.org/worg/dev/org-element-api.html>.
- <https://orgmode.org/worg/dev/org-syntax.html>.
- This is an informal description of it, not an actual
grammar. Nevertheless, there's a few projects trying to codify a
grammar. I'll dig up some links soonish.
- The element API is the formal grammar - canonic
implementation. Org-syntax document is a draft of the text
descrption of the grammar.
- Note: relevant mailing list discussion
<https://orgmode.org/list/68dc1ea1-52e8-7d9e-fb2d-bcf08c111eca@intrepidus.pl/>.
~~FIXME:~~ Add link to a emacs-tree-sitter project/snippet for org-mode.
- Not sure if it is what you have in mind, but there is <https://github.com/gagbo/tree-sitter-org>
- Yes, this is it.
## Q2: Will Elisp performance be more competitive with GCCEmacs enough to make Tree-sitter in Elisp more attractive?
~~The point of this project is to reuse other people's efforts, not
rewriting them.~~
It's a possibility. In terms of probability, probably not. It's a huge
amount of work. The GC latency is also a fundamental issue.
## Q1: Do you think that his package can be included into Emacs/GNU ELPA?
Yes, it is just matter of paperwork.
# Notes
- Project description: emacs-tree-sitter is an Emacs Lisp binding for
tree-sitter, an incremental parsing library.
- <https://github.com/ubolonton/emacs-tree-sitter> (<- bindings).
- <https://ubolonton.github.io/emacs-tree-sitter/> (<- documentation).
- <https://tree-sitter.github.io/tree-sitter/> (<- parser).
- Regular expressions are not powerful enough.
- LSP has high latency and is resource intensive, oft.
- An updated video version was uploaded after the event, with the
missing introduction to Tree-sitter added.
# Related talks
[[!taglink CategoryTreeSitter]]
<a name="transcript"></a>
# Transcript
[[!template text="Hello, everyone! My name is Tuấn-Anh." start="00:00:01.520" video="mainVideo" id=subtitle]]
[[!template text="I've been using Emacs for about 10 years." start="00:00:04.400" video="mainVideo" id=subtitle]]
[[!template text="Today, I'm going to talk about tree-sitter," start="00:00:07.200" video="mainVideo" id=subtitle]]
[[!template text="a new Emacs package that allows Emacs" start="00:00:09.280" video="mainVideo" id=subtitle]]
[[!template text="to parse multiple programming languages in real-time." start="00:00:11.351" video="mainVideo" id=subtitle]]
[[!template new="1" text="So what is the problem statement?" start="00:00:17.840" video="mainVideo" id=subtitle]]
[[!template text="In order to support programming functionalities" start="00:00:21.840" video="mainVideo" id=subtitle]]
[[!template text="for a particular language," start="00:00:24.131" video="mainVideo" id=subtitle]]
[[!template text="a text editor needs to have some degree" start="00:00:25.760" video="mainVideo" id=subtitle]]
[[!template text="of language understanding." start="00:00:27.680" video="mainVideo" id=subtitle]]
[[!template text="Traditionally, text editors have relied" start="00:00:29.679" video="mainVideo" id=subtitle]]
[[!template text="very heavily on regular expressions for this." start="00:00:31.840" video="mainVideo" id=subtitle]]
[[!template text="Emacs is no different." start="00:00:34.960" video="mainVideo" id=subtitle]]
[[!template text="Most language major modes use regular expressions" start="00:00:37.013" video="mainVideo" id=subtitle]]
[[!template text="for syntax-highlighting, code navigation," start="00:00:40.170" video="mainVideo" id=subtitle]]
[[!template text="folding, indexing, and so on." start="00:00:42.960" video="mainVideo" id=subtitle]]
[[!template text="Regular expressions are problematic for a couple of reasons." start="00:00:46.618" video="mainVideo" id=subtitle]]
[[!template text="They're slow and inaccurate." start="00:00:50.559" video="mainVideo" id=subtitle]]
[[!template text="They also make the code hard to read and write." start="00:00:53.778" video="mainVideo" id=subtitle]]
[[!template text="Sometimes it's because the regular expressions themselves are very hairy," start="00:00:56.800" video="mainVideo" id=subtitle]]
[[!template text="and sometimes because they are just not powerful enough." start="00:01:01.199" video="mainVideo" id=subtitle]]
[[!template text="Some helper code is usually needed" start="00:01:05.199" video="mainVideo" id=subtitle]]
[[!template text="to parse more intricate language features." start="00:01:08.625" video="mainVideo" id=subtitle]]
[[!template text="That also illustrates the core problem with regular expressions," start="00:01:11.200" video="mainVideo" id=subtitle]]
[[!template text="in that they are not powerful enough to parse programming languages." start="00:01:16.159" video="mainVideo" id=subtitle]]
[[!template text="An example feature that regular expressions cannot handle very well" start="00:01:21.119" video="mainVideo" id=subtitle]]
[[!template text="is string interpolation, which is a very common feature" start="00:01:25.040" video="mainVideo" id=subtitle]]
[[!template text="in many modern programming languages." start="00:01:28.320" video="mainVideo" id=subtitle]]
[[!template new="1" text="It would be much nicer if Emacs somehow" start="00:01:31.680" video="mainVideo" id=subtitle]]
[[!template text="had structural understanding of source code, like IDEs do." start="00:01:34.079" video="mainVideo" id=subtitle]]
[[!template text="There have been multiple efforts" start="00:01:39.520" video="mainVideo" id=subtitle]]
[[!template text="to bring this kind of programming language understanding into Emacs." start="00:01:41.981" video="mainVideo" id=subtitle]]
[[!template text="There are language-specific parsers" start="00:01:45.280" video="mainVideo" id=subtitle]]
[[!template text="written in Elisp" start="00:01:47.119" video="mainVideo" id=subtitle]]
[[!template text="that can be thought of" start="00:01:48.640" video="mainVideo" id=subtitle]]
[[!template text="as the next logical step of the glue code" start="00:01:50.675" video="mainVideo" id=subtitle]]
[[!template text="on top of regular expressions," start="00:01:51.989" video="mainVideo" id=subtitle]]
[[!template text="moving from partial local pattern recognition" start="00:01:53.856" video="mainVideo" id=subtitle]]
[[!template text="into a full-fledged parser." start="00:01:57.356" video="mainVideo" id=subtitle]]
[[!template text="The most prominent example of this approach" start="00:01:59.840" video="mainVideo" id=subtitle]]
[[!template text="is probably the famous js2-mode." start="00:02:02.023" video="mainVideo" id=subtitle]]
[[!template new="1" text="However, this approach has several issues." start="00:02:06.479" video="mainVideo" id=subtitle]]
[[!template text="Parsing is computationally expensive," start="00:02:10.080" video="mainVideo" id=subtitle]]
[[!template text="and Emacs Lisp is not good at that kind of stuff." start="00:02:12.606" video="mainVideo" id=subtitle]]
[[!template new="1" text="Furthermore, maintenance is very troublesome." start="00:02:16.800" video="mainVideo" id=subtitle]]
[[!template text="In order to work on these parsers," start="00:02:19.156" video="mainVideo" id=subtitle]]
[[!template text="first, you have to know Elisp well enough," start="00:02:22.160" video="mainVideo" id=subtitle]]
[[!template text="and then you have to be comfortable with" start="00:02:24.239" video="mainVideo" id=subtitle]]
[[!template text="writing a recursive descending parser," start="00:02:26.606" video="mainVideo" id=subtitle]]
[[!template text="while constantly keeping up with changes to the language itself," start="00:02:29.739" video="mainVideo" id=subtitle]]
[[!template text="which can be evolving very quickly," start="00:02:34.000" video="mainVideo" id=subtitle]]
[[!template text="like Javascript, for example." start="00:02:36.356" video="mainVideo" id=subtitle]]
[[!template new="1" text="Together, these constraints significantly reduce" start="00:02:39.360" video="mainVideo" id=subtitle]]
[[!template text="the pool of potential maintainers." start="00:02:42.373" video="mainVideo" id=subtitle]]
[[!template text="The biggest issue, though, in my opinion," start="00:02:45.680" video="mainVideo" id=subtitle]]
[[!template text="is lack of the set of generic and reusable APIs." start="00:02:47.760" video="mainVideo" id=subtitle]]
[[!template text="This makes them very hard to use" start="00:02:52.139" video="mainVideo" id=subtitle]]
[[!template text="for minor modes that want to deal with" start="00:02:54.319" video="mainVideo" id=subtitle]]
[[!template text="cross-cutting concerns across multiple languages." start="00:02:55.920" video="mainVideo" id=subtitle]]
[[!template new="1" text="The other approach which has been" start="00:02:59.920" video="mainVideo" id=subtitle]]
[[!template text="gaining a lot of momentum in recent years" start="00:03:01.760" video="mainVideo" id=subtitle]]
[[!template text="is externalizing language understanding" start="00:03:04.319" video="mainVideo" id=subtitle]]
[[!template text="to another process," start="00:03:06.560" video="mainVideo" id=subtitle]]
[[!template text="also known as language server protocol." start="00:03:08.159" video="mainVideo" id=subtitle]]
[[!template new="1" text="This second approach is actually a very interesting one." start="00:03:12.239" video="mainVideo" id=subtitle]]
[[!template text="By decoupling language understanding" start="00:03:16.560" video="mainVideo" id=subtitle]]
[[!template text="from the editing facility itself," start="00:03:18.400" video="mainVideo" id=subtitle]]
[[!template text="the LSP servers can attract a lot more contributors," start="00:03:21.280" video="mainVideo" id=subtitle]]
[[!template text="which makes maintenance easier." start="00:03:25.120" video="mainVideo" id=subtitle]]
[[!template new="1" text="However, they also have several issues of their own." start="00:03:27.189" video="mainVideo" id=subtitle]]
[[!template text="Being a separate process," start="00:03:32.400" video="mainVideo" id=subtitle]]
[[!template text="they are usually more resource-intensive," start="00:03:34.089" video="mainVideo" id=subtitle]]
[[!template text="and depending on the language," start="00:03:37.073" video="mainVideo" id=subtitle]]
[[!template text="the LSP server itself can bring with it" start="00:03:39.920" video="mainVideo" id=subtitle]]
[[!template text="a host of additional dependencies" start="00:03:42.159" video="mainVideo" id=subtitle]]
[[!template text="external to Emacs, which may be messy to install and manage." start="00:03:44.640" video="mainVideo" id=subtitle]]
[[!template new="1" text="Furthermore, JSON over RPC has pretty high latency." start="00:03:50.640" video="mainVideo" id=subtitle]]
[[!template text="For one-off tasks like jumping to source" start="00:03:55.120" video="mainVideo" id=subtitle]]
[[!template text="or on-demand completion, it's great." start="00:03:57.840" video="mainVideo" id=subtitle]]
[[!template text="But for things like code highlighting," start="00:04:00.879" video="mainVideo" id=subtitle]]
[[!template text="the latency is just too much." start="00:04:03.040" video="mainVideo" id=subtitle]]
[[!template new="1" text="I was using Rust and I was following the" start="00:04:06.000" video="mainVideo" id=subtitle]]
[[!template text="community effort to improve its IDE support," start="00:04:08.319" video="mainVideo" id=subtitle]]
[[!template text="hoping to integrate some of that into Emacs itself." start="00:04:11.760" video="mainVideo" id=subtitle]]
[[!template text="Then I heard someone from the community mention tree-sitter," start="00:04:15.760" video="mainVideo" id=subtitle]]
[[!template text="and I decided to check it out." start="00:04:19.759" video="mainVideo" id=subtitle]]
[[!template text="Basically, tree-sitter is an incremental parsing library and a parser generator." start="00:04:23.360" video="mainVideo" id=subtitle]]
[[!template text="It was introduced by the Atom editor in 2018." start="00:04:28.720" video="mainVideo" id=subtitle]]
[[!template text="Besides Atom, it is also being integrated" start="00:04:33.040" video="mainVideo" id=subtitle]]
[[!template text="into the NeoVim editor," start="00:04:35.923" video="mainVideo" id=subtitle]]
[[!template text="and Github is using it to power" start="00:04:37.623" video="mainVideo" id=subtitle]]
[[!template text="their source code analysis" start="00:04:41.040" video="mainVideo" id=subtitle]]
[[!template text="and navigation features." start="00:04:42.423" video="mainVideo" id=subtitle]]
[[!template text="It is written in C and can be compiled" start="00:04:45.840" video="mainVideo" id=subtitle]]
[[!template text="for all major platforms." start="00:04:48.639" video="mainVideo" id=subtitle]]
[[!template text="It can even be compiled" start="00:04:50.623" video="mainVideo" id=subtitle]]
[[!template text="to web assembly to run on the web." start="00:04:53.120" video="mainVideo" id=subtitle]]
[[!template text="That's how Github is using it on their website." start="00:04:55.323" video="mainVideo" id=subtitle]]
[[!template new="1" text="So why is tree-sitter an interesting solution to this problem?" start="00:05:00.800" video="mainVideo" id=subtitle]]
[[!template text="There are multiple features that make it an attractive option." start="00:05:05.840" video="mainVideo" id=subtitle]]
[[!template text="It is designed to be fast." start="00:05:10.000" video="mainVideo" id=subtitle]]
[[!template text="By being incremental," start="00:05:11.839" video="mainVideo" id=subtitle]]
[[!template text="the initial parse of a typical big file" start="00:05:13.680" video="mainVideo" id=subtitle]]
[[!template text="can take tens of milliseconds," start="00:05:15.680" video="mainVideo" id=subtitle]]
[[!template text="while subsequent incremental processes" start="00:05:18.160" video="mainVideo" id=subtitle]]
[[!template text="are sub-millisecond." start="00:05:20.240" video="mainVideo" id=subtitle]]
[[!template text="It achieves this by using structural sharing," start="00:05:22.560" video="mainVideo" id=subtitle]]
[[!template text="meaning replacing only affected nodes" start="00:05:26.240" video="mainVideo" id=subtitle]]
[[!template text="in the old tree when it needs to." start="00:05:29.360" video="mainVideo" id=subtitle]]
[[!template text="Also, unlike LSP, being in the same process," start="00:05:32.960" video="mainVideo" id=subtitle]]
[[!template text="it has much lower latency." start="00:05:37.120" video="mainVideo" id=subtitle]]
[[!template new="1" text="Secondly, it provides a uniform programming interface." start="00:05:40.639" video="mainVideo" id=subtitle]]
[[!template text="The same data structures and functions" start="00:05:44.960" video="mainVideo" id=subtitle]]
[[!template text="work on parse trees of different languages." start="00:05:47.039" video="mainVideo" id=subtitle]]
[[!template text="Syntax nodes of different languages" start="00:05:50.400" video="mainVideo" id=subtitle]]
[[!template text="differ only by their types" start="00:05:52.160" video="mainVideo" id=subtitle]]
[[!template text="and their possible child nodes." start="00:05:54.160" video="mainVideo" id=subtitle]]
[[!template text="This is a big advantage over language-specific parsers." start="00:05:55.723" video="mainVideo" id=subtitle]]
[[!template text="Thirdly, it's written in self-contained embeddable C." start="00:06:02.240" video="mainVideo" id=subtitle]]
[[!template text="As I mentioned previously, it can even be compiled to webassembly." start="00:06:06.880" video="mainVideo" id=subtitle]]
[[!template text="This makes integrating it into various editors quite easy" start="00:06:11.723" video="mainVideo" id=subtitle]]
[[!template text="without having to install any external dependencies." start="00:06:16.106" video="mainVideo" id=subtitle]]
[[!template new="1" text="One thing that is not mentioned here" start="00:06:22.880" video="mainVideo" id=subtitle]]
[[!template text="is that being a parser generator," start="00:06:25.503" video="mainVideo" id=subtitle]]
[[!template text="its grammars are declarative." start="00:06:28.000" video="mainVideo" id=subtitle]]
[[!template text="Together with being editor-independent," start="00:06:31.039" video="mainVideo" id=subtitle]]
[[!template text="this makes the pool of potential contributors much larger." start="00:06:34.880" video="mainVideo" id=subtitle]]
[[!template new="1" text="So I was convinced that tree-sitter is a good fit for Emacs." start="00:06:39.139" video="mainVideo" id=subtitle]]
[[!template text="Last year, I started writing the bindings" start="00:06:45.520" video="mainVideo" id=subtitle]]
[[!template text="using dynamic module support introduced in Emacs 25." start="00:06:48.000" video="mainVideo" id=subtitle]]
[[!template text="Dynamic module means there is platform-specific native code involved," start="00:06:53.280" video="mainVideo" id=subtitle]]
[[!template text="but since there are pre-compiled binaries" start="00:06:58.479" video="mainVideo" id=subtitle]]
[[!template text="for the three major platforms," start="00:07:00.560" video="mainVideo" id=subtitle]]
[[!template text="it should work in most places." start="00:07:02.880" video="mainVideo" id=subtitle]]
[[!template text="Currently, the core functionalities are in a pretty good shape." start="00:07:04.706" video="mainVideo" id=subtitle]]
[[!template text="Syntax highlighting is working nicely." start="00:07:09.440" video="mainVideo" id=subtitle]]
[[!template new="1" text="The whole thing is split into three packages." start="00:07:12.560" video="mainVideo" id=subtitle]]
[[!template text="tree-sitter is the main package that other packages should depend on." start="00:07:16.080" video="mainVideo" id=subtitle]]
[[!template text="tree-sitter-langs is the language bundle" start="00:07:20.319" video="mainVideo" id=subtitle]]
[[!template text="that includes support" start="00:07:22.800" video="mainVideo" id=subtitle]]
[[!template text="for most common languages." start="00:07:24.000" video="mainVideo" id=subtitle]]
[[!template text="And finally, the core APIs are in the package tsc," start="00:07:27.199" video="mainVideo" id=subtitle]]
[[!template text="which stands for tree-sitter-core." start="00:07:32.160" video="mainVideo" id=subtitle]]
[[!template text="It is the implicit dependency of the" start="00:07:36.160" video="mainVideo" id=subtitle]]
[[!template text="tree-sitter package." start="00:07:38.800" video="mainVideo" id=subtitle]]
[[!template text="The main package includes the minor mode tree-sitter-mode." start="00:07:43.520" video="mainVideo" id=subtitle]]
[[!template text="This provides the base for other major or minor modes to build on." start="00:07:47.520" video="mainVideo" id=subtitle]]
[[!template text="Using Emacs's change tracking hooks," start="00:07:52.560" video="mainVideo" id=subtitle]]
[[!template text="it enables incremental parsing" start="00:07:54.839" video="mainVideo" id=subtitle]]
[[!template text="and provides a syntax tree that is always up to date" start="00:07:57.073" video="mainVideo" id=subtitle]]
[[!template text="after any edits in a buffer." start="00:08:00.800" video="mainVideo" id=subtitle]]
[[!template text="There is also a basic debug mode" start="00:08:04.080" video="mainVideo" id=subtitle]]
[[!template text="that shows the parse tree in another buffer." start="00:08:06.223" video="mainVideo" id=subtitle]]
[[!template new="1" text="Here is a quick demo." start="00:08:10.080" video="mainVideo" id=subtitle]]
[[!template text="Here I'm in an empty Python buffer" start="00:08:13.360" video="mainVideo" id=subtitle]]
[[!template text="with tree-sitter enabled." start="00:08:15.673" video="mainVideo" id=subtitle]]
[[!template text="I'm going to turn on the debug mode to" start="00:08:17.520" video="mainVideo" id=subtitle]]
[[!template text="see the parse tree." start="00:08:19.440" video="mainVideo" id=subtitle]]
[[!template text="Since the buffer is empty," start="00:08:26.560" video="mainVideo" id=subtitle]]
[[!template text="there is only one node in the syntax tree:" start="00:08:28.106" video="mainVideo" id=subtitle]]
[[!template text="the top-level module node." start="00:08:30.423" video="mainVideo" id=subtitle]]
[[!template text="Let's try typing some code." start="00:08:33.279" video="mainVideo" id=subtitle]]
[[!template text="As you can see, as I type into the Python buffer," start="00:09:11.040" video="mainVideo" id=subtitle]]
[[!template text="the syntax tree updates in real time." start="00:09:14.640" video="mainVideo" id=subtitle]]
[[!template new="1" text="The other minor mode included in the main package" start="00:09:19.120" video="mainVideo" id=subtitle]]
[[!template text="is tree-sitter-hl-mode." start="00:09:22.039" video="mainVideo" id=subtitle]]
[[!template text="It overrides font-lock mode" start="00:09:24.389" video="mainVideo" id=subtitle]]
[[!template text="and provides its own set of phases" start="00:09:26.349" video="mainVideo" id=subtitle]]
[[!template text="and customization options" start="00:09:28.480" video="mainVideo" id=subtitle]]
[[!template text="It is query-driven." start="00:09:30.139" video="mainVideo" id=subtitle]]
[[!template text="That means instead of regular expressions," start="00:09:32.800" video="mainVideo" id=subtitle]]
[[!template text="it uses a Lisp-like query language" start="00:09:36.240" video="mainVideo" id=subtitle]]
[[!template text="to map syntax nodes" start="00:09:39.518" video="mainVideo" id=subtitle]]
[[!template text="to highlighting phrases." start="00:09:40.320" video="mainVideo" id=subtitle]]
[[!template text="I'm going to open a python file with small snippets" start="00:09:41.923" video="mainVideo" id=subtitle]]
[[!template text="that showcase syntax highlighting." start="00:09:45.760" video="mainVideo" id=subtitle]]
[[!template text="So this is the default highlighting" start="00:09:54.320" video="mainVideo" id=subtitle]]
[[!template text="provided by python-mode." start="00:09:55.920" video="mainVideo" id=subtitle]]
[[!template text="This is the highlighting enabled by tree-sitter." start="00:10:00.880" video="mainVideo" id=subtitle]]
[[!template text="As you can see, string interpolation" start="00:10:04.640" video="mainVideo" id=subtitle]]
[[!template text="and decorators are highlighted correctly." start="00:10:07.680" video="mainVideo" id=subtitle]]
[[!template text="Function calls are also highlighted." start="00:10:11.680" video="mainVideo" id=subtitle]]
[[!template text="You can also note that property accessors" start="00:10:17.440" video="mainVideo" id=subtitle]]
[[!template text="and property assignments are highlighted differently." start="00:10:21.839" video="mainVideo" id=subtitle]]
[[!template text="What I like the most about this is that" start="00:10:27.440" video="mainVideo" id=subtitle]]
[[!template text="new bindings are consistently highlighted." start="00:10:29.360" video="mainVideo" id=subtitle]]
[[!template text="This included local variables," start="00:10:32.640" video="mainVideo" id=subtitle]]
[[!template text="function parameters, and property mutations." start="00:10:36.320" video="mainVideo" id=subtitle]]
[[!template new="1" text="Before going through the tree queries" start="00:10:45.760" video="mainVideo" id=subtitle]]
[[!template text="and the syntax highlighting" start="00:10:48.000" video="mainVideo" id=subtitle]]
[[!template text="customization options," start="00:10:49.279" video="mainVideo" id=subtitle]]
[[!template text="let's take a brief look at" start="00:10:51.680" video="mainVideo" id=subtitle]]
[[!template text="the core data structures and functions" start="00:10:53.339" video="mainVideo" id=subtitle]]
[[!template text="that tree-sitter provides." start="00:10:55.040" video="mainVideo" id=subtitle]]
[[!template text="So parsing is done with the help of" start="00:10:58.079" video="mainVideo" id=subtitle]]
[[!template text="a generic parser object." start="00:11:00.743" video="mainVideo" id=subtitle]]
[[!template text="A single parser object can be used to" start="00:11:02.240" video="mainVideo" id=subtitle]]
[[!template text="parse different languages" start="00:11:04.160" video="mainVideo" id=subtitle]]
[[!template text="by sending different language objects to it." start="00:11:06.000" video="mainVideo" id=subtitle]]
[[!template text="The language objects themselves are" start="00:11:09.279" video="mainVideo" id=subtitle]]
[[!template text="loaded from shared libraries." start="00:11:10.880" video="mainVideo" id=subtitle]]
[[!template text="Since tree-sitter-mmode already handles" start="00:11:14.079" video="mainVideo" id=subtitle]]
[[!template text="the parsing part," start="00:11:16.079" video="mainVideo" id=subtitle]]
[[!template text="we will instead focus on the functions" start="00:11:17.360" video="mainVideo" id=subtitle]]
[[!template text="that inspect nodes," start="00:11:19.440" video="mainVideo" id=subtitle]]
[[!template text="and in the resulting path tree," start="00:11:20.800" video="mainVideo" id=subtitle]]
[[!template text="we can ask tree-sitter what is" start="00:11:25.279" video="mainVideo" id=subtitle]]
[[!template text="the syntax node at point." start="00:11:27.030" video="mainVideo" id=subtitle]]
[[!template text="This is an opaque object, so this is not very useful." start="00:11:44.240" video="mainVideo" id=subtitle]]
[[!template text="We can instead ask what is its type." start="00:11:48.480" video="mainVideo" id=subtitle]]
[[!template text="So its type is the symbol comparison operator." start="00:12:03.760" video="mainVideo" id=subtitle]]
[[!template new="1" text="In tree-sitter, there are two kinds of nodes," start="00:12:08.959" video="mainVideo" id=subtitle]]
[[!template text="anonymous nodes and named nodes." start="00:12:11.600" video="mainVideo" id=subtitle]]
[[!template text="Anonymous nodes correspond to simple grammar elements" start="00:12:13.680" video="mainVideo" id=subtitle]]
[[!template text="like keywords, operators, punctuations, and so on." start="00:12:17.040" video="mainVideo" id=subtitle]]
[[!template text="Name nodes, on the other hand, are grammar elements" start="00:12:21.279" video="mainVideo" id=subtitle]]
[[!template text="that are interesting enough on their own" start="00:12:24.656" video="mainVideo" id=subtitle]]
[[!template text="to have a name, like an identifier," start="00:12:26.639" video="mainVideo" id=subtitle]]
[[!template text="an expression, or a function definition." start="00:12:30.029" video="mainVideo" id=subtitle]]
[[!template text="Name node types are symbols," start="00:12:35.440" video="mainVideo" id=subtitle]]
[[!template text="while anonymous node types are strings." start="00:12:37.323" video="mainVideo" id=subtitle]]
[[!template text="For example, if we are on this comparison operator," start="00:12:42.639" video="mainVideo" id=subtitle]]
[[!template text="the node type should be a string." start="00:12:49.760" video="mainVideo" id=subtitle]]
[[!template text="We can also get other information about the node." start="00:12:55.920" video="mainVideo" id=subtitle]]
[[!template text="For example: what is this text," start="00:12:58.959" video="mainVideo" id=subtitle]]
[[!template text="or where it is in the buffer," start="00:13:09.680" video="mainVideo" id=subtitle]]
[[!template text="or what is its parent." start="00:13:20.800" video="mainVideo" id=subtitle]]
[[!template new="1" text="There are many other APIs to query" start="00:13:43.199" video="mainVideo" id=subtitle]]
[[!template text="our node's properties." start="00:13:46.106" video="mainVideo" id=subtitle]]
[[!template text="tree-sitter allows searching" start="00:13:52.639" video="mainVideo" id=subtitle]]
[[!template text="for structural patterns within a parse tree." start="00:13:54.234" video="mainVideo" id=subtitle]]
[[!template text="It does so through a Lisp-like language." start="00:13:58.240" video="mainVideo" id=subtitle]]
[[!template text="This language supports matching by node types," start="00:14:01.440" video="mainVideo" id=subtitle]]
[[!template text="field names, and predicates." start="00:14:04.639" video="mainVideo" id=subtitle]]
[[!template text="It also allows capturing nodes for further processing." start="00:14:07.760" video="mainVideo" id=subtitle]]
[[!template text="Let's try to see some examples." start="00:14:12.639" video="mainVideo" id=subtitle]]
[[!template text="So in this very simple query," start="00:14:37.680" video="mainVideo" id=subtitle]]
[[!template text="we just try to highlight all the identifiers in the buffer." start="00:14:40.206" video="mainVideo" id=subtitle]]
[[!template text="This s side tells tree-sitter to capture a node." start="00:14:49.040" video="mainVideo" id=subtitle]]
[[!template text="In the context of the query builder," start="00:14:53.120" video="mainVideo" id=subtitle]]
[[!template text="it's not very important," start="00:14:55.507" video="mainVideo" id=subtitle]]
[[!template text="but in normal highlighting query," start="00:14:57.360" video="mainVideo" id=subtitle]]
[[!template text="this will determine" start="00:14:59.706" video="mainVideo" id=subtitle]]
[[!template text="the face used to highlight the note." start="00:15:01.760" video="mainVideo" id=subtitle]]
[[!template text="Suppose we want to capture" start="00:15:06.639" video="mainVideo" id=subtitle]]
[[!template text="all the function names," start="00:15:08.256" video="mainVideo" id=subtitle]]
[[!template text="instead of just any identifier." start="00:15:10.320" video="mainVideo" id=subtitle]]
[[!template text="You can improve the query like this." start="00:15:13.519" video="mainVideo" id=subtitle]]
[[!template text="This will highlight the whole definition." start="00:15:29.440" video="mainVideo" id=subtitle]]
[[!template text="But we only want to capture the function name," start="00:15:32.639" video="mainVideo" id=subtitle]]
[[!template text="which means the identifier here." start="00:15:36.399" video="mainVideo" id=subtitle]]
[[!template text="So we move the capture to after the identifier node." start="00:15:41.054" video="mainVideo" id=subtitle]]
[[!template text="If we want to capture the class names as well," start="00:15:49.600" video="mainVideo" id=subtitle]]
[[!template text="we just add another pattern." start="00:15:52.959" video="mainVideo" id=subtitle]]
[[!template new="1" text="Let's look at a more practical example." start="00:16:10.079" video="mainVideo" id=subtitle]]
[[!template text="Here we can see that single-quoted strings" start="00:16:20.320" video="mainVideo" id=subtitle]]
[[!template text="and double-quoted strings are highlighted the same." start="00:16:23.468" video="mainVideo" id=subtitle]]
[[!template text="But in some places," start="00:16:27.279" video="mainVideo" id=subtitle]]
[[!template text="because of some coding conventions," start="00:16:30.399" video="mainVideo" id=subtitle]]
[[!template text="it may be desirable to highlight them differently." start="00:16:33.440" video="mainVideo" id=subtitle]]
[[!template text="For example, if the string is single-quoted," start="00:16:36.373" video="mainVideo" id=subtitle]]
[[!template text="we may want to highlight it as a constant." start="00:16:39.073" video="mainVideo" id=subtitle]]
[[!template text="Let's try to see whether we can" start="00:16:44.399" video="mainVideo" id=subtitle]]
[[!template text="distinguish these two cases." start="00:16:46.160" video="mainVideo" id=subtitle]]
[[!template text="So here we get all the strings." start="00:16:56.240" video="mainVideo" id=subtitle]]
[[!template text="If we want to see if it's single quotes" start="00:17:00.639" video="mainVideo" id=subtitle]]
[[!template text="or double quote strings," start="00:17:04.079" video="mainVideo" id=subtitle]]
[[!template text="we can try looking at the first character of the string--" start="00:17:08.799" video="mainVideo" id=subtitle]]
[[!template text="I mean the first character of the node--" start="00:17:13.436" video="mainVideo" id=subtitle]]
[[!template text="to check whether it's a single quote or a double quote." start="00:17:16.720" video="mainVideo" id=subtitle]]
[[!template text="So for that, we use tree-sitter's support for predicates." start="00:17:33.600" video="mainVideo" id=subtitle]]
[[!template text="In this case, we use a match predicate" start="00:17:38.920" video="mainVideo" id=subtitle]]
[[!template text="to check whether the string-- whether the node starts" start="00:17:43.360" video="mainVideo" id=subtitle]]
[[!template text="with a single quote." start="00:17:47.339" video="mainVideo" id=subtitle]]
[[!template text="And with this pattern," start="00:17:49.556" video="mainVideo" id=subtitle]]
[[!template text="we only capture the single-quotes strings." start="00:17:51.280" video="mainVideo" id=subtitle]]
[[!template text="Let's try to give it a different face." start="00:18:00.400" video="mainVideo" id=subtitle]]
[[!template text="So we copy the pattern," start="00:18:03.760" video="mainVideo" id=subtitle]]
[[!template text="and we add this pattern for Python only." start="00:18:13.039" video="mainVideo" id=subtitle]]
[[!template text="But we also want to give the capture a different name." start="00:18:25.120" video="mainVideo" id=subtitle]]
[[!template text="Let's say we want to highlight it as a keyword." start="00:18:31.440" video="mainVideo" id=subtitle]]
[[!template text="And now, if we refresh the buffer," start="00:18:46.559" video="mainVideo" id=subtitle]]
[[!template text="we see that single quote strings" start="00:19:06.320" video="mainVideo" id=subtitle]]
[[!template text="are highlighted as keywords." start="00:19:08.523" video="mainVideo" id=subtitle]]
[[!template new="1" text="The highlighting patterns" start="00:19:14.400" video="mainVideo" id=subtitle]]
[[!template text="can also be set for a single project" start="00:19:15.751" video="mainVideo" id=subtitle]]
[[!template text="using directory-local variables." start="00:19:19.200" video="mainVideo" id=subtitle]]
[[!template text="For example, let's take a look at Emacs's source code." start="00:19:23.440" video="mainVideo" id=subtitle]]
[[!template text="So in Emacs's C source, there are a lot of uses" start="00:19:35.760" video="mainVideo" id=subtitle]]
[[!template text="of these different macros" start="00:19:41.123" video="mainVideo" id=subtitle]]
[[!template text="to define functions," start="00:19:43.760" video="mainVideo" id=subtitle]]
[[!template text="and you can see this is actually the function name," start="00:19:47.679" video="mainVideo" id=subtitle]]
[[!template text="but it's highlighted as the string." start="00:19:53.256" video="mainVideo" id=subtitle]]
[[!template text="So what we want is to somehow recognize this pattern" start="00:19:56.373" video="mainVideo" id=subtitle]]
[[!template text="and highlight it." start="00:20:03.679" video="mainVideo" id=subtitle]]
[[!template text="Highlight this part" start="00:20:07.600" video="mainVideo" id=subtitle]]
[[!template text="with the function face instead." start="00:20:11.280" video="mainVideo" id=subtitle]]
[[!template text="In order to do that," start="00:20:14.559" video="mainVideo" id=subtitle]]
[[!template text="we put a pattern in this project's directory-local settings file." start="00:20:17.679" video="mainVideo" id=subtitle]]
[[!template text="So we can put this button in the C mode section." start="00:20:31.760" video="mainVideo" id=subtitle]]
[[!template text="And now, if we enable tree-sitter," start="00:20:40.159" video="mainVideo" id=subtitle]]
[[!template text="you can see that this is highlighted" start="00:20:48.000" video="mainVideo" id=subtitle]]
[[!template text="as a normal function definition." start="00:20:53.200" video="mainVideo" id=subtitle]]
[[!template text="So this is the function face like we wanted." start="00:20:55.056" video="mainVideo" id=subtitle]]
[[!template text="The pattern for this is actually pretty simple." start="00:21:01.200" video="mainVideo" id=subtitle]]
[[!template text="It's only this part." start="00:21:07.200" video="mainVideo" id=subtitle]]
[[!template text="So if it's a function call" start="00:21:12.373" video="mainVideo" id=subtitle]]
[[!template text="where the name of the function is defun," start="00:21:16.456" video="mainVideo" id=subtitle]]
[[!template text="then we highlight the defun as a keyword," start="00:21:19.679" video="mainVideo" id=subtitle]]
[[!template text="and then the first string element," start="00:21:24.240" video="mainVideo" id=subtitle]]
[[!template text="we highlight it as a function name." start="00:21:26.923" video="mainVideo" id=subtitle]]
[[!template new="1" text="Since the language objects are actually native code," start="00:21:35.360" video="mainVideo" id=subtitle]]
[[!template text="they have to be compiled for each platform" start="00:21:39.280" video="mainVideo" id=subtitle]]
[[!template text="that we want to support." start="00:21:41.459" video="mainVideo" id=subtitle]]
[[!template text="This will become a big obstacle for tree-sitter adoption." start="00:21:43.440" video="mainVideo" id=subtitle]]
[[!template text="Therefore, I've created a language bundle package, tree-sitter-langs," start="00:21:48.159" video="mainVideo" id=subtitle]]
[[!template text="that takes care of pre-compiling the grammars," start="00:21:52.960" video="mainVideo" id=subtitle]]
[[!template text="the most common grammars for all three major platforms." start="00:21:55.773" video="mainVideo" id=subtitle]]
[[!template text="It also takes care of distributing these binaries" start="00:22:01.600" video="mainVideo" id=subtitle]]
[[!template text="and provides some highlighting queries" start="00:22:05.360" video="mainVideo" id=subtitle]]
[[!template text="for some of the languages." start="00:22:08.080" video="mainVideo" id=subtitle]]
[[!template text="It should be noted that this package" start="00:22:11.440" video="mainVideo" id=subtitle]]
[[!template text="should be treated as a temporary distribution mechanism only," start="00:22:13.760" video="mainVideo" id=subtitle]]
[[!template text="to help with bootstrapping tree-sitter adoption." start="00:22:19.919" video="mainVideo" id=subtitle]]
[[!template text="The plan is that eventually these files" start="00:22:24.720" video="mainVideo" id=subtitle]]
[[!template text="should be provided by" start="00:22:27.760" video="mainVideo" id=subtitle]]
[[!template text="the language major modes themselves." start="00:22:29.156" video="mainVideo" id=subtitle]]
[[!template text="But in order to do that, we need better tooling," start="00:22:32.480" video="mainVideo" id=subtitle]]
[[!template text="so we're not there yet." start="00:22:36.320" video="mainVideo" id=subtitle]]
[[!template new="1" text="Since the core already works reasonably well," start="00:22:40.240" video="mainVideo" id=subtitle]]
[[!template text="there are several areas that would benefit" start="00:22:43.280" video="mainVideo" id=subtitle]]
[[!template text="from the community's contribution." start="00:22:45.289" video="mainVideo" id=subtitle]]
[[!template text="So tree-sitter's upstream language repositories" start="00:22:49.120" video="mainVideo" id=subtitle]]
[[!template text="already contain highlighting queries on their own." start="00:22:52.640" video="mainVideo" id=subtitle]]
[[!template text="However, they are pretty basic," start="00:22:55.679" video="mainVideo" id=subtitle]]
[[!template text="and they may not fit well with existing Emacs conventions." start="00:22:57.573" video="mainVideo" id=subtitle]]
[[!template text="Therefore, the language bundle has its own set of highlighting queries." start="00:23:02.559" video="mainVideo" id=subtitle]]
[[!template text="This requires maintenance until language major modes adopt tree-sitter" start="00:23:07.120" video="mainVideo" id=subtitle]]
[[!template text="and maintain the queries on their own." start="00:23:12.556" video="mainVideo" id=subtitle]]
[[!template text="The queries are actually quite easy to write," start="00:23:16.640" video="mainVideo" id=subtitle]]
[[!template text="as you've already seen." start="00:23:19.056" video="mainVideo" id=subtitle]]
[[!template text="You just need to be familiar with the language," start="00:23:22.000" video="mainVideo" id=subtitle]]
[[!template text="familiar enough to come up with sensible highlighting patterns." start="00:23:25.360" video="mainVideo" id=subtitle]]
[[!template text="And if you are a maintainer of a language major mode," start="00:23:35.200" video="mainVideo" id=subtitle]]
[[!template text="you may want to consider integrating tree-sitter into your mode," start="00:23:39.679" video="mainVideo" id=subtitle]]
[[!template text="initially maybe as an optional feature." start="00:23:44.189" video="mainVideo" id=subtitle]]
[[!template text="The integration is actually pretty straightforward," start="00:23:48.573" video="mainVideo" id=subtitle]]
[[!template text="especially for syntax highlighting." start="00:23:53.279" video="mainVideo" id=subtitle]]
[[!template text="Or alternatively," start="00:23:56.640" video="mainVideo" id=subtitle]]
[[!template text="you can also try writing a new major mode from scratch" start="00:24:01.520" video="mainVideo" id=subtitle]]
[[!template text="that relies on tree-sitter" start="00:24:05.760" video="mainVideo" id=subtitle]]
[[!template text="from the very beginning." start="00:24:08.000" video="mainVideo" id=subtitle]]
[[!template text="The code for such a major mode is quite simple." start="00:24:12.559" video="mainVideo" id=subtitle]]
[[!template text="For example, this is the proposed" start="00:24:17.523" video="mainVideo" id=subtitle]]
[[!template text="wat-mode for web assembly." start="00:24:23.200" video="mainVideo" id=subtitle]]
[[!template text="The code is just one page of code, not a lot." start="00:24:26.240" video="mainVideo" id=subtitle]]
[[!template text="You can also try writing new minor modes" start="00:24:39.520" video="mainVideo" id=subtitle]]
[[!template text="or writing integration packages." start="00:24:42.720" video="mainVideo" id=subtitle]]
[[!template text="For example, a lot of packages" start="00:24:46.559" video="mainVideo" id=subtitle]]
[[!template text="may benefit from tree-sitter integration," start="00:24:50.880" video="mainVideo" id=subtitle]]
[[!template text="but no one has written the integration yet." start="00:24:54.559" video="mainVideo" id=subtitle]]
[[!template new="1" text="If you are interested in tree-sitter," start="00:25:02.960" video="mainVideo" id=subtitle]]
[[!template text="you can use these links to learn more about it." start="00:25:04.836" video="mainVideo" id=subtitle]]
[[!template text="I think that's it for me today." start="00:25:08.023" video="mainVideo" id=subtitle]]
[[!template text="I'm happy to answer any questions." start="00:25:11.440" video="mainVideo" id=subtitle]]