summaryrefslogtreecommitdiffstats
path: root/2020/subtitles/emacsconf-2020--23-incremental-parsing-with-emacs-tree-sitter--tuan-anh-nguyen.vtt
diff options
context:
space:
mode:
Diffstat (limited to '2020/subtitles/emacsconf-2020--23-incremental-parsing-with-emacs-tree-sitter--tuan-anh-nguyen.vtt')
-rw-r--r--2020/subtitles/emacsconf-2020--23-incremental-parsing-with-emacs-tree-sitter--tuan-anh-nguyen.vtt1235
1 files changed, 1235 insertions, 0 deletions
diff --git a/2020/subtitles/emacsconf-2020--23-incremental-parsing-with-emacs-tree-sitter--tuan-anh-nguyen.vtt b/2020/subtitles/emacsconf-2020--23-incremental-parsing-with-emacs-tree-sitter--tuan-anh-nguyen.vtt
new file mode 100644
index 0000000..276f315
--- /dev/null
+++ b/2020/subtitles/emacsconf-2020--23-incremental-parsing-with-emacs-tree-sitter--tuan-anh-nguyen.vtt
@@ -0,0 +1,1235 @@
+WEBVTT
+
+00:00:01.520 --> 00:00:04.400
+Hello, everyone! My name is Tuấn-Anh.
+
+00:00:04.400 --> 00:00:07.200
+I've been using Emacs for about 10 years.
+
+00:00:07.200 --> 00:00:09.280
+Today, I'm going to talk about tree-sitter,
+
+00:00:09.280 --> 00:00:11.351
+a new Emacs package that allows Emacs
+
+00:00:11.351 --> 00:00:17.840
+to parse multiple programming languages
+in real-time.
+
+00:00:17.840 --> 00:00:21.840
+So what is the problem statement?
+
+00:00:21.840 --> 00:00:24.131
+In order to support programming
+functionalities
+
+00:00:24.131 --> 00:00:25.760
+for a particular language,
+
+00:00:25.760 --> 00:00:27.680
+a text editor needs to have some degree
+
+00:00:27.680 --> 00:00:29.679
+of language understanding.
+
+00:00:29.679 --> 00:00:31.840
+Traditionally, text editors have relied
+
+00:00:31.840 --> 00:00:34.960
+very heavily on regular expressions for
+this.
+
+00:00:34.960 --> 00:00:37.013
+Emacs is no different.
+
+00:00:37.013 --> 00:00:40.170
+Most language major modes use regular
+expressions
+
+00:00:40.170 --> 00:00:42.960
+for syntax-highlighting, code navigation,
+
+00:00:42.960 --> 00:00:46.618
+folding, indexing, and so on.
+
+00:00:46.618 --> 00:00:50.559
+Regular expressions are problematic for
+a couple of reasons.
+
+00:00:50.559 --> 00:00:53.778
+They're slow and inaccurate.
+
+00:00:53.778 --> 00:00:56.800
+They also make the code hard to read and
+write.
+
+00:00:56.800 --> 00:01:01.199
+Sometimes it's because the regular
+expressions themselves are very hairy,
+
+00:01:01.199 --> 00:01:05.199
+and sometimes because they are just not
+powerful enough.
+
+00:01:05.199 --> 00:01:08.625
+Some helper code is usually needed
+
+00:01:08.625 --> 00:01:11.200
+to parse more intricate language
+features.
+
+00:01:11.200 --> 00:01:16.159
+That also illustrates the core problem
+with regular expressions,
+
+00:01:16.159 --> 00:01:21.119
+in that they are not powerful enough to
+parse programming languages.
+
+00:01:21.119 --> 00:01:25.040
+An example feature that regular
+expressions cannot handle very well
+
+00:01:25.040 --> 00:01:28.320
+is string interpolation, which is a very
+common feature
+
+00:01:28.320 --> 00:01:31.680
+in many modern programming languages.
+
+00:01:31.680 --> 00:01:34.079
+It would be much nicer if Emacs somehow
+
+00:01:34.079 --> 00:01:39.520
+had structural understanding of source
+code, like IDEs do.
+
+00:01:39.520 --> 00:01:41.981
+There have been multiple efforts
+
+00:01:41.981 --> 00:01:45.280
+to bring this kind of programming
+language understanding into Emacs.
+
+00:01:45.280 --> 00:01:47.119
+There are language-specific parsers
+
+00:01:47.119 --> 00:01:48.640
+written in Elisp
+
+00:01:48.640 --> 00:01:50.675
+that can be thought of
+
+00:01:50.675 --> 00:01:51.989
+as the next logical step
+of the glue code
+
+00:01:51.989 --> 00:01:53.856
+on top of regular expressions,
+
+00:01:53.856 --> 00:01:57.356
+moving from partial local pattern
+recognition
+
+00:01:57.356 --> 00:01:59.840
+into a full-fledged parser.
+
+00:01:59.840 --> 00:02:02.023
+The most prominent example of this
+approach
+
+00:02:02.023 --> 00:02:06.479
+is probably the famous js2-mode.
+
+00:02:06.479 --> 00:02:10.080
+However, this approach has several issues.
+
+00:02:10.080 --> 00:02:12.606
+Parsing is computationally expensive,
+
+00:02:12.606 --> 00:02:16.800
+and Emacs Lisp is not good at that kind
+of stuff.
+
+00:02:16.800 --> 00:02:19.156
+Furthermore, maintenance is very
+troublesome.
+
+00:02:19.156 --> 00:02:22.160
+In order to work on these parsers,
+
+00:02:22.160 --> 00:02:24.239
+first, you have to know Elisp
+well enough,
+
+00:02:24.239 --> 00:02:26.606
+and then you have to be comfortable with
+
+00:02:26.606 --> 00:02:29.739
+writing a recursive descending parser,
+
+00:02:29.739 --> 00:02:34.000
+while constantly keeping up with changes
+to the language itself,
+
+00:02:34.000 --> 00:02:36.356
+which can be evolving very quickly,
+
+00:02:36.356 --> 00:02:39.360
+like Javascript, for example.
+
+00:02:39.360 --> 00:02:42.373
+Together, these constraints
+significantly reduce
+
+00:02:42.373 --> 00:02:45.680
+the pool of potential maintainers.
+
+00:02:45.680 --> 00:02:47.760
+The biggest issue, though, in my opinion,
+
+00:02:47.760 --> 00:02:52.139
+is lack of the set of generic and
+reusable APIs.
+
+00:02:52.139 --> 00:02:54.319
+This makes them very hard to use
+
+00:02:54.319 --> 00:02:55.920
+for minor modes that want to deal with
+
+00:02:55.920 --> 00:02:59.920
+cross-cutting concerns across multiple
+languages.
+
+00:02:59.920 --> 00:03:01.760
+The other approach which has been
+
+00:03:01.760 --> 00:03:04.319
+gaining a lot of momentum
+in recent years
+
+00:03:04.319 --> 00:03:06.560
+is externalizing language understanding
+
+00:03:06.560 --> 00:03:08.159
+to another process,
+
+00:03:08.159 --> 00:03:12.239
+also known as language server protocol.
+
+00:03:12.239 --> 00:03:16.560
+This second approach is actually a very
+interesting one.
+
+00:03:16.560 --> 00:03:18.400
+By decoupling language understanding
+
+00:03:18.400 --> 00:03:21.280
+from the editing facility itself,
+
+00:03:21.280 --> 00:03:25.120
+the LSP servers can attract a lot more
+contributors,
+
+00:03:25.120 --> 00:03:27.189
+which makes maintenance easier.
+
+00:03:27.189 --> 00:03:32.400
+However, they also have several issues
+of their own.
+
+00:03:32.400 --> 00:03:34.089
+Being a separate process,
+
+00:03:34.089 --> 00:03:37.073
+they are usually more
+resource-intensive,
+
+00:03:37.073 --> 00:03:39.920
+and depending on the language,
+
+00:03:39.920 --> 00:03:42.159
+the LSP server itself can bring with it
+
+00:03:42.159 --> 00:03:44.640
+a host of additional dependencies
+
+00:03:44.640 --> 00:03:50.640
+external to Emacs, which may be messy to
+install and manage.
+
+00:03:50.640 --> 00:03:55.120
+Furthermore, JSON over RPC has pretty
+high latency.
+
+00:03:55.120 --> 00:03:57.840
+For one-off tasks like jumping to source
+
+00:03:57.840 --> 00:04:00.879
+or on-demand completion, it's great.
+
+00:04:00.879 --> 00:04:03.040
+But for things like code highlighting,
+
+00:04:03.040 --> 00:04:06.000
+the latency is just too much.
+
+00:04:06.000 --> 00:04:08.319
+I was using Rust and I was following the
+
+00:04:08.319 --> 00:04:11.760
+community effort to improve its
+IDE support,
+
+00:04:11.760 --> 00:04:15.760
+hoping to integrate some of that into
+Emacs itself.
+
+00:04:15.760 --> 00:04:19.759
+Then I heard someone from the community
+mention tree-sitter,
+
+00:04:19.759 --> 00:04:23.360
+and I decided to check it out.
+
+00:04:23.360 --> 00:04:28.720
+Basically, tree-sitter is an incremental
+parsing library and a parser generator.
+
+00:04:28.720 --> 00:04:33.040
+It was introduced by the Atom editor in
+2018.
+
+00:04:33.040 --> 00:04:35.923
+Besides Atom, it is also being
+integrated
+
+00:04:35.923 --> 00:04:37.623
+into the NeoVim editor,
+
+00:04:37.623 --> 00:04:41.040
+and Github is using it to power
+
+00:04:41.040 --> 00:04:42.423
+their source code analysis
+
+00:04:42.423 --> 00:04:45.840
+and navigation features.
+
+00:04:45.840 --> 00:04:48.639
+It is written in C and can be compiled
+
+00:04:48.639 --> 00:04:50.623
+for all major platforms.
+
+00:04:50.623 --> 00:04:53.120
+It can even be compiled
+
+00:04:53.120 --> 00:04:55.323
+to web assembly to run on the web.
+
+00:04:55.323 --> 00:05:00.800
+That's how Github is using it
+on their website.
+
+00:05:00.800 --> 00:05:05.840
+So why is tree-sitter an interesting
+solution to this problem?
+
+00:05:05.840 --> 00:05:10.000
+There are multiple features that make it
+an attractive option.
+
+00:05:10.000 --> 00:05:11.839
+It is designed to be fast.
+
+00:05:11.839 --> 00:05:13.680
+By being incremental,
+
+00:05:13.680 --> 00:05:15.680
+the initial parse of a typical big file
+
+00:05:15.680 --> 00:05:18.160
+can take tens of milliseconds,
+
+00:05:18.160 --> 00:05:20.240
+while subsequent incremental processes
+
+00:05:20.240 --> 00:05:22.560
+are sub-millisecond.
+
+00:05:22.560 --> 00:05:26.240
+It achieves this by using
+structural sharing,
+
+00:05:26.240 --> 00:05:29.360
+meaning replacing only affected nodes
+
+00:05:29.360 --> 00:05:32.960
+in the old tree when it needs to.
+
+00:05:32.960 --> 00:05:37.120
+Also, unlike LSP, being in
+the same process,
+
+00:05:37.120 --> 00:05:40.639
+it has much lower latency.
+
+00:05:40.639 --> 00:05:44.960
+Secondly, it provides a uniform
+programming interface.
+
+00:05:44.960 --> 00:05:47.039
+The same data structures and functions
+
+00:05:47.039 --> 00:05:50.400
+work on parse trees of different
+languages.
+
+00:05:50.400 --> 00:05:52.160
+Syntax nodes of different languages
+
+00:05:52.160 --> 00:05:54.160
+differ only by their types
+
+00:05:54.160 --> 00:05:55.723
+and their possible child nodes.
+
+00:05:55.723 --> 00:06:02.240
+This is a big advantage over
+language-specific parsers.
+
+00:06:02.240 --> 00:06:06.880
+Thirdly, it's written in self-contained
+embeddable C.
+
+00:06:06.880 --> 00:06:11.723
+As I mentioned previously, it can even
+be compiled to webassembly.
+
+00:06:11.723 --> 00:06:16.106
+This makes integrating it into various
+editors quite easy
+
+00:06:16.106 --> 00:06:22.880
+without having to install any external
+dependencies.
+
+00:06:22.880 --> 00:06:25.503
+One thing that is not mentioned here
+
+00:06:25.503 --> 00:06:28.000
+is that being a parser generator,
+
+00:06:28.000 --> 00:06:31.039
+its grammars are declarative.
+
+00:06:31.039 --> 00:06:34.880
+Together with being editor-independent,
+
+00:06:34.880 --> 00:06:39.139
+this makes the pool of potential
+contributors much larger.
+
+00:06:39.139 --> 00:06:45.520
+So I was convinced that tree-sitter is a
+good fit for Emacs.
+
+00:06:45.520 --> 00:06:48.000
+Last year, I started writing the bindings
+
+00:06:48.000 --> 00:06:53.280
+using dynamic module support introduced
+in Emacs 25.
+
+00:06:53.280 --> 00:06:58.479
+Dynamic module means there is
+platform-specific native code involved,
+
+00:06:58.479 --> 00:07:00.560
+but since there are pre-compiled binaries
+
+00:07:00.560 --> 00:07:02.880
+for the three major platforms,
+
+00:07:02.880 --> 00:07:04.706
+it should work in most places.
+
+00:07:04.706 --> 00:07:09.440
+Currently, the core functionalities are
+in a pretty good shape.
+
+00:07:09.440 --> 00:07:12.560
+Syntax highlighting is working nicely.
+
+00:07:12.560 --> 00:07:16.080
+The whole thing is split into three
+packages.
+
+00:07:16.080 --> 00:07:20.319
+tree-sitter is the main package that
+other packages should depend on.
+
+00:07:20.319 --> 00:07:22.800
+tree-sitter-langs is the language bundle
+
+00:07:22.800 --> 00:07:24.000
+that includes support
+
+00:07:24.000 --> 00:07:27.199
+for most common languages.
+
+00:07:27.199 --> 00:07:32.160
+And finally, the core APIs are in the
+package tsc,
+
+00:07:32.160 --> 00:07:36.160
+which stands for tree-sitter-core.
+
+00:07:36.160 --> 00:07:38.800
+It is the implicit dependency of the
+
+00:07:38.800 --> 00:07:43.520
+tree-sitter package.
+
+00:07:43.520 --> 00:07:47.520
+The main package includes the minor mode
+tree-sitter-mode.
+
+00:07:47.520 --> 00:07:52.560
+This provides the base for other major
+or minor modes to build on.
+
+00:07:52.560 --> 00:07:54.839
+Using Emacs's change tracking hooks,
+
+00:07:54.839 --> 00:07:57.073
+it enables incremental parsing
+
+00:07:57.073 --> 00:08:00.800
+and provides a syntax tree that is
+always up to date
+
+00:08:00.800 --> 00:08:04.080
+after any edits in a buffer.
+
+00:08:04.080 --> 00:08:06.223
+There is also a basic debug mode
+
+00:08:06.223 --> 00:08:10.080
+that shows the parse tree in
+another buffer.
+
+00:08:10.080 --> 00:08:13.360
+Here is a quick demo.
+
+00:08:13.360 --> 00:08:15.673
+Here I'm in an empty Python buffer
+
+00:08:15.673 --> 00:08:17.520
+with tree-sitter enabled.
+
+00:08:17.520 --> 00:08:19.440
+I'm going to turn on the debug mode to
+
+00:08:19.440 --> 00:08:26.560
+see the parse tree.
+
+00:08:26.560 --> 00:08:28.106
+Since the buffer is empty,
+
+00:08:28.106 --> 00:08:30.423
+there is only one node in the
+syntax tree:
+
+00:08:30.423 --> 00:08:33.279
+the top-level module node.
+
+00:08:33.279 --> 00:09:11.040
+Let's try typing some code.
+
+00:09:11.040 --> 00:09:14.640
+As you can see, as I type into the
+Python buffer,
+
+00:09:14.640 --> 00:09:19.120
+the syntax tree updates in real time.
+
+00:09:19.120 --> 00:09:22.039
+The other minor mode included in the
+main package
+
+00:09:22.039 --> 00:09:24.389
+is tree-sitter-hl-mode.
+
+00:09:24.389 --> 00:09:26.349
+It overrides font-lock mode
+
+00:09:26.349 --> 00:09:28.480
+and provides its own set of phases
+
+00:09:28.480 --> 00:09:30.139
+and customization options
+
+00:09:30.139 --> 00:09:32.800
+It is query-driven.
+
+00:09:32.800 --> 00:09:36.240
+That means instead of regular
+expressions,
+
+00:09:36.240 --> 00:09:39.518
+it uses a Lisp-like query language
+
+00:09:39.518 --> 00:09:40.320
+to map syntax nodes
+
+00:09:40.320 --> 00:09:41.923
+to highlighting phrases.
+
+00:09:41.923 --> 00:09:45.760
+I'm going to open a python file with
+small snippets
+
+00:09:45.760 --> 00:09:54.320
+that showcase syntax highlighting.
+
+00:09:54.320 --> 00:09:55.920
+So this is the default highlighting
+
+00:09:55.920 --> 00:10:00.880
+provided by python-mode.
+
+00:10:00.880 --> 00:10:04.640
+This is the highlighting enabled
+by tree-sitter.
+
+00:10:04.640 --> 00:10:07.680
+As you can see, string interpolation
+
+00:10:07.680 --> 00:10:11.680
+and decorators are highlighted correctly.
+
+00:10:11.680 --> 00:10:17.440
+Function calls are also highlighted.
+
+00:10:17.440 --> 00:10:21.839
+You can also note that
+property accessors
+
+00:10:21.839 --> 00:10:27.440
+and property assignments are highlighted
+differently.
+
+00:10:27.440 --> 00:10:29.360
+What I like the most about this is that
+
+00:10:29.360 --> 00:10:32.640
+new bindings are consistently
+highlighted.
+
+00:10:32.640 --> 00:10:36.320
+This included local variables,
+
+00:10:36.320 --> 00:10:45.760
+function parameters, and property
+mutations.
+
+00:10:45.760 --> 00:10:48.000
+Before going through the tree queries
+
+00:10:48.000 --> 00:10:49.279
+and the syntax highlighting
+
+00:10:49.279 --> 00:10:51.680
+customization options,
+
+00:10:51.680 --> 00:10:53.339
+let's take a brief look at
+
+00:10:53.339 --> 00:10:55.040
+the core data structures and functions
+
+00:10:55.040 --> 00:10:58.079
+that tree-sitter provides.
+
+00:10:58.079 --> 00:11:00.743
+So parsing is done with the help of
+
+00:11:00.743 --> 00:11:02.240
+a generic parser object.
+
+00:11:02.240 --> 00:11:04.160
+A single parser object can be used to
+
+00:11:04.160 --> 00:11:06.000
+parse different languages
+
+00:11:06.000 --> 00:11:09.279
+by sending different language objects to
+it.
+
+00:11:09.279 --> 00:11:10.880
+The language objects themselves are
+
+00:11:10.880 --> 00:11:14.079
+loaded from shared libraries.
+
+00:11:14.079 --> 00:11:16.079
+Since tree-sitter-mmode already handles
+
+00:11:16.079 --> 00:11:17.360
+the parsing part,
+
+00:11:17.360 --> 00:11:19.440
+we will instead focus on the functions
+
+00:11:19.440 --> 00:11:20.800
+that inspect nodes,
+
+00:11:20.800 --> 00:11:25.279
+and in the resulting path tree,
+
+00:11:25.279 --> 00:11:27.030
+we can ask tree-sitter what is
+
+00:11:27.030 --> 00:11:44.240
+the syntax node at point.
+
+00:11:44.240 --> 00:11:48.480
+This is an opaque object, so this is not
+very useful.
+
+00:11:48.480 --> 00:12:03.760
+We can instead ask what is its type.
+
+00:12:03.760 --> 00:12:08.959
+So its type is the symbol comparison
+operator.
+
+00:12:08.959 --> 00:12:11.600
+In tree-sitter, there are two kinds of nodes,
+
+00:12:11.600 --> 00:12:13.680
+anonymous nodes and named nodes.
+
+00:12:13.680 --> 00:12:17.040
+Anonymous nodes correspond to simple
+grammar elements
+
+00:12:17.040 --> 00:12:21.279
+like keywords, operators, punctuations,
+and so on.
+
+00:12:21.279 --> 00:12:24.656
+Name nodes, on the other hand, are
+grammar elements
+
+00:12:24.656 --> 00:12:26.639
+that are interesting enough
+on their own
+
+00:12:26.639 --> 00:12:30.029
+to have a name, like an identifier,
+
+00:12:30.029 --> 00:12:35.440
+an expression, or a function definition.
+
+00:12:35.440 --> 00:12:37.323
+Name node types are symbols,
+
+00:12:37.323 --> 00:12:42.639
+while anonymous node types are strings.
+
+00:12:42.639 --> 00:12:49.760
+For example, if we are on this
+comparison operator,
+
+00:12:49.760 --> 00:12:55.920
+the node type should be a string.
+
+00:12:55.920 --> 00:12:58.959
+We can also get other information about
+the node.
+
+00:12:58.959 --> 00:13:09.680
+For example: what is this text,
+
+00:13:09.680 --> 00:13:20.800
+or where it is in the buffer,
+
+00:13:20.800 --> 00:13:43.199
+or what is its parent.
+
+00:13:43.199 --> 00:13:46.106
+There are many other APIs to query
+
+00:13:46.106 --> 00:13:52.639
+our node's properties.
+
+00:13:52.639 --> 00:13:54.234
+tree-sitter allows searching
+
+00:13:54.234 --> 00:13:58.240
+for structural patterns
+within a parse tree.
+
+00:13:58.240 --> 00:14:01.440
+It does so through a Lisp-like language.
+
+00:14:01.440 --> 00:14:04.639
+This language supports matching
+by node types,
+
+00:14:04.639 --> 00:14:07.760
+field names, and predicates.
+
+00:14:07.760 --> 00:14:12.639
+It also allows capturing nodes for
+further processing.
+
+00:14:12.639 --> 00:14:37.680
+Let's try to see some examples.
+
+00:14:37.680 --> 00:14:40.206
+So in this very simple query,
+
+00:14:40.206 --> 00:14:49.040
+we just try to highlight all the
+identifiers in the buffer.
+
+00:14:49.040 --> 00:14:53.120
+This s side tells tree-sitter
+to capture a node.
+
+00:14:53.120 --> 00:14:55.507
+In the context of the query builder,
+
+00:14:55.507 --> 00:14:57.360
+it's not very important,
+
+00:14:57.360 --> 00:14:59.706
+but in normal highlighting query,
+
+00:14:59.706 --> 00:15:01.760
+this will determine
+
+00:15:01.760 --> 00:15:06.639
+the face used to highlight the note.
+
+00:15:06.639 --> 00:15:08.256
+Suppose we want to capture
+
+00:15:08.256 --> 00:15:10.320
+all the function names,
+
+00:15:10.320 --> 00:15:13.519
+instead of just any identifier.
+
+00:15:13.519 --> 00:15:29.440
+You can improve the query like this.
+
+00:15:29.440 --> 00:15:32.639
+This will highlight the whole definition.
+
+00:15:32.639 --> 00:15:36.399
+But we only want to capture
+the function name,
+
+00:15:36.399 --> 00:15:41.054
+which means the identifier here.
+
+00:15:41.054 --> 00:15:49.600
+So we move the capture to after the
+identifier node.
+
+00:15:49.600 --> 00:15:52.959
+If we want to capture the
+class names as well,
+
+00:15:52.959 --> 00:16:10.079
+we just add another pattern.
+
+00:16:10.079 --> 00:16:20.320
+Let's look at a more practical example.
+
+00:16:20.320 --> 00:16:23.468
+Here we can see that
+single-quoted strings
+
+00:16:23.468 --> 00:16:27.279
+and double-quoted strings are
+highlighted the same.
+
+00:16:27.279 --> 00:16:30.399
+But in some places,
+
+00:16:30.399 --> 00:16:33.440
+because of some coding conventions,
+
+00:16:33.440 --> 00:16:36.373
+it may be desirable to highlight them
+differently.
+
+00:16:36.373 --> 00:16:39.073
+For example, if the string is
+single-quoted,
+
+00:16:39.073 --> 00:16:44.399
+we may want to highlight it as a
+constant.
+
+00:16:44.399 --> 00:16:46.160
+Let's try to see whether we can
+
+00:16:46.160 --> 00:16:56.240
+distinguish these two cases.
+
+00:16:56.240 --> 00:17:00.639
+So here we get all the strings.
+
+00:17:00.639 --> 00:17:04.079
+If we want to see if it's single quotes
+
+00:17:04.079 --> 00:17:08.799
+or double quote strings,
+
+00:17:08.799 --> 00:17:13.436
+we can try looking at the first
+character of the string--
+
+00:17:13.436 --> 00:17:16.720
+I mean the first character of the node--
+
+00:17:16.720 --> 00:17:33.600
+to check whether it's a single quote or
+a double quote.
+
+00:17:33.600 --> 00:17:38.920
+So for that, we use tree-sitter's
+support for predicates.
+
+00:17:38.920 --> 00:17:43.360
+In this case, we use a match predicate
+
+00:17:43.360 --> 00:17:47.339
+to check whether the string--
+whether the node starts
+
+00:17:47.339 --> 00:17:49.556
+with a single quote.
+
+00:17:49.556 --> 00:17:51.280
+And with this pattern,
+
+00:17:51.280 --> 00:18:00.400
+we only capture the single-quotes
+strings.
+
+00:18:00.400 --> 00:18:03.760
+Let's try to give it a different face.
+
+00:18:03.760 --> 00:18:13.039
+So we copy the pattern,
+
+00:18:13.039 --> 00:18:25.120
+and we add this pattern for Python only.
+
+00:18:25.120 --> 00:18:31.440
+But we also want to give the capture
+a different name.
+
+00:18:31.440 --> 00:18:46.559
+Let's say we want to highlight it
+as a keyword.
+
+00:18:46.559 --> 00:19:06.320
+And now, if we refresh the buffer,
+
+00:19:06.320 --> 00:19:08.523
+we see that single quote strings
+
+00:19:08.523 --> 00:19:14.400
+are highlighted as keywords.
+
+00:19:14.400 --> 00:19:15.751
+The highlighting patterns
+
+00:19:15.751 --> 00:19:19.200
+can also be set for a single project
+
+00:19:19.200 --> 00:19:23.440
+using directory-local variables.
+
+00:19:23.440 --> 00:19:35.760
+For example, let's take a look at
+Emacs's source code.
+
+00:19:35.760 --> 00:19:41.123
+So in Emacs's C source,
+there are a lot of uses
+
+00:19:41.123 --> 00:19:43.760
+of these different macros
+
+00:19:43.760 --> 00:19:47.679
+to define functions,
+
+00:19:47.679 --> 00:19:53.256
+and you can see this is actually
+the function name,
+
+00:19:53.256 --> 00:19:56.373
+but it's highlighted as the string.
+
+00:19:56.373 --> 00:20:03.679
+So what we want is to somehow
+recognize this pattern
+
+00:20:03.679 --> 00:20:07.600
+and highlight it.
+
+00:20:07.600 --> 00:20:11.280
+Highlight this part
+
+00:20:11.280 --> 00:20:14.559
+with the function face instead.
+
+00:20:14.559 --> 00:20:17.679
+In order to do that,
+
+00:20:17.679 --> 00:20:31.760
+we put a pattern in this project's
+directory-local settings file.
+
+00:20:31.760 --> 00:20:40.159
+So we can put this button in
+the C mode section.
+
+00:20:40.159 --> 00:20:48.000
+And now, if we enable tree-sitter,
+
+00:20:48.000 --> 00:20:50.480
+you can see that this is highlighted
+
+00:20:53.200 --> 00:20:55.056
+as a normal function definition.
+
+00:20:55.056 --> 00:21:01.200
+So this is the function face
+like we wanted.
+
+00:21:01.200 --> 00:21:07.200
+The pattern for this is
+actually pretty simple.
+
+00:21:07.200 --> 00:21:12.373
+It's only this part.
+
+00:21:12.373 --> 00:21:16.456
+So if it's a function call
+
+00:21:16.456 --> 00:21:19.679
+where the name of the function is
+defun,
+
+00:21:19.679 --> 00:21:24.240
+then we highlight the defun as a
+keyword,
+
+00:21:24.240 --> 00:21:26.923
+and then the first string element,
+
+00:21:26.923 --> 00:21:35.360
+we highlight it as a function name.
+
+00:21:35.360 --> 00:21:39.280
+Since the language objects are actually
+native code,
+
+00:21:39.280 --> 00:21:41.459
+they have to be compiled
+for each platform
+
+00:21:41.459 --> 00:21:43.440
+that we want to support.
+
+00:21:43.440 --> 00:21:48.159
+This will become a big obstacle for
+tree-sitter adoption.
+
+00:21:48.159 --> 00:21:52.960
+Therefore, I've created a language bundle
+package, tree-sitter-langs,
+
+00:21:52.960 --> 00:21:55.773
+that takes care of pre-compiling the
+grammars,
+
+00:21:55.773 --> 00:22:01.600
+the most common grammars for all three
+major platforms.
+
+00:22:01.600 --> 00:22:05.360
+It also takes care of distributing
+these binaries
+
+00:22:05.360 --> 00:22:08.080
+and provides some highlighting queries
+
+00:22:08.080 --> 00:22:11.440
+for some of the languages.
+
+00:22:11.440 --> 00:22:13.760
+It should be noted that this package
+
+00:22:13.760 --> 00:22:19.919
+should be treated as a temporary
+distribution mechanism only,
+
+00:22:19.919 --> 00:22:24.720
+to help with bootstrapping
+tree-sitter adoption.
+
+00:22:24.720 --> 00:22:27.760
+The plan is that eventually these files
+
+00:22:27.760 --> 00:22:29.156
+should be provided by
+
+00:22:29.156 --> 00:22:32.480
+the language major modes themselves.
+
+00:22:32.480 --> 00:22:36.320
+But in order to do that, we need better
+tooling,
+
+00:22:36.320 --> 00:22:40.240
+so we're not there yet.
+
+00:22:40.240 --> 00:22:43.280
+Since the core already works
+reasonably well,
+
+00:22:43.280 --> 00:22:45.289
+there are several areas
+that would benefit
+
+00:22:45.289 --> 00:22:49.120
+from the community's contribution.
+
+00:22:49.120 --> 00:22:52.640
+So tree-sitter's upstream language
+repositories
+
+00:22:52.640 --> 00:22:55.679
+already contain highlighting queries on
+their own.
+
+00:22:55.679 --> 00:22:57.573
+However, they are pretty basic,
+
+00:22:57.573 --> 00:23:02.559
+and they may not fit well with existing
+Emacs conventions.
+
+00:23:02.559 --> 00:23:07.120
+Therefore, the language bundle has its
+own set of highlighting queries.
+
+00:23:07.120 --> 00:23:12.556
+This requires maintenance until language
+major modes adopt tree-sitter
+
+00:23:12.556 --> 00:23:16.640
+and maintain the queries on their own.
+
+00:23:16.640 --> 00:23:19.056
+The queries are actually
+quite easy to write,
+
+00:23:19.056 --> 00:23:22.000
+as you've already seen.
+
+00:23:22.000 --> 00:23:25.360
+You just need to be familiar
+with the language,
+
+00:23:25.360 --> 00:23:35.200
+familiar enough to come up with sensible
+highlighting patterns.
+
+00:23:35.200 --> 00:23:39.679
+And if you are a maintainer of a
+language major mode,
+
+00:23:39.679 --> 00:23:44.189
+you may want to consider integrating
+tree-sitter into your mode,
+
+00:23:44.189 --> 00:23:48.573
+initially maybe as an optional feature.
+
+00:23:48.573 --> 00:23:53.279
+The integration is actually pretty
+straightforward,
+
+00:23:53.279 --> 00:23:56.640
+especially for syntax highlighting.
+
+00:23:56.640 --> 00:24:01.520
+Or alternatively,
+
+00:24:01.520 --> 00:24:05.760
+you can also try writing a new major
+mode from scratch
+
+00:24:05.760 --> 00:24:08.000
+that relies on tree-sitter
+
+00:24:08.000 --> 00:24:12.559
+from the very beginning.
+
+00:24:12.559 --> 00:24:17.523
+The code for such a major mode is
+quite simple.
+
+00:24:17.523 --> 00:24:23.200
+For example, this is the proposed
+
+00:24:23.200 --> 00:24:26.240
+wat-mode for web assembly.
+
+00:24:26.240 --> 00:24:39.520
+The code is just one page of code,
+not a lot.
+
+00:24:39.520 --> 00:24:42.720
+You can also try writing new minor modes
+
+00:24:42.720 --> 00:24:46.559
+or writing integration packages.
+
+00:24:46.559 --> 00:24:50.880
+For example, a lot of packages
+
+00:24:50.880 --> 00:24:54.559
+may benefit from tree-sitter integration,
+
+00:24:54.559 --> 00:25:02.960
+but no one has written
+the integration yet.
+
+00:25:02.960 --> 00:25:04.836
+If you are interested in tree-sitter,
+
+00:25:04.836 --> 00:25:08.023
+you can use these links to learn more
+about it.
+
+00:25:08.023 --> 00:25:11.440
+I think that's it for me today.
+
+00:25:11.440 --> 00:25:18.159
+I'm happy to answer any questions.