summaryrefslogtreecommitdiffstats
path: root/2020/subtitles/emacsconf-2020--23-incremental-parsing-with-emacs-tree-sitter--tuan-anh-nguyen-autogen.vtt
diff options
context:
space:
mode:
Diffstat (limited to '2020/subtitles/emacsconf-2020--23-incremental-parsing-with-emacs-tree-sitter--tuan-anh-nguyen-autogen.vtt')
-rw-r--r--2020/subtitles/emacsconf-2020--23-incremental-parsing-with-emacs-tree-sitter--tuan-anh-nguyen-autogen.vtt1407
1 files changed, 0 insertions, 1407 deletions
diff --git a/2020/subtitles/emacsconf-2020--23-incremental-parsing-with-emacs-tree-sitter--tuan-anh-nguyen-autogen.vtt b/2020/subtitles/emacsconf-2020--23-incremental-parsing-with-emacs-tree-sitter--tuan-anh-nguyen-autogen.vtt
deleted file mode 100644
index 62ad5f65..00000000
--- a/2020/subtitles/emacsconf-2020--23-incremental-parsing-with-emacs-tree-sitter--tuan-anh-nguyen-autogen.vtt
+++ /dev/null
@@ -1,1407 +0,0 @@
-WEBVTT
-
-00:00:01.520 --> 00:00:04.400
-Hello, everyone! My name is Tuấn-Anh.
-
-00:00:04.400 --> 00:00:07.200
-I've been using Emacs for about 10 years.
-
-00:00:07.200 --> 00:00:09.280
-Today, I'm going to talk about tree-sitter,
-
-00:00:09.280 --> 00:00:11.351
-a new Emacs package that allows Emacs
-
-00:00:11.351 --> 00:00:17.840
-to parse multiple programming languages
-in real-time.
-
-00:00:17.840 --> 00:00:21.840
-So what is the problem statement?
-
-00:00:21.840 --> 00:00:24.131
-In order to support programming
-functionalities
-
-00:00:24.131 --> 00:00:25.760
-for a particular language,
-
-00:00:25.760 --> 00:00:27.680
-a text editor needs to have some degree
-
-00:00:27.680 --> 00:00:29.679
-of language understanding.
-
-00:00:29.679 --> 00:00:31.840
-Traditionally, text editors have relied
-
-00:00:31.840 --> 00:00:34.960
-very heavily on regular expressions for
-this.
-
-00:00:34.960 --> 00:00:37.013
-Emacs is no different.
-
-00:00:37.013 --> 00:00:40.170
-Most language major modes use regular
-expressions
-
-00:00:40.170 --> 00:00:42.960
-for syntax-highlighting, code navigation,
-
-00:00:42.960 --> 00:00:46.618
-folding, indexing, and so on.
-
-00:00:46.618 --> 00:00:50.559
-Regular expressions are problematic for
-a couple of reasons.
-
-00:00:50.559 --> 00:00:53.778
-They're slow and inaccurate.
-
-00:00:53.778 --> 00:00:56.800
-They also make the code hard to read and
-write.
-
-00:00:56.800 --> 00:01:01.199
-Sometimes it's because the regular
-expressions themselves are very hairy,
-
-00:01:01.199 --> 00:01:05.199
-and sometimes because they are just not
-powerful enough.
-
-00:01:05.199 --> 00:01:08.625
-Some helper code is usually needed
-
-00:01:08.625 --> 00:01:11.200
-to parse more intricate language
-features.
-
-00:01:11.200 --> 00:01:16.159
-That also illustrates the core problem
-with regular expressions,
-
-00:01:16.159 --> 00:01:21.119
-in that they are not powerful enough to
-parse programming languages.
-
-00:01:21.119 --> 00:01:25.040
-An example feature that regular
-expressions cannot handle very well
-
-00:01:25.040 --> 00:01:28.320
-is string interpolation, which is a very
-common feature
-
-00:01:28.320 --> 00:01:31.680
-in many modern programming languages.
-
-00:01:31.680 --> 00:01:34.079
-It would be much nicer if Emacs somehow
-
-00:01:34.079 --> 00:01:39.520
-had structural understanding of source
-code, like IDEs do.
-
-00:01:39.520 --> 00:01:41.981
-There have been multiple efforts
-
-00:01:41.981 --> 00:01:45.280
-to bring this kind of programming
-language understanding into Emacs.
-
-00:01:45.280 --> 00:01:47.119
-There are language-specific parsers
-
-00:01:47.119 --> 00:01:48.640
-written in Elisp
-
-00:01:48.640 --> 00:01:50.675
-that can be thought of
-
-00:01:50.675 --> 00:01:51.989
-as the next logical step
-of the glue code
-
-00:01:51.989 --> 00:01:53.856
-on top of regular expressions,
-
-00:01:53.856 --> 00:01:57.356
-moving from partial local pattern
-recognition
-
-00:01:57.356 --> 00:01:59.840
-into a full-fledged parser.
-
-00:01:59.840 --> 00:02:02.023
-The most prominent example of this
-approach
-
-00:02:02.023 --> 00:02:06.479
-is probably the famous js2-mode.
-
-00:02:06.479 --> 00:02:10.080
-However, this approach has several issues.
-
-00:02:10.080 --> 00:02:12.606
-Parsing is computationally expensive,
-
-00:02:12.606 --> 00:02:16.800
-and Emacs Lisp is not good at that kind
-of stuff.
-
-00:02:16.800 --> 00:02:19.156
-Furthermore, maintenance is very
-troublesome.
-
-00:02:19.156 --> 00:02:22.160
-In order to work on these parsers,
-
-00:02:22.160 --> 00:02:24.239
-first, you have to know Elisp
-well enough,
-
-00:02:24.239 --> 00:02:26.606
-and then you have to be comfortable with
-
-00:02:26.606 --> 00:02:29.739
-writing a recursive descending parser,
-
-00:02:29.739 --> 00:02:34.000
-while constantly keeping up with changes
-to the language itself,
-
-00:02:34.000 --> 00:02:36.356
-which can be evolving very quickly,
-
-00:02:36.356 --> 00:02:39.360
-like Javascript, for example.
-
-00:02:39.360 --> 00:02:42.373
-Together, these constraints
-significantly reduce
-
-00:02:42.373 --> 00:02:45.680
-the pool of potential maintainers.
-
-00:02:45.680 --> 00:02:47.760
-The biggest issue, though, in my opinion,
-
-00:02:47.760 --> 00:02:52.139
-is lack of the set of generic and
-reusable APIs.
-
-00:02:52.139 --> 00:02:54.319
-This makes them very hard to use
-
-00:02:54.319 --> 00:02:55.920
-for minor modes that want to deal with
-
-00:02:55.920 --> 00:02:59.920
-cross-cutting concerns across multiple
-languages.
-
-00:02:59.920 --> 00:03:01.760
-The other approach which has been
-
-00:03:01.760 --> 00:03:04.319
-gaining a lot of momentum
-in recent years
-
-00:03:04.319 --> 00:03:06.560
-is externalizing language understanding
-
-00:03:06.560 --> 00:03:08.159
-to another process,
-
-00:03:08.159 --> 00:03:12.239
-also known as language server protocol.
-
-00:03:12.239 --> 00:03:16.560
-This second approach is actually a very
-interesting one.
-
-00:03:16.560 --> 00:03:18.400
-By decoupling language understanding
-
-00:03:18.400 --> 00:03:21.280
-from the editing facility itself,
-
-00:03:21.280 --> 00:03:25.120
-the LSP servers can attract a lot more
-contributors,
-
-00:03:25.120 --> 00:03:27.189
-which makes maintenance easier.
-
-00:03:27.189 --> 00:03:32.400
-However, they also have several issues
-of their own.
-
-00:03:32.400 --> 00:03:34.089
-Being a separate process,
-
-00:03:34.089 --> 00:03:37.073
-they are usually more
-resource-intensive,
-
-00:03:37.073 --> 00:03:39.920
-and depending on the language,
-
-00:03:39.920 --> 00:03:42.159
-the LSP server itself can bring with it
-
-00:03:42.159 --> 00:03:44.640
-a host of additional dependencies
-
-00:03:44.640 --> 00:03:50.640
-external to Emacs, which may be messy to
-install and manage.
-
-00:03:50.640 --> 00:03:55.120
-Furthermore, JSON over RPC has pretty
-high latency.
-
-00:03:55.120 --> 00:03:57.840
-For one-off tasks like jumping to source
-
-00:03:57.840 --> 00:04:00.879
-or on-demand completion, it's great.
-
-00:04:00.879 --> 00:04:03.040
-But for things like code highlighting,
-
-00:04:03.040 --> 00:04:06.000
-the latency is just too much.
-
-00:04:06.000 --> 00:04:08.319
-I was using Rust and I was following the
-
-00:04:08.319 --> 00:04:11.760
-community effort to improve its
-IDE support,
-
-00:04:11.760 --> 00:04:15.760
-hoping to integrate some of that into
-Emacs itself.
-
-00:04:15.760 --> 00:04:19.759
-Then I heard someone from the community
-mention tree-sitter,
-
-00:04:19.759 --> 00:04:23.360
-and I decided to check it out.
-
-00:04:23.360 --> 00:04:28.720
-Basically, tree-sitter is an incremental
-parsing library and a parser generator.
-
-00:04:28.720 --> 00:04:33.040
-It was introduced by the Atom editor in
-2018.
-
-00:04:33.040 --> 00:04:35.923
-Besides Atom, it is also being
-integrated
-
-00:04:35.923 --> 00:04:37.623
-into the NeoVim editor,
-
-00:04:37.623 --> 00:04:41.040
-and Github is using it to power
-
-00:04:41.040 --> 00:04:42.423
-their source code analysis
-
-00:04:42.423 --> 00:04:45.840
-and navigation features.
-
-00:04:45.840 --> 00:04:48.639
-It is written in C and can be compiled
-
-00:04:48.639 --> 00:04:50.623
-for all major platforms.
-
-00:04:50.623 --> 00:04:53.120
-It can even be compiled
-
-00:04:53.120 --> 00:04:55.323
-to web assembly to run on the web.
-
-00:04:55.323 --> 00:05:00.800
-That's how Github is using it
-on their website.
-
-00:05:00.800 --> 00:05:05.840
-So why is tree-sitter an interesting
-solution to this problem?
-
-00:05:05.840 --> 00:05:10.000
-There are multiple features that make it
-an attractive option.
-
-00:05:10.000 --> 00:05:11.839
-It is designed to be fast.
-
-00:05:11.839 --> 00:05:13.680
-By being incremental,
-
-00:05:13.680 --> 00:05:15.680
-the initial parse of a typical big file
-
-00:05:15.680 --> 00:05:18.160
-can take tens of milliseconds,
-
-00:05:18.160 --> 00:05:20.240
-while subsequent incremental processes
-
-00:05:20.240 --> 00:05:22.560
-are sub-millisecond.
-
-00:05:22.560 --> 00:05:26.240
-It achieves this by using
-structural sharing,
-
-00:05:26.240 --> 00:05:29.360
-meaning replacing only affected nodes
-
-00:05:29.360 --> 00:05:32.960
-in the old tree when it needs to.
-
-00:05:32.960 --> 00:05:37.120
-Also, unlike LSP, being in
-the same process,
-
-00:05:37.120 --> 00:05:40.639
-it has much lower latency.
-
-00:05:40.639 --> 00:05:44.960
-Secondly, it provides a uniform
-programming interface.
-
-00:05:44.960 --> 00:05:47.039
-The same data structures and functions
-
-00:05:47.039 --> 00:05:50.400
-work on parse trees of different
-languages.
-
-00:05:50.400 --> 00:05:52.160
-Syntax nodes of different languages
-
-00:05:52.160 --> 00:05:54.160
-differ only by their types
-
-00:05:54.160 --> 00:05:55.723
-and their possible child nodes.
-
-00:05:55.723 --> 00:06:02.240
-This is a big advantage over
-language-specific parsers.
-
-00:06:02.240 --> 00:06:06.880
-Thirdly, it's written in self-contained
-embeddable C.
-
-00:06:06.880 --> 00:06:11.723
-As I mentioned previously, it can even
-be compiled to webassembly.
-
-00:06:11.723 --> 00:06:16.106
-This makes integrating it into various
-editors quite easy
-
-00:06:16.106 --> 00:06:22.880
-without having to install any external
-dependencies.
-
-00:06:22.880 --> 00:06:25.503
-One thing that is not mentioned here
-
-00:06:25.503 --> 00:06:28.000
-is that being a parser generator,
-
-00:06:28.000 --> 00:06:31.039
-its grammars are declarative.
-
-00:06:31.039 --> 00:06:34.880
-Together with being editor-independent,
-
-00:06:34.880 --> 00:06:39.139
-this makes the pool of potential
-contributors much larger.
-
-00:06:39.139 --> 00:06:45.520
-So I was convinced that tree-sitter is a
-good fit for Emacs.
-
-00:06:45.520 --> 00:06:48.000
-Last year, I started writing the bindings
-
-00:06:48.000 --> 00:06:53.280
-using dynamic module support introduced
-in Emacs 25.
-
-00:06:53.280 --> 00:06:58.479
-Dynamic module means there is
-platform-specific native code involved,
-
-00:06:58.479 --> 00:07:00.560
-but since there are pre-compiled binaries
-
-00:07:00.560 --> 00:07:02.880
-for the three major platforms,
-
-00:07:02.880 --> 00:07:04.706
-it should work in most places.
-
-00:07:04.706 --> 00:07:09.440
-Currently, the core functionalities are
-in a pretty good shape.
-
-00:07:09.440 --> 00:07:12.560
-Syntax highlighting is working nicely.
-
-00:07:12.560 --> 00:07:16.080
-The whole thing is split into three
-packages.
-
-00:07:16.080 --> 00:07:20.319
-tree-sitter is the main package that
-other packages should depend on.
-
-00:07:20.319 --> 00:07:22.800
-tree-sitter-langs is the language bundle
-
-00:07:22.800 --> 00:07:24.000
-that includes support
-
-00:07:24.000 --> 00:07:27.199
-for most common languages.
-
-00:07:27.199 --> 00:07:32.160
-And finally, the core APIs are in the
-package tsc,
-
-00:07:32.160 --> 00:07:36.160
-which stands for tree-sitter-core.
-
-00:07:36.160 --> 00:07:38.800
-It is the implicit dependency of the
-
-00:07:38.800 --> 00:07:43.520
-tree-sitter package.
-
-00:07:43.520 --> 00:07:47.520
-The main package includes the minor mode
-tree-sitter-mode.
-
-00:07:47.520 --> 00:07:52.560
-This provides the base for other major
-or minor modes to build on.
-
-00:07:52.560 --> 00:07:54.839
-Using Emacs's change tracking hooks,
-
-00:07:54.839 --> 00:07:57.073
-it enables incremental parsing
-
-00:07:57.073 --> 00:08:00.800
-and provides a syntax tree that is
-always up to date
-
-00:08:00.800 --> 00:08:04.080
-after any edits in a buffer.
-
-00:08:04.080 --> 00:08:06.223
-There is also a basic debug mode
-
-00:08:06.223 --> 00:08:10.080
-that shows the parse tree in
-another buffer.
-
-00:08:10.080 --> 00:08:13.360
-Here is a quick demo.
-
-00:08:13.360 --> 00:08:15.673
-Here I'm in an empty Python buffer
-
-00:08:15.673 --> 00:08:17.520
-with tree-sitter enabled.
-
-00:08:17.520 --> 00:08:19.440
-I'm going to turn on the debug mode to
-
-00:08:19.440 --> 00:08:26.560
-see the parse tree.
-
-00:08:26.560 --> 00:08:28.106
-Since the buffer is empty,
-
-00:08:28.106 --> 00:08:30.423
-there is only one node in the
-syntax tree:
-
-00:08:30.423 --> 00:08:33.279
-the top-level module node.
-
-00:08:33.279 --> 00:09:11.040
-Let's try typing some code.
-
-00:09:11.040 --> 00:09:14.640
-As you can see, as I type into the
-Python buffer,
-
-00:09:14.640 --> 00:09:19.120
-the syntax tree updates in real time.
-
-00:09:19.120 --> 00:09:22.039
-The other minor mode included in the
-main package
-
-00:09:22.039 --> 00:09:24.389
-is tree-sitter-hl-mode.
-
-00:09:24.389 --> 00:09:26.349
-It overrides font-lock mode
-
-00:09:26.349 --> 00:09:28.480
-and provides its own set of phases
-
-00:09:28.480 --> 00:09:30.139
-and customization options
-
-00:09:30.139 --> 00:09:32.800
-It is query-driven.
-
-00:09:32.800 --> 00:09:36.240
-That means instead of regular
-expressions,
-
-00:09:36.240 --> 00:09:39.518
-it uses a Lisp-like query language
-
-00:09:39.518 --> 00:09:40.320
-to map syntax nodes
-
-00:09:40.320 --> 00:09:41.923
-to highlighting phrases.
-
-00:09:41.923 --> 00:09:45.760
-I'm going to open a python file with
-small snippets
-
-00:09:45.760 --> 00:09:54.320
-that showcase syntax highlighting.
-
-00:09:54.320 --> 00:09:55.920
-So this is the default highlighting
-
-00:09:55.920 --> 00:10:00.880
-provided by python-mode.
-
-00:10:00.880 --> 00:10:04.640
-This is the highlighting enabled
-by tree-sitter.
-
-00:10:04.640 --> 00:10:07.680
-as you can see string interpolation
-
-00:10:07.680 --> 00:10:11.680
-and decorators are highlighted correctly
-
-00:10:11.680 --> 00:10:17.440
-function calls are also highlighted
-
-00:10:17.440 --> 00:10:20.240
-you can also note that property
-
-00:10:20.240 --> 00:10:21.839
-assessors
-
-00:10:21.839 --> 00:10:24.640
-and property assignments are highlighted
-
-00:10:24.640 --> 00:10:27.440
-differently
-
-00:10:27.440 --> 00:10:29.360
-what I like the most about this is that
-
-00:10:29.360 --> 00:10:30.880
-new bindings are consistently
-
-00:10:30.880 --> 00:10:32.640
-highlighted
-
-00:10:32.640 --> 00:10:36.320
-this included local variable
-
-00:10:36.320 --> 00:10:39.760
-function parameters and property
-
-00:10:39.760 --> 00:10:45.760
-mutations
-
-00:10:45.760 --> 00:10:48.000
-before going through the three queries
-
-00:10:48.000 --> 00:10:49.279
-and the syntax highlighting
-
-00:10:49.279 --> 00:10:51.680
-customization options
-
-00:10:51.680 --> 00:10:53.760
-let's take a brief look at the core data
-
-00:10:53.760 --> 00:10:55.040
-structures and functions
-
-00:10:55.040 --> 00:10:58.079
-that tree sitter provides
-
-00:10:58.079 --> 00:10:59.839
-so parsing is done with the help of a
-
-00:10:59.839 --> 00:11:02.240
-generic parser object
-
-00:11:02.240 --> 00:11:04.160
-a single parser object can be used to
-
-00:11:04.160 --> 00:11:06.000
-pass different languages
-
-00:11:06.000 --> 00:11:08.320
-by sending different language objects to
-
-00:11:08.320 --> 00:11:09.279
-it
-
-00:11:09.279 --> 00:11:10.880
-the language objects themselves are
-
-00:11:10.880 --> 00:11:14.079
-loaded from shared libraries
-
-00:11:14.079 --> 00:11:16.079
-since three seater mode already handles
-
-00:11:16.079 --> 00:11:17.360
-the parsing part
-
-00:11:17.360 --> 00:11:19.440
-we will instead focus on the functions
-
-00:11:19.440 --> 00:11:20.800
-that inspect nodes
-
-00:11:20.800 --> 00:11:25.279
-and in the resulting path tree
-
-00:11:25.279 --> 00:11:27.200
-we can ask tree sitter what is the
-
-00:11:27.200 --> 00:11:44.240
-syntax node at point
-
-00:11:44.240 --> 00:11:47.200
-uh is it an opaque object so this is not
-
-00:11:47.200 --> 00:11:48.480
-very useful
-
-00:11:48.480 --> 00:12:03.760
-we can instead ask what is its type
-
-00:12:03.760 --> 00:12:06.560
-so his type is the symbol comparison
-
-00:12:06.560 --> 00:12:08.959
-operator
-
-00:12:08.959 --> 00:12:11.600
-trees there are two kinds of nodes
-
-00:12:11.600 --> 00:12:13.680
-anonymous nodes and named nodes
-
-00:12:13.680 --> 00:12:15.519
-anonymous nodes correspond to simple
-
-00:12:15.519 --> 00:12:17.040
-grammar elements
-
-00:12:17.040 --> 00:12:19.839
-like keywords operators punctuations and
-
-00:12:19.839 --> 00:12:21.279
-so on
-
-00:12:21.279 --> 00:12:24.160
-name nodes on the other hand grammar
-
-00:12:24.160 --> 00:12:25.920
-elements that are interesting enough for
-
-00:12:25.920 --> 00:12:26.639
-their own
-
-00:12:26.639 --> 00:12:30.320
-to have a name like an identifier an
-
-00:12:30.320 --> 00:12:31.839
-expression
-
-00:12:31.839 --> 00:12:35.440
-or a function definition
-
-00:12:35.440 --> 00:12:37.760
-name node types are symbols while
-
-00:12:37.760 --> 00:12:42.639
-anonymous node types are strings
-
-00:12:42.639 --> 00:12:46.320
-for example if we are on this
-
-00:12:46.320 --> 00:12:49.760
-comparison operator
-
-00:12:49.760 --> 00:12:55.920
-the node type should be a string
-
-00:12:55.920 --> 00:12:57.920
-we can also get other information about
-
-00:12:57.920 --> 00:12:58.959
-the node
-
-00:12:58.959 --> 00:13:09.680
-for example what is this text
-
-00:13:09.680 --> 00:13:20.800
-or where it is in the buffer
-
-00:13:20.800 --> 00:13:43.199
-or what is its parent
-
-00:13:43.199 --> 00:13:46.160
-there are many other apis to query or
-
-00:13:46.160 --> 00:13:46.839
-not
-
-00:13:46.839 --> 00:13:52.639
-properties
-
-00:13:52.639 --> 00:13:54.399
-tree sitter allows searching for
-
-00:13:54.399 --> 00:13:58.240
-structural patterns within a parse tree
-
-00:13:58.240 --> 00:14:01.440
-it does so through a list like language
-
-00:14:01.440 --> 00:14:03.519
-this language supports by the matching
-
-00:14:03.519 --> 00:14:04.639
-by node types
-
-00:14:04.639 --> 00:14:07.760
-field names and predicates
-
-00:14:07.760 --> 00:14:10.079
-it also allows capturing nodes for
-
-00:14:10.079 --> 00:14:12.639
-further processing
-
-00:14:12.639 --> 00:14:37.680
-let's try to see some examples
-
-00:14:37.680 --> 00:14:41.040
-so in this very simple query we just
-
-00:14:41.040 --> 00:14:43.839
-try to highlight all the identifiers in
-
-00:14:43.839 --> 00:14:49.040
-the buffer
-
-00:14:49.040 --> 00:14:51.920
-this s side tells trisito to capture a
-
-00:14:51.920 --> 00:14:53.120
-node
-
-00:14:53.120 --> 00:14:55.839
-in the context of the query builder it's
-
-00:14:55.839 --> 00:14:57.360
-not very important
-
-00:14:57.360 --> 00:15:00.320
-but in normal highlighting query this
-
-00:15:00.320 --> 00:15:01.760
-will determine
-
-00:15:01.760 --> 00:15:06.639
-the face used to highlight the note
-
-00:15:06.639 --> 00:15:08.800
-suppose we want to capture all the
-
-00:15:08.800 --> 00:15:10.320
-function names
-
-00:15:10.320 --> 00:15:13.519
-instead of just any identifier
-
-00:15:13.519 --> 00:15:29.440
-you can improve the query like this
-
-00:15:29.440 --> 00:15:31.600
-uh this will highlight the whole
-
-00:15:31.600 --> 00:15:32.639
-definition
-
-00:15:32.639 --> 00:15:35.519
-but we only want to capture the function
-
-00:15:35.519 --> 00:15:36.399
-name
-
-00:15:36.399 --> 00:15:39.600
-which means the identifier
-
-00:15:39.600 --> 00:15:42.800
-here so we
-
-00:15:42.800 --> 00:15:46.320
-move the capture to after the identifier
-
-00:15:46.320 --> 00:15:49.600
-node
-
-00:15:49.600 --> 00:15:51.759
-if we want to capture the class names as
-
-00:15:51.759 --> 00:15:52.959
-well
-
-00:15:52.959 --> 00:16:10.079
-we just add another pattern
-
-00:16:10.079 --> 00:16:20.320
-let's look at a more practical example
-
-00:16:20.320 --> 00:16:22.959
-here we can see that single quotes
-
-00:16:22.959 --> 00:16:23.759
-strings and
-
-00:16:23.759 --> 00:16:25.600
-double quotes screens are highlighted
-
-00:16:25.600 --> 00:16:27.279
-the same
-
-00:16:27.279 --> 00:16:30.399
-but in some places
-
-00:16:30.399 --> 00:16:33.440
-because of some coding conventions
-
-00:16:33.440 --> 00:16:35.440
-it may be desirable to highlight them
-
-00:16:35.440 --> 00:16:37.279
-differently for example if
-
-00:16:37.279 --> 00:16:39.680
-the string is single quoted we may want
-
-00:16:39.680 --> 00:16:40.880
-to highlight it
-
-00:16:40.880 --> 00:16:44.399
-as a constant
-
-00:16:44.399 --> 00:16:46.160
-let's try to see whether we can
-
-00:16:46.160 --> 00:16:47.600
-distinguish these
-
-00:16:47.600 --> 00:16:56.240
-two cases
-
-00:16:56.240 --> 00:17:00.639
-so here we get all the strings
-
-00:17:00.639 --> 00:17:04.079
-if we want to see if it's single quotes
-
-00:17:04.079 --> 00:17:04.559
-or
-
-00:17:04.559 --> 00:17:08.799
-double quote strings
-
-00:17:08.799 --> 00:17:11.039
-we can try looking at the first
-
-00:17:11.039 --> 00:17:12.480
-character
-
-00:17:12.480 --> 00:17:15.280
-of the string I mean the first character
-
-00:17:15.280 --> 00:17:16.720
-of the note
-
-00:17:16.720 --> 00:17:19.360
-to check whether it's a single quote or
-
-00:17:19.360 --> 00:17:33.600
-a double quote
-
-00:17:33.600 --> 00:17:36.080
-yeah so for that we use the three
-
-00:17:36.080 --> 00:17:36.799
-setters
-
-00:17:36.799 --> 00:17:40.160
-support for predicate in this case
-
-00:17:40.160 --> 00:17:43.360
-we use a match predicate
-
-00:17:43.360 --> 00:17:46.080
-to check whether the string where the
-
-00:17:46.080 --> 00:17:46.799
-note
-
-00:17:46.799 --> 00:17:50.320
-starts with a single quote and with this
-
-00:17:50.320 --> 00:17:51.280
-pattern
-
-00:17:51.280 --> 00:17:58.840
-we only capture the single quotes
-
-00:17:58.840 --> 00:18:00.400
-strings
-
-00:18:00.400 --> 00:18:03.760
-let's try to give it a different face
-
-00:18:03.760 --> 00:18:13.039
-so we copy the pattern
-
-00:18:13.039 --> 00:18:18.640
-and we add this pattern
-
-00:18:18.640 --> 00:18:25.120
-pop item only
-
-00:18:25.120 --> 00:18:28.400
-but we also want to give the
-
-00:18:28.400 --> 00:18:31.440
-capture a different name
-
-00:18:31.440 --> 00:18:40.840
-let's say we want to highlight it as a
-
-00:18:40.840 --> 00:18:46.559
-keyword
-
-00:18:46.559 --> 00:19:06.320
-and now if we refresh the buffer
-
-00:19:06.320 --> 00:19:08.799
-we see that single quote strings are
-
-00:19:08.799 --> 00:19:10.320
-highlighted as
-
-00:19:10.320 --> 00:19:14.400
-keywords
-
-00:19:14.400 --> 00:19:16.400
-the highlighting patterns can also be
-
-00:19:16.400 --> 00:19:19.200
-set for a single project
-
-00:19:19.200 --> 00:19:23.440
-using directory local variable
-
-00:19:23.440 --> 00:19:26.880
-for example let's take a look at
-
-00:19:26.880 --> 00:19:35.760
-ems source code
-
-00:19:35.760 --> 00:19:40.400
-so in image c source there are a lot of
-
-00:19:40.400 --> 00:19:43.760
-uses of these different macros
-
-00:19:43.760 --> 00:19:47.679
-to define functions
-
-00:19:47.679 --> 00:19:51.200
-and you can see
-
-00:19:51.200 --> 00:19:53.520
-this is actually the function name but
-
-00:19:53.520 --> 00:19:55.760
-it's highlighted as the
-
-00:19:55.760 --> 00:19:59.120
-string so what we want
-
-00:19:59.120 --> 00:20:03.679
-is to somehow recognize this pattern
-
-00:20:03.679 --> 00:20:07.600
-and highlight it
-
-00:20:07.600 --> 00:20:11.280
-as highlight this part
-
-00:20:11.280 --> 00:20:14.559
-with the function phase instead
-
-00:20:14.559 --> 00:20:17.679
-in order to do that
-
-00:20:17.679 --> 00:20:20.240
-we put a pattern in this project
-
-00:20:20.240 --> 00:20:21.760
-directory local
-
-00:20:21.760 --> 00:20:31.760
-settings file
-
-00:20:31.760 --> 00:20:34.799
-so we can put this button in the c
-
-00:20:34.799 --> 00:20:40.159
-mode section
-
-00:20:40.159 --> 00:20:48.000
-and now if we enable tree sitter
-
-00:20:48.000 --> 00:20:50.480
-you can see that this is the highlighted
-
-00:20:50.480 --> 00:20:53.200
-uh
-
-00:20:53.200 --> 00:20:55.520
-as a normal function definition so this
-
-00:20:55.520 --> 00:20:56.559
-is the function
-
-00:20:56.559 --> 00:21:01.200
-face like we wanted
-
-00:21:01.200 --> 00:21:03.760
-the pattern for this is actually pretty
-
-00:21:03.760 --> 00:21:07.200
-simple
-
-00:21:07.200 --> 00:21:10.720
-it's only
-
-00:21:10.720 --> 00:21:14.720
-only this part so
-
-00:21:14.720 --> 00:21:17.440
-if it's a function call where the name
-
-00:21:17.440 --> 00:21:19.679
-of the function is different
-
-00:21:19.679 --> 00:21:21.600
-then we highlight the different as a
-
-00:21:21.600 --> 00:21:24.240
-keyword
-
-00:21:24.240 --> 00:21:27.360
-and then the first string element we
-
-00:21:27.360 --> 00:21:28.159
-highlighted
-
-00:21:28.159 --> 00:21:35.360
-as a function name
-
-00:21:35.360 --> 00:21:37.679
-since the language objects are actually
-
-00:21:37.679 --> 00:21:39.280
-native code
-
-00:21:39.280 --> 00:21:40.799
-they have to be compiled for each
-
-00:21:40.799 --> 00:21:43.440
-platform that we want to support
-
-00:21:43.440 --> 00:21:45.600
-this will become a big obstacle for
-
-00:21:45.600 --> 00:21:48.159
-3-seater adoption
-
-00:21:48.159 --> 00:21:50.240
-therefore I've created a language window
-
-00:21:50.240 --> 00:21:52.960
-package 3-seater length
-
-00:21:52.960 --> 00:21:54.960
-that takes care of pre-compiling the
-
-00:21:54.960 --> 00:21:56.320
-grammars the
-
-00:21:56.320 --> 00:21:59.679
-most common grammars for all three major
-
-00:21:59.679 --> 00:22:01.600
-platforms
-
-00:22:01.600 --> 00:22:04.080
-it also takes care of distributing these
-
-00:22:04.080 --> 00:22:05.360
-binaries
-
-00:22:05.360 --> 00:22:08.080
-and provides some highlighting queries
-
-00:22:08.080 --> 00:22:11.440
-for some of the languages
-
-00:22:11.440 --> 00:22:13.760
-it should be noted that this package
-
-00:22:13.760 --> 00:22:15.919
-should be treated as a temporary
-
-00:22:15.919 --> 00:22:19.919
-distribution mechanism only
-
-00:22:19.919 --> 00:22:22.240
-to help with bootstrapping three-seaters
-
-00:22:22.240 --> 00:22:24.720
-adoption
-
-00:22:24.720 --> 00:22:27.760
-the plan is that eventually these files
-
-00:22:27.760 --> 00:22:29.760
-should be provided by the language major
-
-00:22:29.760 --> 00:22:32.480
-modes themselves
-
-00:22:32.480 --> 00:22:35.120
-but in order to do that we need better
-
-00:22:35.120 --> 00:22:36.320
-tooling
-
-00:22:36.320 --> 00:22:40.240
-so we're not there yet
-
-00:22:40.240 --> 00:22:42.559
-since the call already works reasonably
-
-00:22:42.559 --> 00:22:43.280
-well
-
-00:22:43.280 --> 00:22:44.640
-there are several areas that would
-
-00:22:44.640 --> 00:22:46.320
-benefit from the community's
-
-00:22:46.320 --> 00:22:49.120
-contribution
-
-00:22:49.120 --> 00:22:51.520
-so three seaters upstream language
-
-00:22:51.520 --> 00:22:52.640
-prepositories
-
-00:22:52.640 --> 00:22:54.400
-already contain highlighting queries on
-
-00:22:54.400 --> 00:22:55.679
-their own
-
-00:22:55.679 --> 00:22:58.480
-however they are pretty basic and they
-
-00:22:58.480 --> 00:23:00.480
-may not fit well with existing emax
-
-00:23:00.480 --> 00:23:02.559
-conventions
-
-00:23:02.559 --> 00:23:04.320
-therefore the language bundle has its
-
-00:23:04.320 --> 00:23:07.120
-own set of highlighting queries
-
-00:23:07.120 --> 00:23:10.559
-this requires maintenance until language
-
-00:23:10.559 --> 00:23:11.600
-measurements adopt
-
-00:23:11.600 --> 00:23:13.760
-three sitter and maintain the queries on
-
-00:23:13.760 --> 00:23:16.640
-their own
-
-00:23:16.640 --> 00:23:18.480
-the queries are actually quite easy to
-
-00:23:18.480 --> 00:23:22.000
-write as you've already seen
-
-00:23:22.000 --> 00:23:24.240
-you just need to be familiar with the
-
-00:23:24.240 --> 00:23:25.360
-language
-
-00:23:25.360 --> 00:23:30.000
-familiar enough to come up with sensible
-
-00:23:30.000 --> 00:23:35.200
-highlighting patterns
-
-00:23:35.200 --> 00:23:37.600
-and if you are a maintainer of a
-
-00:23:37.600 --> 00:23:39.679
-language major mode
-
-00:23:39.679 --> 00:23:42.320
-you may want to consider integrating
-
-00:23:42.320 --> 00:23:43.360
-tree sitter into
-
-00:23:43.360 --> 00:23:46.960
-your mode initially maybe as an
-
-00:23:46.960 --> 00:23:50.080
-optional feature the integration is
-
-00:23:50.080 --> 00:23:53.279
-actually pretty straightforward
-
-00:23:53.279 --> 00:23:56.640
-especially for syntax highlighting
-
-00:23:56.640 --> 00:24:01.520
-or alternatively
-
-00:24:01.520 --> 00:24:03.760
-you can also try writing a new major
-
-00:24:03.760 --> 00:24:04.640
-mode
-
-00:24:04.640 --> 00:24:08.000
-from scratch that relies on tree sitter
-
-00:24:08.000 --> 00:24:12.559
-from the very beginning
-
-00:24:12.559 --> 00:24:16.320
-the code for such a major mode is
-
-00:24:16.320 --> 00:24:19.679
-quite simple for example
-
-00:24:19.679 --> 00:24:23.200
-this is the proposed
-
-00:24:23.200 --> 00:24:26.240
-what mode for web assembly
-
-00:24:26.240 --> 00:24:31.039
-the code is just
-
-00:24:31.039 --> 00:24:34.559
-like one page of code not
-
-00:24:34.559 --> 00:24:39.520
-not a lot
-
-00:24:39.520 --> 00:24:42.720
-you can also try writing new minor modes
-
-00:24:42.720 --> 00:24:46.559
-or writing integration packages
-
-00:24:46.559 --> 00:24:50.080
-for example a lot of package a lot of
-
-00:24:50.080 --> 00:24:50.880
-packages
-
-00:24:50.880 --> 00:24:54.559
-may benefit from tree sitter integration
-
-00:24:54.559 --> 00:24:58.840
-but no one has written the integration
-
-00:24:58.840 --> 00:25:02.960
-yet
-
-00:25:02.960 --> 00:25:05.039
-if you are interested in 3-seater you
-
-00:25:05.039 --> 00:25:06.720
-can use these links to
-
-00:25:06.720 --> 00:25:10.320
-learn more about it I think that's it
-
-00:25:10.320 --> 00:25:11.440
-for me today
-
-00:25:11.440 --> 00:25:18.159
-I'm happy to answer any questions