blob: 99133c7893d4abe5c64dba7a6267c018ef3aa2ec (
plain) (
tree)
|
|
WEBVTT
00:00:01.520 --> 00:00:04.400
hello everyone my name is toniang
00:00:04.400 --> 00:00:07.200
I've been using amax for about 10 years
00:00:07.200 --> 00:00:09.280
today I'm going to talk about 360
00:00:09.280 --> 00:00:11.519
a new imax package that allows ems to
00:00:11.519 --> 00:00:13.759
pass multiple programming languages
00:00:13.759 --> 00:00:17.840
in real time
00:00:17.840 --> 00:00:21.840
so what is the problem statement
00:00:21.840 --> 00:00:23.359
in order to support programming
00:00:23.359 --> 00:00:24.960
functionalities for a particular
00:00:24.960 --> 00:00:25.760
language
00:00:25.760 --> 00:00:27.680
a text editor needs to have some degree
00:00:27.680 --> 00:00:29.679
of language understanding
00:00:29.679 --> 00:00:31.840
traditionally text editors have relied
00:00:31.840 --> 00:00:33.840
very heavily on regular expressions for
00:00:33.840 --> 00:00:34.960
this
00:00:34.960 --> 00:00:38.320
e-max is no different most language
00:00:38.320 --> 00:00:39.280
major modes use
00:00:39.280 --> 00:00:40.879
regular expressions for syntax
00:00:40.879 --> 00:00:42.960
highlighting code navigation
00:00:42.960 --> 00:00:46.239
folding indexing and so on regular
00:00:46.239 --> 00:00:47.440
expressions are
00:00:47.440 --> 00:00:50.559
problematic for a couple of reasons
00:00:50.559 --> 00:00:53.600
they're slow and inaccurate they also
00:00:53.600 --> 00:00:54.000
make
00:00:54.000 --> 00:00:56.800
the code hard to read and write
00:00:56.800 --> 00:00:57.440
sometimes
00:00:57.440 --> 00:00:59.199
it's because the regular expressions
00:00:59.199 --> 00:01:01.199
themselves are very hairy
00:01:01.199 --> 00:01:04.000
and sometimes because they are just not
00:01:04.000 --> 00:01:05.199
powerful enough
00:01:05.199 --> 00:01:07.840
some helper code is usually needed to
00:01:07.840 --> 00:01:11.200
pass more intricate language features
00:01:11.200 --> 00:01:13.280
that also illustrates the core problem
00:01:13.280 --> 00:01:16.159
with regular expressions
00:01:16.159 --> 00:01:18.400
in that they are not powerful enough to
00:01:18.400 --> 00:01:21.119
pass programming languages
00:01:21.119 --> 00:01:22.640
an example feature that regular
00:01:22.640 --> 00:01:25.040
expressions cannot handle very well
00:01:25.040 --> 00:01:27.520
is string interpolation which is a very
00:01:27.520 --> 00:01:28.320
common feature
00:01:28.320 --> 00:01:31.680
in many modern programming languages
00:01:31.680 --> 00:01:34.079
it would be much nicer if image somehow
00:01:34.079 --> 00:01:35.840
had structural understanding of source
00:01:35.840 --> 00:01:36.479
code
00:01:36.479 --> 00:01:39.520
like ides do
00:01:39.520 --> 00:01:41.119
there have been multiple efforts to
00:01:41.119 --> 00:01:42.960
bring this kind of programming language
00:01:42.960 --> 00:01:45.280
understanding into Emacs
00:01:45.280 --> 00:01:47.119
there are language specific persons
00:01:47.119 --> 00:01:48.640
written in elise
00:01:48.640 --> 00:01:50.240
they can be thought of as the next
00:01:50.240 --> 00:01:52.320
logical step of the glue code on top
00:01:52.320 --> 00:01:54.960
of tribal expressions moving from
00:01:54.960 --> 00:01:56.000
partial local
00:01:56.000 --> 00:01:58.079
pattern recognition into a full-fledged
00:01:58.079 --> 00:01:59.840
parser
00:01:59.840 --> 00:02:01.439
the most prominent example of this
00:02:01.439 --> 00:02:03.040
approach is probably the famous
00:02:03.040 --> 00:02:06.479
js2 mode
00:02:06.479 --> 00:02:10.080
however this approach has several issues
00:02:10.080 --> 00:02:12.959
parsing is computationally expensive and
00:02:12.959 --> 00:02:13.680
imagine
00:02:13.680 --> 00:02:16.800
is not good at that kind of stuff
00:02:16.800 --> 00:02:18.400
furthermore maintenance is very
00:02:18.400 --> 00:02:20.840
troublesome in order to work on these
00:02:20.840 --> 00:02:22.160
process
00:02:22.160 --> 00:02:23.599
first you have to know at least well
00:02:23.599 --> 00:02:25.599
enough and then you have to be
00:02:25.599 --> 00:02:27.760
comfortable with writing a
00:02:27.760 --> 00:02:30.319
recursive ascendant parser while
00:02:30.319 --> 00:02:32.080
constantly keeping up with changes to
00:02:32.080 --> 00:02:34.000
the language itself
00:02:34.000 --> 00:02:36.879
which can be evolving very quickly like
00:02:36.879 --> 00:02:39.360
javascript for example
00:02:39.360 --> 00:02:41.599
together these constraints significantly
00:02:41.599 --> 00:02:45.680
reduce the pull of potential maintenance
00:02:45.680 --> 00:02:47.760
the biggest issue though in my opinion
00:02:47.760 --> 00:02:49.680
is lack of the set of generic
00:02:49.680 --> 00:02:52.879
and reusable apis this makes them very
00:02:52.879 --> 00:02:54.319
hard to use
00:02:54.319 --> 00:02:55.920
for minor modes that want to deal with
00:02:55.920 --> 00:02:57.920
cross-cutting concerns across multiple
00:02:57.920 --> 00:02:59.920
languages
00:02:59.920 --> 00:03:01.760
the other approach which has been
00:03:01.760 --> 00:03:03.599
gaining a lot of momentum in recent
00:03:03.599 --> 00:03:04.319
years
00:03:04.319 --> 00:03:06.560
is externalizing language understanding
00:03:06.560 --> 00:03:08.159
to another process
00:03:08.159 --> 00:03:12.239
also known as language server protocol
00:03:12.239 --> 00:03:14.480
this second approach is actually a very
00:03:14.480 --> 00:03:16.560
interesting one
00:03:16.560 --> 00:03:18.400
my decoupling language understanding
00:03:18.400 --> 00:03:21.280
from the editing facility itself
00:03:21.280 --> 00:03:23.760
the usb servers can attract a lot more
00:03:23.760 --> 00:03:25.120
contributors
00:03:25.120 --> 00:03:28.959
which makes maintenance easier however
00:03:28.959 --> 00:03:32.400
they also have several issues available
00:03:32.400 --> 00:03:34.720
being a separate process they are
00:03:34.720 --> 00:03:36.000
usually more resource
00:03:36.000 --> 00:03:39.920
intensive and depending on the language
00:03:39.920 --> 00:03:42.159
the usb server itself can bring with it
00:03:42.159 --> 00:03:44.640
a host of additional dependencies
00:03:44.640 --> 00:03:47.680
external to Emacs which may message to
00:03:47.680 --> 00:03:50.640
install and manage
00:03:50.640 --> 00:03:53.760
furthermore json over rpc has pretty
00:03:53.760 --> 00:03:55.120
high latency
00:03:55.120 --> 00:03:57.840
for one-off tasks like jumping to source
00:03:57.840 --> 00:04:00.879
or on-demand completion is great
00:04:00.879 --> 00:04:03.040
but for things like code highlighting
00:04:03.040 --> 00:04:06.000
the latency is just too much
00:04:06.000 --> 00:04:08.319
I was using rust and I was following the
00:04:08.319 --> 00:04:10.480
community effort to improve its id
00:04:10.480 --> 00:04:11.760
support
00:04:11.760 --> 00:04:13.680
hoping to integrate some of that into
00:04:13.680 --> 00:04:15.760
Emacs itself
00:04:15.760 --> 00:04:17.600
then I heard someone from community
00:04:17.600 --> 00:04:19.759
mention tree sitter
00:04:19.759 --> 00:04:23.360
and I decided to check it out
00:04:23.360 --> 00:04:25.520
basically trisita is an incremental
00:04:25.520 --> 00:04:28.720
parsing library and a parser generator
00:04:28.720 --> 00:04:31.000
it was introduced by the item editor in
00:04:31.000 --> 00:04:33.040
2018
00:04:33.040 --> 00:04:35.680
besides item is also being integrated
00:04:35.680 --> 00:04:36.960
into the neo-vim
00:04:36.960 --> 00:04:41.040
editor and github is using it to power
00:04:41.040 --> 00:04:42.479
their source code analysis and
00:04:42.479 --> 00:04:45.840
navigation features
00:04:45.840 --> 00:04:48.639
it is written in c and can be compiled
00:04:48.639 --> 00:04:49.199
for all
00:04:49.199 --> 00:04:53.120
major platforms it can even be compiled
00:04:53.120 --> 00:04:56.080
to web assembly to run on the web that's
00:04:56.080 --> 00:04:57.600
how github is using it
00:04:57.600 --> 00:05:00.800
on their website
00:05:00.800 --> 00:05:02.960
so why is trisita an interesting
00:05:02.960 --> 00:05:05.840
solution to this problem
00:05:05.840 --> 00:05:07.360
there are multiple features that make it
00:05:07.360 --> 00:05:10.000
an attractive option
00:05:10.000 --> 00:05:12.400
it is designed to be fast by being
00:05:12.400 --> 00:05:13.680
incremental
00:05:13.680 --> 00:05:15.680
the initial parts of a typical big fight
00:05:15.680 --> 00:05:18.160
can take tens of milliseconds
00:05:18.160 --> 00:05:20.240
while subsequent incremental processes
00:05:20.240 --> 00:05:22.560
are sub milliseconds
00:05:22.560 --> 00:05:24.720
it achieves this by using structural
00:05:24.720 --> 00:05:26.240
sharing
00:05:26.240 --> 00:05:29.360
meaning replacing only affected nodes
00:05:29.360 --> 00:05:32.960
in the old tree when it needs to
00:05:32.960 --> 00:05:36.000
also unlike lsp being in the same
00:05:36.000 --> 00:05:37.120
process
00:05:37.120 --> 00:05:40.639
it has much lower latency
00:05:40.639 --> 00:05:42.880
secondly it provides a uniform
00:05:42.880 --> 00:05:44.960
programming interface
00:05:44.960 --> 00:05:47.039
the same data structures and functions
00:05:47.039 --> 00:05:48.720
work on parse trees of different
00:05:48.720 --> 00:05:50.400
languages
00:05:50.400 --> 00:05:52.160
syntax knows of different languages
00:05:52.160 --> 00:05:54.160
differ only by their types
00:05:54.160 --> 00:05:57.360
and their possible child nodes this
00:05:57.360 --> 00:05:58.960
is a big advantage over language
00:05:58.960 --> 00:06:02.240
specific parcels
00:06:02.240 --> 00:06:04.880
thirdly it's written in self-contained
00:06:04.880 --> 00:06:06.880
embeddable c
00:06:06.880 --> 00:06:09.680
as I mentioned previously it can even be
00:06:09.680 --> 00:06:10.400
compiled
00:06:10.400 --> 00:06:13.759
to webassembly this makes integrating it
00:06:13.759 --> 00:06:15.199
into various editors
00:06:15.199 --> 00:06:18.240
quite easy without having to install
00:06:18.240 --> 00:06:22.880
any external dependencies
00:06:22.880 --> 00:06:24.639
one thing that is not mentioned here is
00:06:24.639 --> 00:06:28.000
that being a parcel generator
00:06:28.000 --> 00:06:31.039
scrummers are declarative
00:06:31.039 --> 00:06:34.880
together with being editor independent
00:06:34.880 --> 00:06:36.720
this makes the pool of potential
00:06:36.720 --> 00:06:38.160
contributors
00:06:38.160 --> 00:06:42.400
much larger so I was convinced
00:06:42.400 --> 00:06:45.520
that trisito is a good fit for Emacs
00:06:45.520 --> 00:06:48.000
last year I started writing the bindings
00:06:48.000 --> 00:06:48.720
using
00:06:48.720 --> 00:06:50.960
dynamic model support introduced in imax
00:06:50.960 --> 00:06:53.280
25.
00:06:53.280 --> 00:06:55.360
dynamic module means there is platform
00:06:55.360 --> 00:06:58.479
specific native code involved
00:06:58.479 --> 00:07:00.560
but since they are pre-compiled binaries
00:07:00.560 --> 00:07:02.880
for the three major platforms
00:07:02.880 --> 00:07:06.319
it should work in most places currently
00:07:06.319 --> 00:07:08.319
the core functionalities are in a pretty
00:07:08.319 --> 00:07:09.440
good shape
00:07:09.440 --> 00:07:12.560
syntax highlighting is working nicely
00:07:12.560 --> 00:07:14.840
the whole thing is split into three
00:07:14.840 --> 00:07:16.080
packages
00:07:16.080 --> 00:07:17.759
tree sitter is the main package that
00:07:17.759 --> 00:07:20.319
other packages should depend on
00:07:20.319 --> 00:07:22.800
tree system lens is the language bundle
00:07:22.800 --> 00:07:24.000
that includes support
00:07:24.000 --> 00:07:27.199
for most common languages
00:07:27.199 --> 00:07:30.080
and finally the core apis are in the
00:07:30.080 --> 00:07:32.160
package tsc
00:07:32.160 --> 00:07:36.160
which stands for trees the core
00:07:36.160 --> 00:07:38.800
it is the implicit dependency of the
00:07:38.800 --> 00:07:43.520
three-seater package
00:07:43.520 --> 00:07:46.000
the main package includes the miner mode
00:07:46.000 --> 00:07:47.520
3-seater mode
00:07:47.520 --> 00:07:49.840
this provides the base for other major
00:07:49.840 --> 00:07:52.560
or minor modes to build on
00:07:52.560 --> 00:07:55.280
using image change tracking hooks it
00:07:55.280 --> 00:07:55.840
enables
00:07:55.840 --> 00:07:58.080
incremental parsing and provides a
00:07:58.080 --> 00:08:00.800
syntax tree that is always up to date
00:08:00.800 --> 00:08:04.080
after any edits in a buffer
00:08:04.080 --> 00:08:06.560
there is also a basic debug mode that
00:08:06.560 --> 00:08:10.080
shows the parse tree in another buffer
00:08:10.080 --> 00:08:13.360
here is a quick demo
00:08:13.360 --> 00:08:15.759
here I mean an empty python buffer with
00:08:15.759 --> 00:08:17.520
three seater enabled
00:08:17.520 --> 00:08:19.440
I'm going to turn on the debug mode to
00:08:19.440 --> 00:08:26.560
see the parse tree
00:08:26.560 --> 00:08:28.720
since the buffer is empty there is only
00:08:28.720 --> 00:08:30.639
one node in the syntax tree the top
00:08:30.639 --> 00:08:33.279
level module node
00:08:33.279 --> 00:09:11.040
let's try typing some code
00:09:11.040 --> 00:09:13.600
as you can see as I type into the python
00:09:13.600 --> 00:09:14.640
buffer
00:09:14.640 --> 00:09:19.120
the syntax tree updates in real time
00:09:19.120 --> 00:09:21.120
the other minor mode included in the
00:09:21.120 --> 00:09:23.279
main package is 3-seater
00:09:23.279 --> 00:09:26.640
hl mode it overrides font-lock mode and
00:09:26.640 --> 00:09:28.480
provides its own set of phases
00:09:28.480 --> 00:09:31.839
and customization options it is query
00:09:31.839 --> 00:09:32.800
driven
00:09:32.800 --> 00:09:35.200
that means instead of regular
00:09:35.200 --> 00:09:36.240
expressions
00:09:36.240 --> 00:09:38.720
it uses a list like query language to
00:09:38.720 --> 00:09:40.320
map syntax notes
00:09:40.320 --> 00:09:43.760
to highlighting phrases I'm going to
00:09:43.760 --> 00:09:45.760
open a python file with small snippets
00:09:45.760 --> 00:09:54.320
that showcase syntax highlighting
00:09:54.320 --> 00:09:55.920
so this is the default highlighting
00:09:55.920 --> 00:10:00.880
provided by python mode
00:10:00.880 --> 00:10:02.839
this is the highlighting enabled by tree
00:10:02.839 --> 00:10:04.640
sitter
00:10:04.640 --> 00:10:07.680
as you can see string interpolation
00:10:07.680 --> 00:10:11.680
and decorators are highlighted correctly
00:10:11.680 --> 00:10:17.440
function calls are also highlighted
00:10:17.440 --> 00:10:20.240
you can also note that property
00:10:20.240 --> 00:10:21.839
assessors
00:10:21.839 --> 00:10:24.640
and property assignments are highlighted
00:10:24.640 --> 00:10:27.440
differently
00:10:27.440 --> 00:10:29.360
what I like the most about this is that
00:10:29.360 --> 00:10:30.880
new bindings are consistently
00:10:30.880 --> 00:10:32.640
highlighted
00:10:32.640 --> 00:10:36.320
this included local variable
00:10:36.320 --> 00:10:39.760
function parameters and property
00:10:39.760 --> 00:10:45.760
mutations
00:10:45.760 --> 00:10:48.000
before going through the three queries
00:10:48.000 --> 00:10:49.279
and the syntax highlighting
00:10:49.279 --> 00:10:51.680
customization options
00:10:51.680 --> 00:10:53.760
let's take a brief look at the core data
00:10:53.760 --> 00:10:55.040
structures and functions
00:10:55.040 --> 00:10:58.079
that tree sitter provides
00:10:58.079 --> 00:10:59.839
so parsing is done with the help of a
00:10:59.839 --> 00:11:02.240
generic parser object
00:11:02.240 --> 00:11:04.160
a single parser object can be used to
00:11:04.160 --> 00:11:06.000
pass different languages
00:11:06.000 --> 00:11:08.320
by sending different language objects to
00:11:08.320 --> 00:11:09.279
it
00:11:09.279 --> 00:11:10.880
the language objects themselves are
00:11:10.880 --> 00:11:14.079
loaded from shared libraries
00:11:14.079 --> 00:11:16.079
since three seater mode already handles
00:11:16.079 --> 00:11:17.360
the parsing part
00:11:17.360 --> 00:11:19.440
we will instead focus on the functions
00:11:19.440 --> 00:11:20.800
that inspect nodes
00:11:20.800 --> 00:11:25.279
and in the resulting path tree
00:11:25.279 --> 00:11:27.200
we can ask tree sitter what is the
00:11:27.200 --> 00:11:44.240
syntax node at point
00:11:44.240 --> 00:11:47.200
uh is it an opaque object so this is not
00:11:47.200 --> 00:11:48.480
very useful
00:11:48.480 --> 00:12:03.760
we can instead ask what is its type
00:12:03.760 --> 00:12:06.560
so his type is the symbol comparison
00:12:06.560 --> 00:12:08.959
operator
00:12:08.959 --> 00:12:11.600
trees there are two kinds of nodes
00:12:11.600 --> 00:12:13.680
anonymous nodes and named nodes
00:12:13.680 --> 00:12:15.519
anonymous nodes correspond to simple
00:12:15.519 --> 00:12:17.040
grammar elements
00:12:17.040 --> 00:12:19.839
like keywords operators punctuations and
00:12:19.839 --> 00:12:21.279
so on
00:12:21.279 --> 00:12:24.160
name nodes on the other hand grammar
00:12:24.160 --> 00:12:25.920
elements that are interesting enough for
00:12:25.920 --> 00:12:26.639
their own
00:12:26.639 --> 00:12:30.320
to have a name like an identifier an
00:12:30.320 --> 00:12:31.839
expression
00:12:31.839 --> 00:12:35.440
or a function definition
00:12:35.440 --> 00:12:37.760
name node types are symbols while
00:12:37.760 --> 00:12:42.639
anonymous node types are strings
00:12:42.639 --> 00:12:46.320
for example if we are on this
00:12:46.320 --> 00:12:49.760
comparison operator
00:12:49.760 --> 00:12:55.920
the node type should be a string
00:12:55.920 --> 00:12:57.920
we can also get other information about
00:12:57.920 --> 00:12:58.959
the node
00:12:58.959 --> 00:13:09.680
for example what is this text
00:13:09.680 --> 00:13:20.800
or where it is in the buffer
00:13:20.800 --> 00:13:43.199
or what is its parent
00:13:43.199 --> 00:13:46.160
there are many other apis to query or
00:13:46.160 --> 00:13:46.839
not
00:13:46.839 --> 00:13:52.639
properties
00:13:52.639 --> 00:13:54.399
tree sitter allows searching for
00:13:54.399 --> 00:13:58.240
structural patterns within a parse tree
00:13:58.240 --> 00:14:01.440
it does so through a list like language
00:14:01.440 --> 00:14:03.519
this language supports by the matching
00:14:03.519 --> 00:14:04.639
by node types
00:14:04.639 --> 00:14:07.760
field names and predicates
00:14:07.760 --> 00:14:10.079
it also allows capturing nodes for
00:14:10.079 --> 00:14:12.639
further processing
00:14:12.639 --> 00:14:37.680
let's try to see some examples
00:14:37.680 --> 00:14:41.040
so in this very simple query we just
00:14:41.040 --> 00:14:43.839
try to highlight all the identifiers in
00:14:43.839 --> 00:14:49.040
the buffer
00:14:49.040 --> 00:14:51.920
this s side tells trisito to capture a
00:14:51.920 --> 00:14:53.120
node
00:14:53.120 --> 00:14:55.839
in the context of the query builder it's
00:14:55.839 --> 00:14:57.360
not very important
00:14:57.360 --> 00:15:00.320
but in normal highlighting query this
00:15:00.320 --> 00:15:01.760
will determine
00:15:01.760 --> 00:15:06.639
the face used to highlight the note
00:15:06.639 --> 00:15:08.800
suppose we want to capture all the
00:15:08.800 --> 00:15:10.320
function names
00:15:10.320 --> 00:15:13.519
instead of just any identifier
00:15:13.519 --> 00:15:29.440
you can improve the query like this
00:15:29.440 --> 00:15:31.600
uh this will highlight the whole
00:15:31.600 --> 00:15:32.639
definition
00:15:32.639 --> 00:15:35.519
but we only want to capture the function
00:15:35.519 --> 00:15:36.399
name
00:15:36.399 --> 00:15:39.600
which means the identifier
00:15:39.600 --> 00:15:42.800
here so we
00:15:42.800 --> 00:15:46.320
move the capture to after the identifier
00:15:46.320 --> 00:15:49.600
node
00:15:49.600 --> 00:15:51.759
if we want to capture the class names as
00:15:51.759 --> 00:15:52.959
well
00:15:52.959 --> 00:16:10.079
we just add another pattern
00:16:10.079 --> 00:16:20.320
let's look at a more practical example
00:16:20.320 --> 00:16:22.959
here we can see that single quotes
00:16:22.959 --> 00:16:23.759
strings and
00:16:23.759 --> 00:16:25.600
double quotes screens are highlighted
00:16:25.600 --> 00:16:27.279
the same
00:16:27.279 --> 00:16:30.399
but in some places
00:16:30.399 --> 00:16:33.440
because of some coding conventions
00:16:33.440 --> 00:16:35.440
it may be desirable to highlight them
00:16:35.440 --> 00:16:37.279
differently for example if
00:16:37.279 --> 00:16:39.680
the string is single quoted we may want
00:16:39.680 --> 00:16:40.880
to highlight it
00:16:40.880 --> 00:16:44.399
as a constant
00:16:44.399 --> 00:16:46.160
let's try to see whether we can
00:16:46.160 --> 00:16:47.600
distinguish these
00:16:47.600 --> 00:16:56.240
two cases
00:16:56.240 --> 00:17:00.639
so here we get all the strings
00:17:00.639 --> 00:17:04.079
if we want to see if it's single quotes
00:17:04.079 --> 00:17:04.559
or
00:17:04.559 --> 00:17:08.799
double quote strings
00:17:08.799 --> 00:17:11.039
we can try looking at the first
00:17:11.039 --> 00:17:12.480
character
00:17:12.480 --> 00:17:15.280
of the string I mean the first character
00:17:15.280 --> 00:17:16.720
of the note
00:17:16.720 --> 00:17:19.360
to check whether it's a single quote or
00:17:19.360 --> 00:17:33.600
a double quote
00:17:33.600 --> 00:17:36.080
yeah so for that we use the three
00:17:36.080 --> 00:17:36.799
setters
00:17:36.799 --> 00:17:40.160
support for predicate in this case
00:17:40.160 --> 00:17:43.360
we use a match predicate
00:17:43.360 --> 00:17:46.080
to check whether the string where the
00:17:46.080 --> 00:17:46.799
note
00:17:46.799 --> 00:17:50.320
starts with a single quote and with this
00:17:50.320 --> 00:17:51.280
pattern
00:17:51.280 --> 00:17:58.840
we only capture the single quotes
00:17:58.840 --> 00:18:00.400
strings
00:18:00.400 --> 00:18:03.760
let's try to give it a different face
00:18:03.760 --> 00:18:13.039
so we copy the pattern
00:18:13.039 --> 00:18:18.640
and we add this pattern
00:18:18.640 --> 00:18:25.120
pop item only
00:18:25.120 --> 00:18:28.400
but we also want to give the
00:18:28.400 --> 00:18:31.440
capture a different name
00:18:31.440 --> 00:18:40.840
let's say we want to highlight it as a
00:18:40.840 --> 00:18:46.559
keyword
00:18:46.559 --> 00:19:06.320
and now if we refresh the buffer
00:19:06.320 --> 00:19:08.799
we see that single quote strings are
00:19:08.799 --> 00:19:10.320
highlighted as
00:19:10.320 --> 00:19:14.400
keywords
00:19:14.400 --> 00:19:16.400
the highlighting patterns can also be
00:19:16.400 --> 00:19:19.200
set for a single project
00:19:19.200 --> 00:19:23.440
using directory local variable
00:19:23.440 --> 00:19:26.880
for example let's take a look at
00:19:26.880 --> 00:19:35.760
ems source code
00:19:35.760 --> 00:19:40.400
so in image c source there are a lot of
00:19:40.400 --> 00:19:43.760
uses of these different macros
00:19:43.760 --> 00:19:47.679
to define functions
00:19:47.679 --> 00:19:51.200
and you can see
00:19:51.200 --> 00:19:53.520
this is actually the function name but
00:19:53.520 --> 00:19:55.760
it's highlighted as the
00:19:55.760 --> 00:19:59.120
string so what we want
00:19:59.120 --> 00:20:03.679
is to somehow recognize this pattern
00:20:03.679 --> 00:20:07.600
and highlight it
00:20:07.600 --> 00:20:11.280
as highlight this part
00:20:11.280 --> 00:20:14.559
with the function phase instead
00:20:14.559 --> 00:20:17.679
in order to do that
00:20:17.679 --> 00:20:20.240
we put a pattern in this project
00:20:20.240 --> 00:20:21.760
directory local
00:20:21.760 --> 00:20:31.760
settings file
00:20:31.760 --> 00:20:34.799
so we can put this button in the c
00:20:34.799 --> 00:20:40.159
mode section
00:20:40.159 --> 00:20:48.000
and now if we enable tree sitter
00:20:48.000 --> 00:20:50.480
you can see that this is the highlighted
00:20:50.480 --> 00:20:53.200
uh
00:20:53.200 --> 00:20:55.520
as a normal function definition so this
00:20:55.520 --> 00:20:56.559
is the function
00:20:56.559 --> 00:21:01.200
face like we wanted
00:21:01.200 --> 00:21:03.760
the pattern for this is actually pretty
00:21:03.760 --> 00:21:07.200
simple
00:21:07.200 --> 00:21:10.720
it's only
00:21:10.720 --> 00:21:14.720
only this part so
00:21:14.720 --> 00:21:17.440
if it's a function call where the name
00:21:17.440 --> 00:21:19.679
of the function is different
00:21:19.679 --> 00:21:21.600
then we highlight the different as a
00:21:21.600 --> 00:21:24.240
keyword
00:21:24.240 --> 00:21:27.360
and then the first string element we
00:21:27.360 --> 00:21:28.159
highlighted
00:21:28.159 --> 00:21:35.360
as a function name
00:21:35.360 --> 00:21:37.679
since the language objects are actually
00:21:37.679 --> 00:21:39.280
native code
00:21:39.280 --> 00:21:40.799
they have to be compiled for each
00:21:40.799 --> 00:21:43.440
platform that we want to support
00:21:43.440 --> 00:21:45.600
this will become a big obstacle for
00:21:45.600 --> 00:21:48.159
3-seater adoption
00:21:48.159 --> 00:21:50.240
therefore I've created a language window
00:21:50.240 --> 00:21:52.960
package 3-seater length
00:21:52.960 --> 00:21:54.960
that takes care of pre-compiling the
00:21:54.960 --> 00:21:56.320
grammars the
00:21:56.320 --> 00:21:59.679
most common grammars for all three major
00:21:59.679 --> 00:22:01.600
platforms
00:22:01.600 --> 00:22:04.080
it also takes care of distributing these
00:22:04.080 --> 00:22:05.360
binaries
00:22:05.360 --> 00:22:08.080
and provides some highlighting queries
00:22:08.080 --> 00:22:11.440
for some of the languages
00:22:11.440 --> 00:22:13.760
it should be noted that this package
00:22:13.760 --> 00:22:15.919
should be treated as a temporary
00:22:15.919 --> 00:22:19.919
distribution mechanism only
00:22:19.919 --> 00:22:22.240
to help with bootstrapping three-seaters
00:22:22.240 --> 00:22:24.720
adoption
00:22:24.720 --> 00:22:27.760
the plan is that eventually these files
00:22:27.760 --> 00:22:29.760
should be provided by the language major
00:22:29.760 --> 00:22:32.480
modes themselves
00:22:32.480 --> 00:22:35.120
but in order to do that we need better
00:22:35.120 --> 00:22:36.320
tooling
00:22:36.320 --> 00:22:40.240
so we're not there yet
00:22:40.240 --> 00:22:42.559
since the call already works reasonably
00:22:42.559 --> 00:22:43.280
well
00:22:43.280 --> 00:22:44.640
there are several areas that would
00:22:44.640 --> 00:22:46.320
benefit from the community's
00:22:46.320 --> 00:22:49.120
contribution
00:22:49.120 --> 00:22:51.520
so three seaters upstream language
00:22:51.520 --> 00:22:52.640
prepositories
00:22:52.640 --> 00:22:54.400
already contain highlighting queries on
00:22:54.400 --> 00:22:55.679
their own
00:22:55.679 --> 00:22:58.480
however they are pretty basic and they
00:22:58.480 --> 00:23:00.480
may not fit well with existing emax
00:23:00.480 --> 00:23:02.559
conventions
00:23:02.559 --> 00:23:04.320
therefore the language bundle has its
00:23:04.320 --> 00:23:07.120
own set of highlighting queries
00:23:07.120 --> 00:23:10.559
this requires maintenance until language
00:23:10.559 --> 00:23:11.600
measurements adopt
00:23:11.600 --> 00:23:13.760
three sitter and maintain the queries on
00:23:13.760 --> 00:23:16.640
their own
00:23:16.640 --> 00:23:18.480
the queries are actually quite easy to
00:23:18.480 --> 00:23:22.000
write as you've already seen
00:23:22.000 --> 00:23:24.240
you just need to be familiar with the
00:23:24.240 --> 00:23:25.360
language
00:23:25.360 --> 00:23:30.000
familiar enough to come up with sensible
00:23:30.000 --> 00:23:35.200
highlighting patterns
00:23:35.200 --> 00:23:37.600
and if you are a maintainer of a
00:23:37.600 --> 00:23:39.679
language major mode
00:23:39.679 --> 00:23:42.320
you may want to consider integrating
00:23:42.320 --> 00:23:43.360
tree sitter into
00:23:43.360 --> 00:23:46.960
your mode initially maybe as an
00:23:46.960 --> 00:23:50.080
optional feature the integration is
00:23:50.080 --> 00:23:53.279
actually pretty straightforward
00:23:53.279 --> 00:23:56.640
especially for syntax highlighting
00:23:56.640 --> 00:24:01.520
or alternatively
00:24:01.520 --> 00:24:03.760
you can also try writing a new major
00:24:03.760 --> 00:24:04.640
mode
00:24:04.640 --> 00:24:08.000
from scratch that relies on tree sitter
00:24:08.000 --> 00:24:12.559
from the very beginning
00:24:12.559 --> 00:24:16.320
the code for such a major mode is
00:24:16.320 --> 00:24:19.679
quite simple for example
00:24:19.679 --> 00:24:23.200
this is the proposed
00:24:23.200 --> 00:24:26.240
what mode for web assembly
00:24:26.240 --> 00:24:31.039
the code is just
00:24:31.039 --> 00:24:34.559
like one page of code not
00:24:34.559 --> 00:24:39.520
not a lot
00:24:39.520 --> 00:24:42.720
you can also try writing new minor modes
00:24:42.720 --> 00:24:46.559
or writing integration packages
00:24:46.559 --> 00:24:50.080
for example a lot of package a lot of
00:24:50.080 --> 00:24:50.880
packages
00:24:50.880 --> 00:24:54.559
may benefit from tree sitter integration
00:24:54.559 --> 00:24:58.840
but no one has written the integration
00:24:58.840 --> 00:25:02.960
yet
00:25:02.960 --> 00:25:05.039
if you are interested in 3-seater you
00:25:05.039 --> 00:25:06.720
can use these links to
00:25:06.720 --> 00:25:10.320
learn more about it I think that's it
00:25:10.320 --> 00:25:11.440
for me today
00:25:11.440 --> 00:25:18.159
I'm happy to answer any questions
|