WEBVTT
00:00.080 --> 00:01.360
Hi. My name is Ethan,
00:01.360 --> 00:02.320
and today I'm going to be speaking
00:02.320 --> 00:04.240
about tree-edit, which is a package
00:04.240 --> 00:06.160
which aims to bring structural editing
00:06.160 --> 00:08.320
to everyday languages.
00:08.320 --> 00:10.559
So what is structural editing?
00:10.559 --> 00:11.657
The way that we typically
00:11.657 --> 00:12.578
write code today
00:12.578 --> 00:14.480
is working with characters, words,
00:14.480 --> 00:16.206
lines, paragraphs, and so on,
00:16.206 --> 00:18.600
and these objects have no real relation
00:18.600 --> 00:21.520
to the structure of programming languages.
00:21.520 --> 00:24.667
In contrast, tree-edit's editing operations
00:24.667 --> 00:26.897
map exactly to the structure
00:26.897 --> 00:28.411
of the programming language,
00:28.411 --> 00:30.303
which is typically in a tree form
00:30.303 --> 00:32.053
with different types of nodes
00:32.053 --> 00:33.920
such as identifiers, expressions,
00:33.920 --> 00:35.957
and statements. Using this structure
00:35.957 --> 00:37.548
can enable much more powerful
00:37.548 --> 00:39.200
editing operations,
00:39.200 --> 00:40.769
and crucially editing operations
00:40.769 --> 00:42.081
that map much more closely
00:42.081 --> 00:44.960
to the way that we think about code.
00:44.960 --> 00:46.140
tree-edit was inspired by
00:46.140 --> 00:47.386
paredit and lispy,
00:47.386 --> 00:48.320
which are two great
00:48.320 --> 00:50.271
Lisp structural editors.
00:50.271 --> 00:52.383
However, what makes tree-edit unique
00:52.383 --> 00:54.480
is that it can work with many languages,
00:54.480 --> 00:55.759
such as some of the
00:55.759 --> 00:59.826
more mainstream languages like C, Java,
00:59.826 --> 01:01.600
Python, and so on.
01:01.600 --> 01:03.273
So now I'm going to show off tree-edit
01:03.273 --> 01:05.705
in action, working with a Java program.
01:05.705 --> 01:07.237
So we can see on the left,
01:07.237 --> 01:09.119
we have a syntax tree,
01:09.119 --> 01:11.560
and the node in bold is what I call
01:11.560 --> 01:13.780
the current node. So instead of
01:13.780 --> 01:15.100
the concept of a cursor,
01:15.100 --> 01:17.600
where we have a point in 2D space,
01:17.600 --> 01:20.285
we instead work with a current node
01:20.285 --> 01:22.729
which all our editing operations
01:22.729 --> 01:23.840
take place upon.
01:23.840 --> 01:26.479
So we can move up and down,
01:26.479 --> 01:28.720
or rather side to side,
01:28.720 --> 01:31.160
move inwards down to the children
01:31.160 --> 01:33.920
of the tree, back up to the parents.
01:33.920 --> 01:36.799
We can also jump to a node by its type.
01:36.799 --> 01:38.768
So we're going to jump to
01:38.768 --> 01:40.880
a variable declaration.
01:40.880 --> 01:44.399
We can jump to an if statement.
01:44.399 --> 01:46.880
And as you might have noticed,
01:46.880 --> 01:48.360
tree-edit by default
01:48.360 --> 01:51.337
uses a vim-style mode of editing,
01:51.337 --> 01:55.119
so it's a verb, which would be jump,
01:55.119 --> 01:56.874
and then a type,
01:56.874 --> 02:00.799
which would be if statement.
02:00.799 --> 02:03.346
So now I'll show off
02:03.346 --> 02:06.144
the syntax tree modification in action.
02:06.144 --> 02:08.000
So if I delete this deleteme node,
02:08.000 --> 02:10.112
we can see the node is deleted,
02:10.112 --> 02:12.049
and also the comma is removed
02:12.049 --> 02:13.920
since it's no longer needed.
02:13.920 --> 02:16.720
We can add some nodes back in.
02:16.720 --> 02:18.160
Here we just have a placeholder node
02:18.160 --> 02:20.391
called tree, which we can swap out
02:20.391 --> 02:21.875
with whatever we like.
02:21.875 --> 02:24.560
So if we want to put in, for example,
02:24.560 --> 02:29.280
a plus or minus operator,
02:29.280 --> 02:30.879
it'll put these two TREE things here
02:30.879 --> 02:32.634
since there needs to be something there,
02:32.634 --> 02:37.360
but we can go fill them out as we like.
02:37.360 --> 02:38.595
So that's what that is.
02:38.595 --> 02:41.920
Then I'll delete these again.
02:41.920 --> 02:43.709
Next we can see raising.
02:43.709 --> 02:45.280
So if I raise reader,
02:45.280 --> 02:46.160
then it will replace
02:46.160 --> 02:47.342
the outer function call
02:47.342 --> 02:48.583
with the node itself.
02:48.583 --> 02:50.948
I could raise it again.
02:50.948 --> 02:53.363
The opposite operation to that
02:53.363 --> 02:57.200
is wrapping. So I can wrap reader
02:57.200 --> 02:59.519
back into function call,
02:59.519 --> 03:03.009
and I could wrap this again
03:03.009 --> 03:08.480
if I wanted to. So that is wrapping.
03:08.480 --> 03:12.640
We can also do it on a statement level,
03:12.640 --> 03:13.760
so if I want to wrap this
03:13.760 --> 03:14.480
in an if statement,
03:14.480 --> 03:17.034
I can wrap the statement,
03:17.034 --> 03:18.400
and there we go.
03:18.400 --> 03:21.280
And let's just raise it back up,
03:21.280 --> 03:23.200
raise it again.
03:23.200 --> 03:25.760
There we go. Finally, I'll show off
03:26.959 --> 03:28.720
slurping and barfing,
03:28.720 --> 03:32.256
which... a little bit gross words,
03:32.256 --> 03:34.879
but I think it accurately describes
03:34.879 --> 03:37.519
the action, so let me just add
03:37.519 --> 03:41.120
a couple breaks here.
03:41.120 --> 03:44.748
So let's say we want
03:44.748 --> 03:46.779
this if statement and a couple of breaks
03:46.779 --> 03:48.319
to be inside of the while,
03:48.319 --> 03:50.959
so we can just slurp this up,
03:50.959 --> 03:52.433
and if we don't actually want them,
03:52.433 --> 03:54.528
we can barf them back out.
03:54.528 --> 03:56.736
So that's where those words
03:56.736 --> 03:57.840
have come from.
03:57.840 --> 04:01.120
And we can just... delete as we please.
04:01.120 --> 04:03.826
So yeah, that's a quick overview
04:03.826 --> 04:07.360
of the tree editing plugin in action.
04:07.360 --> 04:08.900
So now I want to talk a little bit
04:08.900 --> 04:12.080
about the implementation of tree-edit.
04:12.080 --> 04:14.400
Tree-edit uses the tree-sitter parser
04:14.400 --> 04:17.919
to convert text into a syntax tree.
04:17.919 --> 04:21.501
Tree-sitter is used by GitHub
04:21.501 --> 04:22.752
for its syntax highlighting,
04:22.752 --> 04:25.280
and it's available in a bunch of editors,
04:25.280 --> 04:27.120
including Emacs, so it's
04:27.120 --> 04:28.960
a fairly standard tool.
04:28.960 --> 04:30.960
However, the unique part about tree-edit
04:30.960 --> 04:32.479
is how it performs
04:32.479 --> 04:34.479
correct editing operations
04:34.479 --> 04:35.919
on the syntax tree
04:35.919 --> 04:38.320
and then converts that back into text.
04:38.320 --> 04:41.759
So to do that, we use miniKanren,
04:41.759 --> 04:43.759
and miniKanren is an embedded
04:43.759 --> 04:45.120
domain-specific language
04:45.120 --> 04:47.440
for logic programming.
04:47.440 --> 04:50.080
So what exactly does that mean?
04:50.080 --> 04:51.280
In our case, it's just
04:51.280 --> 04:54.240
an Emacs Lisp library called reazon,
04:54.240 --> 04:56.720
which exposes a set of macros
04:56.720 --> 04:58.320
which enables us to program
04:58.320 --> 05:01.360
in this logic programming style.
05:01.360 --> 05:03.280
I'm not going to get into the details
05:03.280 --> 05:05.520
of how logic programming works.
05:05.520 --> 05:07.520
However, one of the most unique aspects
05:07.520 --> 05:09.919
about it is that we can define
05:09.919 --> 05:13.600
a predicate and then figure out
05:13.600 --> 05:15.280
all the inputs to the predicate
05:15.280 --> 05:17.759
that would hold to be true.
05:17.759 --> 05:19.360
So in this case,
05:19.360 --> 05:21.520
we have our query variable q,
05:21.520 --> 05:24.479
which will be what the output is,
05:24.479 --> 05:29.120
and we are asking for all the values of q
05:29.120 --> 05:32.080
that pass this predicate of
05:32.080 --> 05:34.479
being set-equal to 1 2 3 4.
05:34.479 --> 05:36.880
So if we execute this,
05:36.880 --> 05:40.080
it will take a little time...
05:40.080 --> 05:41.520
It shouldn't be taking this long.
05:41.520 --> 05:43.280
Oh, there it goes.
05:43.280 --> 05:45.919
We can see that it's generated
05:45.919 --> 05:47.520
a bunch of different answers
05:47.520 --> 05:51.199
that are all set-equal to 1 2 3 4.
05:51.199 --> 05:52.880
So it's just a bunch of
05:52.880 --> 05:57.280
different permutations of that.
05:57.280 --> 05:59.120
We can extend this notion
05:59.120 --> 06:03.600
to a parser. In tree-edit, we've defined
06:03.600 --> 06:05.360
a parser in reazon,
06:05.360 --> 06:10.800
and we can use that parser to figure out
06:10.800 --> 06:15.919
any tokens that match the type of node
06:15.919 --> 06:16.880
that we're trying to generate.
06:16.880 --> 06:19.600
If I execute this, we can see
06:19.600 --> 06:21.199
that reazon has generated
06:21.199 --> 06:23.440
these five answers that match
06:23.440 --> 06:26.960
what a try statement is in Java.
06:26.960 --> 06:29.680
Here we can see we can have
06:29.680 --> 06:31.919
an infinite amount of catches
06:31.919 --> 06:34.720
optionally ending with a finally,
06:34.720 --> 06:36.160
and we always have to start
06:36.160 --> 06:39.039
with a try and a block.
06:39.039 --> 06:40.000
We can see this again
06:40.000 --> 06:42.400
with an argument list.
06:42.400 --> 06:43.520
We have the opening and closing
06:43.520 --> 06:45.759
parentheses, and expressions
06:45.759 --> 06:49.120
which are comma delimited.
06:49.120 --> 06:51.759
Now, for a more complex example, and
06:51.759 --> 06:53.680
something that is along the lines
06:53.680 --> 06:55.199
of what's in tree-edit,
06:55.199 --> 06:57.919
is if we have this x here
06:57.919 --> 07:01.599
and we want to insert another expression,
07:01.599 --> 07:05.759
so x, y. We can assert
07:05.759 --> 07:07.680
that there's some new tokens,
07:07.680 --> 07:10.160
and we want an expression
07:10.160 --> 07:11.840
to be in those new tokens,
07:11.840 --> 07:13.280
and we can essentially state
07:13.280 --> 07:15.039
where we want these new tokens to go
07:15.039 --> 07:19.759
within the old list of tokens,
07:19.759 --> 07:21.599
so replacing it
07:21.599 --> 07:23.360
after the previous expression,
07:23.360 --> 07:26.000
before the closed parentheses,
07:26.000 --> 07:26.880
and then we can state
07:26.880 --> 07:28.560
that the whole thing parses.
07:28.560 --> 07:30.080
If we run that, we can see that
07:30.080 --> 07:32.479
as we wanted earlier,
07:32.479 --> 07:37.120
which was a comma and then expression,
07:37.120 --> 07:39.120
we have that here as well.
07:39.120 --> 07:41.759
We can see this again.
07:41.759 --> 07:42.720
Here, the only change is that
07:42.720 --> 07:45.280
we've moved the tokens to be
07:45.280 --> 07:46.240
before the expression.
07:46.240 --> 07:48.800
So we want to put an expression
07:48.800 --> 07:50.560
before this x, so we want something
07:50.560 --> 07:52.560
like y, x,
07:52.560 --> 07:54.240
and if we execute that,
07:54.240 --> 07:57.919
we can see that it is correctly asserted
07:57.919 --> 07:59.039
that it would be an expression
07:59.039 --> 08:01.520
and then a comma afterwards.
08:01.520 --> 08:02.960
One last example is
08:02.960 --> 08:04.400
if we have an if statement
08:04.400 --> 08:07.759
and we want to add an extra block,
08:07.759 --> 08:11.599
we can see that it correctly figures out
08:11.599 --> 08:12.400
that we need an else
08:12.400 --> 08:13.840
in order to have another statement
08:13.840 --> 08:16.720
in an if statement.
08:16.720 --> 08:19.759
So, next steps for tree-edit.
08:19.759 --> 08:21.039
The core of tree-edit is in place
08:21.039 --> 08:23.120
but there's a lot of usability features
08:23.120 --> 08:25.360
to add, and a lot of testing
08:25.360 --> 08:26.400
that needs to be done
08:26.400 --> 08:29.599
in order to iron out any bugs that exist.
08:29.599 --> 08:30.960
I'd like to add support
08:30.960 --> 08:35.200
for as many languages as is possible.
08:35.200 --> 08:36.240
I think my next step
08:36.240 --> 08:38.490
will probably be Python.
08:38.490 --> 08:41.279
There's some performance improvements
08:41.279 --> 08:44.080
that need to be made, since using this
08:44.080 --> 08:45.519
logic programming language
08:45.519 --> 08:47.600
is fairly intensive.
08:47.600 --> 08:48.800
There's some optimizations
08:48.800 --> 08:50.560
both on the library side
08:50.560 --> 08:51.519
and on tree-edit side
08:51.519 --> 08:53.360
that can be made.
08:53.360 --> 08:55.519
Contributors are of course welcome,
08:55.519 --> 09:00.000
as tree-edit is an open source project.
09:00.000 --> 09:03.360
For future work, I think the prospect
09:03.360 --> 09:04.480
of voice controlled development
09:04.480 --> 09:06.240
with tree-edit is actually something
09:06.240 --> 09:07.920
that's really exciting,
09:07.920 --> 09:11.120
since syntax can be very cumbersome
09:11.120 --> 09:12.320
when you're working with
09:12.320 --> 09:14.240
voice control software.
09:14.240 --> 09:16.320
I can envision something like
09:16.320 --> 09:19.440
saying, "Jump to identifier,
09:19.440 --> 09:26.640
add plus operator, jump to if statement,
09:26.640 --> 09:30.480
wrap if statement in while."
09:30.480 --> 09:31.519
So that's something
09:31.519 --> 09:33.519
I'd like to investigate.
09:33.519 --> 09:35.040
I also would just like to
09:35.040 --> 09:37.279
provide the core functionality
09:37.279 --> 09:39.120
of [tree-edit] as something
09:39.120 --> 09:40.399
that can be used as a library
09:40.399 --> 09:41.920
for other projects,
09:41.920 --> 09:43.839
such as refactoring packages,
09:43.839 --> 09:46.240
or other non-Vim-style approaches,
09:46.240 --> 09:49.200
and just making the syntax generation
09:49.200 --> 09:52.080
available for reuse.
09:52.080 --> 09:53.760
Finally, I'd like to thank
09:53.760 --> 09:56.399
the authors of reazon
09:56.399 --> 09:58.399
and elisp-tree-sitter,
09:58.399 --> 10:00.185
which in turn packages
10:00.185 --> 10:02.079
tree-sitter itself,
10:02.079 --> 10:05.440
since tree-edit relies very heavily
10:05.440 --> 10:07.680
on these two packages.
10:07.680 --> 10:08.959
I'd also like to thank
10:08.959 --> 10:10.480
the author of lispy,
10:10.480 --> 10:12.720
since a lot of the design decisions
10:12.720 --> 10:14.800
when it comes to the editing operations
10:14.800 --> 10:18.560
are based very heavily on lispy.
10:18.560 --> 10:20.320
So that's the end of my talk.
10:20.320 --> 10:22.959
Thank you for watching.
10:22.959 --> 10:23.959
[captions by sachac]