summaryrefslogtreecommitdiffstats
path: root/2021/captions/emacsconf-2021-structural--tree-edit-structural-editing-for-java-python-c-and-beyond--ethan-leba--main.vtt
blob: 9bba8336f37889173aec0bd6401c61f93af68db9 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
WEBVTT

00:00.080 --> 00:01.360
Hi. My name is Ethan,

00:01.360 --> 00:02.320
and today I'm going to be speaking 

00:02.320 --> 00:04.240
about tree-edit, which is a package

00:04.240 --> 00:06.160
which aims to bring structural editing

00:06.160 --> 00:08.320
to everyday languages.

00:08.320 --> 00:10.559
So what is structural editing?

00:10.559 --> 00:11.657
The way that we typically 

00:11.657 --> 00:12.578
write code today

00:12.578 --> 00:14.480
is working with characters, words,

00:14.480 --> 00:16.206
lines, paragraphs, and so on, 

00:16.206 --> 00:18.600
and these objects have no real relation 

00:18.600 --> 00:21.520
to the structure of programming languages.

00:21.520 --> 00:24.667
In contrast, tree-edit's editing operations

00:24.667 --> 00:26.897
map exactly to the structure 

00:26.897 --> 00:28.411
of the programming language,

00:28.411 --> 00:30.303
which is typically in a tree form

00:30.303 --> 00:32.053
with different types of nodes

00:32.053 --> 00:33.920
such as identifiers, expressions,

00:33.920 --> 00:35.957
and statements. Using this structure 

00:35.957 --> 00:37.548
can enable much more powerful 

00:37.548 --> 00:39.200
editing operations,

00:39.200 --> 00:40.769
and crucially editing operations 

00:40.769 --> 00:42.081
that map much more closely

00:42.081 --> 00:44.960
to the way that we think about code.

00:44.960 --> 00:46.140
tree-edit was inspired by 

00:46.140 --> 00:47.386
paredit and lispy, 

00:47.386 --> 00:48.320
which are two great 

00:48.320 --> 00:50.271
Lisp structural editors.

00:50.271 --> 00:52.383
However, what makes tree-edit unique

00:52.383 --> 00:54.480
is that it can work with many languages,

00:54.480 --> 00:55.759
such as some of the 

00:55.759 --> 00:59.826
more mainstream languages like C, Java,

00:59.826 --> 01:01.600
Python, and so on.

01:01.600 --> 01:03.273
So now I'm going to show off tree-edit 

01:03.273 --> 01:05.705
in action, working with a Java program.

01:05.705 --> 01:07.237
So we can see on the left,

01:07.237 --> 01:09.119
we have a syntax tree,

01:09.119 --> 01:11.560
and the node in bold is what I call

01:11.560 --> 01:13.780
the current node. So instead of

01:13.780 --> 01:15.100
the concept of a cursor,

01:15.100 --> 01:17.600
where we have a point in 2D space,

01:17.600 --> 01:20.285
we instead work with a current node

01:20.285 --> 01:22.729
which all our editing operations 

01:22.729 --> 01:23.840
take place upon.

01:23.840 --> 01:26.479
So we can move up and down,

01:26.479 --> 01:28.720
or rather side to side,

01:28.720 --> 01:31.160
move inwards down to the children

01:31.160 --> 01:33.920
of the tree, back up to the parents.

01:33.920 --> 01:36.799
We can also jump to a node by its type. 

01:36.799 --> 01:38.768
So we're going to jump to

01:38.768 --> 01:40.880
a variable declaration.

01:40.880 --> 01:44.399
We can jump to an if statement.

01:44.399 --> 01:46.880
And as you might have noticed,

01:46.880 --> 01:48.360
tree-edit by default 

01:48.360 --> 01:51.337
uses a vim-style mode of editing,

01:51.337 --> 01:55.119
so it's a verb, which would be jump,

01:55.119 --> 01:56.874
and then a type, 

01:56.874 --> 02:00.799
which would be if statement.

02:00.799 --> 02:03.346
So now I'll show off 

02:03.346 --> 02:06.144
the syntax tree modification in action.

02:06.144 --> 02:08.000
So if I delete this deleteme node,

02:08.000 --> 02:10.112
we can see the node is deleted,

02:10.112 --> 02:12.049
and also the comma is removed

02:12.049 --> 02:13.920
since it's no longer needed.

02:13.920 --> 02:16.720
We can add some nodes back in.

02:16.720 --> 02:18.160
Here we just have a placeholder node

02:18.160 --> 02:20.391
called tree, which we can swap out

02:20.391 --> 02:21.875
with whatever we like. 

02:21.875 --> 02:24.560
So if we want to put in, for example,

02:24.560 --> 02:29.280
a plus or minus operator,

02:29.280 --> 02:30.879
it'll put these two TREE things here

02:30.879 --> 02:32.634
since there needs to be something there,

02:32.634 --> 02:37.360
but we can go fill them out as we like.

02:37.360 --> 02:38.595
So that's what that is. 

02:38.595 --> 02:41.920
Then I'll delete these again.

02:41.920 --> 02:43.709
Next we can see raising. 

02:43.709 --> 02:45.280
So if I raise reader, 

02:45.280 --> 02:46.160
then it will replace

02:46.160 --> 02:47.342
the outer function call

02:47.342 --> 02:48.583
with the node itself. 

02:48.583 --> 02:50.948
I could raise it again.

02:50.948 --> 02:53.363
The opposite operation to that 

02:53.363 --> 02:57.200
is wrapping. So I can wrap reader

02:57.200 --> 02:59.519
back into function call,

02:59.519 --> 03:03.009
and I could wrap this again 

03:03.009 --> 03:08.480
if I wanted to. So that is wrapping.

03:08.480 --> 03:12.640
We can also do it on a statement level,

03:12.640 --> 03:13.760
so if I want to wrap this

03:13.760 --> 03:14.480
in an if statement,

03:14.480 --> 03:17.034
I can wrap the statement,

03:17.034 --> 03:18.400
and there we go.

03:18.400 --> 03:21.280
And let's just raise it back up,

03:21.280 --> 03:23.200
raise it again.

03:23.200 --> 03:25.760
There we go. Finally, I'll show off

03:26.959 --> 03:28.720
slurping and barfing,

03:28.720 --> 03:32.256
which... a little bit gross words, 

03:32.256 --> 03:34.879
but I think it accurately describes

03:34.879 --> 03:37.519
the action, so let me just add

03:37.519 --> 03:41.120
a couple breaks here.

03:41.120 --> 03:44.748
So let's say we want 

03:44.748 --> 03:46.779
this if statement and a couple of breaks 

03:46.779 --> 03:48.319
to be inside of the while,

03:48.319 --> 03:50.959
so we can just slurp this up,

03:50.959 --> 03:52.433
and if we don't actually want them,

03:52.433 --> 03:54.528
we can barf them back out. 

03:54.528 --> 03:56.736
So that's where those words 

03:56.736 --> 03:57.840
have come from.

03:57.840 --> 04:01.120
And we can just... delete as we please.

04:01.120 --> 04:03.826
So yeah, that's a quick overview 

04:03.826 --> 04:07.360
of the tree editing plugin in action.

04:07.360 --> 04:08.900
So now I want to talk a little bit 

04:08.900 --> 04:12.080
about the implementation of tree-edit.

04:12.080 --> 04:14.400
Tree-edit uses the tree-sitter parser

04:14.400 --> 04:17.919
to convert text into a syntax tree.

04:17.919 --> 04:21.501
Tree-sitter is used by GitHub

04:21.501 --> 04:22.752
for its syntax highlighting, 

04:22.752 --> 04:25.280
and it's available in a bunch of editors,

04:25.280 --> 04:27.120
including Emacs, so it's 

04:27.120 --> 04:28.960
a fairly standard tool.

04:28.960 --> 04:30.960
However, the unique part about tree-edit 

04:30.960 --> 04:32.479
is how it performs 

04:32.479 --> 04:34.479
correct editing operations 

04:34.479 --> 04:35.919
on the syntax tree

04:35.919 --> 04:38.320
and then converts that back into text.

04:38.320 --> 04:41.759
So to do that, we use miniKanren,

04:41.759 --> 04:43.759
and miniKanren is an embedded

04:43.759 --> 04:45.120
domain-specific language 

04:45.120 --> 04:47.440
for logic programming. 

04:47.440 --> 04:50.080
So what exactly does that mean?

04:50.080 --> 04:51.280
In our case, it's just 

04:51.280 --> 04:54.240
an Emacs Lisp library called reazon,

04:54.240 --> 04:56.720
which exposes a set of macros 

04:56.720 --> 04:58.320
which enables us to program

04:58.320 --> 05:01.360
in this logic programming style.

05:01.360 --> 05:03.280
I'm not going to get into the details

05:03.280 --> 05:05.520
of how logic programming works.

05:05.520 --> 05:07.520
However, one of the most unique aspects

05:07.520 --> 05:09.919
about it is that we can define 

05:09.919 --> 05:13.600
a predicate and then figure out 

05:13.600 --> 05:15.280
all the inputs to the predicate 

05:15.280 --> 05:17.759
that would hold to be true.

05:17.759 --> 05:19.360
So in this case,

05:19.360 --> 05:21.520
we have our query variable q,

05:21.520 --> 05:24.479
which will be what the output is,

05:24.479 --> 05:29.120
and we are asking for all the values of q

05:29.120 --> 05:32.080
that pass this predicate of

05:32.080 --> 05:34.479
being set-equal to 1 2 3 4. 

05:34.479 --> 05:36.880
So if we execute this,

05:36.880 --> 05:40.080
it will take a little time...

05:40.080 --> 05:41.520
It shouldn't be taking this long. 

05:41.520 --> 05:43.280
Oh, there it goes.

05:43.280 --> 05:45.919
We can see that it's generated 

05:45.919 --> 05:47.520
a bunch of different answers 

05:47.520 --> 05:51.199
that are all set-equal to 1 2 3 4.

05:51.199 --> 05:52.880
So it's just a bunch of 

05:52.880 --> 05:57.280
different permutations of that.

05:57.280 --> 05:59.120
We can extend this notion 

05:59.120 --> 06:03.600
to a parser. In tree-edit, we've defined

06:03.600 --> 06:05.360
a parser in reazon,

06:05.360 --> 06:10.800
and we can use that parser to figure out

06:10.800 --> 06:15.919
any tokens that match the type of node 

06:15.919 --> 06:16.880
that we're trying to generate.

06:16.880 --> 06:19.600
If I execute this, we can see

06:19.600 --> 06:21.199
that reazon has generated 

06:21.199 --> 06:23.440
these five answers that match

06:23.440 --> 06:26.960
what a try statement is in Java.

06:26.960 --> 06:29.680
Here we can see we can have 

06:29.680 --> 06:31.919
an infinite amount of catches

06:31.919 --> 06:34.720
optionally ending with a finally,

06:34.720 --> 06:36.160
and we always have to start 

06:36.160 --> 06:39.039
with a try and a block.

06:39.039 --> 06:40.000
We can see this again 

06:40.000 --> 06:42.400
with an argument list.

06:42.400 --> 06:43.520
We have the opening and closing

06:43.520 --> 06:45.759
parentheses, and expressions 

06:45.759 --> 06:49.120
which are comma delimited.

06:49.120 --> 06:51.759
Now, for a more complex example, and

06:51.759 --> 06:53.680
something that is along the lines 

06:53.680 --> 06:55.199
of what's in tree-edit,

06:55.199 --> 06:57.919
is if we have this x here

06:57.919 --> 07:01.599
and we want to insert another expression,

07:01.599 --> 07:05.759
so x, y. We can assert 

07:05.759 --> 07:07.680
that there's some new tokens,

07:07.680 --> 07:10.160
and we want an expression 

07:10.160 --> 07:11.840
to be in those new tokens,

07:11.840 --> 07:13.280
and we can essentially state 

07:13.280 --> 07:15.039
where we want these new tokens to go

07:15.039 --> 07:19.759
within the old list of tokens,

07:19.759 --> 07:21.599
so replacing it 

07:21.599 --> 07:23.360
after the previous expression, 

07:23.360 --> 07:26.000
before the closed parentheses,

07:26.000 --> 07:26.880
and then we can state 

07:26.880 --> 07:28.560
that the whole thing parses. 

07:28.560 --> 07:30.080
If we run that, we can see that

07:30.080 --> 07:32.479
as we wanted earlier,

07:32.479 --> 07:37.120
which was a comma and then expression,

07:37.120 --> 07:39.120
we have that here as well.

07:39.120 --> 07:41.759
We can see this again.

07:41.759 --> 07:42.720
Here, the only change is that 

07:42.720 --> 07:45.280
we've moved the tokens to be

07:45.280 --> 07:46.240
before the expression.

07:46.240 --> 07:48.800
So we want to put an expression 

07:48.800 --> 07:50.560
before this x, so we want something

07:50.560 --> 07:52.560
like y, x,

07:52.560 --> 07:54.240
and if we execute that,

07:54.240 --> 07:57.919
we can see that it is correctly asserted

07:57.919 --> 07:59.039
that it would be an expression

07:59.039 --> 08:01.520
and then a comma afterwards.

08:01.520 --> 08:02.960
One last example is 

08:02.960 --> 08:04.400
if we have an if statement 

08:04.400 --> 08:07.759
and we want to add an extra block,

08:07.759 --> 08:11.599
we can see that it correctly figures out

08:11.599 --> 08:12.400
that we need an else

08:12.400 --> 08:13.840
in order to have another statement 

08:13.840 --> 08:16.720
in an if statement.

08:16.720 --> 08:19.759
So, next steps for tree-edit.

08:19.759 --> 08:21.039
The core of tree-edit is in place 

08:21.039 --> 08:23.120
but there's a lot of usability features 

08:23.120 --> 08:25.360
to add, and a lot of testing 

08:25.360 --> 08:26.400
that needs to be done

08:26.400 --> 08:29.599
in order to iron out any bugs that exist.

08:29.599 --> 08:30.960
I'd like to add support 

08:30.960 --> 08:35.200
for as many languages as is possible.

08:35.200 --> 08:36.240
I think my next step 

08:36.240 --> 08:38.490
will probably be Python.

08:38.490 --> 08:41.279
There's some performance improvements 

08:41.279 --> 08:44.080
that need to be made, since using this 

08:44.080 --> 08:45.519
logic programming language 

08:45.519 --> 08:47.600
is fairly intensive. 

08:47.600 --> 08:48.800
There's some optimizations 

08:48.800 --> 08:50.560
both on the library side 

08:50.560 --> 08:51.519
and on tree-edit side 

08:51.519 --> 08:53.360
that can be made. 

08:53.360 --> 08:55.519
Contributors are of course welcome,

08:55.519 --> 09:00.000
as tree-edit is an open source project.

09:00.000 --> 09:03.360
For future work, I think the prospect

09:03.360 --> 09:04.480
of voice controlled development

09:04.480 --> 09:06.240
with tree-edit is actually something

09:06.240 --> 09:07.920
that's really exciting,

09:07.920 --> 09:11.120
since syntax can be very cumbersome

09:11.120 --> 09:12.320
when you're working with

09:12.320 --> 09:14.240
voice control software.

09:14.240 --> 09:16.320
I can envision something like

09:16.320 --> 09:19.440
saying, "Jump to identifier,

09:19.440 --> 09:26.640
add plus operator, jump to if statement,

09:26.640 --> 09:30.480
wrap if statement in while."

09:30.480 --> 09:31.519
So that's something 

09:31.519 --> 09:33.519
I'd like to investigate.

09:33.519 --> 09:35.040
I also would just like to 

09:35.040 --> 09:37.279
provide the core functionality 

09:37.279 --> 09:39.120
of [tree-edit] as something

09:39.120 --> 09:40.399
that can be used as a library

09:40.399 --> 09:41.920
for other projects,

09:41.920 --> 09:43.839
such as refactoring packages,

09:43.839 --> 09:46.240
or other non-Vim-style approaches,

09:46.240 --> 09:49.200
and just making the syntax generation

09:49.200 --> 09:52.080
available for reuse.

09:52.080 --> 09:53.760
Finally, I'd like to thank 

09:53.760 --> 09:56.399
the authors of reazon 

09:56.399 --> 09:58.399
and elisp-tree-sitter, 

09:58.399 --> 10:00.185
which in turn packages 

10:00.185 --> 10:02.079
tree-sitter itself,

10:02.079 --> 10:05.440
since tree-edit relies very heavily

10:05.440 --> 10:07.680
on these two packages.

10:07.680 --> 10:08.959
I'd also like to thank 

10:08.959 --> 10:10.480
the author of lispy,

10:10.480 --> 10:12.720
since a lot of the design decisions 

10:12.720 --> 10:14.800
when it comes to the editing operations 

10:14.800 --> 10:18.560
are based very heavily on lispy.

10:18.560 --> 10:20.320
So that's the end of my talk.

10:20.320 --> 10:22.959
Thank you for watching.

10:22.959 --> 10:23.959
[captions by sachac]