WEBVTT 00:00.080 --> 00:01.360 Hi. My name is Ethan, 00:01.360 --> 00:02.320 and today I'm going to be speaking 00:02.320 --> 00:04.240 about tree-edit, which is a package 00:04.240 --> 00:06.160 which aims to bring structural editing 00:06.160 --> 00:08.320 to everyday languages. 00:08.320 --> 00:10.559 So what is structural editing? 00:10.559 --> 00:11.657 The way that we typically 00:11.657 --> 00:12.578 write code today 00:12.578 --> 00:14.480 is working with characters, words, 00:14.480 --> 00:16.206 lines, paragraphs, and so on, 00:16.206 --> 00:18.600 and these objects have no real relation 00:18.600 --> 00:21.520 to the structure of programming languages. 00:21.520 --> 00:24.667 In contrast, tree-edit's editing operations 00:24.667 --> 00:26.897 map exactly to the structure 00:26.897 --> 00:28.411 of the programming language, 00:28.411 --> 00:30.303 which is typically in a tree form 00:30.303 --> 00:32.053 with different types of nodes 00:32.053 --> 00:33.920 such as identifiers, expressions, 00:33.920 --> 00:35.957 and statements. Using this structure 00:35.957 --> 00:37.548 can enable much more powerful 00:37.548 --> 00:39.200 editing operations, 00:39.200 --> 00:40.769 and crucially editing operations 00:40.769 --> 00:42.081 that map much more closely 00:42.081 --> 00:44.960 to the way that we think about code. 00:44.960 --> 00:46.140 tree-edit was inspired by 00:46.140 --> 00:47.386 paredit and lispy, 00:47.386 --> 00:48.320 which are two great 00:48.320 --> 00:50.271 Lisp structural editors. 00:50.271 --> 00:52.383 However, what makes tree-edit unique 00:52.383 --> 00:54.480 is that it can work with many languages, 00:54.480 --> 00:55.759 such as some of the 00:55.759 --> 00:59.826 more mainstream languages like C, Java, 00:59.826 --> 01:01.600 Python, and so on. 01:01.600 --> 01:03.273 So now I'm going to show off tree-edit 01:03.273 --> 01:05.705 in action, working with a Java program. 01:05.705 --> 01:07.237 So we can see on the left, 01:07.237 --> 01:09.119 we have a syntax tree, 01:09.119 --> 01:11.560 and the node in bold is what I call 01:11.560 --> 01:13.780 the current node. So instead of 01:13.780 --> 01:15.100 the concept of a cursor, 01:15.100 --> 01:17.600 where we have a point in 2D space, 01:17.600 --> 01:20.285 we instead work with a current node 01:20.285 --> 01:22.729 which all our editing operations 01:22.729 --> 01:23.840 take place upon. 01:23.840 --> 01:26.479 So we can move up and down, 01:26.479 --> 01:28.720 or rather side to side, 01:28.720 --> 01:31.160 move inwards down to the children 01:31.160 --> 01:33.920 of the tree, back up to the parents. 01:33.920 --> 01:36.799 We can also jump to a node by its type. 01:36.799 --> 01:38.768 So we're going to jump to 01:38.768 --> 01:40.880 a variable declaration. 01:40.880 --> 01:44.399 We can jump to an if statement. 01:44.399 --> 01:46.880 And as you might have noticed, 01:46.880 --> 01:48.360 tree-edit by default 01:48.360 --> 01:51.337 uses a vim-style mode of editing, 01:51.337 --> 01:55.119 so it's a verb, which would be jump, 01:55.119 --> 01:56.874 and then a type, 01:56.874 --> 02:00.799 which would be if statement. 02:00.799 --> 02:03.346 So now I'll show off 02:03.346 --> 02:06.144 the syntax tree modification in action. 02:06.144 --> 02:08.000 So if I delete this deleteme node, 02:08.000 --> 02:10.112 we can see the node is deleted, 02:10.112 --> 02:12.049 and also the comma is removed 02:12.049 --> 02:13.920 since it's no longer needed. 02:13.920 --> 02:16.720 We can add some nodes back in. 02:16.720 --> 02:18.160 Here we just have a placeholder node 02:18.160 --> 02:20.391 called tree, which we can swap out 02:20.391 --> 02:21.875 with whatever we like. 02:21.875 --> 02:24.560 So if we want to put in, for example, 02:24.560 --> 02:29.280 a plus or minus operator, 02:29.280 --> 02:30.879 it'll put these two TREE things here 02:30.879 --> 02:32.634 since there needs to be something there, 02:32.634 --> 02:37.360 but we can go fill them out as we like. 02:37.360 --> 02:38.595 So that's what that is. 02:38.595 --> 02:41.920 Then I'll delete these again. 02:41.920 --> 02:43.709 Next we can see raising. 02:43.709 --> 02:45.280 So if I raise reader, 02:45.280 --> 02:46.160 then it will replace 02:46.160 --> 02:47.342 the outer function call 02:47.342 --> 02:48.583 with the node itself. 02:48.583 --> 02:50.948 I could raise it again. 02:50.948 --> 02:53.363 The opposite operation to that 02:53.363 --> 02:57.200 is wrapping. So I can wrap reader 02:57.200 --> 02:59.519 back into function call, 02:59.519 --> 03:03.009 and I could wrap this again 03:03.009 --> 03:08.480 if I wanted to. So that is wrapping. 03:08.480 --> 03:12.640 We can also do it on a statement level, 03:12.640 --> 03:13.760 so if I want to wrap this 03:13.760 --> 03:14.480 in an if statement, 03:14.480 --> 03:17.034 I can wrap the statement, 03:17.034 --> 03:18.400 and there we go. 03:18.400 --> 03:21.280 And let's just raise it back up, 03:21.280 --> 03:23.200 raise it again. 03:23.200 --> 03:25.760 There we go. Finally, I'll show off 03:26.959 --> 03:28.720 slurping and barfing, 03:28.720 --> 03:32.256 which... a little bit gross words, 03:32.256 --> 03:34.879 but I think it accurately describes 03:34.879 --> 03:37.519 the action, so let me just add 03:37.519 --> 03:41.120 a couple breaks here. 03:41.120 --> 03:44.748 So let's say we want 03:44.748 --> 03:46.779 this if statement and a couple of breaks 03:46.779 --> 03:48.319 to be inside of the while, 03:48.319 --> 03:50.959 so we can just slurp this up, 03:50.959 --> 03:52.433 and if we don't actually want them, 03:52.433 --> 03:54.528 we can barf them back out. 03:54.528 --> 03:56.736 So that's where those words 03:56.736 --> 03:57.840 have come from. 03:57.840 --> 04:01.120 And we can just... delete as we please. 04:01.120 --> 04:03.826 So yeah, that's a quick overview 04:03.826 --> 04:07.360 of the tree editing plugin in action. 04:07.360 --> 04:08.900 So now I want to talk a little bit 04:08.900 --> 04:12.080 about the implementation of tree-edit. 04:12.080 --> 04:14.400 Tree-edit uses the tree-sitter parser 04:14.400 --> 04:17.919 to convert text into a syntax tree. 04:17.919 --> 04:21.501 Tree-sitter is used by GitHub 04:21.501 --> 04:22.752 for its syntax highlighting, 04:22.752 --> 04:25.280 and it's available in a bunch of editors, 04:25.280 --> 04:27.120 including Emacs, so it's 04:27.120 --> 04:28.960 a fairly standard tool. 04:28.960 --> 04:30.960 However, the unique part about tree-edit 04:30.960 --> 04:32.479 is how it performs 04:32.479 --> 04:34.479 correct editing operations 04:34.479 --> 04:35.919 on the syntax tree 04:35.919 --> 04:38.320 and then converts that back into text. 04:38.320 --> 04:41.759 So to do that, we use miniKanren, 04:41.759 --> 04:43.759 and miniKanren is an embedded 04:43.759 --> 04:45.120 domain-specific language 04:45.120 --> 04:47.440 for logic programming. 04:47.440 --> 04:50.080 So what exactly does that mean? 04:50.080 --> 04:51.280 In our case, it's just 04:51.280 --> 04:54.240 an Emacs Lisp library called reazon, 04:54.240 --> 04:56.720 which exposes a set of macros 04:56.720 --> 04:58.320 which enables us to program 04:58.320 --> 05:01.360 in this logic programming style. 05:01.360 --> 05:03.280 I'm not going to get into the details 05:03.280 --> 05:05.520 of how logic programming works. 05:05.520 --> 05:07.520 However, one of the most unique aspects 05:07.520 --> 05:09.919 about it is that we can define 05:09.919 --> 05:13.600 a predicate and then figure out 05:13.600 --> 05:15.280 all the inputs to the predicate 05:15.280 --> 05:17.759 that would hold to be true. 05:17.759 --> 05:19.360 So in this case, 05:19.360 --> 05:21.520 we have our query variable q, 05:21.520 --> 05:24.479 which will be what the output is, 05:24.479 --> 05:29.120 and we are asking for all the values of q 05:29.120 --> 05:32.080 that pass this predicate of 05:32.080 --> 05:34.479 being set-equal to 1 2 3 4. 05:34.479 --> 05:36.880 So if we execute this, 05:36.880 --> 05:40.080 it will take a little time... 05:40.080 --> 05:41.520 It shouldn't be taking this long. 05:41.520 --> 05:43.280 Oh, there it goes. 05:43.280 --> 05:45.919 We can see that it's generated 05:45.919 --> 05:47.520 a bunch of different answers 05:47.520 --> 05:51.199 that are all set-equal to 1 2 3 4. 05:51.199 --> 05:52.880 So it's just a bunch of 05:52.880 --> 05:57.280 different permutations of that. 05:57.280 --> 05:59.120 We can extend this notion 05:59.120 --> 06:03.600 to a parser. In tree-edit, we've defined 06:03.600 --> 06:05.360 a parser in reazon, 06:05.360 --> 06:10.800 and we can use that parser to figure out 06:10.800 --> 06:15.919 any tokens that match the type of node 06:15.919 --> 06:16.880 that we're trying to generate. 06:16.880 --> 06:19.600 If I execute this, we can see 06:19.600 --> 06:21.199 that reazon has generated 06:21.199 --> 06:23.440 these five answers that match 06:23.440 --> 06:26.960 what a try statement is in Java. 06:26.960 --> 06:29.680 Here we can see we can have 06:29.680 --> 06:31.919 an infinite amount of catches 06:31.919 --> 06:34.720 optionally ending with a finally, 06:34.720 --> 06:36.160 and we always have to start 06:36.160 --> 06:39.039 with a try and a block. 06:39.039 --> 06:40.000 We can see this again 06:40.000 --> 06:42.400 with an argument list. 06:42.400 --> 06:43.520 We have the opening and closing 06:43.520 --> 06:45.759 parentheses, and expressions 06:45.759 --> 06:49.120 which are comma delimited. 06:49.120 --> 06:51.759 Now, for a more complex example, and 06:51.759 --> 06:53.680 something that is along the lines 06:53.680 --> 06:55.199 of what's in tree-edit, 06:55.199 --> 06:57.919 is if we have this x here 06:57.919 --> 07:01.599 and we want to insert another expression, 07:01.599 --> 07:05.759 so x, y. We can assert 07:05.759 --> 07:07.680 that there's some new tokens, 07:07.680 --> 07:10.160 and we want an expression 07:10.160 --> 07:11.840 to be in those new tokens, 07:11.840 --> 07:13.280 and we can essentially state 07:13.280 --> 07:15.039 where we want these new tokens to go 07:15.039 --> 07:19.759 within the old list of tokens, 07:19.759 --> 07:21.599 so replacing it 07:21.599 --> 07:23.360 after the previous expression, 07:23.360 --> 07:26.000 before the closed parentheses, 07:26.000 --> 07:26.880 and then we can state 07:26.880 --> 07:28.560 that the whole thing parses. 07:28.560 --> 07:30.080 If we run that, we can see that 07:30.080 --> 07:32.479 as we wanted earlier, 07:32.479 --> 07:37.120 which was a comma and then expression, 07:37.120 --> 07:39.120 we have that here as well. 07:39.120 --> 07:41.759 We can see this again. 07:41.759 --> 07:42.720 Here, the only change is that 07:42.720 --> 07:45.280 we've moved the tokens to be 07:45.280 --> 07:46.240 before the expression. 07:46.240 --> 07:48.800 So we want to put an expression 07:48.800 --> 07:50.560 before this x, so we want something 07:50.560 --> 07:52.560 like y, x, 07:52.560 --> 07:54.240 and if we execute that, 07:54.240 --> 07:57.919 we can see that it is correctly asserted 07:57.919 --> 07:59.039 that it would be an expression 07:59.039 --> 08:01.520 and then a comma afterwards. 08:01.520 --> 08:02.960 One last example is 08:02.960 --> 08:04.400 if we have an if statement 08:04.400 --> 08:07.759 and we want to add an extra block, 08:07.759 --> 08:11.599 we can see that it correctly figures out 08:11.599 --> 08:12.400 that we need an else 08:12.400 --> 08:13.840 in order to have another statement 08:13.840 --> 08:16.720 in an if statement. 08:16.720 --> 08:19.759 So, next steps for tree-edit. 08:19.759 --> 08:21.039 The core of tree-edit is in place 08:21.039 --> 08:23.120 but there's a lot of usability features 08:23.120 --> 08:25.360 to add, and a lot of testing 08:25.360 --> 08:26.400 that needs to be done 08:26.400 --> 08:29.599 in order to iron out any bugs that exist. 08:29.599 --> 08:30.960 I'd like to add support 08:30.960 --> 08:35.200 for as many languages as is possible. 08:35.200 --> 08:36.240 I think my next step 08:36.240 --> 08:38.490 will probably be Python. 08:38.490 --> 08:41.279 There's some performance improvements 08:41.279 --> 08:44.080 that need to be made, since using this 08:44.080 --> 08:45.519 logic programming language 08:45.519 --> 08:47.600 is fairly intensive. 08:47.600 --> 08:48.800 There's some optimizations 08:48.800 --> 08:50.560 both on the library side 08:50.560 --> 08:51.519 and on tree-edit side 08:51.519 --> 08:53.360 that can be made. 08:53.360 --> 08:55.519 Contributors are of course welcome, 08:55.519 --> 09:00.000 as tree-edit is an open source project. 09:00.000 --> 09:03.360 For future work, I think the prospect 09:03.360 --> 09:04.480 of voice controlled development 09:04.480 --> 09:06.240 with tree-edit is actually something 09:06.240 --> 09:07.920 that's really exciting, 09:07.920 --> 09:11.120 since syntax can be very cumbersome 09:11.120 --> 09:12.320 when you're working with 09:12.320 --> 09:14.240 voice control software. 09:14.240 --> 09:16.320 I can envision something like 09:16.320 --> 09:19.440 saying, "Jump to identifier, 09:19.440 --> 09:26.640 add plus operator, jump to if statement, 09:26.640 --> 09:30.480 wrap if statement in while." 09:30.480 --> 09:31.519 So that's something 09:31.519 --> 09:33.519 I'd like to investigate. 09:33.519 --> 09:35.040 I also would just like to 09:35.040 --> 09:37.279 provide the core functionality 09:37.279 --> 09:39.120 of [tree-edit] as something 09:39.120 --> 09:40.399 that can be used as a library 09:40.399 --> 09:41.920 for other projects, 09:41.920 --> 09:43.839 such as refactoring packages, 09:43.839 --> 09:46.240 or other non-Vim-style approaches, 09:46.240 --> 09:49.200 and just making the syntax generation 09:49.200 --> 09:52.080 available for reuse. 09:52.080 --> 09:53.760 Finally, I'd like to thank 09:53.760 --> 09:56.399 the authors of reazon 09:56.399 --> 09:58.399 and elisp-tree-sitter, 09:58.399 --> 10:00.185 which in turn packages 10:00.185 --> 10:02.079 tree-sitter itself, 10:02.079 --> 10:05.440 since tree-edit relies very heavily 10:05.440 --> 10:07.680 on these two packages. 10:07.680 --> 10:08.959 I'd also like to thank 10:08.959 --> 10:10.480 the author of lispy, 10:10.480 --> 10:12.720 since a lot of the design decisions 10:12.720 --> 10:14.800 when it comes to the editing operations 10:14.800 --> 10:18.560 are based very heavily on lispy. 10:18.560 --> 10:20.320 So that's the end of my talk. 10:20.320 --> 10:22.959 Thank you for watching. 10:22.959 --> 10:23.959 [captions by sachac]