WEBVTT 00:00.200 --> 00:02.340 Hi everybody, my name is Andrea Corallo, 00:02.440 --> 00:03.280 and this presentation is about 00:03.382 --> 00:05.802 the Emacs Lisp Native Compiler -- 00:05.904 --> 00:07.083 having GNU Emacs able to 00:07.184 --> 00:08.805 compile and run Emacs Lisp as native code. 00:08.907 --> 00:11.565 This project has been my hobby project 00:11.667 --> 00:14.009 for the last about two years and a half 00:14.111 --> 00:15.408 And we will see a little bit 00:15.509 --> 00:16.969 where this project is coming from, 00:17.070 --> 00:18.913 where we are, and where we want to go. 00:19.015 --> 00:20.571 So essentially everything you need to know 00:20.672 --> 00:22.132 about the Emacs Lisp Native Compiler, 00:22.232 --> 00:23.416 and probably a little more. 00:23.521 --> 00:26.535 Just a little bit of context on Emacs Lisp. 00:26.635 --> 00:30.175 Well, Emacs Lisp is a programming language, 00:30.278 --> 00:33.544 it's indeed a Lisp, 00:33.647 --> 00:36.619 and one year ago I looked for some statistics, 00:36.722 --> 00:38.742 and I was kind of pleased to see 00:38.842 --> 00:40.030 -- surprised to see actually -- 00:40.132 --> 00:40.972 that it's still kind of popular 00:41.073 --> 00:42.844 as a programming language. 00:42.944 --> 00:43.684 It doesn't rank that bad 00:43.785 --> 00:45.554 against other programming languages. 00:45.658 --> 00:48.387 Also, the other important fact about Emacs Lisp 00:48.487 --> 00:51.518 is that there is a lot of Emacs Lisp out there, 00:51.622 --> 00:55.922 and this will have an impact [on this project.] 00:56.025 --> 00:57.632 It's a programming language that is capable 00:57.734 --> 00:59.320 of almost any task, so it's 00:59.420 --> 01:01.728 almost a general-purpose programming language, 01:01.831 --> 01:04.116 and this reflects on Emacs itself, 01:04.217 --> 01:07.232 that it's capable of almost any task. 01:07.335 --> 01:09.679 Indeed this "almost" is something we want to fix, 01:09.781 --> 01:10.720 because we want to do everything, 01:10.820 --> 01:14.218 we want [to do] all of our computing [with Emacs]. 01:14.321 --> 01:16.523 Also, an interesting aspect for me 01:16.625 --> 01:18.903 is that it's not specified by any standard. 01:19.005 --> 01:22.507 This implies it can evolve in a more agile way 01:22.609 --> 01:25.667 without having to change the standard, etc. 01:25.770 --> 01:27.870 And, in fact, it's kind of improving, 01:27.970 --> 01:30.370 I believe relatively fast. 01:30.473 --> 01:32.791 A little bit about Lisp in general. 01:32.892 --> 01:34.952 First, it's the best programming language ever, 01:35.052 --> 01:35.712 we all know it. 01:35.813 --> 01:37.833 It has a lot of very nice properties, 01:37.934 --> 01:40.356 like it's dynamic, it's homoiconic, etc. 01:40.462 --> 01:42.394 But the interesting thing for implementors, 01:42.496 --> 01:43.476 is that, in theory, 01:43.576 --> 01:46.716 it can be implemented with very few primitives. 01:46.817 --> 01:49.485 You build very few primitives that are like magic, 01:49.590 --> 01:51.838 and on top of this, 01:51.938 --> 01:53.770 you can implement the whole language. 01:53.873 --> 01:55.759 This sounds very nice, 01:55.860 --> 01:57.280 and very appealing for implementors 01:57.381 --> 02:00.175 meaning to implement a new Lisp implementation, 02:00.279 --> 02:02.539 or improving or modifying an existing one. 02:02.641 --> 02:04.163 So the question is: 02:04.263 --> 02:07.663 how many primitives do we have to implement 02:07.764 --> 02:09.384 if we want to change 02:09.485 --> 02:13.203 the GNU Emacs Lisp implementation? 02:13.308 --> 02:20.128 Unfortunately, not really as few as we would like. 02:20.234 --> 02:25.294 In GNU Emacs, we have about 1500 primitives, 02:25.400 --> 02:27.970 and the main reason for that is performance. 02:28.071 --> 02:30.961 Actually, GNU Emacs was written 02:31.064 --> 02:34.292 when performance was a big issue, 02:34.393 --> 02:36.493 and nowadays certain parts 02:36.594 --> 02:38.486 are still performance-critical. 02:38.591 --> 02:40.295 We have a lot of C code; 02:40.395 --> 02:46.331 30% of the GNU Emacs codebase is C code, 02:46.435 --> 02:49.177 and we have to maintain this. 02:49.279 --> 02:52.859 But not only that, this is the main barrier 02:52.959 --> 02:55.579 for people that tried in the past 02:55.681 --> 02:57.663 to change the Emacs Lisp implementation. 02:57.765 --> 02:59.101 Because not only do you have to 02:59.202 --> 03:01.582 replace these primitives, 03:01.683 --> 03:03.907 all of them or part of them, 03:04.012 --> 03:06.384 but sometimes they also share (these primitives), 03:06.484 --> 03:07.952 internal data structures. 03:08.055 --> 03:09.745 For instance, it's very difficult to say: 03:09.846 --> 03:12.826 Now I want to go from C 03:12.926 --> 03:13.906 to a different programming language 03:14.006 --> 03:15.638 for implementing these primitives. 03:15.740 --> 03:17.028 So this has been, effectively, 03:17.128 --> 03:20.380 the main barrier for doing this work. 03:20.486 --> 03:22.090 Another interesting aspect 03:22.190 --> 03:23.870 about the GNU Emacs implementation 03:23.970 --> 03:26.190 is that Lisp can run interpreted 03:26.291 --> 03:28.647 or byte-compiled for performance, 03:28.752 --> 03:30.632 and the byte compiler itself 03:30.733 --> 03:32.673 is written in Emacs Lisp. 03:32.773 --> 03:35.033 This implies that GNU Emacs has to go through 03:35.134 --> 03:37.214 a bootstrap procedure in order to be built. 03:37.319 --> 03:38.715 But it's kind of interesting 03:38.815 --> 03:41.835 for something that started as a text editor, 03:41.937 --> 03:43.101 or something like it. 03:43.203 --> 03:47.877 The byte-code that is Emacs Lisp 03:47.979 --> 03:50.279 when it's been byte compiled, 03:50.379 --> 03:53.079 it's running on a stack-based virtual machine 03:53.180 --> 03:55.146 that is implemented in C. 03:55.253 --> 03:59.741 OK, so I've listed a bunch of areas 03:59.842 --> 04:02.074 where Emacs Lisp could improve: 04:02.178 --> 04:06.924 Namespace, Extensibility, Performance, 04:07.025 --> 04:11.239 and Debuggability, and Diagnostic, we could say. 04:11.346 --> 04:14.326 This activity, this project in particular, 04:14.428 --> 04:17.088 is affecting primarily Performance 04:17.189 --> 04:21.314 and the performance area the Execution Engine. 04:21.414 --> 04:22.570 That said, 04:22.671 --> 04:25.855 I think it has an impact also on Extensibility, 04:25.957 --> 04:27.811 and I hope it will have an impact also 04:27.913 --> 04:29.779 on programming diagnostics, 04:29.881 --> 04:32.501 so giving better warnings, [unknown]. 04:32.604 --> 04:36.694 So which are the benefits 04:36.795 --> 04:39.215 of increasing the Emacs Lisp performance? 04:39.316 --> 04:42.976 Indeed, we will have, if we do that, 04:43.078 --> 04:45.668 programs that run faster. 04:45.774 --> 04:48.738 But the main implication of that 04:48.840 --> 04:50.760 is that we could write less C; 04:50.860 --> 04:52.940 we could maintain and debug less C. 04:53.041 --> 04:56.437 That is kind of a time-consuming task. 04:56.542 --> 04:59.502 And we could also allow for 04:59.603 --> 05:01.723 writing performance-critical extensions 05:01.824 --> 05:03.804 directly in Lisp 05:03.909 --> 05:06.305 without having to use systems like 05:06.406 --> 05:09.686 [I think there's a bunch.] 05:09.927 --> 05:14.791 These are very consistent benefits. 05:14.899 --> 05:16.669 OK, Project Goals, 05:16.769 --> 05:18.829 but I think the title of this slide 05:18.930 --> 05:21.520 maybe should be Project Requirements. 05:21.623 --> 05:23.231 So when I started this activity, 05:23.331 --> 05:26.943 I set some requirements for the project, 05:27.049 --> 05:29.469 with the main goal in mind: to go upstream. 05:29.570 --> 05:31.494 So I wanted to create something, 05:31.594 --> 05:34.250 a modified implementation of GNU Emacs, 05:34.353 --> 05:36.675 that was compatible 05:36.776 --> 05:38.596 as close to 100% as possible 05:38.696 --> 05:40.514 to the current implementation. 05:40.620 --> 05:42.720 And when I say "current implementation," 05:42.820 --> 05:44.758 I don't refer to what 05:44.859 --> 05:46.879 the Emacs Lisp Programming Manual specifies 05:46.980 --> 05:49.804 as expected behavior of the implementation, 05:49.907 --> 05:52.207 but really the implementation itself. 05:52.309 --> 05:53.841 This is because there are a lot of corner cases 05:53.942 --> 05:56.390 that are not specified by the manual, 05:56.493 --> 05:58.093 but programs do rely on that, 05:58.193 --> 06:00.523 and given there is a ton of Emacs Lisp 06:00.625 --> 06:01.935 already around, 06:02.037 --> 06:05.437 compatibility was definitely a major requirement. 06:05.541 --> 06:07.727 I wanted to produce something that had 06:07.827 --> 06:09.587 reduced impact on the Emacs codebase, 06:09.687 --> 06:11.543 at least as much as possible. 06:11.644 --> 06:14.724 So I didn't want to rewrite all of GNU Emacs. 06:14.827 --> 06:17.609 Indeed, because it would have been 06:17.710 --> 06:21.068 too much work for one single person, 06:21.171 --> 06:25.371 but also still thinking 06:25.473 --> 06:29.153 to an upstream outcome all of this time. 06:29.258 --> 06:31.974 Another requirement was to have 06:32.076 --> 06:34.700 no, or very reduced, impact on the user, 06:34.804 --> 06:36.456 so I didn't want to change 06:36.556 --> 06:38.604 the way Emacs is used by you. 06:38.708 --> 06:40.618 And last but not least, 06:40.718 --> 06:42.848 introducing new dependencies, 06:42.951 --> 06:45.379 those dependencies had to be Free Software, 06:45.479 --> 06:47.191 possibly GPL, 06:47.295 --> 06:49.995 and also an important requirement 06:50.097 --> 06:52.941 is that these dependencies had to be 06:53.043 --> 06:55.143 some kind of trusted software that we know 06:55.243 --> 06:56.681 is going to be maintained in the future. 06:56.781 --> 06:59.261 Given Emacs has been around since forever, 06:59.363 --> 07:01.183 and will be around forever and ever, 07:01.285 --> 07:05.265 this was another very important point. 07:05.367 --> 07:08.707 A little bit of history of this project/ 07:08.808 --> 07:10.648 a quick timeline. 07:10.748 --> 07:14.388 2019, in May, I did my first commit, 07:14.490 --> 07:17.110 I think it was when I tried to write 07:17.210 --> 07:20.230 my first primitive function ever in C, 07:20.332 --> 07:21.752 in GNU Emacs. 07:21.852 --> 07:24.832 And this was an attempt to try to compile 07:24.934 --> 07:30.314 a function that, once executed, returning [?] 07:30.415 --> 07:33.755 That was it. Six months after (about), 07:33.857 --> 07:37.057 I had something that was kind of working, 07:37.157 --> 07:42.797 So I was able to start up a semi-standard Emacs 07:42.899 --> 07:44.879 and then to compile and load, 07:44.981 --> 07:47.221 and replacing most of the functions 07:47.321 --> 07:50.921 that I had defined floating in my Lisp universe. 07:51.022 --> 07:54.202 Those functions are the functions 07:54.304 --> 07:55.944 that are essentially composing Emacs, 07:56.044 --> 07:57.884 at least the Lisp side of Emacs. 07:57.984 --> 08:01.084 A lot of features were missing, 08:01.186 --> 08:03.426 like I had no Garbage Collector support, 08:03.526 --> 08:07.006 no bootstrap, I was not optimizing these functions 08:07.108 --> 08:10.068 Because optimization [would be broken]. 08:10.169 --> 08:12.749 No image dump support, etc. 08:12.850 --> 08:16.330 But I think this proved the design could work. 08:16.431 --> 08:19.171 So I sent to email to emacs-devel. I said 08:19.272 --> 08:20.852 I have this stuff I'm working on, 08:20.953 --> 08:24.613 and I wanted some feedback from the upstream 08:24.714 --> 08:27.534 to see if there was some interest. 08:27.635 --> 08:30.455 I believe the outcome of this was positive 08:30.557 --> 08:32.837 because about one month after, 08:32.937 --> 08:35.697 I pushed my branch within the Emacs git 08:35.798 --> 08:38.438 as a feature branch, and shortly after, 08:38.539 --> 08:42.679 we started to use the bug tracker to track bugs. 08:42.779 --> 08:45.479 So essentially we moved the development 08:45.580 --> 08:50.040 on the upstream infrastructure. 08:50.140 --> 08:55.620 I believe two years after the first commit, 08:55.721 --> 08:57.781 the project was merged 08:57.882 --> 09:00.922 after literally hundreds of bugs solved, 09:01.022 --> 09:03.842 and improvements, suggestions [unknown] 09:03.943 --> 09:08.623 and this was about six months ago. 09:08.723 --> 09:12.363 Before discussing how the native compiler works, 09:12.464 --> 09:14.224 I think it's worth looking at 09:14.324 --> 09:17.644 how Lisp is implemented in GNU Emacs. 09:17.745 --> 09:19.305 We have Lisp_Objects 09:19.405 --> 09:21.945 floating around our Lisp universe, 09:22.045 --> 09:23.905 and they are internally represented in this way. 09:24.006 --> 09:25.606 We have what is called a tagged pointer, 09:25.706 --> 09:27.406 that is just a regular pointer 09:27.506 --> 09:29.286 that is pointing to the area of memory 09:29.386 --> 09:31.926 where we hold the real data of the object. 09:32.027 --> 09:34.187 But within this tagged pointer, 09:34.287 --> 09:36.747 we reserve a few bits 09:36.848 --> 09:38.968 to indicate the type of object we are pointing to. 09:39.068 --> 09:40.528 This is important because 09:40.628 --> 09:42.288 each time we access an object, 09:42.388 --> 09:46.168 we have to typically check those bits 09:46.269 --> 09:49.029 to check that the object we are manipulating 09:49.129 --> 09:50.529 is of the right kind, 09:50.630 --> 09:52.530 remove those bits, and, if we are happy, 09:52.630 --> 09:55.810 access the object, otherwise [unknown]. 09:55.910 --> 09:57.690 All the objects are like this, 09:57.791 --> 09:59.731 except for typically Fixnums, 09:59.831 --> 10:01.031 that are small integers 10:01.131 --> 10:04.891 that we manage to fit directly within the pointer. 10:04.992 --> 10:07.292 Also for manipulating Fixnums, 10:07.392 --> 10:09.412 we have to check the tag bits each time. 10:09.513 --> 10:13.173 Whenever we are not sure of the type of object 10:13.273 --> 10:16.493 we are manipulating (read: almost every time), 10:16.594 --> 10:18.914 we have to check those bits and remove those bits 10:19.014 --> 10:21.754 before doing any manipulation on the Fixnum. 10:21.854 --> 10:26.014 How Emacs Lisp is byte-compiled and executed 10:26.115 --> 10:27.775 in, let's call it, "Vanilla". 10:27.875 --> 10:30.055 If we have a Lisp expression of this kind: 10:30.156 --> 10:32.776 We take the variable 'a' we do plus 2, 10:32.876 --> 10:34.876 and then we multiply the result by 3, 10:34.976 --> 10:37.376 the byte compiler will produce this LAP code. 10:37.477 --> 10:38.497 LAP code is essentially 10:38.597 --> 10:40.017 the assembly for the byte-code, 10:40.117 --> 10:43.697 so it's the "intermediate representation" 10:43.798 --> 10:48.458 that will assembled into byte-code. (.elc files) 10:48.558 --> 10:50.738 How is this program executed? 10:50.839 --> 10:53.639 As I mentioned, it's executed in a virtual machine 10:53.739 --> 10:55.699 that is stack-based, 10:55.800 --> 10:58.540 but we start with an execution stack that's empty, 10:58.640 --> 11:01.680 and a stack pointer pointing to its bottom. 11:01.780 --> 11:04.280 And we execute the first instruction, 11:04.380 --> 11:07.120 that is pushing in the stack the value of 'a', 11:07.220 --> 11:10.360 in this case, 100. Then we push the constant 2. 11:10.460 --> 11:12.400 Then we do the summation, 11:12.500 --> 11:14.040 and we have the result in the stack. 11:14.140 --> 11:16.860 Same: we push the constant 3, 11:16.960 --> 11:17.620 we do the multiplication, 11:17.720 --> 11:19.260 and we will be able to return. 11:19.360 --> 11:22.700 Now, what's good and what's bad about this? 11:22.800 --> 11:25.800 A good thing is that it's very simple 11:25.900 --> 11:27.460 to start from Lisp 11:27.560 --> 11:31.320 and compile this kind of LAP output. 11:31.420 --> 11:34.240 At least it's reasonably simple. 11:34.340 --> 11:36.240 The compiler is not that complex. 11:36.340 --> 11:39.000 The bad thing is that all this machinery 11:39.100 --> 11:40.660 -- push and pop, etc. -- 11:40.760 --> 11:44.320 it's very different from how a modern CPU works. 11:44.420 --> 11:45.720 Because modern CPUs, 11:45.820 --> 11:47.660 they are not stack-based anymore 11:47.760 --> 11:50.900 but they have instead a fixed number of registers, 11:51.000 --> 11:54.020 and they work with assignment and operation 11:54.120 --> 11:55.040 within these registers 11:55.140 --> 11:57.040 that are generally called "general-purpose." 11:57.140 --> 11:59.260 So to execute this LAP program, 11:59.360 --> 12:00.760 there is another program, 12:00.860 --> 12:02.600 that is the implementation of the VM itself 12:02.700 --> 12:06.400 that is doing conversion during runtime. 12:06.500 --> 12:08.320 So it's interpreting the LAP program 12:08.420 --> 12:11.360 and it's converting it into instructions 12:11.460 --> 12:13.180 that we can execute on the CPU. 12:13.280 --> 12:14.940 This conversion is done each time 12:15.040 --> 12:17.100 we will run some byte-code. 12:17.200 --> 12:19.760 And it's something that we want to avoid. 12:19.860 --> 12:21.560 Instead of this live conversion, 12:21.660 --> 12:26.220 we want to convert once: 12:26.320 --> 12:28.920 our Lisp program into native code, 12:29.020 --> 12:32.380 that is, a binary program that can be executed 12:32.480 --> 12:34.400 directly by our CPU. 12:34.500 --> 12:36.680 We want to save all this unnecessary conversion 12:36.780 --> 12:39.000 that we do each time we are running a program 12:39.100 --> 12:39.760 while we are running it. 12:39.860 --> 12:42.060 And we want to do this process just once, 12:42.160 --> 12:43.740 when we are compiling. 12:43.840 --> 12:46.240 That's the main goal of this activity. 12:46.340 --> 12:50.200 How is the byte compiler implemented? 12:50.300 --> 12:53.940 As any compiler it's a pipeline of transformations. 12:54.040 --> 12:58.560 We go through macro expansion, closure conversion, 12:58.660 --> 13:02.120 we have a bunch of source level optimization. 13:02.220 --> 13:03.940 Then we go into LAP, 13:04.040 --> 13:06.800 that's the transformation we are interested in, 13:06.900 --> 13:10.600 and after a few optimizations on LAP, 13:10.700 --> 13:14.140 LAP is assembled into byte-code. 13:14.240 --> 13:16.880 So if we [list it] 13:16.980 --> 13:19.320 in terms of intermediate representations, 13:19.420 --> 13:23.600 we can simplify this pipeline like this. 13:23.700 --> 13:26.100 We start with Lisp, and at a certain point 13:26.200 --> 13:29.520 we are manipulating the program in LAP form, 13:29.620 --> 13:31.980 and then at the end we produce the byte-code 13:32.080 --> 13:34.560 that is the .elc file that [you run] 13:34.660 --> 13:37.660 What I wanted to realize was something like this. 13:37.760 --> 13:41.200 I wanted to start from LAP, do something, 13:41.300 --> 13:44.600 and jump into GCC using libgccjit 13:44.700 --> 13:45.560 and in particular 13:45.660 --> 13:48.000 the libgccjit Intermediate Representation 13:48.100 --> 13:50.340 that we will discuss. 13:50.440 --> 13:53.120 Now, why I wanted to do something like this? 13:53.220 --> 13:57.300 Essentially, writing a compiler from scratch 13:57.400 --> 14:01.520 for Emacs Lisp would have been a very big task. 14:01.620 --> 14:05.120 So I wanted to rely on, as much as I could, 14:05.220 --> 14:07.180 the Emacs Lisp byte compiler, 14:07.280 --> 14:10.280 because I had to produce something 14:10.380 --> 14:12.920 that was as compatible as possible 14:13.020 --> 14:14.240 to the current implementation. 14:14.340 --> 14:18.340 So this was (I believe) a very good idea 14:18.440 --> 14:20.520 to save an enormous quantity of work 14:20.620 --> 14:22.340 and to produce something 14:22.440 --> 14:24.940 that was compatible in terms of semantics 14:25.040 --> 14:26.820 with the original implementation. 14:26.920 --> 14:30.660 Also, I didn't want to implement a code generator 14:30.760 --> 14:32.860 for each architecture we were targeting, 14:32.960 --> 14:35.900 nor wanted to implement all these optimizations 14:36.000 --> 14:37.840 that are already in GCC, 14:37.940 --> 14:40.120 so I thought it was a good idea 14:40.220 --> 14:44.720 to rely on an existing compiler like GCC [?]. 14:44.820 --> 14:47.240 Let's talk about libgccjit. 14:47.340 --> 14:50.180 It was added by David Malcolm in GCC 5 14:50.280 --> 14:55.680 It allows you to describe a C-ish semantic to GCC 14:55.780 --> 14:57.380 and have it compile. 14:57.480 --> 15:01.400 It's good for [reading] Jitters or AoT compilers. 15:01.500 --> 15:04.400 And if we talk about GCC: 15:04.500 --> 15:07.180 it's a compiler, it's a very good one, 15:07.280 --> 15:09.340 it has support for a remarkable number 15:09.440 --> 15:11.640 of target architectures. 15:11.740 --> 15:15.080 It's also very good at generating [fast code], 15:15.180 --> 15:17.980 and it's been around for a long time; 15:18.080 --> 15:21.200 I believe it's like 1 year younger than Emacs. 15:21.300 --> 15:23.220 It's still very well maintained, 15:23.320 --> 15:25.840 and we can assume it will be maintained 15:25.940 --> 15:28.920 for quite a while. And, as I mentioned, 15:29.020 --> 15:31.120 this was a very important point. 15:31.220 --> 15:33.860 Also, it's GPL; it's Free Software, 15:33.960 --> 15:36.120 and it's developed under the GNU umbrella, 15:36.220 --> 15:40.360 so I thought it was a very good option. 15:40.460 --> 15:43.300 So we can imagine a simple translation 15:43.400 --> 15:46.400 that goes from LAP to this subset of C 15:46.500 --> 15:48.880 that we can describe to libgccjit. 15:48.980 --> 15:52.880 This simple translation we can see here, 15:52.980 --> 15:55.680 it's actually pretty trivial. 15:55.780 --> 15:58.500 Instead of doing operations 15:58.600 --> 16:02.060 within the execution stack we have seen before, 16:02.160 --> 16:05.080 we have just an array that is replacing it, 16:05.180 --> 16:07.380 and it's called 'local' in this case, 16:07.480 --> 16:12.360 and we have assignments within this array, 16:12.460 --> 16:15.140 so that they are done in place of the original 16:15.240 --> 16:17.100 push and pop activity of the stack. 16:17.200 --> 16:18.280 The nice thing is that, 16:18.380 --> 16:20.260 when you have done this translation, 16:20.360 --> 16:23.160 GCC will be able to optimize this, 16:23.260 --> 16:25.840 and remove all the unnecessary operations, 16:25.940 --> 16:27.180 and generate code 16:27.280 --> 16:29.660 for the specific CPU you are targeting, 16:29.760 --> 16:32.020 [which will be running your code]. 16:32.120 --> 16:34.920 This sounds great; it sounds like 16:35.020 --> 16:37.440 a very simple and effective translation, 16:37.540 --> 16:39.920 and in fact the first iteration of my compiler 16:40.020 --> 16:41.120 was doing just this. 16:41.220 --> 16:45.320 It was essentially a big C function 16:45.420 --> 16:48.240 that was taking LAP and doing this conversion 16:48.340 --> 16:50.120 describing the output to libgccjit. 16:50.220 --> 16:53.720 Unfortunately, if you do this, 16:53.820 --> 16:55.740 you will discover that you have 16:55.840 --> 17:00.080 a performance upper bound limit of about 3x. 17:00.180 --> 17:04.360 So it was an option, 17:04.460 --> 17:06.560 but I thought it was a good occasion 17:06.660 --> 17:08.980 for trying to do something more. 17:09.080 --> 17:11.640 And doing something more means 17:11.740 --> 17:13.680 implementing a smarter compiler 17:13.780 --> 17:17.180 that is doing some advanced analysis on the code, 17:17.280 --> 17:18.640 and will be able to perform 17:18.740 --> 17:20.700 Lisp-specific optimizations 17:20.800 --> 17:22.480 -- optimizations that take advantage of 17:22.580 --> 17:24.960 the specific Lisp semantics, 17:25.060 --> 17:27.760 something that GCC is not aware of. 17:27.860 --> 17:31.240 And while I was thinking about that, 17:31.340 --> 17:34.000 I thought that having a smarter compiler 17:34.100 --> 17:38.120 had also other advantages, like a smarter compiler 17:38.220 --> 17:40.120 that understands the semantics 17:40.220 --> 17:41.820 of the programming language being compiled 17:41.920 --> 17:43.680 would be also capable of 17:43.780 --> 17:45.620 giving feedback to the programmers, 17:45.720 --> 17:47.920 like better warnings and errors. 17:48.020 --> 17:51.360 So I was really fascinated about this idea, 17:51.460 --> 17:53.240 and I wanted to change my implementation 17:53.340 --> 17:55.920 because I was not really happy about it. 17:56.020 --> 17:58.320 I had a lot of C code in terms of 17:58.420 --> 18:02.100 lines that were not doing any smart job. 18:02.200 --> 18:07.380 And I wanted to write all the interesting logic 18:07.480 --> 18:09.940 [in Lisp]. 18:10.040 --> 18:12.560 So optimizing outside GCC 18:12.660 --> 18:15.840 before jumping into GCC, 18:15.940 --> 18:20.500 as I mentioned, has two main targets: 18:20.600 --> 18:23.060 Either optimize the code before going into GCC, 18:23.160 --> 18:25.380 or present to GCC some code 18:25.480 --> 18:27.740 that we know GCC can optimize effectively. 18:27.840 --> 18:30.800 And also, this will give, as I mentioned, 18:30.900 --> 18:32.660 better options for the compiler 18:32.760 --> 18:34.420 to provide warnings, errors 18:34.520 --> 18:36.340 -- better diagnostics. 18:36.440 --> 18:38.200 So this is pretty much 18:38.300 --> 18:40.640 what the native compiler looks like nowadays, 18:40.740 --> 18:42.720 in terms of passes. 18:42.820 --> 18:44.560 We have a list of passes, 18:44.660 --> 18:46.620 each of which is taking an input 18:46.720 --> 18:48.120 and producing an output. 18:48.220 --> 18:51.040 So it's doing either analysis on the program 18:51.140 --> 18:52.640 that's being passed, 18:52.740 --> 18:54.760 or it's performing a transformation. 18:54.860 --> 18:57.900 All of these passes are implemented in Lisp, 18:58.000 --> 19:00.440 and only the last pass is implemented in C. 19:00.540 --> 19:05.060 That is the one that is talking to libgccjit. 19:05.160 --> 19:07.760 To do that, I have introduced 19:07.860 --> 19:10.640 a new intermediate representation 19:10.740 --> 19:13.960 that I call LIMPLE, as a tribute to GCC GIMPLE, 19:14.060 --> 19:17.220 that is the main internal representation of GCC, 19:17.320 --> 19:20.600 at least one of the main ones. 19:20.700 --> 19:25.080 Introducing a new intermediate representation 19:25.180 --> 19:27.720 -- a new way of representing my program -- 19:27.820 --> 19:29.540 solved a bunch of problems. 19:29.640 --> 19:33.200 First, it allowed me to implement 19:33.300 --> 19:37.280 non-trivial analysis and transformations, 19:37.380 --> 19:40.000 the ones I needed in my compiler pipeline. 19:40.100 --> 19:42.160 But also, it solved the problem of 19:42.260 --> 19:43.840 what was the boundary between 19:43.940 --> 19:46.120 what I had to implement in Lisp, 19:46.220 --> 19:48.040 and what in C. 19:48.140 --> 19:49.040 Because once I had 19:49.140 --> 19:51.860 my intermediate representation defined, 19:51.960 --> 19:53.780 essentially the boundary between Lisp and C 19:53.880 --> 19:55.640 is just a function, that is, 19:55.740 --> 19:57.780 the one that is implementing the final pass. 19:57.880 --> 19:59.220 That is taking, as an input, 19:59.320 --> 20:01.620 all of my programs in LIMPLE representation 20:01.720 --> 20:03.880 and it's doing [his bit]. 20:03.980 --> 20:07.980 So I was convinced this design at least had sense. 20:08.080 --> 20:10.120 When we go through some of these passes, 20:10.220 --> 20:12.600 just to give you an idea of what these are doing: 20:12.700 --> 20:14.560 the first pass is just responsible for 20:14.660 --> 20:18.100 spilling the LAP from the byte compiler 20:18.200 --> 20:20.160 that effectively here we are using as a front end 20:20.260 --> 20:21.780 for our compiler pipeline. 20:21.880 --> 20:24.040 The second pass, called 'limplify', 20:24.140 --> 20:27.920 will be in charge of converting LAP into LIMPLE. 20:28.020 --> 20:31.260 LIMPLE is an intermediate representation 20:31.360 --> 20:32.860 that is Control Flow Graph based, 20:32.960 --> 20:34.700 and it's capable of SSA. 20:34.800 --> 20:38.640 So we can have a look to what this means. 20:38.740 --> 20:41.300 Let's assume we have our LAP program, 20:41.400 --> 20:42.520 as any program, 20:42.620 --> 20:43.960 that's a simple list of instructions 20:44.060 --> 20:45.640 that we will execute one after the other. 20:45.740 --> 20:47.360 Some of these instructions 20:47.460 --> 20:49.200 are special instructions 20:49.300 --> 20:51.240 that we call conditional branches, 20:51.340 --> 20:52.800 where we check for a condition, 20:52.900 --> 20:54.320 and if this is verified, 20:54.420 --> 20:56.940 we jump to a different address within the program. 20:57.040 --> 20:59.580 (Addresses that here we are calling 'labels'.) 20:59.680 --> 21:03.440 So we can split our program in chunks, 21:03.540 --> 21:08.200 and those chunks we execute without interruption, 21:08.300 --> 21:10.460 so we always enter from the top of those, 21:10.560 --> 21:12.600 and we exit from the bottom. 21:12.700 --> 21:15.980 We can name those, and split them apart, 21:16.080 --> 21:18.980 and these are what we call basic blocks. 21:19.080 --> 21:22.360 And now we have a bunch of these basic blocks 21:22.460 --> 21:23.380 that are floating, 21:23.480 --> 21:25.020 and they are not any more sorted. 21:25.120 --> 21:25.920 This is what is called 21:26.020 --> 21:28.680 a Control Flow Graph based representation. 21:28.780 --> 21:31.400 Now we can get into the SSA topic. 21:31.500 --> 21:33.900 That stands for Static Single Assignment. 21:34.000 --> 21:35.860 I don't want to get into the details, 21:35.960 --> 21:36.720 but just give you a feeling. 21:36.820 --> 21:38.480 I added into our basic blocks 21:38.580 --> 21:41.400 in our Control Flow Graph a few assignments. 21:41.500 --> 21:43.840 We will transform this into SSA 21:43.940 --> 21:45.040 just for the variable 'x', 21:45.140 --> 21:47.360 just for the sake of demonstrating it. 21:47.460 --> 21:49.760 This is done through a number of phases 21:49.860 --> 21:51.760 that are essentially some analysis, 21:51.860 --> 21:52.600 mainly renaming. 21:52.700 --> 21:55.560 But the outcome, the one we see here, 21:55.660 --> 21:59.120 looks quite similar to the original one, 21:59.220 --> 22:01.120 but we can see that the variable 'x' 22:01.220 --> 22:01.960 has been renamed. 22:02.060 --> 22:03.400 And now we don't have anymore just one, 22:03.500 --> 22:06.140 but a number of these variables. 22:06.240 --> 22:08.000 The interesting property is that 22:08.100 --> 22:10.880 each of these variables is assigned just once. 22:10.980 --> 22:13.240 And this allows for the compiler 22:13.340 --> 22:16.760 to do prediction of the value of that variable, 22:16.860 --> 22:19.040 depending on the position 22:19.140 --> 22:19.840 within the Control Flow Graph. 22:19.940 --> 22:21.980 This is very important. For instance, 22:22.080 --> 22:23.440 a very simple case is 'x1' 22:23.540 --> 22:27.440 that we see is assigned once by definition, 22:27.540 --> 22:29.300 in particular here at the beginning. 22:29.400 --> 22:31.040 Here it's very simple to understand 22:31.140 --> 22:33.080 that x1 will have the value 3. 22:33.180 --> 22:35.380 While, for instance, it's more difficult to prove 22:35.480 --> 22:37.060 what is going to be the value of x5, 22:37.160 --> 22:38.620 because it's calling a function, 22:38.720 --> 22:41.940 or we don't know at the moment what x4 is. 22:42.040 --> 22:46.240 So the compiler will gain the capability 22:46.340 --> 22:48.460 to do prediction on all the variables, 22:48.560 --> 22:50.620 and the more we get information on one variable, 22:50.720 --> 22:54.980 the more we can prove about the others. 22:55.460 --> 22:57.280 Coming back to our passes, the next one 22:57.380 --> 22:59.320 is forward propagation. 22:59.420 --> 23:00.600 This pass is responsible for 23:00.700 --> 23:03.240 doing what I briefly mentioned just before: 23:03.340 --> 23:07.160 doing proof over all the different variables 23:07.260 --> 23:09.620 in different positions of the Control Flow Graph, 23:09.720 --> 23:12.700 about the values, types, or ranges. 23:12.800 --> 23:15.080 This pass is also responsible for 23:15.180 --> 23:16.920 executing functions 23:17.020 --> 23:18.640 when we know that the function has no side effect 23:18.740 --> 23:20.520 and the pass managed to 23:20.620 --> 23:22.440 prove all the values of its argument. 23:22.540 --> 23:24.800 So the function is then executed at compile time 23:24.900 --> 23:26.700 and it doesn't even exist anymore 23:26.800 --> 23:27.880 in the produced code. 23:27.980 --> 23:30.300 Then we have another pass, this is 23:30.400 --> 23:33.420 an example of a pass that is very specific: 23:33.520 --> 23:36.120 it's trying to remove the call to funcall 23:36.220 --> 23:38.040 when those are not necessary. 23:38.140 --> 23:39.760 There are a couple situations 23:39.860 --> 23:42.200 where this is very useful. 23:42.300 --> 23:45.240 And not only is this beneficial 23:45.340 --> 23:47.560 because we are generating better code, 23:47.660 --> 23:49.245 but when we manage to do that, 23:49.345 --> 23:52.000 we allow GCC better analysis over the code, 23:52.100 --> 23:54.440 because GCC knows nothing about funcall. 23:54.540 --> 23:57.400 So if we are calling, from 'foo', directly, 'bar', 23:57.500 --> 24:01.280 for GCC it's way easier to do its analysis 24:01.380 --> 24:03.360 on top of this code. 24:03.460 --> 24:06.240 Another interesting pass we can mention is 'tco'. 24:06.340 --> 24:08.800 This is performing Tail Recursion Elimination. 24:08.900 --> 24:11.880 It allows a more functional programming style, 24:11.980 --> 24:13.220 if you want. 24:13.320 --> 24:14.280 We can jump to the last pass 24:14.380 --> 24:16.200 that is called 'final', and as I mentioned, 24:16.300 --> 24:17.520 this one is responsible for 24:17.620 --> 24:19.880 taking our program in LIMPLE representation 24:19.980 --> 24:24.900 and describing it to libgccjit in the gccjit IR. 24:25.000 --> 24:27.480 That's the main task. It's also 24:27.580 --> 24:29.520 defining inline functions 24:29.620 --> 24:32.400 for accessing fundamental data types, and so on. 24:32.500 --> 24:34.460 This pass is also responsible for 24:34.560 --> 24:36.280 using some of the predictions 24:36.380 --> 24:39.560 done by previous passes to generate better code. 24:39.660 --> 24:41.320 Things we had to add 24:41.420 --> 24:43.880 to have all of this machinery work 24:43.980 --> 24:45.240 and to be controllable: 24:45.340 --> 24:47.480 The first one is an opt called 'native-comp-speed' 24:47.580 --> 24:49.920 and it's equivalent to Common Lisp's 'speed'. 24:50.020 --> 24:51.920 It represents the optimization level. 24:52.020 --> 24:53.400 The default is 2 and is 24:53.500 --> 24:55.400 the maximum optimization level 24:55.500 --> 24:58.600 that is meant to reflect 24:58.700 --> 25:00.860 all the original semantics of Emacs Lisp. 25:00.960 --> 25:02.480 So it's the one that should be used by default. 25:02.580 --> 25:04.640 The second one is 'compilation unit' 25:04.740 --> 25:05.960 and it's a kind of new object 25:06.060 --> 25:11.080 that has been added to Emacs. 25:11.180 --> 25:12.960 Let's have a look to 25:13.060 --> 25:14.360 how the Garbage Collector works in this case. 25:14.460 --> 25:15.840 The GNU Emacs Garbage Collector 25:15.940 --> 25:18.660 is a simple mark-and-sweep garbage collector. 25:18.760 --> 25:21.420 It does a tree walk through all the objects 25:21.520 --> 25:25.100 and follows references from one object to another. 25:25.200 --> 25:27.960 All the objects reachable during the mark phase 25:28.060 --> 25:31.240 will be kept in our Lisp universe. 25:31.340 --> 25:33.680 All the other ones will be freed. 25:33.780 --> 25:35.080 In this case we have a bunch of functions, 25:35.180 --> 25:38.760 'foo1', 'foo2', 'bar1', etc., that are defined. 25:38.860 --> 25:40.320 When a function is defined, 25:40.420 --> 25:42.400 it's accessible through its symbol, 25:42.500 --> 25:44.360 so we have the symbol referring to the function. 25:44.460 --> 25:47.720 The function, in this case a native-compiled one, 25:47.820 --> 25:50.040 is referring to the compilation unit. 25:50.140 --> 25:53.020 The compilation unit is essentially 25:53.120 --> 25:58.600 the ELF file that has been compiled, 25:58.700 --> 26:01.160 and contains all those functions 26:01.260 --> 26:03.240 that came from the original .el file, 26:03.340 --> 26:05.100 and that we have loaded into memory. 26:05.200 --> 26:10.000 If, for instance, 'bar1 and 'bar2 are undefined, 26:10.100 --> 26:14.200 functions [3] and 4 will be no longer reachable, 26:14.300 --> 26:16.040 and we will be able to free them 26:16.140 --> 26:18.160 and unload the compilation unit. 26:18.260 --> 26:21.200 We discussed quite a lot about Control Flow Graph, 26:21.300 --> 26:23.560 SSA, and a lot of boring stuff, 26:23.660 --> 26:25.400 and I promised you that we are doing 26:25.500 --> 26:27.320 a lot of interesting proofs over variables, 26:27.420 --> 26:30.220 So let's have some examples of them. 26:30.320 --> 26:31.840 Let's jump into a quick demo 26:31.940 --> 26:34.480 to see what all of this abstract theory 26:34.580 --> 26:37.680 and this esoteric propagation engine can do for us 26:37.780 --> 26:39.240 and how the user can interact with it. 26:39.340 --> 26:42.100 I've defined a bunch of functions, 26:42.200 --> 26:45.240 and I will native-compile and load it. 26:47.500 --> 26:48.840 Alright, Emacs Lisp native compiled and loaded. 26:48.940 --> 26:52.320 At this point, I can disassemble 'foo1' 26:52.420 --> 26:56.320 to make sure it's native code and I'm not lying. 26:56.420 --> 26:58.500 These are the instructions 26:58.600 --> 27:01.320 that will be executed directly by my CPU 27:01.420 --> 27:03.620 when I call this function. 27:03.720 --> 27:07.520 Alright, very cool. 27:07.620 --> 27:16.080 Now, [Lisp:] (symbol-function #'foo1) 27:16.180 --> 27:19.720 Interestingly, this is returning a subroutine, 27:19.820 --> 27:21.820 as it would be a primitive function. 27:21.920 --> 27:23.700 Because this is native code, 27:23.800 --> 27:24.840 even if it's written in Lisp, 27:24.940 --> 27:26.340 has been converted to native code 27:26.440 --> 27:29.200 as if it's a primitive function. 27:29.300 --> 27:31.560 But we can do also a new thing: 27:31.660 --> 27:34.440 asking for the type of the subroutine. 27:34.540 --> 27:38.360 Alright, very cool. It says this is a function, 27:38.460 --> 27:40.280 it's taking one argument of type 't' 27:40.380 --> 27:41.560 (that means anything 27:41.660 --> 27:42.960 because we don't have any information), 27:43.060 --> 27:45.200 and is returning a type 't', 27:45.300 --> 27:47.380 so also there we don't have much information. 27:47.480 --> 27:49.700 OK, very cool, but not very useful. 27:49.800 --> 27:53.680 Let's see #'foo2. #'foo2 is slightly different, 27:53.780 --> 27:55.840 it doesn't take any argument, but it's returning 27:55.940 --> 27:58.040 an integer included between 3 and 3. 27:58.140 --> 28:01.360 Wow, amazing! 28:01.460 --> 28:04.040 Let's get into something a little more complex: 28:04.140 --> 28:09.440 #'foo3 takes one argument we know nothing about, 28:09.540 --> 28:11.740 but it's returning a number. 28:11.840 --> 28:13.280 And why it's returning a number? 28:13.380 --> 28:16.320 Essentially because 1+ is returning a number, 28:16.420 --> 28:18.880 and in all the other cases, 28:18.980 --> 28:20.640 it would signal an error 28:20.740 --> 28:23.380 if it's not happy about its input argument. 28:23.480 --> 28:27.760 Let's have a look to #'foo4. 28:27.860 --> 28:32.920 #'foo4 is a little bit more complex. 28:33.020 --> 28:34.680 It will return nil 28:34.780 --> 28:37.400 if the 'when' condition is not satisfied, 28:37.500 --> 28:39.720 so it's type 'null' here. 28:39.820 --> 28:41.200 It can return a floating point; 28:41.300 --> 28:43.600 we don't do propagation of floating point so far, 28:43.700 --> 28:47.080 or it can return any integer between 4 and 9. 28:47.180 --> 28:52.840 Wow. Let's go on with #'foo5. 28:52.940 --> 28:55.760 #'foo5 is even more complex 28:55.860 --> 28:57.200 because other than 28:57.300 --> 28:59.180 having to satisfy this condition, 28:59.280 --> 29:02.280 we can see that the result of the propagation 29:02.380 --> 29:03.800 of this complex condition 29:03.900 --> 29:05.460 is propagated also across the 'plus'. 29:05.560 --> 29:08.240 So this foo5 can return nil, 29:08.340 --> 29:09.720 a floating point we know nothing about, 29:09.820 --> 29:13.560 or an integer included between 12 and 24. 29:13.660 --> 29:18.080 Let's go on with #'foo6. 29:18.180 --> 29:23.320 #'foo6 is returning anything but an integer. 29:23.420 --> 29:26.520 I think it should be pretty obvious why, 29:26.620 --> 29:28.120 because if it's not an integer we return it, 29:28.220 --> 29:30.000 otherwise we signal an error. 29:30.100 --> 29:32.880 Let's finish with #'foo7 very quickly. 29:32.980 --> 29:37.920 #'foo7 has another very complex condition, 29:38.020 --> 29:40.320 at least for me, but it's also interesting to see 29:40.420 --> 29:42.200 that we are also propagating values for symbols. 29:42.300 --> 29:45.160 So we can return the symbol 'big, 29:45.260 --> 29:46.980 the symbol 'small, 29:47.080 --> 29:51.200 or an integer included between -100 and 100. 29:51.300 --> 29:54.440 Now, the question is: why all of this is useful 29:54.540 --> 29:56.800 other than having Andrea very happy 29:56.900 --> 29:59.560 when he's playing with this all day? 29:59.660 --> 30:01.440 Well, we have to come back one second 30:01.540 --> 30:04.920 to how Lisp_Objects are represented within Emacs. 30:05.020 --> 30:09.480 Lisp_Objects are represented as machine words, 30:09.580 --> 30:12.580 where we reserve a few bits to indicate the type. 30:12.680 --> 30:15.560 And every time we access the object, 30:15.660 --> 30:17.120 when this is a Fixnum, 30:17.220 --> 30:19.920 or a regular object where this is a pointer, 30:20.020 --> 30:21.600 we always have to extract these bits, 30:21.700 --> 30:24.520 make sure that they satisfy a condition, 30:24.620 --> 30:27.120 so make sure that we are going to manipulate 30:27.220 --> 30:28.580 the object of the type we expect, 30:28.680 --> 30:31.040 and then we can extract the object 30:31.140 --> 30:32.540 and do the manipulation. 30:32.640 --> 30:34.420 If the compiler managed to prove 30:34.620 --> 30:37.400 that the contained object is of the right type, 30:37.500 --> 30:42.440 it will be able to not emit the type check, 30:42.540 --> 30:44.500 and save a lot of instructions. 30:44.600 --> 30:48.560 This is a very powerful optimization. 30:48.660 --> 30:50.760 Let's discuss some potential future development 30:50.860 --> 30:52.000 in this area. 30:52.100 --> 30:53.360 First I think it would be extremely nice 30:53.460 --> 30:54.920 to extend the propagation 30:55.020 --> 30:56.920 to types that are not built in. 30:57.020 --> 30:58.160 There are a lot of cases 30:58.260 --> 31:00.600 where we could optimize effectively. 31:00.700 --> 31:02.520 For instance when we do 31:02.620 --> 31:05.720 a lot of accesses to structures 31:05.820 --> 31:06.920 -- lots of stuff like that -- 31:07.020 --> 31:08.880 where we keep on checking and checking 31:08.980 --> 31:10.640 the same object for the type 31:10.740 --> 31:13.080 where it's obvious, where it should be trivial 31:13.180 --> 31:14.720 to prove that it's the right type 31:14.820 --> 31:16.360 after the first access or things like that. 31:16.460 --> 31:18.920 So I believe this is a low-hanging fruit 31:19.020 --> 31:21.180 in terms of performance. 31:21.280 --> 31:23.080 Also I think it would be really nice 31:23.180 --> 31:24.720 to extend the declare mechanism 31:24.820 --> 31:27.600 to allow the user to declare argument types. 31:27.700 --> 31:30.480 (Optionally. Indeed, optionally.) 31:30.580 --> 31:32.880 Doing that would give the compiler 31:32.980 --> 31:35.120 a lot of information to propagate value types 31:35.220 --> 31:36.360 within the function. 31:36.460 --> 31:38.840 But those will allow the compiler 31:38.940 --> 31:43.040 to give the user really good diagnostics 31:43.140 --> 31:45.400 during compile time. Like the compiler could say: 31:45.500 --> 31:47.000 Hey, here you are calling a function 31:47.100 --> 31:49.200 that is expecting this argument of this type, 31:49.300 --> 31:52.000 but I'm proving that you are calling it 31:52.100 --> 31:55.200 with an argument that is of THIS type, 31:55.300 --> 31:57.840 and is not a subtype of the expected one. 31:57.940 --> 32:00.240 So you are doing something not coherent. 32:00.340 --> 32:02.800 This kind of interprocedural logic. 32:02.900 --> 32:05.000 And I think the compiler should also take advantage 32:05.100 --> 32:06.640 (under certain circumstances) 32:06.740 --> 32:08.760 of this interprocedural analysis 32:08.860 --> 32:12.520 in order to optimize even more, when possible. 32:12.620 --> 32:13.720 Also I think we should 32:13.820 --> 32:15.480 work on our [?] to improve the code generation 32:15.580 --> 32:17.100 depending on the prediction 32:17.200 --> 32:18.160 that the compiler is doing. 32:18.260 --> 32:20.120 We already take advantage of those predictions, 32:20.220 --> 32:22.300 but I think we could do better. 32:22.400 --> 32:25.120 A quick look at some performance results. 32:25.220 --> 32:28.720 These are from the elisp-benchmarks package 32:28.820 --> 32:30.760 within GNU ELPA. 32:30.860 --> 32:32.520 This is the performance uplift, 32:32.620 --> 32:38.480 and we can identify about 4 "classes" of results. 32:38.580 --> 32:41.440 The first one there is no performance uplift, 32:41.540 --> 32:42.880 because there is not much we can do, 32:42.980 --> 32:44.720 and the time is probably not spent 32:44.820 --> 32:46.280 within the execution engine. 32:46.380 --> 32:49.000 And the ones around 3x are the ones 32:49.100 --> 32:50.680 Where probably we are not triggering 32:50.780 --> 32:52.640 manually specific optimizations, 32:52.740 --> 32:53.600 but just the fact 32:53.700 --> 32:57.100 that we are converting into native code 32:57.200 --> 33:00.500 is giving us this performance uplift. 33:00.600 --> 33:03.280 Then there is a bunch of other benchmarks 33:03.380 --> 33:05.680 where the Lisp optimizations are triggering, 33:05.780 --> 33:09.620 and the uplift is way bigger, 33:09.720 --> 33:11.900 and then we have 3 benchmarks that at the time 33:12.000 --> 33:13.420 are completely optimized out. 33:13.520 --> 33:15.580 That means the compiler became "so smart" 33:15.680 --> 33:18.160 that it was able to compute the result 33:18.260 --> 33:20.160 in the compile time and just put the result 33:20.260 --> 33:21.400 in the generated binary. 33:21.500 --> 33:23.880 Let's discuss a little bit the compilation model. 33:23.980 --> 33:26.080 This is an Hybrid one; 33:26.180 --> 33:29.420 it's both JIT-like and Ahead-of-Time-like. 33:29.520 --> 33:33.400 Emacs is composed of what we call an Emacs image, 33:33.500 --> 33:36.180 essentially the Emacs binary that we start. 33:36.280 --> 33:37.720 It's including all the C code, 33:37.820 --> 33:41.760 plus all the Lisp code that we preload. 33:41.860 --> 33:46.480 Then we have the rest of the Emacs Lisp codebase 33:46.580 --> 33:49.440 that can be loaded just if it's required. 33:49.540 --> 33:52.760 Same for the external packages, if we have any. 33:52.860 --> 33:55.120 If we build an Emacs Lisp 33:55.220 --> 33:57.940 with native compilation enabled, by default, 33:58.040 --> 34:01.200 only the Emacs image will be native compiled. 34:01.300 --> 34:03.920 All the other code will be compiled 34:04.020 --> 34:06.720 on the fly when it's loaded and executed 34:06.820 --> 34:08.800 the first time, if it's necessary. 34:08.900 --> 34:10.880 Same for the packages, in a transparent way 34:10.980 --> 34:12.640 and asynchronous way. 34:12.740 --> 34:13.480 Also worth noting 34:13.580 --> 34:15.400 that the result of this compilation 34:15.500 --> 34:17.240 will be stored into a cache directory 34:17.340 --> 34:19.900 within the home directory of the user. 34:20.000 --> 34:23.600 So it will be reused in the following sessions 34:23.700 --> 34:25.440 if the same file is loaded, 34:25.540 --> 34:27.520 without having to recompile multiple times 34:27.620 --> 34:29.320 the same file in different sessions. 34:29.420 --> 34:31.520 It works a little bit like this: 34:31.620 --> 34:33.800 When we load the byte-code for the first time, 34:33.900 --> 34:35.900 we spawn a native compilation process. 34:36.000 --> 34:39.320 Meanwhile we keep using the byte-code available. 34:39.420 --> 34:41.180 When the native compilation is finished, 34:41.280 --> 34:43.960 we hot-swap the definition of the functions 34:44.060 --> 34:46.040 that are contained in the file 34:46.140 --> 34:47.960 and start using the native code transparently. 34:48.060 --> 34:49.880 We do this asynchronously, 34:49.980 --> 34:53.140 and for more compilation units at the same time, 34:53.240 --> 34:56.280 so it looks a little bit like this. 34:56.380 --> 34:58.560 Let's try a quick demo of all of this machinery. 34:58.660 --> 35:00.880 I've started a fresh Emacs 35:00.980 --> 35:03.280 with native compilation support, 35:03.380 --> 35:05.060 and at this moment nothing is going on. 35:05.160 --> 35:07.440 We will test the machinery with Tetris, 35:07.540 --> 35:10.460 because I can't imagine anything better 35:10.560 --> 35:12.840 to test this. 35:12.940 --> 35:15.080 What I do expect is that when I launch Tetris 35:15.180 --> 35:17.920 it will be loaded, it will immediately 35:18.020 --> 35:20.460 start execution of the byte-compiled version, 35:20.560 --> 35:23.000 so we won't see any delay in the user experience, 35:23.100 --> 35:25.160 and in the meanwhile, a parallel process 35:25.260 --> 35:28.160 will start to native-compile Tetris itself. 35:28.260 --> 35:30.200 When the native compilation will be finished, 35:30.300 --> 35:33.240 the functions of all Tetris will be hot-swapped. 35:33.340 --> 35:36.280 So we will not see any interruption. 35:36.380 --> 35:39.680 So Tetris started, and it's running, 35:39.780 --> 35:41.600 we have seen no delay, and in the meanwhile, 35:41.700 --> 35:43.520 the native compilation probably already finished, 35:43.620 --> 35:44.960 we can have a look. 35:45.060 --> 35:47.160 In this I see the native compilation log buffer. 35:47.260 --> 35:49.720 So we see that Tetris has been native compiled, 35:49.820 --> 35:51.760 and all of its dependencies. 35:51.860 --> 35:53.840 Now Tetris is still running, 35:53.940 --> 36:00.480 but I can do "C-h f tetris" 36:00.580 --> 36:02.540 and we can see that 'tetris' 36:02.640 --> 36:04.880 is an interactive native compiled 36:04.980 --> 36:07.940 Lisp function, so it has been native-compiled. 36:08.040 --> 36:13.500 I can even disassemble if I want. 36:13.600 --> 36:14.880 OK, so very cool. 36:14.980 --> 36:18.020 I guess we can say this mechanism is working. 36:18.120 --> 36:20.660 Also worth noting that if I go back 36:20.760 --> 36:24.120 to the *Async-native-compile-log* buffer, 36:24.220 --> 36:27.960 we see we have compiled another bunch of files. 36:28.060 --> 36:31.600 I think these are because of my 'C-h f', 36:31.700 --> 36:33.800 this help function command and disassemble, 36:33.900 --> 36:34.920 and so on. 36:35.020 --> 36:37.720 The first time you run Emacs, you will have, 36:37.820 --> 36:41.280 from time to time, these processes spawned. 36:41.380 --> 36:43.100 Emacs is "compiling itself", 36:43.200 --> 36:45.520 and it's replacing the byte-code definition 36:45.620 --> 36:47.640 with the native one. But after a few sessions, 36:47.740 --> 36:49.760 you will not see this anymore, 36:49.860 --> 36:51.360 because the output of this compilation, 36:51.460 --> 36:54.900 as I mentioned, are stored in the user directory. 36:55.000 --> 36:57.560 To conclude: Emacs with native compilation support 36:57.660 --> 36:59.080 is coming up in Emacs 28, 36:59.180 --> 37:01.040 that is gonna be the next major stable release 37:01.140 --> 37:02.080 that will be released. 37:02.180 --> 37:04.840 So we ought to celebrate with a big party, 37:04.940 --> 37:07.320 I believe. But before going to the party, 37:07.420 --> 37:09.080 I'd like to list a few points 37:09.180 --> 37:11.340 that I think have been success factors 37:11.440 --> 37:13.420 in upstreaming this work. 37:13.520 --> 37:15.420 It has been extremely important 37:15.520 --> 37:18.140 to get in touch with upstream as soon as possible, 37:18.240 --> 37:20.680 as soon as I had a proof of concept. 37:20.780 --> 37:22.600 It's been extremely important 37:22.700 --> 37:24.660 to involve the community as much as possible, 37:24.760 --> 37:28.480 and this included keeping a development blog, 37:28.580 --> 37:31.600 and posts about that on emacs-devel, 37:31.700 --> 37:33.720 and also producing material, 37:33.820 --> 37:36.020 participating in conferences, 37:36.120 --> 37:38.320 and giving presentations like the one I'm doing, 37:38.420 --> 37:40.680 to explain what I was doing and how it works. 37:40.780 --> 37:43.040 It has been extremely important, also, 37:43.140 --> 37:45.860 to be able to rely on the upstream infrastructure. 37:45.960 --> 37:47.540 So, to develop the software 37:47.640 --> 37:49.080 as a feature branch in the official git, 37:49.180 --> 37:49.960 but even more, I would say, 37:50.060 --> 37:51.660 to use the official bug tracker 37:51.760 --> 37:52.880 for solving bugs of this branch. 37:52.980 --> 37:54.680 This gave the opportunity 37:54.780 --> 37:58.160 to stay really in close touch with maintainers, 37:58.260 --> 37:59.920 and senior developers of Emacs. 38:00.020 --> 38:02.980 That helped me a lot. And at the same time 38:03.080 --> 38:04.820 they were informed about what I was doing 38:04.920 --> 38:07.360 and what was the status of this feature branch. 38:07.460 --> 38:08.800 Extremely important. 38:08.900 --> 38:11.160 And also I think it played a major role 38:11.260 --> 38:14.120 to try to design this enormous patch 38:14.220 --> 38:18.120 in a way that the impact on the current codebase 38:18.220 --> 38:21.120 was minimized (at least as much as possible). 38:21.220 --> 38:23.660 And also minimizing 38:23.760 --> 38:26.240 the impact on the user operation of the software, 38:26.340 --> 38:28.040 in this case Emacs. 38:28.140 --> 38:29.640 So yes, mandatory Special Thanks: 38:29.740 --> 38:33.360 Emacs developers, and especially maintainers 38:33.460 --> 38:36.680 and senior developers like Stefan Monnier, 38:36.780 --> 38:40.440 that helped me a lot across this long journey. 38:40.540 --> 38:42.800 And, well, all the community 38:42.900 --> 38:45.160 that really got so excited about this project 38:45.260 --> 38:46.240 and gave me the energy 38:46.340 --> 38:49.120 to go through all of this time and development 38:49.220 --> 38:51.980 and bugs and solving, etc. etc. 38:52.080 --> 38:54.920 So yes, it was a really exciting time, 38:55.020 --> 38:58.000 and I think we have to look forward 38:58.100 --> 39:01.400 and start thinking about how to improve all this 39:01.500 --> 39:02.920 for the following years. 39:03.020 --> 39:04.300 And that's it. 39:04.400 --> 39:06.040 I think I should be online for questions. 39:06.140 --> 39:07.480 Thank you very much. 39:07.580 --> 39:07.680 [captions by John Cummings]