From 0dab88ba445dededd508cf1c16422cf055bd4b07 Mon Sep 17 00:00:00 2001 From: Sacha Chua Date: Fri, 3 Dec 2021 17:28:36 -0500 Subject: Add native talk --- ...d-future-developments--andrea-corallo--main.vtt | 2752 ++++++++++++++++++++ 1 file changed, 2752 insertions(+) create mode 100644 2021/captions/emacsconf-2021-native--emacs-lisp-native-compiler-current-status-and-future-developments--andrea-corallo--main.vtt diff --git a/2021/captions/emacsconf-2021-native--emacs-lisp-native-compiler-current-status-and-future-developments--andrea-corallo--main.vtt b/2021/captions/emacsconf-2021-native--emacs-lisp-native-compiler-current-status-and-future-developments--andrea-corallo--main.vtt new file mode 100644 index 00000000..0f7e6a3c --- /dev/null +++ b/2021/captions/emacsconf-2021-native--emacs-lisp-native-compiler-current-status-and-future-developments--andrea-corallo--main.vtt @@ -0,0 +1,2752 @@ +WEBVTT + +00:00.200 --> 00:02.340 +Hi everybody, my name is Andrea Corallo, + +00:02.440 --> 00:03.280 +and this presentation is about + +00:03.382 --> 00:05.802 +the Emacs Lisp Native Compiler -- + +00:05.904 --> 00:07.083 +having GNU Emacs able to + +00:07.184 --> 00:08.805 +compile and run Emacs Lisp as native code. + +00:08.907 --> 00:11.565 +This project has been my hobby project + +00:11.667 --> 00:14.009 +for the last about two years and a half + +00:14.111 --> 00:15.408 +And we will see a little bit + +00:15.509 --> 00:16.969 +where this project is coming from, + +00:17.070 --> 00:18.913 +where we are, and where we want to go. + +00:19.015 --> 00:20.571 +So essentially everything you need to know + +00:20.672 --> 00:22.132 +about the Emacs Lisp Native Compiler, + +00:22.232 --> 00:23.416 +and probably a little more. + +00:23.521 --> 00:26.535 +Just a little bit of context on Emacs Lisp. + +00:26.635 --> 00:30.175 +Well, Emacs Lisp is a programming language, + +00:30.278 --> 00:33.544 +it's indeed a Lisp, + +00:33.647 --> 00:36.619 +and one year ago I looked for some statistics, + +00:36.722 --> 00:38.742 +and I was kind of pleased to see + +00:38.842 --> 00:40.030 +-- surprised to see actually -- + +00:40.132 --> 00:40.972 +that it's still kind of popular + +00:41.073 --> 00:42.844 +as a programming language. + +00:42.944 --> 00:43.684 +It doesn't rank that bad + +00:43.785 --> 00:45.554 +against other programming languages. + +00:45.658 --> 00:48.387 +Also, the other important fact about Emacs Lisp + +00:48.487 --> 00:51.518 +is that there is a lot of Emacs Lisp out there, + +00:51.622 --> 00:55.922 +and this will have an impact [on this project.] + +00:56.025 --> 00:57.632 +It's a programming language that is capable + +00:57.734 --> 00:59.320 +of almost any task, so it's + +00:59.420 --> 01:01.728 +almost a general-purpose programming language, + +01:01.831 --> 01:04.116 +and this reflects on Emacs itself, + +01:04.217 --> 01:07.232 +that it's capable of almost any task. + +01:07.335 --> 01:09.679 +Indeed this "almost" is something we want to fix, + +01:09.781 --> 01:10.720 +because we want to do everything, + +01:10.820 --> 01:14.218 +we want [to do] all of our computing [with Emacs]. + +01:14.321 --> 01:16.523 +Also, an interesting aspect for me + +01:16.625 --> 01:18.903 +is that it's not specified by any standard. + +01:19.005 --> 01:22.507 +This implies it can evolve in a more agile way + +01:22.609 --> 01:25.667 +without having to change the standard, etc. + +01:25.770 --> 01:27.870 +And, in fact, it's kind of improving, + +01:27.970 --> 01:30.370 +I believe relatively fast. + +01:30.473 --> 01:32.791 +A little bit about Lisp in general. + +01:32.892 --> 01:34.952 +First, it's the best programming language ever, + +01:35.052 --> 01:35.712 +we all know it. + +01:35.813 --> 01:37.833 +It has a lot of very nice properties, + +01:37.934 --> 01:40.356 +like it's dynamic, it's homoiconic, etc. + +01:40.462 --> 01:42.394 +But the interesting thing for implementors, + +01:42.496 --> 01:43.476 +is that, in theory, + +01:43.576 --> 01:46.716 +it can be implemented with very few primitives. + +01:46.817 --> 01:49.485 +You build very few primitives that are like magic, + +01:49.590 --> 01:51.838 +and on top of this, + +01:51.938 --> 01:53.770 +you can implement the whole language. + +01:53.873 --> 01:55.759 +This sounds very nice, + +01:55.860 --> 01:57.280 +and very appealing for implementors + +01:57.381 --> 02:00.175 +meaning to implement a new Lisp implementation, + +02:00.279 --> 02:02.539 +or improving or modifying an existing one. + +02:02.641 --> 02:04.163 +So the question is: + +02:04.263 --> 02:07.663 +how many primitives do we have to implement + +02:07.764 --> 02:09.384 +if we want to change + +02:09.485 --> 02:13.203 +the GNU Emacs Lisp implementation? + +02:13.308 --> 02:20.128 +Unfortunately, not really as few as we would like. + +02:20.234 --> 02:25.294 +In GNU Emacs, we have about 1500 primitives, + +02:25.400 --> 02:27.970 +and the main reason for that is performance. + +02:28.071 --> 02:30.961 +Actually, GNU Emacs was written + +02:31.064 --> 02:34.292 +when performance was a big issue, + +02:34.393 --> 02:36.493 +and nowadays certain parts + +02:36.594 --> 02:38.486 +are still performance-critical. + +02:38.591 --> 02:40.295 +We have a lot of C code; + +02:40.395 --> 02:46.331 +30% of the GNU Emacs codebase is C code, + +02:46.435 --> 02:49.177 +and we have to maintain this. + +02:49.279 --> 02:52.859 +But not only that, this is the main barrier + +02:52.959 --> 02:55.579 +for people that tried in the past + +02:55.681 --> 02:57.663 +to change the Emacs Lisp implementation. + +02:57.765 --> 02:59.101 +Because not only do you have to + +02:59.202 --> 03:01.582 +replace these primitives, + +03:01.683 --> 03:03.907 +all of them or part of them, + +03:04.012 --> 03:06.384 +but sometimes they also share (these primitives), + +03:06.484 --> 03:07.952 +internal data structures. + +03:08.055 --> 03:09.745 +For instance, it's very difficult to say: + +03:09.846 --> 03:12.826 +Now I want to go from C + +03:12.926 --> 03:13.906 +to a different programming language + +03:14.006 --> 03:15.638 +for implementing these primitives. + +03:15.740 --> 03:17.028 +So this has been, effectively, + +03:17.128 --> 03:20.380 +the main barrier for doing this work. + +03:20.486 --> 03:22.090 +Another interesting aspect + +03:22.190 --> 03:23.870 +about the GNU Emacs implementation + +03:23.970 --> 03:26.190 +is that Lisp can run interpreted + +03:26.291 --> 03:28.647 +or byte-compiled for performance, + +03:28.752 --> 03:30.632 +and the byte compiler itself + +03:30.733 --> 03:32.673 +is written in Emacs Lisp. + +03:32.773 --> 03:35.033 +This implies that GNU Emacs has to go through + +03:35.134 --> 03:37.214 +a bootstrap procedure in order to be built. + +03:37.319 --> 03:38.715 +But it's kind of interesting + +03:38.815 --> 03:41.835 +for something that started as a text editor, + +03:41.937 --> 03:43.101 +or something like it. + +03:43.203 --> 03:47.877 +The byte-code that is Emacs Lisp + +03:47.979 --> 03:50.279 +when it's been byte compiled, + +03:50.379 --> 03:53.079 +it's running on a stack-based virtual machine + +03:53.180 --> 03:55.146 +that is implemented in C. + +03:55.253 --> 03:59.741 +OK, so I've listed a bunch of areas + +03:59.842 --> 04:02.074 +where Emacs Lisp could improve: + +04:02.178 --> 04:06.924 +Namespace, Extensibility, Performance, + +04:07.025 --> 04:11.239 +and Debuggability, and Diagnostic, we could say. + +04:11.346 --> 04:14.326 +This activity, this project in particular, + +04:14.428 --> 04:17.088 +is affecting primarily Performance + +04:17.189 --> 04:21.314 +and the performance area the Execution Engine. + +04:21.414 --> 04:22.570 +That said, + +04:22.671 --> 04:25.855 +I think it has an impact also on Extensibility, + +04:25.957 --> 04:27.811 +and I hope it will have an impact also + +04:27.913 --> 04:29.779 +on programming diagnostics, + +04:29.881 --> 04:32.501 +so giving better warnings, [unknown]. + +04:32.604 --> 04:36.694 +So which are the benefits + +04:36.795 --> 04:39.215 +of increasing the Emacs Lisp performance? + +04:39.316 --> 04:42.976 +Indeed, we will have, if we do that, + +04:43.078 --> 04:45.668 +programs that run faster. + +04:45.774 --> 04:48.738 +But the main implication of that + +04:48.840 --> 04:50.760 +is that we could write less C; + +04:50.860 --> 04:52.940 +we could maintain and debug less C. + +04:53.041 --> 04:56.437 +That is kind of a time-consuming task. + +04:56.542 --> 04:59.502 +And we could also allow for + +04:59.603 --> 05:01.723 +writing performance-critical extensions + +05:01.824 --> 05:03.804 +directly in Lisp + +05:03.909 --> 05:06.305 +without having to use systems like + +05:06.406 --> 05:09.686 +[I think there's a bunch.] + +05:09.927 --> 05:14.791 +These are very consistent benefits. + +05:14.899 --> 05:16.669 +OK, Project Goals, + +05:16.769 --> 05:18.829 +but I think the title of this slide + +05:18.930 --> 05:21.520 +maybe should be Project Requirements. + +05:21.623 --> 05:23.231 +So when I started this activity, + +05:23.331 --> 05:26.943 +I set some requirements for the project, + +05:27.049 --> 05:29.469 +with the main goal in mind: to go upstream. + +05:29.570 --> 05:31.494 +So I wanted to create something, + +05:31.594 --> 05:34.250 +a modified implementation of GNU Emacs, + +05:34.353 --> 05:36.675 +that was compatible + +05:36.776 --> 05:38.596 +as close to 100% as possible + +05:38.696 --> 05:40.514 +to the current implementation. + +05:40.620 --> 05:42.720 +And when I say "current implementation," + +05:42.820 --> 05:44.758 +I don't refer to what + +05:44.859 --> 05:46.879 +the Emacs Lisp Programming Manual specifies + +05:46.980 --> 05:49.804 +as expected behavior of the implementation, + +05:49.907 --> 05:52.207 +but really the implementation itself. + +05:52.309 --> 05:53.841 +This is because there are a lot of corner cases + +05:53.942 --> 05:56.390 +that are not specified by the manual, + +05:56.493 --> 05:58.093 +but programs do rely on that, + +05:58.193 --> 06:00.523 +and given there is a ton of Emacs Lisp + +06:00.625 --> 06:01.935 +already around, + +06:02.037 --> 06:05.437 +compatibility was definitely a major requirement. + +06:05.541 --> 06:07.727 +I wanted to produce something that had + +06:07.827 --> 06:09.587 +reduced impact on the Emacs codebase, + +06:09.687 --> 06:11.543 +at least as much as possible. + +06:11.644 --> 06:14.724 +So I didn't want to rewrite all of GNU Emacs. + +06:14.827 --> 06:17.609 +Indeed, because it would have been + +06:17.710 --> 06:21.068 +too much work for one single person, + +06:21.171 --> 06:25.371 +but also still thinking + +06:25.473 --> 06:29.153 +to an upstream outcome all of this time. + +06:29.258 --> 06:31.974 +Another requirement was to have + +06:32.076 --> 06:34.700 +no, or very reduced, impact on the user, + +06:34.804 --> 06:36.456 +so I didn't want to change + +06:36.556 --> 06:38.604 +the way Emacs is used by you. + +06:38.708 --> 06:40.618 +And last but not least, + +06:40.718 --> 06:42.848 +introducing new dependencies, + +06:42.951 --> 06:45.379 +those dependencies had to be Free Software, + +06:45.479 --> 06:47.191 +possibly GPL, + +06:47.295 --> 06:49.995 +and also an important requirement + +06:50.097 --> 06:52.941 +is that these dependencies had to be + +06:53.043 --> 06:55.143 +some kind of trusted software that we know + +06:55.243 --> 06:56.681 +is going to be maintained in the future. + +06:56.781 --> 06:59.261 +Given Emacs has been around since forever, + +06:59.363 --> 07:01.183 +and will be around forever and ever, + +07:01.285 --> 07:05.265 +this was another very important point. + +07:05.367 --> 07:08.707 +A little bit of history of this project/ + +07:08.808 --> 07:10.648 +a quick timeline. + +07:10.748 --> 07:14.388 +2019, in May, I did my first commit, + +07:14.490 --> 07:17.110 +I think it was when I tried to write + +07:17.210 --> 07:20.230 +my first primitive function ever in C, + +07:20.332 --> 07:21.752 +in GNU Emacs. + +07:21.852 --> 07:24.832 +And this was an attempt to try to compile + +07:24.934 --> 07:30.314 +a function that, once executed, returning [?] + +07:30.415 --> 07:33.755 +That was it. Six months after (about), + +07:33.857 --> 07:37.057 +I had something that was kind of working, + +07:37.157 --> 07:42.797 +So I was able to start up a semi-standard Emacs + +07:42.899 --> 07:44.879 +and then to compile and load, + +07:44.981 --> 07:47.221 +and replacing most of the functions + +07:47.321 --> 07:50.921 +that I had defined floating in my Lisp universe. + +07:51.022 --> 07:54.202 +Those functions are the functions + +07:54.304 --> 07:55.944 +that are essentially composing Emacs, + +07:56.044 --> 07:57.884 +at least the Lisp side of Emacs. + +07:57.984 --> 08:01.084 +A lot of features were missing, + +08:01.186 --> 08:03.426 +like I had no Garbage Collector support, + +08:03.526 --> 08:07.006 +no bootstrap, I was not optimizing these functions + +08:07.108 --> 08:10.068 +Because optimization [would be broken]. + +08:10.169 --> 08:12.749 +No image dump support, etc. + +08:12.850 --> 08:16.330 +But I think this proved the design could work. + +08:16.431 --> 08:19.171 +So I sent to email to emacs-devel. I said + +08:19.272 --> 08:20.852 +I have this stuff I'm working on, + +08:20.953 --> 08:24.613 +and I wanted some feedback from the upstream + +08:24.714 --> 08:27.534 +to see if there was some interest. + +08:27.635 --> 08:30.455 +I believe the outcome of this was positive + +08:30.557 --> 08:32.837 +because about one month after, + +08:32.937 --> 08:35.697 +I pushed my branch within the Emacs git + +08:35.798 --> 08:38.438 +as a feature branch, and shortly after, + +08:38.539 --> 08:42.679 +we started to use the bug tracker to track bugs. + +08:42.779 --> 08:45.479 +So essentially we moved the development + +08:45.580 --> 08:50.040 +on the upstream infrastructure. + +08:50.140 --> 08:55.620 +I believe two years after the first commit, + +08:55.721 --> 08:57.781 +the project was merged + +08:57.882 --> 09:00.922 +after literally hundreds of bugs solved, + +09:01.022 --> 09:03.842 +and improvements, suggestions [unknown] + +09:03.943 --> 09:08.623 +and this was about six months ago. + +09:08.723 --> 09:12.363 +Before discussing how the native compiler works, + +09:12.464 --> 09:14.224 +I think it's worth looking at + +09:14.324 --> 09:17.644 +how Lisp is implemented in GNU Emacs. + +09:17.745 --> 09:19.305 +We have Lisp_Objects + +09:19.405 --> 09:21.945 +floating around our Lisp universe, + +09:22.045 --> 09:23.905 +and they are internally represented in this way. + +09:24.006 --> 09:25.606 +We have what is called a tagged pointer, + +09:25.706 --> 09:27.406 +that is just a regular pointer + +09:27.506 --> 09:29.286 +that is pointing to the area of memory + +09:29.386 --> 09:31.926 +where we hold the real data of the object. + +09:32.027 --> 09:34.187 +But within this tagged pointer, + +09:34.287 --> 09:36.747 +we reserve a few bits + +09:36.848 --> 09:38.968 +to indicate the type of object we are pointing to. + +09:39.068 --> 09:40.528 +This is important because + +09:40.628 --> 09:42.288 +each time we access an object, + +09:42.388 --> 09:46.168 +we have to typically check those bits + +09:46.269 --> 09:49.029 +to check that the object we are manipulating + +09:49.129 --> 09:50.529 +is of the right kind, + +09:50.630 --> 09:52.530 +remove those bits, and, if we are happy, + +09:52.630 --> 09:55.810 +access the object, otherwise [unknown]. + +09:55.910 --> 09:57.690 +All the objects are like this, + +09:57.791 --> 09:59.731 +except for typically Fixnums, + +09:59.831 --> 10:01.031 +that are small integers + +10:01.131 --> 10:04.891 +that we manage to fit directly within the pointer. + +10:04.992 --> 10:07.292 +Also for manipulating Fixnums, + +10:07.392 --> 10:09.412 +we have to check the tag bits each time. + +10:09.513 --> 10:13.173 +Whenever we are not sure of the type of object + +10:13.273 --> 10:16.493 +we are manipulating (read: almost every time), + +10:16.594 --> 10:18.914 +we have to check those bits and remove those bits + +10:19.014 --> 10:21.754 +before doing any manipulation on the Fixnum. + +10:21.854 --> 10:26.014 +How Emacs Lisp is byte-compiled and executed + +10:26.115 --> 10:27.775 +in, let's call it, "Vanilla". + +10:27.875 --> 10:30.055 +If we have a Lisp expression of this kind: + +10:30.156 --> 10:32.776 +We take the variable 'a' we do plus 2, + +10:32.876 --> 10:34.876 +and then we multiply the result by 3, + +10:34.976 --> 10:37.376 +the byte compiler will produce this LAP code. + +10:37.477 --> 10:38.497 +LAP code is essentially + +10:38.597 --> 10:40.017 +the assembly for the byte-code, + +10:40.117 --> 10:43.697 +so it's the "intermediate representation" + +10:43.798 --> 10:48.458 +that will assembled into byte-code. (.elc files) + +10:48.558 --> 10:50.738 +How is this program executed? + +10:50.839 --> 10:53.639 +As I mentioned, it's executed in a virtual machine + +10:53.739 --> 10:55.699 +that is stack-based, + +10:55.800 --> 10:58.540 +but we start with an execution stack that's empty, + +10:58.640 --> 11:01.680 +and a stack pointer pointing to its bottom. + +11:01.780 --> 11:04.280 +And we execute the first instruction, + +11:04.380 --> 11:07.120 +that is pushing in the stack the value of 'a', + +11:07.220 --> 11:10.360 +in this case, 100. Then we push the constant 2. + +11:10.460 --> 11:12.400 +Then we do the summation, + +11:12.500 --> 11:14.040 +and we have the result in the stack. + +11:14.140 --> 11:16.860 +Same: we push the constant 3, + +11:16.960 --> 11:17.620 +we do the multiplication, + +11:17.720 --> 11:19.260 +and we will be able to return. + +11:19.360 --> 11:22.700 +Now, what's good and what's bad about this? + +11:22.800 --> 11:25.800 +A good thing is that it's very simple + +11:25.900 --> 11:27.460 +to start from Lisp + +11:27.560 --> 11:31.320 +and compile this kind of LAP output. + +11:31.420 --> 11:34.240 +At least it's reasonably simple. + +11:34.340 --> 11:36.240 +The compiler is not that complex. + +11:36.340 --> 11:39.000 +The bad thing is that all this machinery + +11:39.100 --> 11:40.660 +-- push and pop, etc. -- + +11:40.760 --> 11:44.320 +it's very different from how a modern CPU works. + +11:44.420 --> 11:45.720 +Because modern CPUs, + +11:45.820 --> 11:47.660 +they are not stack-based anymore + +11:47.760 --> 11:50.900 +but they have instead a fixed number of registers, + +11:51.000 --> 11:54.020 +and they work with assignment and operation + +11:54.120 --> 11:55.040 +within these registers + +11:55.140 --> 11:57.040 +that are generally called "general-purpose." + +11:57.140 --> 11:59.260 +So to execute this LAP program, + +11:59.360 --> 12:00.760 +there is another program, + +12:00.860 --> 12:02.600 +that is the implementation of the VM itself + +12:02.700 --> 12:06.400 +that is doing conversion during runtime. + +12:06.500 --> 12:08.320 +So it's interpreting the LAP program + +12:08.420 --> 12:11.360 +and it's converting it into instructions + +12:11.460 --> 12:13.180 +that we can execute on the CPU. + +12:13.280 --> 12:14.940 +This conversion is done each time + +12:15.040 --> 12:17.100 +we will run some byte-code. + +12:17.200 --> 12:19.760 +And it's something that we want to avoid. + +12:19.860 --> 12:21.560 +Instead of this live conversion, + +12:21.660 --> 12:26.220 +we want to convert once: + +12:26.320 --> 12:28.920 +our Lisp program into native code, + +12:29.020 --> 12:32.380 +that is, a binary program that can be executed + +12:32.480 --> 12:34.400 +directly by our CPU. + +12:34.500 --> 12:36.680 +We want to save all this unnecessary conversion + +12:36.780 --> 12:39.000 +that we do each time we are running a program + +12:39.100 --> 12:39.760 +while we are running it. + +12:39.860 --> 12:42.060 +And we want to do this process just once, + +12:42.160 --> 12:43.740 +when we are compiling. + +12:43.840 --> 12:46.240 +That's the main goal of this activity. + +12:46.340 --> 12:50.200 +How is the byte compiler implemented? + +12:50.300 --> 12:53.940 +As any compiler it's a pipeline of transformations. + +12:54.040 --> 12:58.560 +We go through macro expansion, closure conversion, + +12:58.660 --> 13:02.120 +we have a bunch of source level optimization. + +13:02.220 --> 13:03.940 +Then we go into LAP, + +13:04.040 --> 13:06.800 +that's the transformation we are interested in, + +13:06.900 --> 13:10.600 +and after a few optimizations on LAP, + +13:10.700 --> 13:14.140 +LAP is assembled into byte-code. + +13:14.240 --> 13:16.880 +So if we [list it] + +13:16.980 --> 13:19.320 +in terms of intermediate representations, + +13:19.420 --> 13:23.600 +we can simplify this pipeline like this. + +13:23.700 --> 13:26.100 +We start with Lisp, and at a certain point + +13:26.200 --> 13:29.520 +we are manipulating the program in LAP form, + +13:29.620 --> 13:31.980 +and then at the end we produce the byte-code + +13:32.080 --> 13:34.560 +that is the .elc file that [you run] + +13:34.660 --> 13:37.660 +What I wanted to realize was something like this. + +13:37.760 --> 13:41.200 +I wanted to start from LAP, do something, + +13:41.300 --> 13:44.600 +and jump into GCC using libgccjit + +13:44.700 --> 13:45.560 +and in particular + +13:45.660 --> 13:48.000 +the libgccjit Intermediate Representation + +13:48.100 --> 13:50.340 +that we will discuss. + +13:50.440 --> 13:53.120 +Now, why I wanted to do something like this? + +13:53.220 --> 13:57.300 +Essentially, writing a compiler from scratch + +13:57.400 --> 14:01.520 +for Emacs Lisp would have been a very big task. + +14:01.620 --> 14:05.120 +So I wanted to rely on, as much as I could, + +14:05.220 --> 14:07.180 +the Emacs Lisp byte compiler, + +14:07.280 --> 14:10.280 +because I had to produce something + +14:10.380 --> 14:12.920 +that was as compatible as possible + +14:13.020 --> 14:14.240 +to the current implementation. + +14:14.340 --> 14:18.340 +So this was (I believe) a very good idea + +14:18.440 --> 14:20.520 +to save an enormous quantity of work + +14:20.620 --> 14:22.340 +and to produce something + +14:22.440 --> 14:24.940 +that was compatible in terms of semantics + +14:25.040 --> 14:26.820 +with the original implementation. + +14:26.920 --> 14:30.660 +Also, I didn't want to implement a code generator + +14:30.760 --> 14:32.860 +for each architecture we were targeting, + +14:32.960 --> 14:35.900 +nor wanted to implement all these optimizations + +14:36.000 --> 14:37.840 +that are already in GCC, + +14:37.940 --> 14:40.120 +so I thought it was a good idea + +14:40.220 --> 14:44.720 +to rely on an existing compiler like GCC [?]. + +14:44.820 --> 14:47.240 +Let's talk about libgccjit. + +14:47.340 --> 14:50.180 +It was added by David Malcolm in GCC 5 + +14:50.280 --> 14:55.680 +It allows you to describe a C-ish semantic to GCC + +14:55.780 --> 14:57.380 +and have it compile. + +14:57.480 --> 15:01.400 +It's good for [reading] Jitters or AoT compilers. + +15:01.500 --> 15:04.400 +And if we talk about GCC: + +15:04.500 --> 15:07.180 +it's a compiler, it's a very good one, + +15:07.280 --> 15:09.340 +it has support for a remarkable number + +15:09.440 --> 15:11.640 +of target architectures. + +15:11.740 --> 15:15.080 +It's also very good at generating [fast code], + +15:15.180 --> 15:17.980 +and it's been around for a long time; + +15:18.080 --> 15:21.200 +I believe it's like 1 year younger than Emacs. + +15:21.300 --> 15:23.220 +It's still very well maintained, + +15:23.320 --> 15:25.840 +and we can assume it will be maintained + +15:25.940 --> 15:28.920 +for quite a while. And, as I mentioned, + +15:29.020 --> 15:31.120 +this was a very important point. + +15:31.220 --> 15:33.860 +Also, it's GPL; it's Free Software, + +15:33.960 --> 15:36.120 +and it's developed under the GNU umbrella, + +15:36.220 --> 15:40.360 +so I thought it was a very good option. + +15:40.460 --> 15:43.300 +So we can imagine a simple translation + +15:43.400 --> 15:46.400 +that goes from LAP to this subset of C + +15:46.500 --> 15:48.880 +that we can describe to libgccjit. + +15:48.980 --> 15:52.880 +This simple translation we can see here, + +15:52.980 --> 15:55.680 +it's actually pretty trivial. + +15:55.780 --> 15:58.500 +Instead of doing operations + +15:58.600 --> 16:02.060 +within the execution stack we have seen before, + +16:02.160 --> 16:05.080 +we have just an array that is replacing it, + +16:05.180 --> 16:07.380 +and it's called 'local' in this case, + +16:07.480 --> 16:12.360 +and we have assignments within this array, + +16:12.460 --> 16:15.140 +so that they are done in place of the original + +16:15.240 --> 16:17.100 +push and pop activity of the stack. + +16:17.200 --> 16:18.280 +The nice thing is that, + +16:18.380 --> 16:20.260 +when you have done this translation, + +16:20.360 --> 16:23.160 +GCC will be able to optimize this, + +16:23.260 --> 16:25.840 +and remove all the unnecessary operations, + +16:25.940 --> 16:27.180 +and generate code + +16:27.280 --> 16:29.660 +for the specific CPU you are targeting, + +16:29.760 --> 16:32.020 +[which will be running your code]. + +16:32.120 --> 16:34.920 +This sounds great; it sounds like + +16:35.020 --> 16:37.440 +a very simple and effective translation, + +16:37.540 --> 16:39.920 +and in fact the first iteration of my compiler + +16:40.020 --> 16:41.120 +was doing just this. + +16:41.220 --> 16:45.320 +It was essentially a big C function + +16:45.420 --> 16:48.240 +that was taking LAP and doing this conversion + +16:48.340 --> 16:50.120 +describing the output to libgccjit. + +16:50.220 --> 16:53.720 +Unfortunately, if you do this, + +16:53.820 --> 16:55.740 +you will discover that you have + +16:55.840 --> 17:00.080 +a performance upper bound limit of about 3x. + +17:00.180 --> 17:04.360 +So it was an option, + +17:04.460 --> 17:06.560 +but I thought it was a good occasion + +17:06.660 --> 17:08.980 +for trying to do something more. + +17:09.080 --> 17:11.640 +And doing something more means + +17:11.740 --> 17:13.680 +implementing a smarter compiler + +17:13.780 --> 17:17.180 +that is doing some advanced analysis on the code, + +17:17.280 --> 17:18.640 +and will be able to perform + +17:18.740 --> 17:20.700 +Lisp-specific optimizations + +17:20.800 --> 17:22.480 +-- optimizations that take advantage of + +17:22.580 --> 17:24.960 +the specific Lisp semantics, + +17:25.060 --> 17:27.760 +something that GCC is not aware of. + +17:27.860 --> 17:31.240 +And while I was thinking about that, + +17:31.340 --> 17:34.000 +I thought that having a smarter compiler + +17:34.100 --> 17:38.120 +had also other advantages, like a smarter compiler + +17:38.220 --> 17:40.120 +that understands the semantics + +17:40.220 --> 17:41.820 +of the programming language being compiled + +17:41.920 --> 17:43.680 +would be also capable of + +17:43.780 --> 17:45.620 +giving feedback to the programmers, + +17:45.720 --> 17:47.920 +like better warnings and errors. + +17:48.020 --> 17:51.360 +So I was really fascinated about this idea, + +17:51.460 --> 17:53.240 +and I wanted to change my implementation + +17:53.340 --> 17:55.920 +because I was not really happy about it. + +17:56.020 --> 17:58.320 +I had a lot of C code in terms of + +17:58.420 --> 18:02.100 +lines that were not doing any smart job. + +18:02.200 --> 18:07.380 +And I wanted to write all the interesting logic + +18:07.480 --> 18:09.940 +[in Lisp]. + +18:10.040 --> 18:12.560 +So optimizing outside GCC + +18:12.660 --> 18:15.840 +before jumping into GCC, + +18:15.940 --> 18:20.500 +as I mentioned, has two main targets: + +18:20.600 --> 18:23.060 +Either optimize the code before going into GCC, + +18:23.160 --> 18:25.380 +or present to GCC some code + +18:25.480 --> 18:27.740 +that we know GCC can optimize effectively. + +18:27.840 --> 18:30.800 +And also, this will give, as I mentioned, + +18:30.900 --> 18:32.660 +better options for the compiler + +18:32.760 --> 18:34.420 +to provide warnings, errors + +18:34.520 --> 18:36.340 +-- better diagnostics. + +18:36.440 --> 18:38.200 +So this is pretty much + +18:38.300 --> 18:40.640 +what the native compiler looks like nowadays, + +18:40.740 --> 18:42.720 +in terms of passes. + +18:42.820 --> 18:44.560 +We have a list of passes, + +18:44.660 --> 18:46.620 +each of which is taking an input + +18:46.720 --> 18:48.120 +and producing an output. + +18:48.220 --> 18:51.040 +So it's doing either analysis on the program + +18:51.140 --> 18:52.640 +that's being passed, + +18:52.740 --> 18:54.760 +or it's performing a transformation. + +18:54.860 --> 18:57.900 +All of these passes are implemented in Lisp, + +18:58.000 --> 19:00.440 +and only the last pass is implemented in C. + +19:00.540 --> 19:05.060 +That is the one that is talking to libgccjit. + +19:05.160 --> 19:07.760 +To do that, I have introduced + +19:07.860 --> 19:10.640 +a new intermediate representation + +19:10.740 --> 19:13.960 +that I call LIMPLE, as a tribute to GCC GIMPLE, + +19:14.060 --> 19:17.220 +that is the main internal representation of GCC, + +19:17.320 --> 19:20.600 +at least one of the main ones. + +19:20.700 --> 19:25.080 +Introducing a new intermediate representation + +19:25.180 --> 19:27.720 +-- a new way of representing my program -- + +19:27.820 --> 19:29.540 +solved a bunch of problems. + +19:29.640 --> 19:33.200 +First, it allowed me to implement + +19:33.300 --> 19:37.280 +non-trivial analysis and transformations, + +19:37.380 --> 19:40.000 +the ones I needed in my compiler pipeline. + +19:40.100 --> 19:42.160 +But also, it solved the problem of + +19:42.260 --> 19:43.840 +what was the boundary between + +19:43.940 --> 19:46.120 +what I had to implement in Lisp, + +19:46.220 --> 19:48.040 +and what in C. + +19:48.140 --> 19:49.040 +Because once I had + +19:49.140 --> 19:51.860 +my intermediate representation defined, + +19:51.960 --> 19:53.780 +essentially the boundary between Lisp and C + +19:53.880 --> 19:55.640 +is just a function, that is, + +19:55.740 --> 19:57.780 +the one that is implementing the final pass. + +19:57.880 --> 19:59.220 +That is taking, as an input, + +19:59.320 --> 20:01.620 +all of my programs in LIMPLE representation + +20:01.720 --> 20:03.880 +and it's doing [his bit]. + +20:03.980 --> 20:07.980 +So I was convinced this design at least had sense. + +20:08.080 --> 20:10.120 +When we go through some of these passes, + +20:10.220 --> 20:12.600 +just to give you an idea of what these are doing: + +20:12.700 --> 20:14.560 +the first pass is just responsible for + +20:14.660 --> 20:18.100 +spilling the LAP from the byte compiler + +20:18.200 --> 20:20.160 +that effectively here we are using as a front end + +20:20.260 --> 20:21.780 +for our compiler pipeline. + +20:21.880 --> 20:24.040 +The second pass, called 'limplify', + +20:24.140 --> 20:27.920 +will be in charge of converting LAP into LIMPLE. + +20:28.020 --> 20:31.260 +LIMPLE is an intermediate representation + +20:31.360 --> 20:32.860 +that is Control Flow Graph based, + +20:32.960 --> 20:34.700 +and it's capable of SSA. + +20:34.800 --> 20:38.640 +So we can have a look to what this means. + +20:38.740 --> 20:41.300 +Let's assume we have our LAP program, + +20:41.400 --> 20:42.520 +as any program, + +20:42.620 --> 20:43.960 +that's a simple list of instructions + +20:44.060 --> 20:45.640 +that we will execute one after the other. + +20:45.740 --> 20:47.360 +Some of these instructions + +20:47.460 --> 20:49.200 +are special instructions + +20:49.300 --> 20:51.240 +that we call conditional branches, + +20:51.340 --> 20:52.800 +where we check for a condition, + +20:52.900 --> 20:54.320 +and if this is verified, + +20:54.420 --> 20:56.940 +we jump to a different address within the program. + +20:57.040 --> 20:59.580 +(Addresses that here we are calling 'labels'.) + +20:59.680 --> 21:03.440 +So we can split our program in chunks, + +21:03.540 --> 21:08.200 +and those chunks we execute without interruption, + +21:08.300 --> 21:10.460 +so we always enter from the top of those, + +21:10.560 --> 21:12.600 +and we exit from the bottom. + +21:12.700 --> 21:15.980 +We can name those, and split them apart, + +21:16.080 --> 21:18.980 +and these are what we call basic blocks. + +21:19.080 --> 21:22.360 +And now we have a bunch of these basic blocks + +21:22.460 --> 21:23.380 +that are floating, + +21:23.480 --> 21:25.020 +and they are not any more sorted. + +21:25.120 --> 21:25.920 +This is what is called + +21:26.020 --> 21:28.680 +a Control Flow Graph based representation. + +21:28.780 --> 21:31.400 +Now we can get into the SSA topic. + +21:31.500 --> 21:33.900 +That stands for Static Single Assignment. + +21:34.000 --> 21:35.860 +I don't want to get into the details, + +21:35.960 --> 21:36.720 +but just give you a feeling. + +21:36.820 --> 21:38.480 +I added into our basic blocks + +21:38.580 --> 21:41.400 +in our Control Flow Graph a few assignments. + +21:41.500 --> 21:43.840 +We will transform this into SSA + +21:43.940 --> 21:45.040 +just for the variable 'x', + +21:45.140 --> 21:47.360 +just for the sake of demonstrating it. + +21:47.460 --> 21:49.760 +This is done through a number of phases + +21:49.860 --> 21:51.760 +that are essentially some analysis, + +21:51.860 --> 21:52.600 +mainly renaming. + +21:52.700 --> 21:55.560 +But the outcome, the one we see here, + +21:55.660 --> 21:59.120 +looks quite similar to the original one, + +21:59.220 --> 22:01.120 +but we can see that the variable 'x' + +22:01.220 --> 22:01.960 +has been renamed. + +22:02.060 --> 22:03.400 +And now we don't have anymore just one, + +22:03.500 --> 22:06.140 +but a number of these variables. + +22:06.240 --> 22:08.000 +The interesting property is that + +22:08.100 --> 22:10.880 +each of these variables is assigned just once. + +22:10.980 --> 22:13.240 +And this allows for the compiler + +22:13.340 --> 22:16.760 +to do prediction of the value of that variable, + +22:16.860 --> 22:19.040 +depending on the position + +22:19.140 --> 22:19.840 +within the Control Flow Graph. + +22:19.940 --> 22:21.980 +This is very important. For instance, + +22:22.080 --> 22:23.440 +a very simple case is 'x1' + +22:23.540 --> 22:27.440 +that we see is assigned once by definition, + +22:27.540 --> 22:29.300 +in particular here at the beginning. + +22:29.400 --> 22:31.040 +Here it's very simple to understand + +22:31.140 --> 22:33.080 +that x1 will have the value 3. + +22:33.180 --> 22:35.380 +While, for instance, it's more difficult to prove + +22:35.480 --> 22:37.060 +what is going to be the value of x5, + +22:37.160 --> 22:38.620 +because it's calling a function, + +22:38.720 --> 22:41.940 +or we don't know at the moment what x4 is. + +22:42.040 --> 22:46.240 +So the compiler will gain the capability + +22:46.340 --> 22:48.460 +to do prediction on all the variables, + +22:48.560 --> 22:50.620 +and the more we get information on one variable, + +22:50.720 --> 22:54.980 +the more we can prove about the others. + +22:55.460 --> 22:57.280 +Coming back to our passes, the next one + +22:57.380 --> 22:59.320 +is forward propagation. + +22:59.420 --> 23:00.600 +This pass is responsible for + +23:00.700 --> 23:03.240 +doing what I briefly mentioned just before: + +23:03.340 --> 23:07.160 +doing proof over all the different variables + +23:07.260 --> 23:09.620 +in different positions of the Control Flow Graph, + +23:09.720 --> 23:12.700 +about the values, types, or ranges. + +23:12.800 --> 23:15.080 +This pass is also responsible for + +23:15.180 --> 23:16.920 +executing functions + +23:17.020 --> 23:18.640 +when we know that the function has no side effect + +23:18.740 --> 23:20.520 +and the pass managed to + +23:20.620 --> 23:22.440 +prove all the values of its argument. + +23:22.540 --> 23:24.800 +So the function is then executed at compile time + +23:24.900 --> 23:26.700 +and it doesn't even exist anymore + +23:26.800 --> 23:27.880 +in the produced code. + +23:27.980 --> 23:30.300 +Then we have another pass, this is + +23:30.400 --> 23:33.420 +an example of a pass that is very specific: + +23:33.520 --> 23:36.120 +it's trying to remove the call to funcall + +23:36.220 --> 23:38.040 +when those are not necessary. + +23:38.140 --> 23:39.760 +There are a couple situations + +23:39.860 --> 23:42.200 +where this is very useful. + +23:42.300 --> 23:45.240 +And not only is this beneficial + +23:45.340 --> 23:47.560 +because we are generating better code, + +23:47.660 --> 23:49.245 +but when we manage to do that, + +23:49.345 --> 23:52.000 +we allow GCC better analysis over the code, + +23:52.100 --> 23:54.440 +because GCC knows nothing about funcall. + +23:54.540 --> 23:57.400 +So if we are calling, from 'foo', directly, 'bar', + +23:57.500 --> 24:01.280 +for GCC it's way easier to do its analysis + +24:01.380 --> 24:03.360 +on top of this code. + +24:03.460 --> 24:06.240 +Another interesting pass we can mention is 'tco'. + +24:06.340 --> 24:08.800 +This is performing Tail Recursion Elimination. + +24:08.900 --> 24:11.880 +It allows a more functional programming style, + +24:11.980 --> 24:13.220 +if you want. + +24:13.320 --> 24:14.280 +We can jump to the last pass + +24:14.380 --> 24:16.200 +that is called 'final', and as I mentioned, + +24:16.300 --> 24:17.520 +this one is responsible for + +24:17.620 --> 24:19.880 +taking our program in LIMPLE representation + +24:19.980 --> 24:24.900 +and describing it to libgccjit in the gccjit IR. + +24:25.000 --> 24:27.480 +That's the main task. It's also + +24:27.580 --> 24:29.520 +defining inline functions + +24:29.620 --> 24:32.400 +for accessing fundamental data types, and so on. + +24:32.500 --> 24:34.460 +This pass is also responsible for + +24:34.560 --> 24:36.280 +using some of the predictions + +24:36.380 --> 24:39.560 +done by previous passes to generate better code. + +24:39.660 --> 24:41.320 +Things we had to add + +24:41.420 --> 24:43.880 +to have all of this machinery work + +24:43.980 --> 24:45.240 +and to be controllable: + +24:45.340 --> 24:47.480 +The first one is an opt called 'native-comp-speed' + +24:47.580 --> 24:49.920 +and it's equivalent to Common Lisp's 'speed'. + +24:50.020 --> 24:51.920 +It represents the optimization level. + +24:52.020 --> 24:53.400 +The default is 2 and is + +24:53.500 --> 24:55.400 +the maximum optimization level + +24:55.500 --> 24:58.600 +that is meant to reflect + +24:58.700 --> 25:00.860 +all the original semantics of Emacs Lisp. + +25:00.960 --> 25:02.480 +So it's the one that should be used by default. + +25:02.580 --> 25:04.640 +The second one is 'compilation unit' + +25:04.740 --> 25:05.960 +and it's a kind of new object + +25:06.060 --> 25:11.080 +that has been added to Emacs. + +25:11.180 --> 25:12.960 +Let's have a look to + +25:13.060 --> 25:14.360 +how the Garbage Collector works in this case. + +25:14.460 --> 25:15.840 +The GNU Emacs Garbage Collector + +25:15.940 --> 25:18.660 +is a simple mark-and-sweep garbage collector. + +25:18.760 --> 25:21.420 +It does a tree walk through all the objects + +25:21.520 --> 25:25.100 +and follows references from one object to another. + +25:25.200 --> 25:27.960 +All the objects reachable during the mark phase + +25:28.060 --> 25:31.240 +will be kept in our Lisp universe. + +25:31.340 --> 25:33.680 +All the other ones will be freed. + +25:33.780 --> 25:35.080 +In this case we have a bunch of functions, + +25:35.180 --> 25:38.760 +'foo1', 'foo2', 'bar1', etc., that are defined. + +25:38.860 --> 25:40.320 +When a function is defined, + +25:40.420 --> 25:42.400 +it's accessible through its symbol, + +25:42.500 --> 25:44.360 +so we have the symbol referring to the function. + +25:44.460 --> 25:47.720 +The function, in this case a native-compiled one, + +25:47.820 --> 25:50.040 +is referring to the compilation unit. + +25:50.140 --> 25:53.020 +The compilation unit is essentially + +25:53.120 --> 25:58.600 +the ELF file that has been compiled, + +25:58.700 --> 26:01.160 +and contains all those functions + +26:01.260 --> 26:03.240 +that came from the original .el file, + +26:03.340 --> 26:05.100 +and that we have loaded into memory. + +26:05.200 --> 26:10.000 +If, for instance, 'bar1 and 'bar2 are undefined, + +26:10.100 --> 26:14.200 +functions [3] and 4 will be no longer reachable, + +26:14.300 --> 26:16.040 +and we will be able to free them + +26:16.140 --> 26:18.160 +and unload the compilation unit. + +26:18.260 --> 26:21.200 +We discussed quite a lot about Control Flow Graph, + +26:21.300 --> 26:23.560 +SSA, and a lot of boring stuff, + +26:23.660 --> 26:25.400 +and I promised you that we are doing + +26:25.500 --> 26:27.320 +a lot of interesting proofs over variables, + +26:27.420 --> 26:30.220 +So let's have some examples of them. + +26:30.320 --> 26:31.840 +Let's jump into a quick demo + +26:31.940 --> 26:34.480 +to see what all of this abstract theory + +26:34.580 --> 26:37.680 +and this esoteric propagation engine can do for us + +26:37.780 --> 26:39.240 +and how the user can interact with it. + +26:39.340 --> 26:42.100 +I've defined a bunch of functions, + +26:42.200 --> 26:45.240 +and I will native-compile and load it. + +26:47.500 --> 26:48.840 +Alright, Emacs Lisp native compiled and loaded. + +26:48.940 --> 26:52.320 +At this point, I can disassemble 'foo1' + +26:52.420 --> 26:56.320 +to make sure it's native code and I'm not lying. + +26:56.420 --> 26:58.500 +These are the instructions + +26:58.600 --> 27:01.320 +that will be executed directly by my CPU + +27:01.420 --> 27:03.620 +when I call this function. + +27:03.720 --> 27:07.520 +Alright, very cool. + +27:07.620 --> 27:16.080 +Now, [Lisp:] (symbol-function #'foo1) + +27:16.180 --> 27:19.720 +Interestingly, this is returning a subroutine, + +27:19.820 --> 27:21.820 +as it would be a primitive function. + +27:21.920 --> 27:23.700 +Because this is native code, + +27:23.800 --> 27:24.840 +even if it's written in Lisp, + +27:24.940 --> 27:26.340 +has been converted to native code + +27:26.440 --> 27:29.200 +as if it's a primitive function. + +27:29.300 --> 27:31.560 +But we can do also a new thing: + +27:31.660 --> 27:34.440 +asking for the type of the subroutine. + +27:34.540 --> 27:38.360 +Alright, very cool. It says this is a function, + +27:38.460 --> 27:40.280 +it's taking one argument of type 't' + +27:40.380 --> 27:41.560 +(that means anything + +27:41.660 --> 27:42.960 +because we don't have any information), + +27:43.060 --> 27:45.200 +and is returning a type 't', + +27:45.300 --> 27:47.380 +so also there we don't have much information. + +27:47.480 --> 27:49.700 +OK, very cool, but not very useful. + +27:49.800 --> 27:53.680 +Let's see #'foo2. #'foo2 is slightly different, + +27:53.780 --> 27:55.840 +it doesn't take any argument, but it's returning + +27:55.940 --> 27:58.040 +an integer included between 3 and 3. + +27:58.140 --> 28:01.360 +Wow, amazing! + +28:01.460 --> 28:04.040 +Let's get into something a little more complex: + +28:04.140 --> 28:09.440 +#'foo3 takes one argument we know nothing about, + +28:09.540 --> 28:11.740 +but it's returning a number. + +28:11.840 --> 28:13.280 +And why it's returning a number? + +28:13.380 --> 28:16.320 +Essentially because 1+ is returning a number, + +28:16.420 --> 28:18.880 +and in all the other cases, + +28:18.980 --> 28:20.640 +it would signal an error + +28:20.740 --> 28:23.380 +if it's not happy about its input argument. + +28:23.480 --> 28:27.760 +Let's have a look to #'foo4. + +28:27.860 --> 28:32.920 +#'foo4 is a little bit more complex. + +28:33.020 --> 28:34.680 +It will return nil + +28:34.780 --> 28:37.400 +if the 'when' condition is not satisfied, + +28:37.500 --> 28:39.720 +so it's type 'null' here. + +28:39.820 --> 28:41.200 +It can return a floating point; + +28:41.300 --> 28:43.600 +we don't do propagation of floating point so far, + +28:43.700 --> 28:47.080 +or it can return any integer between 4 and 9. + +28:47.180 --> 28:52.840 +Wow. Let's go on with #'foo5. + +28:52.940 --> 28:55.760 +#'foo5 is even more complex + +28:55.860 --> 28:57.200 +because other than + +28:57.300 --> 28:59.180 +having to satisfy this condition, + +28:59.280 --> 29:02.280 +we can see that the result of the propagation + +29:02.380 --> 29:03.800 +of this complex condition + +29:03.900 --> 29:05.460 +is propagated also across the 'plus'. + +29:05.560 --> 29:08.240 +So this foo5 can return nil, + +29:08.340 --> 29:09.720 +a floating point we know nothing about, + +29:09.820 --> 29:13.560 +or an integer included between 12 and 24. + +29:13.660 --> 29:18.080 +Let's go on with #'foo6. + +29:18.180 --> 29:23.320 +#'foo6 is returning anything but an integer. + +29:23.420 --> 29:26.520 +I think it should be pretty obvious why, + +29:26.620 --> 29:28.120 +because if it's not an integer we return it, + +29:28.220 --> 29:30.000 +otherwise we signal an error. + +29:30.100 --> 29:32.880 +Let's finish with #'foo7 very quickly. + +29:32.980 --> 29:37.920 +#'foo7 has another very complex condition, + +29:38.020 --> 29:40.320 +at least for me, but it's also interesting to see + +29:40.420 --> 29:42.200 +that we are also propagating values for symbols. + +29:42.300 --> 29:45.160 +So we can return the symbol 'big, + +29:45.260 --> 29:46.980 +the symbol 'small, + +29:47.080 --> 29:51.200 +or an integer included between -100 and 100. + +29:51.300 --> 29:54.440 +Now, the question is: why all of this is useful + +29:54.540 --> 29:56.800 +other than having Andrea very happy + +29:56.900 --> 29:59.560 +when he's playing with this all day? + +29:59.660 --> 30:01.440 +Well, we have to come back one second + +30:01.540 --> 30:04.920 +to how Lisp_Objects are represented within Emacs. + +30:05.020 --> 30:09.480 +Lisp_Objects are represented as machine words, + +30:09.580 --> 30:12.580 +where we reserve a few bits to indicate the type. + +30:12.680 --> 30:15.560 +And every time we access the object, + +30:15.660 --> 30:17.120 +when this is a Fixnum, + +30:17.220 --> 30:19.920 +or a regular object where this is a pointer, + +30:20.020 --> 30:21.600 +we always have to extract these bits, + +30:21.700 --> 30:24.520 +make sure that they satisfy a condition, + +30:24.620 --> 30:27.120 +so make sure that we are going to manipulate + +30:27.220 --> 30:28.580 +the object of the type we expect, + +30:28.680 --> 30:31.040 +and then we can extract the object + +30:31.140 --> 30:32.540 +and do the manipulation. + +30:32.640 --> 30:34.420 +If the compiler managed to prove + +30:34.620 --> 30:37.400 +that the contained object is of the right type, + +30:37.500 --> 30:42.440 +it will be able to not emit the type check, + +30:42.540 --> 30:44.500 +and save a lot of instructions. + +30:44.600 --> 30:48.560 +This is a very powerful optimization. + +30:48.660 --> 30:50.760 +Let's discuss some potential future development + +30:50.860 --> 30:52.000 +in this area. + +30:52.100 --> 30:53.360 +First I think it would be extremely nice + +30:53.460 --> 30:54.920 +to extend the propagation + +30:55.020 --> 30:56.920 +to types that are not built in. + +30:57.020 --> 30:58.160 +There are a lot of cases + +30:58.260 --> 31:00.600 +where we could optimize effectively. + +31:00.700 --> 31:02.520 +For instance when we do + +31:02.620 --> 31:05.720 +a lot of accesses to structures + +31:05.820 --> 31:06.920 +-- lots of stuff like that -- + +31:07.020 --> 31:08.880 +where we keep on checking and checking + +31:08.980 --> 31:10.640 +the same object for the type + +31:10.740 --> 31:13.080 +where it's obvious, where it should be trivial + +31:13.180 --> 31:14.720 +to prove that it's the right type + +31:14.820 --> 31:16.360 +after the first access or things like that. + +31:16.460 --> 31:18.920 +So I believe this is a low-hanging fruit + +31:19.020 --> 31:21.180 +in terms of performance. + +31:21.280 --> 31:23.080 +Also I think it would be really nice + +31:23.180 --> 31:24.720 +to extend the declare mechanism + +31:24.820 --> 31:27.600 +to allow the user to declare argument types. + +31:27.700 --> 31:30.480 +(Optionally. Indeed, optionally.) + +31:30.580 --> 31:32.880 +Doing that would give the compiler + +31:32.980 --> 31:35.120 +a lot of information to propagate value types + +31:35.220 --> 31:36.360 +within the function. + +31:36.460 --> 31:38.840 +But those will allow the compiler + +31:38.940 --> 31:43.040 +to give the user really good diagnostics + +31:43.140 --> 31:45.400 +during compile time. Like the compiler could say: + +31:45.500 --> 31:47.000 +Hey, here you are calling a function + +31:47.100 --> 31:49.200 +that is expecting this argument of this type, + +31:49.300 --> 31:52.000 +but I'm proving that you are calling it + +31:52.100 --> 31:55.200 +with an argument that is of THIS type, + +31:55.300 --> 31:57.840 +and is not a subtype of the expected one. + +31:57.940 --> 32:00.240 +So you are doing something not coherent. + +32:00.340 --> 32:02.800 +This kind of interprocedural logic. + +32:02.900 --> 32:05.000 +And I think the compiler should also take advantage + +32:05.100 --> 32:06.640 +(under certain circumstances) + +32:06.740 --> 32:08.760 +of this interprocedural analysis + +32:08.860 --> 32:12.520 +in order to optimize even more, when possible. + +32:12.620 --> 32:13.720 +Also I think we should + +32:13.820 --> 32:15.480 +work on our [?] to improve the code generation + +32:15.580 --> 32:17.100 +depending on the prediction + +32:17.200 --> 32:18.160 +that the compiler is doing. + +32:18.260 --> 32:20.120 +We already take advantage of those predictions, + +32:20.220 --> 32:22.300 +but I think we could do better. + +32:22.400 --> 32:25.120 +A quick look at some performance results. + +32:25.220 --> 32:28.720 +These are from the elisp-benchmarks package + +32:28.820 --> 32:30.760 +within GNU ELPA. + +32:30.860 --> 32:32.520 +This is the performance uplift, + +32:32.620 --> 32:38.480 +and we can identify about 4 "classes" of results. + +32:38.580 --> 32:41.440 +The first one there is no performance uplift, + +32:41.540 --> 32:42.880 +because there is not much we can do, + +32:42.980 --> 32:44.720 +and the time is probably not spent + +32:44.820 --> 32:46.280 +within the execution engine. + +32:46.380 --> 32:49.000 +And the ones around 3x are the ones + +32:49.100 --> 32:50.680 +Where probably we are not triggering + +32:50.780 --> 32:52.640 +manually specific optimizations, + +32:52.740 --> 32:53.600 +but just the fact + +32:53.700 --> 32:57.100 +that we are converting into native code + +32:57.200 --> 33:00.500 +is giving us this performance uplift. + +33:00.600 --> 33:03.280 +Then there is a bunch of other benchmarks + +33:03.380 --> 33:05.680 +where the Lisp optimizations are triggering, + +33:05.780 --> 33:09.620 +and the uplift is way bigger, + +33:09.720 --> 33:11.900 +and then we have 3 benchmarks that at the time + +33:12.000 --> 33:13.420 +are completely optimized out. + +33:13.520 --> 33:15.580 +That means the compiler became "so smart" + +33:15.680 --> 33:18.160 +that it was able to compute the result + +33:18.260 --> 33:20.160 +in the compile time and just put the result + +33:20.260 --> 33:21.400 +in the generated binary. + +33:21.500 --> 33:23.880 +Let's discuss a little bit the compilation model. + +33:23.980 --> 33:26.080 +This is an Hybrid one; + +33:26.180 --> 33:29.420 +it's both JIT-like and Ahead-of-Time-like. + +33:29.520 --> 33:33.400 +Emacs is composed of what we call an Emacs image, + +33:33.500 --> 33:36.180 +essentially the Emacs binary that we start. + +33:36.280 --> 33:37.720 +It's including all the C code, + +33:37.820 --> 33:41.760 +plus all the Lisp code that we preload. + +33:41.860 --> 33:46.480 +Then we have the rest of the Emacs Lisp codebase + +33:46.580 --> 33:49.440 +that can be loaded just if it's required. + +33:49.540 --> 33:52.760 +Same for the external packages, if we have any. + +33:52.860 --> 33:55.120 +If we build an Emacs Lisp + +33:55.220 --> 33:57.940 +with native compilation enabled, by default, + +33:58.040 --> 34:01.200 +only the Emacs image will be native compiled. + +34:01.300 --> 34:03.920 +All the other code will be compiled + +34:04.020 --> 34:06.720 +on the fly when it's loaded and executed + +34:06.820 --> 34:08.800 +the first time, if it's necessary. + +34:08.900 --> 34:10.880 +Same for the packages, in a transparent way + +34:10.980 --> 34:12.640 +and asynchronous way. + +34:12.740 --> 34:13.480 +Also worth noting + +34:13.580 --> 34:15.400 +that the result of this compilation + +34:15.500 --> 34:17.240 +will be stored into a cache directory + +34:17.340 --> 34:19.900 +within the home directory of the user. + +34:20.000 --> 34:23.600 +So it will be reused in the following sessions + +34:23.700 --> 34:25.440 +if the same file is loaded, + +34:25.540 --> 34:27.520 +without having to recompile multiple times + +34:27.620 --> 34:29.320 +the same file in different sessions. + +34:29.420 --> 34:31.520 +It works a little bit like this: + +34:31.620 --> 34:33.800 +When we load the byte-code for the first time, + +34:33.900 --> 34:35.900 +we spawn a native compilation process. + +34:36.000 --> 34:39.320 +Meanwhile we keep using the byte-code available. + +34:39.420 --> 34:41.180 +When the native compilation is finished, + +34:41.280 --> 34:43.960 +we hot-swap the definition of the functions + +34:44.060 --> 34:46.040 +that are contained in the file + +34:46.140 --> 34:47.960 +and start using the native code transparently. + +34:48.060 --> 34:49.880 +We do this asynchronously, + +34:49.980 --> 34:53.140 +and for more compilation units at the same time, + +34:53.240 --> 34:56.280 +so it looks a little bit like this. + +34:56.380 --> 34:58.560 +Let's try a quick demo of all of this machinery. + +34:58.660 --> 35:00.880 +I've started a fresh Emacs + +35:00.980 --> 35:03.280 +with native compilation support, + +35:03.380 --> 35:05.060 +and at this moment nothing is going on. + +35:05.160 --> 35:07.440 +We will test the machinery with Tetris, + +35:07.540 --> 35:10.460 +because I can't imagine anything better + +35:10.560 --> 35:12.840 +to test this. + +35:12.940 --> 35:15.080 +What I do expect is that when I launch Tetris + +35:15.180 --> 35:17.920 +it will be loaded, it will immediately + +35:18.020 --> 35:20.460 +start execution of the byte-compiled version, + +35:20.560 --> 35:23.000 +so we won't see any delay in the user experience, + +35:23.100 --> 35:25.160 +and in the meanwhile, a parallel process + +35:25.260 --> 35:28.160 +will start to native-compile Tetris itself. + +35:28.260 --> 35:30.200 +When the native compilation will be finished, + +35:30.300 --> 35:33.240 +the functions of all Tetris will be hot-swapped. + +35:33.340 --> 35:36.280 +So we will not see any interruption. + +35:36.380 --> 35:39.680 +So Tetris started, and it's running, + +35:39.780 --> 35:41.600 +we have seen no delay, and in the meanwhile, + +35:41.700 --> 35:43.520 +the native compilation probably already finished, + +35:43.620 --> 35:44.960 +we can have a look. + +35:45.060 --> 35:47.160 +In this I see the native compilation log buffer. + +35:47.260 --> 35:49.720 +So we see that Tetris has been native compiled, + +35:49.820 --> 35:51.760 +and all of its dependencies. + +35:51.860 --> 35:53.840 +Now Tetris is still running, + +35:53.940 --> 36:00.480 +but I can do "C-h f tetris" + +36:00.580 --> 36:02.540 +and we can see that 'tetris' + +36:02.640 --> 36:04.880 +is an interactive native compiled + +36:04.980 --> 36:07.940 +Lisp function, so it has been native-compiled. + +36:08.040 --> 36:13.500 +I can even disassemble if I want. + +36:13.600 --> 36:14.880 +OK, so very cool. + +36:14.980 --> 36:18.020 +I guess we can say this mechanism is working. + +36:18.120 --> 36:20.660 +Also worth noting that if I go back + +36:20.760 --> 36:24.120 +to the *Async-native-compile-log* buffer, + +36:24.220 --> 36:27.960 +we see we have compiled another bunch of files. + +36:28.060 --> 36:31.600 +I think these are because of my 'C-h f', + +36:31.700 --> 36:33.800 +this help function command and disassemble, + +36:33.900 --> 36:34.920 +and so on. + +36:35.020 --> 36:37.720 +The first time you run Emacs, you will have, + +36:37.820 --> 36:41.280 +from time to time, these processes spawned. + +36:41.380 --> 36:43.100 +Emacs is "compiling itself", + +36:43.200 --> 36:45.520 +and it's replacing the byte-code definition + +36:45.620 --> 36:47.640 +with the native one. But after a few sessions, + +36:47.740 --> 36:49.760 +you will not see this anymore, + +36:49.860 --> 36:51.360 +because the output of this compilation, + +36:51.460 --> 36:54.900 +as I mentioned, are stored in the user directory. + +36:55.000 --> 36:57.560 +To conclude: Emacs with native compilation support + +36:57.660 --> 36:59.080 +is coming up in Emacs 28, + +36:59.180 --> 37:01.040 +that is gonna be the next major stable release + +37:01.140 --> 37:02.080 +that will be released. + +37:02.180 --> 37:04.840 +So we ought to celebrate with a big party, + +37:04.940 --> 37:07.320 +I believe. But before going to the party, + +37:07.420 --> 37:09.080 +I'd like to list a few points + +37:09.180 --> 37:11.340 +that I think have been success factors + +37:11.440 --> 37:13.420 +in upstreaming this work. + +37:13.520 --> 37:15.420 +It has been extremely important + +37:15.520 --> 37:18.140 +to get in touch with upstream as soon as possible, + +37:18.240 --> 37:20.680 +as soon as I had a proof of concept. + +37:20.780 --> 37:22.600 +It's been extremely important + +37:22.700 --> 37:24.660 +to involve the community as much as possible, + +37:24.760 --> 37:28.480 +and this included keeping a development blog, + +37:28.580 --> 37:31.600 +and posts about that on emacs-devel, + +37:31.700 --> 37:33.720 +and also producing material, + +37:33.820 --> 37:36.020 +participating in conferences, + +37:36.120 --> 37:38.320 +and giving presentations like the one I'm doing, + +37:38.420 --> 37:40.680 +to explain what I was doing and how it works. + +37:40.780 --> 37:43.040 +It has been extremely important, also, + +37:43.140 --> 37:45.860 +to be able to rely on the upstream infrastructure. + +37:45.960 --> 37:47.540 +So, to develop the software + +37:47.640 --> 37:49.080 +as a feature branch in the official git, + +37:49.180 --> 37:49.960 +but even more, I would say, + +37:50.060 --> 37:51.660 +to use the official bug tracker + +37:51.760 --> 37:52.880 +for solving bugs of this branch. + +37:52.980 --> 37:54.680 +This gave the opportunity + +37:54.780 --> 37:58.160 +to stay really in close touch with maintainers, + +37:58.260 --> 37:59.920 +and senior developers of Emacs. + +38:00.020 --> 38:02.980 +That helped me a lot. And at the same time + +38:03.080 --> 38:04.820 +they were informed about what I was doing + +38:04.920 --> 38:07.360 +and what was the status of this feature branch. + +38:07.460 --> 38:08.800 +Extremely important. + +38:08.900 --> 38:11.160 +And also I think it played a major role + +38:11.260 --> 38:14.120 +to try to design this enormous patch + +38:14.220 --> 38:18.120 +in a way that the impact on the current codebase + +38:18.220 --> 38:21.120 +was minimized (at least as much as possible). + +38:21.220 --> 38:23.660 +And also minimizing + +38:23.760 --> 38:26.240 +the impact on the user operation of the software, + +38:26.340 --> 38:28.040 +in this case Emacs. + +38:28.140 --> 38:29.640 +So yes, mandatory Special Thanks: + +38:29.740 --> 38:33.360 +Emacs developers, and especially maintainers + +38:33.460 --> 38:36.680 +and senior developers like Stefan Monnier, + +38:36.780 --> 38:40.440 +that helped me a lot across this long journey. + +38:40.540 --> 38:42.800 +And, well, all the community + +38:42.900 --> 38:45.160 +that really got so excited about this project + +38:45.260 --> 38:46.240 +and gave me the energy + +38:46.340 --> 38:49.120 +to go through all of this time and development + +38:49.220 --> 38:51.980 +and bugs and solving, etc. etc. + +38:52.080 --> 38:54.920 +So yes, it was a really exciting time, + +38:55.020 --> 38:58.000 +and I think we have to look forward + +38:58.100 --> 39:01.400 +and start thinking about how to improve all this + +39:01.500 --> 39:02.920 +for the following years. + +39:03.020 --> 39:04.300 +And that's it. + +39:04.400 --> 39:06.040 +I think I should be online for questions. + +39:06.140 --> 39:07.480 +Thank you very much. + +39:07.580 --> 39:07.680 +[captions by John Cummings] -- cgit v1.2.3