From 0dab88ba445dededd508cf1c16422cf055bd4b07 Mon Sep 17 00:00:00 2001
From: Sacha Chua <sacha@sachachua.com>
Date: Fri, 3 Dec 2021 17:28:36 -0500
Subject: Add native talk

---
 ...d-future-developments--andrea-corallo--main.vtt | 2752 ++++++++++++++++++++
 1 file changed, 2752 insertions(+)
 create mode 100644 2021/captions/emacsconf-2021-native--emacs-lisp-native-compiler-current-status-and-future-developments--andrea-corallo--main.vtt

diff --git a/2021/captions/emacsconf-2021-native--emacs-lisp-native-compiler-current-status-and-future-developments--andrea-corallo--main.vtt b/2021/captions/emacsconf-2021-native--emacs-lisp-native-compiler-current-status-and-future-developments--andrea-corallo--main.vtt
new file mode 100644
index 00000000..0f7e6a3c
--- /dev/null
+++ b/2021/captions/emacsconf-2021-native--emacs-lisp-native-compiler-current-status-and-future-developments--andrea-corallo--main.vtt
@@ -0,0 +1,2752 @@
+WEBVTT
+
+00:00.200 --> 00:02.340
+Hi everybody, my name is Andrea Corallo,
+
+00:02.440 --> 00:03.280
+and this presentation is about
+
+00:03.382 --> 00:05.802
+the Emacs Lisp Native Compiler --
+
+00:05.904 --> 00:07.083
+having GNU Emacs able to
+
+00:07.184 --> 00:08.805
+compile and run Emacs Lisp as native code.
+
+00:08.907 --> 00:11.565
+This project has been my hobby project
+
+00:11.667 --> 00:14.009
+for the last about two years and a half
+
+00:14.111 --> 00:15.408
+And we will see a little bit
+
+00:15.509 --> 00:16.969
+where this project is coming from,
+
+00:17.070 --> 00:18.913
+where we are, and where we want to go.
+
+00:19.015 --> 00:20.571
+So essentially everything you need to know
+
+00:20.672 --> 00:22.132
+about the Emacs Lisp Native Compiler,
+
+00:22.232 --> 00:23.416
+and probably a little more.
+
+00:23.521 --> 00:26.535
+Just a little bit of context on Emacs Lisp.
+
+00:26.635 --> 00:30.175
+Well, Emacs Lisp is a programming language,
+
+00:30.278 --> 00:33.544
+it's indeed a Lisp,
+
+00:33.647 --> 00:36.619
+and one year ago I looked for some statistics,
+
+00:36.722 --> 00:38.742
+and I was kind of pleased to see
+
+00:38.842 --> 00:40.030
+-- surprised to see actually --
+
+00:40.132 --> 00:40.972
+that it's still kind of popular
+
+00:41.073 --> 00:42.844
+as a programming language.
+
+00:42.944 --> 00:43.684
+It doesn't rank that bad
+
+00:43.785 --> 00:45.554
+against other programming languages.
+
+00:45.658 --> 00:48.387
+Also, the other important fact about Emacs Lisp
+
+00:48.487 --> 00:51.518
+is that there is a lot of Emacs Lisp out there,
+
+00:51.622 --> 00:55.922
+and this will have an impact [on this project.]
+
+00:56.025 --> 00:57.632
+It's a programming language that is capable
+
+00:57.734 --> 00:59.320
+of almost any task, so it's
+
+00:59.420 --> 01:01.728
+almost a general-purpose programming language,
+
+01:01.831 --> 01:04.116
+and this reflects on Emacs itself,
+
+01:04.217 --> 01:07.232
+that it's capable of almost any task.
+
+01:07.335 --> 01:09.679
+Indeed this "almost" is something we want to fix,
+
+01:09.781 --> 01:10.720
+because we want to do everything,
+
+01:10.820 --> 01:14.218
+we want [to do] all of our computing [with Emacs].
+
+01:14.321 --> 01:16.523
+Also, an interesting aspect for me
+
+01:16.625 --> 01:18.903
+is that it's not specified by any standard.
+
+01:19.005 --> 01:22.507
+This implies it can evolve in a more agile way
+
+01:22.609 --> 01:25.667
+without having to change the standard, etc.
+
+01:25.770 --> 01:27.870
+And, in fact, it's kind of improving,
+
+01:27.970 --> 01:30.370
+I believe relatively fast.
+
+01:30.473 --> 01:32.791
+A little bit about Lisp in general.
+
+01:32.892 --> 01:34.952
+First, it's the best programming language ever,
+
+01:35.052 --> 01:35.712
+we all know it.
+
+01:35.813 --> 01:37.833
+It has a lot of very nice properties,
+
+01:37.934 --> 01:40.356
+like it's dynamic, it's homoiconic, etc.
+
+01:40.462 --> 01:42.394
+But the interesting thing for implementors,
+
+01:42.496 --> 01:43.476
+is that, in theory,
+
+01:43.576 --> 01:46.716
+it can be implemented with very few primitives.
+
+01:46.817 --> 01:49.485
+You build very few primitives that are like magic,
+
+01:49.590 --> 01:51.838
+and on top of this,
+
+01:51.938 --> 01:53.770
+you can implement the whole language.
+
+01:53.873 --> 01:55.759
+This sounds very nice,
+
+01:55.860 --> 01:57.280
+and very appealing for implementors
+
+01:57.381 --> 02:00.175
+meaning to implement a new Lisp implementation,
+
+02:00.279 --> 02:02.539
+or improving or modifying an existing one.
+
+02:02.641 --> 02:04.163
+So the question is:
+
+02:04.263 --> 02:07.663
+how many primitives do we have to implement
+
+02:07.764 --> 02:09.384
+if we want to change
+
+02:09.485 --> 02:13.203
+the GNU Emacs Lisp implementation?
+
+02:13.308 --> 02:20.128
+Unfortunately, not really as few as we would like.
+
+02:20.234 --> 02:25.294
+In GNU Emacs, we have about 1500 primitives,
+
+02:25.400 --> 02:27.970
+and the main reason for that is performance.
+
+02:28.071 --> 02:30.961
+Actually, GNU Emacs was written
+
+02:31.064 --> 02:34.292
+when performance was a big issue,
+
+02:34.393 --> 02:36.493
+and nowadays certain parts
+
+02:36.594 --> 02:38.486
+are still performance-critical.
+
+02:38.591 --> 02:40.295
+We have a lot of C code;
+
+02:40.395 --> 02:46.331
+30% of the GNU Emacs codebase is C code,
+
+02:46.435 --> 02:49.177
+and we have to maintain this.
+
+02:49.279 --> 02:52.859
+But not only that, this is the main barrier
+
+02:52.959 --> 02:55.579
+for people that tried in the past
+
+02:55.681 --> 02:57.663
+to change the Emacs Lisp implementation.
+
+02:57.765 --> 02:59.101
+Because not only do you have to
+
+02:59.202 --> 03:01.582
+replace these primitives,
+
+03:01.683 --> 03:03.907
+all of them or part of them,
+
+03:04.012 --> 03:06.384
+but sometimes they also share (these primitives),
+
+03:06.484 --> 03:07.952
+internal data structures.
+
+03:08.055 --> 03:09.745
+For instance, it's very difficult to say:
+
+03:09.846 --> 03:12.826
+Now I want to go from C
+
+03:12.926 --> 03:13.906
+to a different programming language
+
+03:14.006 --> 03:15.638
+for implementing these primitives.
+
+03:15.740 --> 03:17.028
+So this has been, effectively,
+
+03:17.128 --> 03:20.380
+the main barrier for doing this work.
+
+03:20.486 --> 03:22.090
+Another interesting aspect
+
+03:22.190 --> 03:23.870
+about the GNU Emacs implementation
+
+03:23.970 --> 03:26.190
+is that Lisp can run interpreted
+
+03:26.291 --> 03:28.647
+or byte-compiled for performance,
+
+03:28.752 --> 03:30.632
+and the byte compiler itself
+
+03:30.733 --> 03:32.673
+is written in Emacs Lisp.
+
+03:32.773 --> 03:35.033
+This implies that GNU Emacs has to go through
+
+03:35.134 --> 03:37.214
+a bootstrap procedure in order to be built.
+
+03:37.319 --> 03:38.715
+But it's kind of interesting
+
+03:38.815 --> 03:41.835
+for something that started as a text editor,
+
+03:41.937 --> 03:43.101
+or something like it.
+
+03:43.203 --> 03:47.877
+The byte-code that is Emacs Lisp
+
+03:47.979 --> 03:50.279
+when it's been byte compiled,
+
+03:50.379 --> 03:53.079
+it's running on a stack-based virtual machine
+
+03:53.180 --> 03:55.146
+that is implemented in C.
+
+03:55.253 --> 03:59.741
+OK, so I've listed a bunch of areas
+
+03:59.842 --> 04:02.074
+where Emacs Lisp could improve:
+
+04:02.178 --> 04:06.924
+Namespace, Extensibility, Performance,
+
+04:07.025 --> 04:11.239
+and Debuggability, and Diagnostic, we could say.
+
+04:11.346 --> 04:14.326
+This activity, this project in particular,
+
+04:14.428 --> 04:17.088
+is affecting primarily Performance
+
+04:17.189 --> 04:21.314
+and the performance area the Execution Engine.
+
+04:21.414 --> 04:22.570
+That said,
+
+04:22.671 --> 04:25.855
+I think it has an impact also on Extensibility,
+
+04:25.957 --> 04:27.811
+and I hope it will have an impact also
+
+04:27.913 --> 04:29.779
+on programming diagnostics,
+
+04:29.881 --> 04:32.501
+so giving better warnings, [unknown].
+
+04:32.604 --> 04:36.694
+So which are the benefits
+
+04:36.795 --> 04:39.215
+of increasing the Emacs Lisp performance?
+
+04:39.316 --> 04:42.976
+Indeed, we will have, if we do that,
+
+04:43.078 --> 04:45.668
+programs that run faster.
+
+04:45.774 --> 04:48.738
+But the main implication of that
+
+04:48.840 --> 04:50.760
+is that we could write less C;
+
+04:50.860 --> 04:52.940
+we could maintain and debug less C.
+
+04:53.041 --> 04:56.437
+That is kind of a time-consuming task.
+
+04:56.542 --> 04:59.502
+And we could also allow for
+
+04:59.603 --> 05:01.723
+writing performance-critical extensions
+
+05:01.824 --> 05:03.804
+directly in Lisp
+
+05:03.909 --> 05:06.305
+without having to use systems like
+
+05:06.406 --> 05:09.686
+[I think there's a bunch.]
+
+05:09.927 --> 05:14.791
+These are very consistent benefits.
+
+05:14.899 --> 05:16.669
+OK, Project Goals,
+
+05:16.769 --> 05:18.829
+but I think the title of this slide
+
+05:18.930 --> 05:21.520
+maybe should be Project Requirements.
+
+05:21.623 --> 05:23.231
+So when I started this activity,
+
+05:23.331 --> 05:26.943
+I set some requirements for the project,
+
+05:27.049 --> 05:29.469
+with the main goal in mind: to go upstream.
+
+05:29.570 --> 05:31.494
+So I wanted to create something,
+
+05:31.594 --> 05:34.250
+a modified implementation of GNU Emacs,
+
+05:34.353 --> 05:36.675
+that was compatible
+
+05:36.776 --> 05:38.596
+as close to 100% as possible
+
+05:38.696 --> 05:40.514
+to the current implementation.
+
+05:40.620 --> 05:42.720
+And when I say "current implementation,"
+
+05:42.820 --> 05:44.758
+I don't refer to what
+
+05:44.859 --> 05:46.879
+the Emacs Lisp Programming Manual specifies
+
+05:46.980 --> 05:49.804
+as expected behavior of the implementation,
+
+05:49.907 --> 05:52.207
+but really the implementation itself.
+
+05:52.309 --> 05:53.841
+This is because there are a lot of corner cases
+
+05:53.942 --> 05:56.390
+that are not specified by the manual,
+
+05:56.493 --> 05:58.093
+but programs do rely on that,
+
+05:58.193 --> 06:00.523
+and given there is a ton of Emacs Lisp
+
+06:00.625 --> 06:01.935
+already around,
+
+06:02.037 --> 06:05.437
+compatibility was definitely a major requirement.
+
+06:05.541 --> 06:07.727
+I wanted to produce something that had
+
+06:07.827 --> 06:09.587
+reduced impact on the Emacs codebase,
+
+06:09.687 --> 06:11.543
+at least as much as possible.
+
+06:11.644 --> 06:14.724
+So I didn't want to rewrite all of GNU Emacs.
+
+06:14.827 --> 06:17.609
+Indeed, because it would have been
+
+06:17.710 --> 06:21.068
+too much work for one single person,
+
+06:21.171 --> 06:25.371
+but also still thinking
+
+06:25.473 --> 06:29.153
+to an upstream outcome all of this time.
+
+06:29.258 --> 06:31.974
+Another requirement was to have
+
+06:32.076 --> 06:34.700
+no, or very reduced, impact on the user,
+
+06:34.804 --> 06:36.456
+so I didn't want to change
+
+06:36.556 --> 06:38.604
+the way Emacs is used by you.
+
+06:38.708 --> 06:40.618
+And last but not least,
+
+06:40.718 --> 06:42.848
+introducing new dependencies,
+
+06:42.951 --> 06:45.379
+those dependencies had to be Free Software,
+
+06:45.479 --> 06:47.191
+possibly GPL,
+
+06:47.295 --> 06:49.995
+and also an important requirement
+
+06:50.097 --> 06:52.941
+is that these dependencies had to be
+
+06:53.043 --> 06:55.143
+some kind of trusted software that we know
+
+06:55.243 --> 06:56.681
+is going to be maintained in the future.
+
+06:56.781 --> 06:59.261
+Given Emacs has been around since forever,
+
+06:59.363 --> 07:01.183
+and will be around forever and ever,
+
+07:01.285 --> 07:05.265
+this was another very important point.
+
+07:05.367 --> 07:08.707
+A little bit of history of this project/
+
+07:08.808 --> 07:10.648
+a quick timeline.
+
+07:10.748 --> 07:14.388
+2019, in May, I did my first commit,
+
+07:14.490 --> 07:17.110
+I think it was when I tried to write
+
+07:17.210 --> 07:20.230
+my first primitive function ever in C,
+
+07:20.332 --> 07:21.752
+in GNU Emacs.
+
+07:21.852 --> 07:24.832
+And this was an attempt to try to compile
+
+07:24.934 --> 07:30.314
+a function that, once executed, returning [?]
+
+07:30.415 --> 07:33.755
+That was it. Six months after (about),
+
+07:33.857 --> 07:37.057
+I had something that was kind of working,
+
+07:37.157 --> 07:42.797
+So I was able to start up a semi-standard Emacs
+
+07:42.899 --> 07:44.879
+and then to compile and load,
+
+07:44.981 --> 07:47.221
+and replacing most of the functions
+
+07:47.321 --> 07:50.921
+that I had defined floating in my Lisp universe.
+
+07:51.022 --> 07:54.202
+Those functions are the functions
+
+07:54.304 --> 07:55.944
+that are essentially composing Emacs,
+
+07:56.044 --> 07:57.884
+at least the Lisp side of Emacs.
+
+07:57.984 --> 08:01.084
+A lot of features were missing,
+
+08:01.186 --> 08:03.426
+like I had no Garbage Collector support,
+
+08:03.526 --> 08:07.006
+no bootstrap, I was not optimizing these functions
+
+08:07.108 --> 08:10.068
+Because optimization [would be broken].
+
+08:10.169 --> 08:12.749
+No image dump support, etc.
+
+08:12.850 --> 08:16.330
+But I think this proved the design could work.
+
+08:16.431 --> 08:19.171
+So I sent to email to emacs-devel. I said
+
+08:19.272 --> 08:20.852
+I have this stuff I'm working on,
+
+08:20.953 --> 08:24.613
+and I wanted some feedback from the upstream
+
+08:24.714 --> 08:27.534
+to see if there was some interest.
+
+08:27.635 --> 08:30.455
+I believe the outcome of this was positive
+
+08:30.557 --> 08:32.837
+because about one month after,
+
+08:32.937 --> 08:35.697
+I pushed my branch within the Emacs git
+
+08:35.798 --> 08:38.438
+as a feature branch, and shortly after,
+
+08:38.539 --> 08:42.679
+we started to use the bug tracker to track bugs.
+
+08:42.779 --> 08:45.479
+So essentially we moved the development
+
+08:45.580 --> 08:50.040
+on the upstream infrastructure.
+
+08:50.140 --> 08:55.620
+I believe two years after the first commit,
+
+08:55.721 --> 08:57.781
+the project was merged
+
+08:57.882 --> 09:00.922
+after literally hundreds of bugs solved,
+
+09:01.022 --> 09:03.842
+and improvements, suggestions [unknown]
+
+09:03.943 --> 09:08.623
+and this was about six months ago.
+
+09:08.723 --> 09:12.363
+Before discussing how the native compiler works,
+
+09:12.464 --> 09:14.224
+I think it's worth looking at
+
+09:14.324 --> 09:17.644
+how Lisp is implemented in GNU Emacs.
+
+09:17.745 --> 09:19.305
+We have Lisp_Objects
+
+09:19.405 --> 09:21.945
+floating around our Lisp universe,
+
+09:22.045 --> 09:23.905
+and they are internally represented in this way.
+
+09:24.006 --> 09:25.606
+We have what is called a tagged pointer,
+
+09:25.706 --> 09:27.406
+that is just a regular pointer
+
+09:27.506 --> 09:29.286
+that is pointing to the area of memory
+
+09:29.386 --> 09:31.926
+where we hold the real data of the object.
+
+09:32.027 --> 09:34.187
+But within this tagged pointer,
+
+09:34.287 --> 09:36.747
+we reserve a few bits
+
+09:36.848 --> 09:38.968
+to indicate the type of object we are pointing to.
+
+09:39.068 --> 09:40.528
+This is important because
+
+09:40.628 --> 09:42.288
+each time we access an object,
+
+09:42.388 --> 09:46.168
+we have to typically check those bits
+
+09:46.269 --> 09:49.029
+to check that the object we are manipulating
+
+09:49.129 --> 09:50.529
+is of the right kind,
+
+09:50.630 --> 09:52.530
+remove those bits, and, if we are happy,
+
+09:52.630 --> 09:55.810
+access the object, otherwise [unknown].
+
+09:55.910 --> 09:57.690
+All the objects are like this,
+
+09:57.791 --> 09:59.731
+except for typically Fixnums,
+
+09:59.831 --> 10:01.031
+that are small integers
+
+10:01.131 --> 10:04.891
+that we manage to fit directly within the pointer.
+
+10:04.992 --> 10:07.292
+Also for manipulating Fixnums,
+
+10:07.392 --> 10:09.412
+we have to check the tag bits each time.
+
+10:09.513 --> 10:13.173
+Whenever we are not sure of the type of object
+
+10:13.273 --> 10:16.493
+we are manipulating (read: almost every time),
+
+10:16.594 --> 10:18.914
+we have to check those bits and remove those bits
+
+10:19.014 --> 10:21.754
+before doing any manipulation on the Fixnum.
+
+10:21.854 --> 10:26.014
+How Emacs Lisp is byte-compiled and executed
+
+10:26.115 --> 10:27.775
+in, let's call it, "Vanilla".
+
+10:27.875 --> 10:30.055
+If we have a Lisp expression of this kind:
+
+10:30.156 --> 10:32.776
+We take the variable 'a' we do plus 2,
+
+10:32.876 --> 10:34.876
+and then we multiply the result by 3,
+
+10:34.976 --> 10:37.376
+the byte compiler will produce this LAP code.
+
+10:37.477 --> 10:38.497
+LAP code is essentially
+
+10:38.597 --> 10:40.017
+the assembly for the byte-code,
+
+10:40.117 --> 10:43.697
+so it's the "intermediate representation"
+
+10:43.798 --> 10:48.458
+that will assembled into byte-code. (.elc files)
+
+10:48.558 --> 10:50.738
+How is this program executed?
+
+10:50.839 --> 10:53.639
+As I mentioned, it's executed in a virtual machine
+
+10:53.739 --> 10:55.699
+that is stack-based,
+
+10:55.800 --> 10:58.540
+but we start with an execution stack that's empty,
+
+10:58.640 --> 11:01.680
+and a stack pointer pointing to its bottom.
+
+11:01.780 --> 11:04.280
+And we execute the first instruction,
+
+11:04.380 --> 11:07.120
+that is pushing in the stack the value of 'a',
+
+11:07.220 --> 11:10.360
+in this case, 100. Then we push the constant 2.
+
+11:10.460 --> 11:12.400
+Then we do the summation,
+
+11:12.500 --> 11:14.040
+and we have the result in the stack.
+
+11:14.140 --> 11:16.860
+Same: we push the constant 3,
+
+11:16.960 --> 11:17.620
+we do the multiplication,
+
+11:17.720 --> 11:19.260
+and we will be able to return.
+
+11:19.360 --> 11:22.700
+Now, what's good and what's bad about this?
+
+11:22.800 --> 11:25.800
+A good thing is that it's very simple
+
+11:25.900 --> 11:27.460
+to start from Lisp
+
+11:27.560 --> 11:31.320
+and compile this kind of LAP output.
+
+11:31.420 --> 11:34.240
+At least it's reasonably simple.
+
+11:34.340 --> 11:36.240
+The compiler is not that complex.
+
+11:36.340 --> 11:39.000
+The bad thing is that all this machinery
+
+11:39.100 --> 11:40.660
+-- push and pop, etc. --
+
+11:40.760 --> 11:44.320
+it's very different from how a modern CPU works.
+
+11:44.420 --> 11:45.720
+Because modern CPUs,
+
+11:45.820 --> 11:47.660
+they are not stack-based anymore
+
+11:47.760 --> 11:50.900
+but they have instead a fixed number of registers,
+
+11:51.000 --> 11:54.020
+and they work with assignment and operation
+
+11:54.120 --> 11:55.040
+within these registers
+
+11:55.140 --> 11:57.040
+that are generally called "general-purpose."
+
+11:57.140 --> 11:59.260
+So to execute this LAP program,
+
+11:59.360 --> 12:00.760
+there is another program,
+
+12:00.860 --> 12:02.600
+that is the implementation of the VM itself
+
+12:02.700 --> 12:06.400
+that is doing conversion during runtime.
+
+12:06.500 --> 12:08.320
+So it's interpreting the LAP program
+
+12:08.420 --> 12:11.360
+and it's converting it into instructions
+
+12:11.460 --> 12:13.180
+that we can execute on the CPU.
+
+12:13.280 --> 12:14.940
+This conversion is done each time
+
+12:15.040 --> 12:17.100
+we will run some byte-code.
+
+12:17.200 --> 12:19.760
+And it's something that we want to avoid.
+
+12:19.860 --> 12:21.560
+Instead of this live conversion,
+
+12:21.660 --> 12:26.220
+we want to convert once:
+
+12:26.320 --> 12:28.920
+our Lisp program into native code,
+
+12:29.020 --> 12:32.380
+that is, a binary program that can be executed
+
+12:32.480 --> 12:34.400
+directly by our CPU.
+
+12:34.500 --> 12:36.680
+We want to save all this unnecessary conversion
+
+12:36.780 --> 12:39.000
+that we do each time we are running a program
+
+12:39.100 --> 12:39.760
+while we are running it.
+
+12:39.860 --> 12:42.060
+And we want to do this process just once,
+
+12:42.160 --> 12:43.740
+when we are compiling.
+
+12:43.840 --> 12:46.240
+That's the main goal of this activity.
+
+12:46.340 --> 12:50.200
+How is the byte compiler implemented?
+
+12:50.300 --> 12:53.940
+As any compiler it's a pipeline of transformations.
+
+12:54.040 --> 12:58.560
+We go through macro expansion, closure conversion,
+
+12:58.660 --> 13:02.120
+we have a bunch of source level optimization.
+
+13:02.220 --> 13:03.940
+Then we go into LAP,
+
+13:04.040 --> 13:06.800
+that's the transformation we are interested in,
+
+13:06.900 --> 13:10.600
+and after a few optimizations on LAP,
+
+13:10.700 --> 13:14.140
+LAP is assembled into byte-code.
+
+13:14.240 --> 13:16.880
+So if we [list it]
+
+13:16.980 --> 13:19.320
+in terms of intermediate representations,
+
+13:19.420 --> 13:23.600
+we can simplify this pipeline like this.
+
+13:23.700 --> 13:26.100
+We start with Lisp, and at a certain point
+
+13:26.200 --> 13:29.520
+we are manipulating the program in LAP form,
+
+13:29.620 --> 13:31.980
+and then at the end we produce the byte-code
+
+13:32.080 --> 13:34.560
+that is the .elc file that [you run]
+
+13:34.660 --> 13:37.660
+What I wanted to realize was something like this.
+
+13:37.760 --> 13:41.200
+I wanted to start from LAP, do something,
+
+13:41.300 --> 13:44.600
+and jump into GCC using libgccjit
+
+13:44.700 --> 13:45.560
+and in particular
+
+13:45.660 --> 13:48.000
+the libgccjit Intermediate Representation
+
+13:48.100 --> 13:50.340
+that we will discuss.
+
+13:50.440 --> 13:53.120
+Now, why I wanted to do something like this?
+
+13:53.220 --> 13:57.300
+Essentially, writing a compiler from scratch
+
+13:57.400 --> 14:01.520
+for Emacs Lisp would have been a very big task.
+
+14:01.620 --> 14:05.120
+So I wanted to rely on, as much as I could,
+
+14:05.220 --> 14:07.180
+the Emacs Lisp byte compiler,
+
+14:07.280 --> 14:10.280
+because I had to produce something
+
+14:10.380 --> 14:12.920
+that was as compatible as possible
+
+14:13.020 --> 14:14.240
+to the current implementation.
+
+14:14.340 --> 14:18.340
+So this was (I believe) a very good idea
+
+14:18.440 --> 14:20.520
+to save an enormous quantity of work
+
+14:20.620 --> 14:22.340
+and to produce something
+
+14:22.440 --> 14:24.940
+that was compatible in terms of semantics
+
+14:25.040 --> 14:26.820
+with the original implementation.
+
+14:26.920 --> 14:30.660
+Also, I didn't want to implement a code generator
+
+14:30.760 --> 14:32.860
+for each architecture we were targeting,
+
+14:32.960 --> 14:35.900
+nor wanted to implement all these optimizations
+
+14:36.000 --> 14:37.840
+that are already in GCC,
+
+14:37.940 --> 14:40.120
+so I thought it was a good idea
+
+14:40.220 --> 14:44.720
+to rely on an existing compiler like GCC [?].
+
+14:44.820 --> 14:47.240
+Let's talk about libgccjit.
+
+14:47.340 --> 14:50.180
+It was added by David Malcolm in GCC 5
+
+14:50.280 --> 14:55.680
+It allows you to describe a C-ish semantic to GCC
+
+14:55.780 --> 14:57.380
+and have it compile.
+
+14:57.480 --> 15:01.400
+It's good for [reading] Jitters or AoT compilers.
+
+15:01.500 --> 15:04.400
+And if we talk about GCC:
+
+15:04.500 --> 15:07.180
+it's a compiler, it's a very good one,
+
+15:07.280 --> 15:09.340
+it has support for a remarkable number
+
+15:09.440 --> 15:11.640
+of target architectures.
+
+15:11.740 --> 15:15.080
+It's also very good at generating [fast code],
+
+15:15.180 --> 15:17.980
+and it's been around for a long time;
+
+15:18.080 --> 15:21.200
+I believe it's like 1 year younger than Emacs.
+
+15:21.300 --> 15:23.220
+It's still very well maintained,
+
+15:23.320 --> 15:25.840
+and we can assume it will be maintained
+
+15:25.940 --> 15:28.920
+for quite a while. And, as I mentioned,
+
+15:29.020 --> 15:31.120
+this was a very important point.
+
+15:31.220 --> 15:33.860
+Also, it's GPL; it's Free Software,
+
+15:33.960 --> 15:36.120
+and it's developed under the GNU umbrella,
+
+15:36.220 --> 15:40.360
+so I thought it was a very good option.
+
+15:40.460 --> 15:43.300
+So we can imagine a simple translation
+
+15:43.400 --> 15:46.400
+that goes from LAP to this subset of C
+
+15:46.500 --> 15:48.880
+that we can describe to libgccjit.
+
+15:48.980 --> 15:52.880
+This simple translation we can see here,
+
+15:52.980 --> 15:55.680
+it's actually pretty trivial.
+
+15:55.780 --> 15:58.500
+Instead of doing operations
+
+15:58.600 --> 16:02.060
+within the execution stack we have seen before,
+
+16:02.160 --> 16:05.080
+we have just an array that is replacing it,
+
+16:05.180 --> 16:07.380
+and it's called 'local' in this case,
+
+16:07.480 --> 16:12.360
+and we have assignments within this array,
+
+16:12.460 --> 16:15.140
+so that they are done in place of the original
+
+16:15.240 --> 16:17.100
+push and pop activity of the stack.
+
+16:17.200 --> 16:18.280
+The nice thing is that,
+
+16:18.380 --> 16:20.260
+when you have done this translation,
+
+16:20.360 --> 16:23.160
+GCC will be able to optimize this,
+
+16:23.260 --> 16:25.840
+and remove all the unnecessary operations,
+
+16:25.940 --> 16:27.180
+and generate code
+
+16:27.280 --> 16:29.660
+for the specific CPU you are targeting,
+
+16:29.760 --> 16:32.020
+[which will be running your code].
+
+16:32.120 --> 16:34.920
+This sounds great; it sounds like
+
+16:35.020 --> 16:37.440
+a very simple and effective translation,
+
+16:37.540 --> 16:39.920
+and in fact the first iteration of my compiler
+
+16:40.020 --> 16:41.120
+was doing just this.
+
+16:41.220 --> 16:45.320
+It was essentially a big C function
+
+16:45.420 --> 16:48.240
+that was taking LAP and doing this conversion
+
+16:48.340 --> 16:50.120
+describing the output to libgccjit.
+
+16:50.220 --> 16:53.720
+Unfortunately, if you do this,
+
+16:53.820 --> 16:55.740
+you will discover that you have
+
+16:55.840 --> 17:00.080
+a performance upper bound limit of about 3x.
+
+17:00.180 --> 17:04.360
+So it was an option,
+
+17:04.460 --> 17:06.560
+but I thought it was a good occasion
+
+17:06.660 --> 17:08.980
+for trying to do something more.
+
+17:09.080 --> 17:11.640
+And doing something more means
+
+17:11.740 --> 17:13.680
+implementing a smarter compiler
+
+17:13.780 --> 17:17.180
+that is doing some advanced analysis on the code,
+
+17:17.280 --> 17:18.640
+and will be able to perform
+
+17:18.740 --> 17:20.700
+Lisp-specific optimizations
+
+17:20.800 --> 17:22.480
+-- optimizations that take advantage of
+
+17:22.580 --> 17:24.960
+the specific Lisp semantics,
+
+17:25.060 --> 17:27.760
+something that GCC is not aware of.
+
+17:27.860 --> 17:31.240
+And while I was thinking about that,
+
+17:31.340 --> 17:34.000
+I thought that having a smarter compiler
+
+17:34.100 --> 17:38.120
+had also other advantages, like a smarter compiler
+
+17:38.220 --> 17:40.120
+that understands the semantics
+
+17:40.220 --> 17:41.820
+of the programming language being compiled
+
+17:41.920 --> 17:43.680
+would be also capable of
+
+17:43.780 --> 17:45.620
+giving feedback to the programmers,
+
+17:45.720 --> 17:47.920
+like better warnings and errors.
+
+17:48.020 --> 17:51.360
+So I was really fascinated about this idea,
+
+17:51.460 --> 17:53.240
+and I wanted to change my implementation
+
+17:53.340 --> 17:55.920
+because I was not really happy about it.
+
+17:56.020 --> 17:58.320
+I had a lot of C code in terms of
+
+17:58.420 --> 18:02.100
+lines that were not doing any smart job.
+
+18:02.200 --> 18:07.380
+And I wanted to write all the interesting logic
+
+18:07.480 --> 18:09.940
+[in Lisp].
+
+18:10.040 --> 18:12.560
+So optimizing outside GCC
+
+18:12.660 --> 18:15.840
+before jumping into GCC,
+
+18:15.940 --> 18:20.500
+as I mentioned, has two main targets:
+
+18:20.600 --> 18:23.060
+Either optimize the code before going into GCC,
+
+18:23.160 --> 18:25.380
+or present to GCC some code
+
+18:25.480 --> 18:27.740
+that we know GCC can optimize effectively.
+
+18:27.840 --> 18:30.800
+And also, this will give, as I mentioned,
+
+18:30.900 --> 18:32.660
+better options for the compiler
+
+18:32.760 --> 18:34.420
+to provide warnings, errors
+
+18:34.520 --> 18:36.340
+-- better diagnostics.
+
+18:36.440 --> 18:38.200
+So this is pretty much
+
+18:38.300 --> 18:40.640
+what the native compiler looks like nowadays,
+
+18:40.740 --> 18:42.720
+in terms of passes.
+
+18:42.820 --> 18:44.560
+We have a list of passes,
+
+18:44.660 --> 18:46.620
+each of which is taking an input
+
+18:46.720 --> 18:48.120
+and producing an output.
+
+18:48.220 --> 18:51.040
+So it's doing either analysis on the program
+
+18:51.140 --> 18:52.640
+that's being passed,
+
+18:52.740 --> 18:54.760
+or it's performing a transformation.
+
+18:54.860 --> 18:57.900
+All of these passes are implemented in Lisp,
+
+18:58.000 --> 19:00.440
+and only the last pass is implemented in C.
+
+19:00.540 --> 19:05.060
+That is the one that is talking to libgccjit.
+
+19:05.160 --> 19:07.760
+To do that, I have introduced
+
+19:07.860 --> 19:10.640
+a new intermediate representation
+
+19:10.740 --> 19:13.960
+that I call LIMPLE, as a tribute to GCC GIMPLE,
+
+19:14.060 --> 19:17.220
+that is the main internal representation of GCC,
+
+19:17.320 --> 19:20.600
+at least one of the main ones.
+
+19:20.700 --> 19:25.080
+Introducing a new intermediate representation
+
+19:25.180 --> 19:27.720
+-- a new way of representing my program --
+
+19:27.820 --> 19:29.540
+solved a bunch of problems.
+
+19:29.640 --> 19:33.200
+First, it allowed me to implement
+
+19:33.300 --> 19:37.280
+non-trivial analysis and transformations,
+
+19:37.380 --> 19:40.000
+the ones I needed in my compiler pipeline.
+
+19:40.100 --> 19:42.160
+But also, it solved the problem of
+
+19:42.260 --> 19:43.840
+what was the boundary between
+
+19:43.940 --> 19:46.120
+what I had to implement in Lisp,
+
+19:46.220 --> 19:48.040
+and what in C.
+
+19:48.140 --> 19:49.040
+Because once I had
+
+19:49.140 --> 19:51.860
+my intermediate representation defined,
+
+19:51.960 --> 19:53.780
+essentially the boundary between Lisp and C
+
+19:53.880 --> 19:55.640
+is just a function, that is,
+
+19:55.740 --> 19:57.780
+the one that is implementing the final pass.
+
+19:57.880 --> 19:59.220
+That is taking, as an input,
+
+19:59.320 --> 20:01.620
+all of my programs in LIMPLE representation
+
+20:01.720 --> 20:03.880
+and it's doing [his bit].
+
+20:03.980 --> 20:07.980
+So I was convinced this design at least had sense.
+
+20:08.080 --> 20:10.120
+When we go through some of these passes,
+
+20:10.220 --> 20:12.600
+just to give you an idea of what these are doing:
+
+20:12.700 --> 20:14.560
+the first pass is just responsible for
+
+20:14.660 --> 20:18.100
+spilling the LAP from the byte compiler
+
+20:18.200 --> 20:20.160
+that effectively here we are using as a front end
+
+20:20.260 --> 20:21.780
+for our compiler pipeline.
+
+20:21.880 --> 20:24.040
+The second pass, called 'limplify',
+
+20:24.140 --> 20:27.920
+will be in charge of converting LAP into LIMPLE.
+
+20:28.020 --> 20:31.260
+LIMPLE is an intermediate representation
+
+20:31.360 --> 20:32.860
+that is Control Flow Graph based,
+
+20:32.960 --> 20:34.700
+and it's capable of SSA.
+
+20:34.800 --> 20:38.640
+So we can have a look to what this means.
+
+20:38.740 --> 20:41.300
+Let's assume we have our LAP program,
+
+20:41.400 --> 20:42.520
+as any program,
+
+20:42.620 --> 20:43.960
+that's a simple list of instructions
+
+20:44.060 --> 20:45.640
+that we will execute one after the other.
+
+20:45.740 --> 20:47.360
+Some of these instructions
+
+20:47.460 --> 20:49.200
+are special instructions
+
+20:49.300 --> 20:51.240
+that we call conditional branches,
+
+20:51.340 --> 20:52.800
+where we check for a condition,
+
+20:52.900 --> 20:54.320
+and if this is verified,
+
+20:54.420 --> 20:56.940
+we jump to a different address within the program.
+
+20:57.040 --> 20:59.580
+(Addresses that here we are calling 'labels'.)
+
+20:59.680 --> 21:03.440
+So we can split our program in chunks,
+
+21:03.540 --> 21:08.200
+and those chunks we execute without interruption,
+
+21:08.300 --> 21:10.460
+so we always enter from the top of those,
+
+21:10.560 --> 21:12.600
+and we exit from the bottom.
+
+21:12.700 --> 21:15.980
+We can name those, and split them apart,
+
+21:16.080 --> 21:18.980
+and these are what we call basic blocks.
+
+21:19.080 --> 21:22.360
+And now we have a bunch of these basic blocks
+
+21:22.460 --> 21:23.380
+that are floating,
+
+21:23.480 --> 21:25.020
+and they are not any more sorted.
+
+21:25.120 --> 21:25.920
+This is what is called
+
+21:26.020 --> 21:28.680
+a Control Flow Graph based representation.
+
+21:28.780 --> 21:31.400
+Now we can get into the SSA topic.
+
+21:31.500 --> 21:33.900
+That stands for Static Single Assignment.
+
+21:34.000 --> 21:35.860
+I don't want to get into the details,
+
+21:35.960 --> 21:36.720
+but just give you a feeling.
+
+21:36.820 --> 21:38.480
+I added into our basic blocks
+
+21:38.580 --> 21:41.400
+in our Control Flow Graph a few assignments.
+
+21:41.500 --> 21:43.840
+We will transform this into SSA
+
+21:43.940 --> 21:45.040
+just for the variable 'x',
+
+21:45.140 --> 21:47.360
+just for the sake of demonstrating it.
+
+21:47.460 --> 21:49.760
+This is done through a number of phases
+
+21:49.860 --> 21:51.760
+that are essentially some analysis,
+
+21:51.860 --> 21:52.600
+mainly renaming.
+
+21:52.700 --> 21:55.560
+But the outcome, the one we see here,
+
+21:55.660 --> 21:59.120
+looks quite similar to the original one,
+
+21:59.220 --> 22:01.120
+but we can see that the variable 'x'
+
+22:01.220 --> 22:01.960
+has been renamed.
+
+22:02.060 --> 22:03.400
+And now we don't have anymore just one,
+
+22:03.500 --> 22:06.140
+but a number of these variables.
+
+22:06.240 --> 22:08.000
+The interesting property is that
+
+22:08.100 --> 22:10.880
+each of these variables is assigned just once.
+
+22:10.980 --> 22:13.240
+And this allows for the compiler
+
+22:13.340 --> 22:16.760
+to do prediction of the value of that variable,
+
+22:16.860 --> 22:19.040
+depending on the position
+
+22:19.140 --> 22:19.840
+within the Control Flow Graph.
+
+22:19.940 --> 22:21.980
+This is very important. For instance,
+
+22:22.080 --> 22:23.440
+a very simple case is 'x1'
+
+22:23.540 --> 22:27.440
+that we see is assigned once by definition,
+
+22:27.540 --> 22:29.300
+in particular here at the beginning.
+
+22:29.400 --> 22:31.040
+Here it's very simple to understand
+
+22:31.140 --> 22:33.080
+that x1 will have the value 3.
+
+22:33.180 --> 22:35.380
+While, for instance, it's more difficult to prove
+
+22:35.480 --> 22:37.060
+what is going to be the value of x5,
+
+22:37.160 --> 22:38.620
+because it's calling a function,
+
+22:38.720 --> 22:41.940
+or we don't know at the moment what x4 is.
+
+22:42.040 --> 22:46.240
+So the compiler will gain the capability
+
+22:46.340 --> 22:48.460
+to do prediction on all the variables,
+
+22:48.560 --> 22:50.620
+and the more we get information on one variable,
+
+22:50.720 --> 22:54.980
+the more we can prove about the others.
+
+22:55.460 --> 22:57.280
+Coming back to our passes, the next one
+
+22:57.380 --> 22:59.320
+is forward propagation.
+
+22:59.420 --> 23:00.600
+This pass is responsible for
+
+23:00.700 --> 23:03.240
+doing what I briefly mentioned just before:
+
+23:03.340 --> 23:07.160
+doing proof over all the different variables
+
+23:07.260 --> 23:09.620
+in different positions of the Control Flow Graph,
+
+23:09.720 --> 23:12.700
+about the values, types, or ranges.
+
+23:12.800 --> 23:15.080
+This pass is also responsible for
+
+23:15.180 --> 23:16.920
+executing functions
+
+23:17.020 --> 23:18.640
+when we know that the function has no side effect
+
+23:18.740 --> 23:20.520
+and the pass managed to
+
+23:20.620 --> 23:22.440
+prove all the values of its argument.
+
+23:22.540 --> 23:24.800
+So the function is then executed at compile time
+
+23:24.900 --> 23:26.700
+and it doesn't even exist anymore
+
+23:26.800 --> 23:27.880
+in the produced code.
+
+23:27.980 --> 23:30.300
+Then we have another pass, this is
+
+23:30.400 --> 23:33.420
+an example of a pass that is very specific:
+
+23:33.520 --> 23:36.120
+it's trying to remove the call to funcall
+
+23:36.220 --> 23:38.040
+when those are not necessary.
+
+23:38.140 --> 23:39.760
+There are a couple situations
+
+23:39.860 --> 23:42.200
+where this is very useful.
+
+23:42.300 --> 23:45.240
+And not only is this beneficial
+
+23:45.340 --> 23:47.560
+because we are generating better code,
+
+23:47.660 --> 23:49.245
+but when we manage to do that,
+
+23:49.345 --> 23:52.000
+we allow GCC better analysis over the code,
+
+23:52.100 --> 23:54.440
+because GCC knows nothing about funcall.
+
+23:54.540 --> 23:57.400
+So if we are calling, from 'foo', directly, 'bar',
+
+23:57.500 --> 24:01.280
+for GCC it's way easier to do its analysis
+
+24:01.380 --> 24:03.360
+on top of this code.
+
+24:03.460 --> 24:06.240
+Another interesting pass we can mention is 'tco'.
+
+24:06.340 --> 24:08.800
+This is performing Tail Recursion Elimination.
+
+24:08.900 --> 24:11.880
+It allows a more functional programming style,
+
+24:11.980 --> 24:13.220
+if you want.
+
+24:13.320 --> 24:14.280
+We can jump to the last pass
+
+24:14.380 --> 24:16.200
+that is called 'final', and as I mentioned,
+
+24:16.300 --> 24:17.520
+this one is responsible for
+
+24:17.620 --> 24:19.880
+taking our program in LIMPLE representation
+
+24:19.980 --> 24:24.900
+and describing it to libgccjit in the gccjit IR.
+
+24:25.000 --> 24:27.480
+That's the main task. It's also
+
+24:27.580 --> 24:29.520
+defining inline functions
+
+24:29.620 --> 24:32.400
+for accessing fundamental data types, and so on.
+
+24:32.500 --> 24:34.460
+This pass is also responsible for
+
+24:34.560 --> 24:36.280
+using some of the predictions
+
+24:36.380 --> 24:39.560
+done by previous passes to generate better code.
+
+24:39.660 --> 24:41.320
+Things we had to add
+
+24:41.420 --> 24:43.880
+to have all of this machinery work
+
+24:43.980 --> 24:45.240
+and to be controllable:
+
+24:45.340 --> 24:47.480
+The first one is an opt called 'native-comp-speed'
+
+24:47.580 --> 24:49.920
+and it's equivalent to Common Lisp's 'speed'.
+
+24:50.020 --> 24:51.920
+It represents the optimization level.
+
+24:52.020 --> 24:53.400
+The default is 2 and is
+
+24:53.500 --> 24:55.400
+the maximum optimization level
+
+24:55.500 --> 24:58.600
+that is meant to reflect
+
+24:58.700 --> 25:00.860
+all the original semantics of Emacs Lisp.
+
+25:00.960 --> 25:02.480
+So it's the one that should be used by default.
+
+25:02.580 --> 25:04.640
+The second one is 'compilation unit'
+
+25:04.740 --> 25:05.960
+and it's a kind of new object
+
+25:06.060 --> 25:11.080
+that has been added to Emacs.
+
+25:11.180 --> 25:12.960
+Let's have a look to
+
+25:13.060 --> 25:14.360
+how the Garbage Collector works in this case.
+
+25:14.460 --> 25:15.840
+The GNU Emacs Garbage Collector
+
+25:15.940 --> 25:18.660
+is a simple mark-and-sweep garbage collector.
+
+25:18.760 --> 25:21.420
+It does a tree walk through all the objects
+
+25:21.520 --> 25:25.100
+and follows references from one object to another.
+
+25:25.200 --> 25:27.960
+All the objects reachable during the mark phase
+
+25:28.060 --> 25:31.240
+will be kept in our Lisp universe.
+
+25:31.340 --> 25:33.680
+All the other ones will be freed.
+
+25:33.780 --> 25:35.080
+In this case we have a bunch of functions,
+
+25:35.180 --> 25:38.760
+'foo1', 'foo2', 'bar1', etc., that are defined.
+
+25:38.860 --> 25:40.320
+When a function is defined,
+
+25:40.420 --> 25:42.400
+it's accessible through its symbol,
+
+25:42.500 --> 25:44.360
+so we have the symbol referring to the function.
+
+25:44.460 --> 25:47.720
+The function, in this case a native-compiled one,
+
+25:47.820 --> 25:50.040
+is referring to the compilation unit.
+
+25:50.140 --> 25:53.020
+The compilation unit is essentially
+
+25:53.120 --> 25:58.600
+the ELF file that has been compiled,
+
+25:58.700 --> 26:01.160
+and contains all those functions
+
+26:01.260 --> 26:03.240
+that came from the original .el file,
+
+26:03.340 --> 26:05.100
+and that we have loaded into memory.
+
+26:05.200 --> 26:10.000
+If, for instance, 'bar1 and 'bar2 are undefined,
+
+26:10.100 --> 26:14.200
+functions [3] and 4 will be no longer reachable,
+
+26:14.300 --> 26:16.040
+and we will be able to free them
+
+26:16.140 --> 26:18.160
+and unload the compilation unit.
+
+26:18.260 --> 26:21.200
+We discussed quite a lot about Control Flow Graph,
+
+26:21.300 --> 26:23.560
+SSA, and a lot of boring stuff,
+
+26:23.660 --> 26:25.400
+and I promised you that we are doing
+
+26:25.500 --> 26:27.320
+a lot of interesting proofs over variables,
+
+26:27.420 --> 26:30.220
+So let's have some examples of them.
+
+26:30.320 --> 26:31.840
+Let's jump into a quick demo
+
+26:31.940 --> 26:34.480
+to see what all of this abstract theory
+
+26:34.580 --> 26:37.680
+and this esoteric propagation engine can do for us
+
+26:37.780 --> 26:39.240
+and how the user can interact with it.
+
+26:39.340 --> 26:42.100
+I've defined a bunch of functions,
+
+26:42.200 --> 26:45.240
+and I will native-compile and load it.
+
+26:47.500 --> 26:48.840
+Alright, Emacs Lisp native compiled and loaded.
+
+26:48.940 --> 26:52.320
+At this point, I can disassemble 'foo1'
+
+26:52.420 --> 26:56.320
+to make sure it's native code and I'm not lying.
+
+26:56.420 --> 26:58.500
+These are the instructions
+
+26:58.600 --> 27:01.320
+that will be executed directly by my CPU
+
+27:01.420 --> 27:03.620
+when I call this function.
+
+27:03.720 --> 27:07.520
+Alright, very cool.
+
+27:07.620 --> 27:16.080
+Now, [Lisp:] (symbol-function #'foo1)
+
+27:16.180 --> 27:19.720
+Interestingly, this is returning a subroutine,
+
+27:19.820 --> 27:21.820
+as it would be a primitive function.
+
+27:21.920 --> 27:23.700
+Because this is native code,
+
+27:23.800 --> 27:24.840
+even if it's written in Lisp,
+
+27:24.940 --> 27:26.340
+has been converted to native code
+
+27:26.440 --> 27:29.200
+as if it's a primitive function.
+
+27:29.300 --> 27:31.560
+But we can do also a new thing:
+
+27:31.660 --> 27:34.440
+asking for the type of the subroutine.
+
+27:34.540 --> 27:38.360
+Alright, very cool. It says this is a function,
+
+27:38.460 --> 27:40.280
+it's taking one argument of type 't'
+
+27:40.380 --> 27:41.560
+(that means anything
+
+27:41.660 --> 27:42.960
+because we don't have any information),
+
+27:43.060 --> 27:45.200
+and is returning a type 't',
+
+27:45.300 --> 27:47.380
+so also there we don't have much information.
+
+27:47.480 --> 27:49.700
+OK, very cool, but not very useful.
+
+27:49.800 --> 27:53.680
+Let's see #'foo2. #'foo2 is slightly different,
+
+27:53.780 --> 27:55.840
+it doesn't take any argument, but it's returning
+
+27:55.940 --> 27:58.040
+an integer included between 3 and 3.
+
+27:58.140 --> 28:01.360
+Wow, amazing!
+
+28:01.460 --> 28:04.040
+Let's get into something a little more complex:
+
+28:04.140 --> 28:09.440
+#'foo3 takes one argument we know nothing about,
+
+28:09.540 --> 28:11.740
+but it's returning a number.
+
+28:11.840 --> 28:13.280
+And why it's returning a number?
+
+28:13.380 --> 28:16.320
+Essentially because 1+ is returning a number,
+
+28:16.420 --> 28:18.880
+and in all the other cases,
+
+28:18.980 --> 28:20.640
+it would signal an error
+
+28:20.740 --> 28:23.380
+if it's not happy about its input argument.
+
+28:23.480 --> 28:27.760
+Let's have a look to #'foo4.
+
+28:27.860 --> 28:32.920
+#'foo4 is a little bit more complex.
+
+28:33.020 --> 28:34.680
+It will return nil
+
+28:34.780 --> 28:37.400
+if the 'when' condition is not satisfied,
+
+28:37.500 --> 28:39.720
+so it's type 'null' here.
+
+28:39.820 --> 28:41.200
+It can return a floating point;
+
+28:41.300 --> 28:43.600
+we don't do propagation of floating point so far,
+
+28:43.700 --> 28:47.080
+or it can return any integer between 4 and 9.
+
+28:47.180 --> 28:52.840
+Wow. Let's go on with #'foo5.
+
+28:52.940 --> 28:55.760
+#'foo5 is even more complex
+
+28:55.860 --> 28:57.200
+because other than
+
+28:57.300 --> 28:59.180
+having to satisfy this condition,
+
+28:59.280 --> 29:02.280
+we can see that the result of the propagation
+
+29:02.380 --> 29:03.800
+of this complex condition
+
+29:03.900 --> 29:05.460
+is propagated also across the 'plus'.
+
+29:05.560 --> 29:08.240
+So this foo5 can return nil,
+
+29:08.340 --> 29:09.720
+a floating point we know nothing about,
+
+29:09.820 --> 29:13.560
+or an integer included between 12 and 24.
+
+29:13.660 --> 29:18.080
+Let's go on with #'foo6.
+
+29:18.180 --> 29:23.320
+#'foo6 is returning anything but an integer.
+
+29:23.420 --> 29:26.520
+I think it should be pretty obvious why,
+
+29:26.620 --> 29:28.120
+because if it's not an integer we return it,
+
+29:28.220 --> 29:30.000
+otherwise we signal an error.
+
+29:30.100 --> 29:32.880
+Let's finish with #'foo7 very quickly.
+
+29:32.980 --> 29:37.920
+#'foo7 has another very complex condition,
+
+29:38.020 --> 29:40.320
+at least for me, but it's also interesting to see
+
+29:40.420 --> 29:42.200
+that we are also propagating values for symbols.
+
+29:42.300 --> 29:45.160
+So we can return the symbol 'big,
+
+29:45.260 --> 29:46.980
+the symbol 'small,
+
+29:47.080 --> 29:51.200
+or an integer included between -100 and 100.
+
+29:51.300 --> 29:54.440
+Now, the question is: why all of this is useful
+
+29:54.540 --> 29:56.800
+other than having Andrea very happy
+
+29:56.900 --> 29:59.560
+when he's playing with this all day?
+
+29:59.660 --> 30:01.440
+Well, we have to come back one second
+
+30:01.540 --> 30:04.920
+to how Lisp_Objects are represented within Emacs.
+
+30:05.020 --> 30:09.480
+Lisp_Objects are represented as machine words,
+
+30:09.580 --> 30:12.580
+where we reserve a few bits to indicate the type.
+
+30:12.680 --> 30:15.560
+And every time we access the object,
+
+30:15.660 --> 30:17.120
+when this is a Fixnum,
+
+30:17.220 --> 30:19.920
+or a regular object where this is a pointer,
+
+30:20.020 --> 30:21.600
+we always have to extract these bits,
+
+30:21.700 --> 30:24.520
+make sure that they satisfy a condition,
+
+30:24.620 --> 30:27.120
+so make sure that we are going to manipulate
+
+30:27.220 --> 30:28.580
+the object of the type we expect,
+
+30:28.680 --> 30:31.040
+and then we can extract the object
+
+30:31.140 --> 30:32.540
+and do the manipulation.
+
+30:32.640 --> 30:34.420
+If the compiler managed to prove
+
+30:34.620 --> 30:37.400
+that the contained object is of the right type,
+
+30:37.500 --> 30:42.440
+it will be able to not emit the type check,
+
+30:42.540 --> 30:44.500
+and save a lot of instructions.
+
+30:44.600 --> 30:48.560
+This is a very powerful optimization.
+
+30:48.660 --> 30:50.760
+Let's discuss some potential future development
+
+30:50.860 --> 30:52.000
+in this area.
+
+30:52.100 --> 30:53.360
+First I think it would be extremely nice
+
+30:53.460 --> 30:54.920
+to extend the propagation
+
+30:55.020 --> 30:56.920
+to types that are not built in.
+
+30:57.020 --> 30:58.160
+There are a lot of cases
+
+30:58.260 --> 31:00.600
+where we could optimize effectively.
+
+31:00.700 --> 31:02.520
+For instance when we do
+
+31:02.620 --> 31:05.720
+a lot of accesses to structures
+
+31:05.820 --> 31:06.920
+-- lots of stuff like that --
+
+31:07.020 --> 31:08.880
+where we keep on checking and checking
+
+31:08.980 --> 31:10.640
+the same object for the type
+
+31:10.740 --> 31:13.080
+where it's obvious, where it should be trivial
+
+31:13.180 --> 31:14.720
+to prove that it's the right type
+
+31:14.820 --> 31:16.360
+after the first access or things like that.
+
+31:16.460 --> 31:18.920
+So I believe this is a low-hanging fruit
+
+31:19.020 --> 31:21.180
+in terms of performance.
+
+31:21.280 --> 31:23.080
+Also I think it would be really nice
+
+31:23.180 --> 31:24.720
+to extend the declare mechanism
+
+31:24.820 --> 31:27.600
+to allow the user to declare argument types.
+
+31:27.700 --> 31:30.480
+(Optionally. Indeed, optionally.)
+
+31:30.580 --> 31:32.880
+Doing that would give the compiler
+
+31:32.980 --> 31:35.120
+a lot of information to propagate value types
+
+31:35.220 --> 31:36.360
+within the function.
+
+31:36.460 --> 31:38.840
+But those will allow the compiler
+
+31:38.940 --> 31:43.040
+to give the user really good diagnostics
+
+31:43.140 --> 31:45.400
+during compile time. Like the compiler could say:
+
+31:45.500 --> 31:47.000
+Hey, here you are calling a function
+
+31:47.100 --> 31:49.200
+that is expecting this argument of this type,
+
+31:49.300 --> 31:52.000
+but I'm proving that you are calling it
+
+31:52.100 --> 31:55.200
+with an argument that is of THIS type,
+
+31:55.300 --> 31:57.840
+and is not a subtype of the expected one.
+
+31:57.940 --> 32:00.240
+So you are doing something not coherent.
+
+32:00.340 --> 32:02.800
+This kind of interprocedural logic.
+
+32:02.900 --> 32:05.000
+And I think the compiler should also take advantage
+
+32:05.100 --> 32:06.640
+(under certain circumstances)
+
+32:06.740 --> 32:08.760
+of this interprocedural analysis
+
+32:08.860 --> 32:12.520
+in order to optimize even more, when possible.
+
+32:12.620 --> 32:13.720
+Also I think we should
+
+32:13.820 --> 32:15.480
+work on our [?] to improve the code generation
+
+32:15.580 --> 32:17.100
+depending on the prediction
+
+32:17.200 --> 32:18.160
+that the compiler is doing.
+
+32:18.260 --> 32:20.120
+We already take advantage of those predictions,
+
+32:20.220 --> 32:22.300
+but I think we could do better.
+
+32:22.400 --> 32:25.120
+A quick look at some performance results.
+
+32:25.220 --> 32:28.720
+These are from the elisp-benchmarks package
+
+32:28.820 --> 32:30.760
+within GNU ELPA.
+
+32:30.860 --> 32:32.520
+This is the performance uplift,
+
+32:32.620 --> 32:38.480
+and we can identify about 4 "classes" of results.
+
+32:38.580 --> 32:41.440
+The first one there is no performance uplift,
+
+32:41.540 --> 32:42.880
+because there is not much we can do,
+
+32:42.980 --> 32:44.720
+and the time is probably not spent
+
+32:44.820 --> 32:46.280
+within the execution engine.
+
+32:46.380 --> 32:49.000
+And the ones around 3x are the ones
+
+32:49.100 --> 32:50.680
+Where probably we are not triggering
+
+32:50.780 --> 32:52.640
+manually specific optimizations,
+
+32:52.740 --> 32:53.600
+but just the fact
+
+32:53.700 --> 32:57.100
+that we are converting into native code
+
+32:57.200 --> 33:00.500
+is giving us this performance uplift.
+
+33:00.600 --> 33:03.280
+Then there is a bunch of other benchmarks
+
+33:03.380 --> 33:05.680
+where the Lisp optimizations are triggering,
+
+33:05.780 --> 33:09.620
+and the uplift is way bigger,
+
+33:09.720 --> 33:11.900
+and then we have 3 benchmarks that at the time
+
+33:12.000 --> 33:13.420
+are completely optimized out.
+
+33:13.520 --> 33:15.580
+That means the compiler became "so smart"
+
+33:15.680 --> 33:18.160
+that it was able to compute the result
+
+33:18.260 --> 33:20.160
+in the compile time and just put the result
+
+33:20.260 --> 33:21.400
+in the generated binary.
+
+33:21.500 --> 33:23.880
+Let's discuss a little bit the compilation model.
+
+33:23.980 --> 33:26.080
+This is an Hybrid one;
+
+33:26.180 --> 33:29.420
+it's both JIT-like and Ahead-of-Time-like.
+
+33:29.520 --> 33:33.400
+Emacs is composed of what we call an Emacs image,
+
+33:33.500 --> 33:36.180
+essentially the Emacs binary that we start.
+
+33:36.280 --> 33:37.720
+It's including all the C code,
+
+33:37.820 --> 33:41.760
+plus all the Lisp code that we preload.
+
+33:41.860 --> 33:46.480
+Then we have the rest of the Emacs Lisp codebase
+
+33:46.580 --> 33:49.440
+that can be loaded just if it's required.
+
+33:49.540 --> 33:52.760
+Same for the external packages, if we have any.
+
+33:52.860 --> 33:55.120
+If we build an Emacs Lisp
+
+33:55.220 --> 33:57.940
+with native compilation enabled, by default,
+
+33:58.040 --> 34:01.200
+only the Emacs image will be native compiled.
+
+34:01.300 --> 34:03.920
+All the other code will be compiled
+
+34:04.020 --> 34:06.720
+on the fly when it's loaded and executed
+
+34:06.820 --> 34:08.800
+the first time, if it's necessary.
+
+34:08.900 --> 34:10.880
+Same for the packages, in a transparent way
+
+34:10.980 --> 34:12.640
+and asynchronous way.
+
+34:12.740 --> 34:13.480
+Also worth noting
+
+34:13.580 --> 34:15.400
+that the result of this compilation
+
+34:15.500 --> 34:17.240
+will be stored into a cache directory
+
+34:17.340 --> 34:19.900
+within the home directory of the user.
+
+34:20.000 --> 34:23.600
+So it will be reused in the following sessions
+
+34:23.700 --> 34:25.440
+if the same file is loaded,
+
+34:25.540 --> 34:27.520
+without having to recompile multiple times
+
+34:27.620 --> 34:29.320
+the same file in different sessions.
+
+34:29.420 --> 34:31.520
+It works a little bit like this:
+
+34:31.620 --> 34:33.800
+When we load the byte-code for the first time,
+
+34:33.900 --> 34:35.900
+we spawn a native compilation process.
+
+34:36.000 --> 34:39.320
+Meanwhile we keep using the byte-code available.
+
+34:39.420 --> 34:41.180
+When the native compilation is finished,
+
+34:41.280 --> 34:43.960
+we hot-swap the definition of the functions
+
+34:44.060 --> 34:46.040
+that are contained in the file
+
+34:46.140 --> 34:47.960
+and start using the native code transparently.
+
+34:48.060 --> 34:49.880
+We do this asynchronously,
+
+34:49.980 --> 34:53.140
+and for more compilation units at the same time,
+
+34:53.240 --> 34:56.280
+so it looks a little bit like this.
+
+34:56.380 --> 34:58.560
+Let's try a quick demo of all of this machinery.
+
+34:58.660 --> 35:00.880
+I've started a fresh Emacs
+
+35:00.980 --> 35:03.280
+with native compilation support,
+
+35:03.380 --> 35:05.060
+and at this moment nothing is going on.
+
+35:05.160 --> 35:07.440
+We will test the machinery with Tetris,
+
+35:07.540 --> 35:10.460
+because I can't imagine anything better
+
+35:10.560 --> 35:12.840
+to test this.
+
+35:12.940 --> 35:15.080
+What I do expect is that when I launch Tetris
+
+35:15.180 --> 35:17.920
+it will be loaded, it will immediately
+
+35:18.020 --> 35:20.460
+start execution of the byte-compiled version,
+
+35:20.560 --> 35:23.000
+so we won't see any delay in the user experience,
+
+35:23.100 --> 35:25.160
+and in the meanwhile, a parallel process
+
+35:25.260 --> 35:28.160
+will start to native-compile Tetris itself.
+
+35:28.260 --> 35:30.200
+When the native compilation will be finished,
+
+35:30.300 --> 35:33.240
+the functions of all Tetris will be hot-swapped.
+
+35:33.340 --> 35:36.280
+So we will not see any interruption.
+
+35:36.380 --> 35:39.680
+So Tetris started, and it's running,
+
+35:39.780 --> 35:41.600
+we have seen no delay, and in the meanwhile,
+
+35:41.700 --> 35:43.520
+the native compilation probably already finished,
+
+35:43.620 --> 35:44.960
+we can have a look.
+
+35:45.060 --> 35:47.160
+In this I see the native compilation log buffer.
+
+35:47.260 --> 35:49.720
+So we see that Tetris has been native compiled,
+
+35:49.820 --> 35:51.760
+and all of its dependencies.
+
+35:51.860 --> 35:53.840
+Now Tetris is still running,
+
+35:53.940 --> 36:00.480
+but I can do "C-h f tetris"
+
+36:00.580 --> 36:02.540
+and we can see that 'tetris'
+
+36:02.640 --> 36:04.880
+is an interactive native compiled
+
+36:04.980 --> 36:07.940
+Lisp function, so it has been native-compiled.
+
+36:08.040 --> 36:13.500
+I can even disassemble if I want.
+
+36:13.600 --> 36:14.880
+OK, so very cool.
+
+36:14.980 --> 36:18.020
+I guess we can say this mechanism is working.
+
+36:18.120 --> 36:20.660
+Also worth noting that if I go back
+
+36:20.760 --> 36:24.120
+to the *Async-native-compile-log* buffer,
+
+36:24.220 --> 36:27.960
+we see we have compiled another bunch of files.
+
+36:28.060 --> 36:31.600
+I think these are because of my 'C-h f',
+
+36:31.700 --> 36:33.800
+this help function command and disassemble,
+
+36:33.900 --> 36:34.920
+and so on.
+
+36:35.020 --> 36:37.720
+The first time you run Emacs, you will have,
+
+36:37.820 --> 36:41.280
+from time to time, these processes spawned.
+
+36:41.380 --> 36:43.100
+Emacs is "compiling itself",
+
+36:43.200 --> 36:45.520
+and it's replacing the byte-code definition
+
+36:45.620 --> 36:47.640
+with the native one. But after a few sessions,
+
+36:47.740 --> 36:49.760
+you will not see this anymore,
+
+36:49.860 --> 36:51.360
+because the output of this compilation,
+
+36:51.460 --> 36:54.900
+as I mentioned, are stored in the user directory.
+
+36:55.000 --> 36:57.560
+To conclude: Emacs with native compilation support
+
+36:57.660 --> 36:59.080
+is coming up in Emacs 28,
+
+36:59.180 --> 37:01.040
+that is gonna be the next major stable release
+
+37:01.140 --> 37:02.080
+that will be released.
+
+37:02.180 --> 37:04.840
+So we ought to celebrate with a big party,
+
+37:04.940 --> 37:07.320
+I believe. But before going to the party,
+
+37:07.420 --> 37:09.080
+I'd like to list a few points
+
+37:09.180 --> 37:11.340
+that I think have been success factors
+
+37:11.440 --> 37:13.420
+in upstreaming this work.
+
+37:13.520 --> 37:15.420
+It has been extremely important
+
+37:15.520 --> 37:18.140
+to get in touch with upstream as soon as possible,
+
+37:18.240 --> 37:20.680
+as soon as I had a proof of concept.
+
+37:20.780 --> 37:22.600
+It's been extremely important
+
+37:22.700 --> 37:24.660
+to involve the community as much as possible,
+
+37:24.760 --> 37:28.480
+and this included keeping a development blog,
+
+37:28.580 --> 37:31.600
+and posts about that on emacs-devel,
+
+37:31.700 --> 37:33.720
+and also producing material,
+
+37:33.820 --> 37:36.020
+participating in conferences,
+
+37:36.120 --> 37:38.320
+and giving presentations like the one I'm doing,
+
+37:38.420 --> 37:40.680
+to explain what I was doing and how it works.
+
+37:40.780 --> 37:43.040
+It has been extremely important, also,
+
+37:43.140 --> 37:45.860
+to be able to rely on the upstream infrastructure.
+
+37:45.960 --> 37:47.540
+So, to develop the software
+
+37:47.640 --> 37:49.080
+as a feature branch in the official git,
+
+37:49.180 --> 37:49.960
+but even more, I would say,
+
+37:50.060 --> 37:51.660
+to use the official bug tracker
+
+37:51.760 --> 37:52.880
+for solving bugs of this branch.
+
+37:52.980 --> 37:54.680
+This gave the opportunity
+
+37:54.780 --> 37:58.160
+to stay really in close touch with maintainers,
+
+37:58.260 --> 37:59.920
+and senior developers of Emacs.
+
+38:00.020 --> 38:02.980
+That helped me a lot. And at the same time
+
+38:03.080 --> 38:04.820
+they were informed about what I was doing
+
+38:04.920 --> 38:07.360
+and what was the status of this feature branch.
+
+38:07.460 --> 38:08.800
+Extremely important.
+
+38:08.900 --> 38:11.160
+And also I think it played a major role
+
+38:11.260 --> 38:14.120
+to try to design this enormous patch
+
+38:14.220 --> 38:18.120
+in a way that the impact on the current codebase
+
+38:18.220 --> 38:21.120
+was minimized (at least as much as possible).
+
+38:21.220 --> 38:23.660
+And also minimizing
+
+38:23.760 --> 38:26.240
+the impact on the user operation of the software,
+
+38:26.340 --> 38:28.040
+in this case Emacs.
+
+38:28.140 --> 38:29.640
+So yes, mandatory Special Thanks:
+
+38:29.740 --> 38:33.360
+Emacs developers, and especially maintainers
+
+38:33.460 --> 38:36.680
+and senior developers like Stefan Monnier,
+
+38:36.780 --> 38:40.440
+that helped me a lot across this long journey.
+
+38:40.540 --> 38:42.800
+And, well, all the community
+
+38:42.900 --> 38:45.160
+that really got so excited about this project
+
+38:45.260 --> 38:46.240
+and gave me the energy
+
+38:46.340 --> 38:49.120
+to go through all of this time and development
+
+38:49.220 --> 38:51.980
+and bugs and solving, etc. etc.
+
+38:52.080 --> 38:54.920
+So yes, it was a really exciting time,
+
+38:55.020 --> 38:58.000
+and I think we have to look forward
+
+38:58.100 --> 39:01.400
+and start thinking about how to improve all this
+
+39:01.500 --> 39:02.920
+for the following years.
+
+39:03.020 --> 39:04.300
+And that's it.
+
+39:04.400 --> 39:06.040
+I think I should be online for questions.
+
+39:06.140 --> 39:07.480
+Thank you very much.
+
+39:07.580 --> 39:07.680
+[captions by John Cummings]
-- 
cgit v1.2.3