WEBVTT
00:00.200 --> 00:02.340
Hi everybody, my name is Andrea Corallo,
00:02.440 --> 00:03.280
and this presentation is about
00:03.382 --> 00:05.802
the Emacs Lisp Native Compiler --
00:05.904 --> 00:07.083
having GNU Emacs able to
00:07.184 --> 00:08.805
compile and run Emacs Lisp as native code.
00:08.907 --> 00:11.565
This project has been my hobby project
00:11.667 --> 00:14.009
for the last about two years and a half
00:14.111 --> 00:15.408
And we will see a little bit
00:15.509 --> 00:16.969
where this project is coming from,
00:17.070 --> 00:18.913
where we are, and where we want to go.
00:19.015 --> 00:20.571
So essentially everything you need to know
00:20.672 --> 00:22.132
about the Emacs Lisp Native Compiler,
00:22.232 --> 00:23.416
and probably a little more.
00:23.521 --> 00:26.535
Just a little bit of context on Emacs Lisp.
00:26.635 --> 00:30.175
Well, Emacs Lisp is a programming language,
00:30.278 --> 00:33.544
it's indeed a Lisp,
00:33.647 --> 00:36.619
and one year ago I looked for some statistics,
00:36.722 --> 00:38.742
and I was kind of pleased to see
00:38.842 --> 00:40.030
-- surprised to see actually --
00:40.132 --> 00:40.972
that it's still kind of popular
00:41.073 --> 00:42.844
as a programming language.
00:42.944 --> 00:43.684
It doesn't rank that bad
00:43.785 --> 00:45.554
against other programming languages.
00:45.658 --> 00:48.387
Also, the other important fact about Emacs Lisp
00:48.487 --> 00:51.518
is that there is a lot of Emacs Lisp out there,
00:51.622 --> 00:55.922
and this will have an impact [on this project.]
00:56.025 --> 00:57.632
It's a programming language that is capable
00:57.734 --> 00:59.320
of almost any task, so it's
00:59.420 --> 01:01.728
almost a general-purpose programming language,
01:01.831 --> 01:04.116
and this reflects on Emacs itself,
01:04.217 --> 01:07.232
that it's capable of almost any task.
01:07.335 --> 01:09.679
Indeed this "almost" is something we want to fix,
01:09.781 --> 01:10.720
because we want to do everything,
01:10.820 --> 01:14.218
we want [to do] all of our computing [with Emacs].
01:14.321 --> 01:16.523
Also, an interesting aspect for me
01:16.625 --> 01:18.903
is that it's not specified by any standard.
01:19.005 --> 01:22.507
This implies it can evolve in a more agile way
01:22.609 --> 01:25.667
without having to change the standard, etc.
01:25.770 --> 01:27.870
And, in fact, it's kind of improving,
01:27.970 --> 01:30.370
I believe relatively fast.
01:30.473 --> 01:32.791
A little bit about Lisp in general.
01:32.892 --> 01:34.952
First, it's the best programming language ever,
01:35.052 --> 01:35.712
we all know it.
01:35.813 --> 01:37.833
It has a lot of very nice properties,
01:37.934 --> 01:40.356
like it's dynamic, it's homoiconic, etc.
01:40.462 --> 01:42.394
But the interesting thing for implementors,
01:42.496 --> 01:43.476
is that, in theory,
01:43.576 --> 01:46.716
it can be implemented with very few primitives.
01:46.817 --> 01:49.485
You build very few primitives that are like magic,
01:49.590 --> 01:51.838
and on top of this,
01:51.938 --> 01:53.770
you can implement the whole language.
01:53.873 --> 01:55.759
This sounds very nice,
01:55.860 --> 01:57.280
and very appealing for implementors
01:57.381 --> 02:00.175
meaning to implement a new Lisp implementation,
02:00.279 --> 02:02.539
or improving or modifying an existing one.
02:02.641 --> 02:04.163
So the question is:
02:04.263 --> 02:07.663
how many primitives do we have to implement
02:07.764 --> 02:09.384
if we want to change
02:09.485 --> 02:13.203
the GNU Emacs Lisp implementation?
02:13.308 --> 02:20.128
Unfortunately, not really as few as we would like.
02:20.234 --> 02:25.294
In GNU Emacs, we have about 1500 primitives,
02:25.400 --> 02:27.970
and the main reason for that is performance.
02:28.071 --> 02:30.961
Actually, GNU Emacs was written
02:31.064 --> 02:34.292
when performance was a big issue,
02:34.393 --> 02:36.493
and nowadays certain parts
02:36.594 --> 02:38.486
are still performance-critical.
02:38.591 --> 02:40.295
We have a lot of C code;
02:40.395 --> 02:46.331
30% of the GNU Emacs codebase is C code,
02:46.435 --> 02:49.177
and we have to maintain this.
02:49.279 --> 02:52.859
But not only that, this is the main barrier
02:52.959 --> 02:55.579
for people that tried in the past
02:55.681 --> 02:57.663
to change the Emacs Lisp implementation.
02:57.765 --> 02:59.101
Because not only do you have to
02:59.202 --> 03:01.582
replace these primitives,
03:01.683 --> 03:03.907
all of them or part of them,
03:04.012 --> 03:06.384
but sometimes they also share (these primitives),
03:06.484 --> 03:07.952
internal data structures.
03:08.055 --> 03:09.745
For instance, it's very difficult to say:
03:09.846 --> 03:12.826
Now I want to go from C
03:12.926 --> 03:13.906
to a different programming language
03:14.006 --> 03:15.638
for implementing these primitives.
03:15.740 --> 03:17.028
So this has been, effectively,
03:17.128 --> 03:20.380
the main barrier for doing this work.
03:20.486 --> 03:22.090
Another interesting aspect
03:22.190 --> 03:23.870
about the GNU Emacs implementation
03:23.970 --> 03:26.190
is that Lisp can run interpreted
03:26.291 --> 03:28.647
or byte-compiled for performance,
03:28.752 --> 03:30.632
and the byte compiler itself
03:30.733 --> 03:32.673
is written in Emacs Lisp.
03:32.773 --> 03:35.033
This implies that GNU Emacs has to go through
03:35.134 --> 03:37.214
a bootstrap procedure in order to be built.
03:37.319 --> 03:38.715
But it's kind of interesting
03:38.815 --> 03:41.835
for something that started as a text editor,
03:41.937 --> 03:43.101
or something like it.
03:43.203 --> 03:47.877
The byte-code that is Emacs Lisp
03:47.979 --> 03:50.279
when it's been byte compiled,
03:50.379 --> 03:53.079
it's running on a stack-based virtual machine
03:53.180 --> 03:55.146
that is implemented in C.
03:55.253 --> 03:59.741
OK, so I've listed a bunch of areas
03:59.842 --> 04:02.074
where Emacs Lisp could improve:
04:02.178 --> 04:06.924
Namespace, Extensibility, Performance,
04:07.025 --> 04:11.239
and Debuggability, and Diagnostic, we could say.
04:11.346 --> 04:14.326
This activity, this project in particular,
04:14.428 --> 04:17.088
is affecting primarily Performance
04:17.189 --> 04:21.314
and the performance area the Execution Engine.
04:21.414 --> 04:22.570
That said,
04:22.671 --> 04:25.855
I think it has an impact also on Extensibility,
04:25.957 --> 04:27.811
and I hope it will have an impact also
04:27.913 --> 04:29.779
on programming diagnostics,
04:29.881 --> 04:32.501
so giving better warnings, [unknown].
04:32.604 --> 04:36.694
So which are the benefits
04:36.795 --> 04:39.215
of increasing the Emacs Lisp performance?
04:39.316 --> 04:42.976
Indeed, we will have, if we do that,
04:43.078 --> 04:45.668
programs that run faster.
04:45.774 --> 04:48.738
But the main implication of that
04:48.840 --> 04:50.760
is that we could write less C;
04:50.860 --> 04:52.940
we could maintain and debug less C.
04:53.041 --> 04:56.437
That is kind of a time-consuming task.
04:56.542 --> 04:59.502
And we could also allow for
04:59.603 --> 05:01.723
writing performance-critical extensions
05:01.824 --> 05:03.804
directly in Lisp
05:03.909 --> 05:06.305
without having to use systems like
05:06.406 --> 05:09.686
[I think there's a bunch.]
05:09.927 --> 05:14.791
These are very consistent benefits.
05:14.899 --> 05:16.669
OK, Project Goals,
05:16.769 --> 05:18.829
but I think the title of this slide
05:18.930 --> 05:21.520
maybe should be Project Requirements.
05:21.623 --> 05:23.231
So when I started this activity,
05:23.331 --> 05:26.943
I set some requirements for the project,
05:27.049 --> 05:29.469
with the main goal in mind: to go upstream.
05:29.570 --> 05:31.494
So I wanted to create something,
05:31.594 --> 05:34.250
a modified implementation of GNU Emacs,
05:34.353 --> 05:36.675
that was compatible
05:36.776 --> 05:38.596
as close to 100% as possible
05:38.696 --> 05:40.514
to the current implementation.
05:40.620 --> 05:42.720
And when I say "current implementation,"
05:42.820 --> 05:44.758
I don't refer to what
05:44.859 --> 05:46.879
the Emacs Lisp Programming Manual specifies
05:46.980 --> 05:49.804
as expected behavior of the implementation,
05:49.907 --> 05:52.207
but really the implementation itself.
05:52.309 --> 05:53.841
This is because there are a lot of corner cases
05:53.942 --> 05:56.390
that are not specified by the manual,
05:56.493 --> 05:58.093
but programs do rely on that,
05:58.193 --> 06:00.523
and given there is a ton of Emacs Lisp
06:00.625 --> 06:01.935
already around,
06:02.037 --> 06:05.437
compatibility was definitely a major requirement.
06:05.541 --> 06:07.727
I wanted to produce something that had
06:07.827 --> 06:09.587
reduced impact on the Emacs codebase,
06:09.687 --> 06:11.543
at least as much as possible.
06:11.644 --> 06:14.724
So I didn't want to rewrite all of GNU Emacs.
06:14.827 --> 06:17.609
Indeed, because it would have been
06:17.710 --> 06:21.068
too much work for one single person,
06:21.171 --> 06:25.371
but also still thinking
06:25.473 --> 06:29.153
to an upstream outcome all of this time.
06:29.258 --> 06:31.974
Another requirement was to have
06:32.076 --> 06:34.700
no, or very reduced, impact on the user,
06:34.804 --> 06:36.456
so I didn't want to change
06:36.556 --> 06:38.604
the way Emacs is used by you.
06:38.708 --> 06:40.618
And last but not least,
06:40.718 --> 06:42.848
introducing new dependencies,
06:42.951 --> 06:45.379
those dependencies had to be Free Software,
06:45.479 --> 06:47.191
possibly GPL,
06:47.295 --> 06:49.995
and also an important requirement
06:50.097 --> 06:52.941
is that these dependencies had to be
06:53.043 --> 06:55.143
some kind of trusted software that we know
06:55.243 --> 06:56.681
is going to be maintained in the future.
06:56.781 --> 06:59.261
Given Emacs has been around since forever,
06:59.363 --> 07:01.183
and will be around forever and ever,
07:01.285 --> 07:05.265
this was another very important point.
07:05.367 --> 07:08.707
A little bit of history of this project/
07:08.808 --> 07:10.648
a quick timeline.
07:10.748 --> 07:14.388
2019, in May, I did my first commit,
07:14.490 --> 07:17.110
I think it was when I tried to write
07:17.210 --> 07:20.230
my first primitive function ever in C,
07:20.332 --> 07:21.752
in GNU Emacs.
07:21.852 --> 07:24.832
And this was an attempt to try to compile
07:24.934 --> 07:30.314
a function that, once executed, returning [?]
07:30.415 --> 07:33.755
That was it. Six months after (about),
07:33.857 --> 07:37.057
I had something that was kind of working,
07:37.157 --> 07:42.797
So I was able to start up a semi-standard Emacs
07:42.899 --> 07:44.879
and then to compile and load,
07:44.981 --> 07:47.221
and replacing most of the functions
07:47.321 --> 07:50.921
that I had defined floating in my Lisp universe.
07:51.022 --> 07:54.202
Those functions are the functions
07:54.304 --> 07:55.944
that are essentially composing Emacs,
07:56.044 --> 07:57.884
at least the Lisp side of Emacs.
07:57.984 --> 08:01.084
A lot of features were missing,
08:01.186 --> 08:03.426
like I had no Garbage Collector support,
08:03.526 --> 08:07.006
no bootstrap, I was not optimizing these functions
08:07.108 --> 08:10.068
Because optimization [would be broken].
08:10.169 --> 08:12.749
No image dump support, etc.
08:12.850 --> 08:16.330
But I think this proved the design could work.
08:16.431 --> 08:19.171
So I sent to email to emacs-devel. I said
08:19.272 --> 08:20.852
I have this stuff I'm working on,
08:20.953 --> 08:24.613
and I wanted some feedback from the upstream
08:24.714 --> 08:27.534
to see if there was some interest.
08:27.635 --> 08:30.455
I believe the outcome of this was positive
08:30.557 --> 08:32.837
because about one month after,
08:32.937 --> 08:35.697
I pushed my branch within the Emacs git
08:35.798 --> 08:38.438
as a feature branch, and shortly after,
08:38.539 --> 08:42.679
we started to use the bug tracker to track bugs.
08:42.779 --> 08:45.479
So essentially we moved the development
08:45.580 --> 08:50.040
on the upstream infrastructure.
08:50.140 --> 08:55.620
I believe two years after the first commit,
08:55.721 --> 08:57.781
the project was merged
08:57.882 --> 09:00.922
after literally hundreds of bugs solved,
09:01.022 --> 09:03.842
and improvements, suggestions [unknown]
09:03.943 --> 09:08.623
and this was about six months ago.
09:08.723 --> 09:12.363
Before discussing how the native compiler works,
09:12.464 --> 09:14.224
I think it's worth looking at
09:14.324 --> 09:17.644
how Lisp is implemented in GNU Emacs.
09:17.745 --> 09:19.305
We have Lisp_Objects
09:19.405 --> 09:21.945
floating around our Lisp universe,
09:22.045 --> 09:23.905
and they are internally represented in this way.
09:24.006 --> 09:25.606
We have what is called a tagged pointer,
09:25.706 --> 09:27.406
that is just a regular pointer
09:27.506 --> 09:29.286
that is pointing to the area of memory
09:29.386 --> 09:31.926
where we hold the real data of the object.
09:32.027 --> 09:34.187
But within this tagged pointer,
09:34.287 --> 09:36.747
we reserve a few bits
09:36.848 --> 09:38.968
to indicate the type of object we are pointing to.
09:39.068 --> 09:40.528
This is important because
09:40.628 --> 09:42.288
each time we access an object,
09:42.388 --> 09:46.168
we have to typically check those bits
09:46.269 --> 09:49.029
to check that the object we are manipulating
09:49.129 --> 09:50.529
is of the right kind,
09:50.630 --> 09:52.530
remove those bits, and, if we are happy,
09:52.630 --> 09:55.810
access the object, otherwise [unknown].
09:55.910 --> 09:57.690
All the objects are like this,
09:57.791 --> 09:59.731
except for typically Fixnums,
09:59.831 --> 10:01.031
that are small integers
10:01.131 --> 10:04.891
that we manage to fit directly within the pointer.
10:04.992 --> 10:07.292
Also for manipulating Fixnums,
10:07.392 --> 10:09.412
we have to check the tag bits each time.
10:09.513 --> 10:13.173
Whenever we are not sure of the type of object
10:13.273 --> 10:16.493
we are manipulating (read: almost every time),
10:16.594 --> 10:18.914
we have to check those bits and remove those bits
10:19.014 --> 10:21.754
before doing any manipulation on the Fixnum.
10:21.854 --> 10:26.014
How Emacs Lisp is byte-compiled and executed
10:26.115 --> 10:27.775
in, let's call it, "Vanilla".
10:27.875 --> 10:30.055
If we have a Lisp expression of this kind:
10:30.156 --> 10:32.776
We take the variable 'a' we do plus 2,
10:32.876 --> 10:34.876
and then we multiply the result by 3,
10:34.976 --> 10:37.376
the byte compiler will produce this LAP code.
10:37.477 --> 10:38.497
LAP code is essentially
10:38.597 --> 10:40.017
the assembly for the byte-code,
10:40.117 --> 10:43.697
so it's the "intermediate representation"
10:43.798 --> 10:48.458
that will assembled into byte-code. (.elc files)
10:48.558 --> 10:50.738
How is this program executed?
10:50.839 --> 10:53.639
As I mentioned, it's executed in a virtual machine
10:53.739 --> 10:55.699
that is stack-based,
10:55.800 --> 10:58.540
but we start with an execution stack that's empty,
10:58.640 --> 11:01.680
and a stack pointer pointing to its bottom.
11:01.780 --> 11:04.280
And we execute the first instruction,
11:04.380 --> 11:07.120
that is pushing in the stack the value of 'a',
11:07.220 --> 11:10.360
in this case, 100. Then we push the constant 2.
11:10.460 --> 11:12.400
Then we do the summation,
11:12.500 --> 11:14.040
and we have the result in the stack.
11:14.140 --> 11:16.860
Same: we push the constant 3,
11:16.960 --> 11:17.620
we do the multiplication,
11:17.720 --> 11:19.260
and we will be able to return.
11:19.360 --> 11:22.700
Now, what's good and what's bad about this?
11:22.800 --> 11:25.800
A good thing is that it's very simple
11:25.900 --> 11:27.460
to start from Lisp
11:27.560 --> 11:31.320
and compile this kind of LAP output.
11:31.420 --> 11:34.240
At least it's reasonably simple.
11:34.340 --> 11:36.240
The compiler is not that complex.
11:36.340 --> 11:39.000
The bad thing is that all this machinery
11:39.100 --> 11:40.660
-- push and pop, etc. --
11:40.760 --> 11:44.320
it's very different from how a modern CPU works.
11:44.420 --> 11:45.720
Because modern CPUs,
11:45.820 --> 11:47.660
they are not stack-based anymore
11:47.760 --> 11:50.900
but they have instead a fixed number of registers,
11:51.000 --> 11:54.020
and they work with assignment and operation
11:54.120 --> 11:55.040
within these registers
11:55.140 --> 11:57.040
that are generally called "general-purpose."
11:57.140 --> 11:59.260
So to execute this LAP program,
11:59.360 --> 12:00.760
there is another program,
12:00.860 --> 12:02.600
that is the implementation of the VM itself
12:02.700 --> 12:06.400
that is doing conversion during runtime.
12:06.500 --> 12:08.320
So it's interpreting the LAP program
12:08.420 --> 12:11.360
and it's converting it into instructions
12:11.460 --> 12:13.180
that we can execute on the CPU.
12:13.280 --> 12:14.940
This conversion is done each time
12:15.040 --> 12:17.100
we will run some byte-code.
12:17.200 --> 12:19.760
And it's something that we want to avoid.
12:19.860 --> 12:21.560
Instead of this live conversion,
12:21.660 --> 12:26.220
we want to convert once:
12:26.320 --> 12:28.920
our Lisp program into native code,
12:29.020 --> 12:32.380
that is, a binary program that can be executed
12:32.480 --> 12:34.400
directly by our CPU.
12:34.500 --> 12:36.680
We want to save all this unnecessary conversion
12:36.780 --> 12:39.000
that we do each time we are running a program
12:39.100 --> 12:39.760
while we are running it.
12:39.860 --> 12:42.060
And we want to do this process just once,
12:42.160 --> 12:43.740
when we are compiling.
12:43.840 --> 12:46.240
That's the main goal of this activity.
12:46.340 --> 12:50.200
How is the byte compiler implemented?
12:50.300 --> 12:53.940
As any compiler it's a pipeline of transformations.
12:54.040 --> 12:58.560
We go through macro expansion, closure conversion,
12:58.660 --> 13:02.120
we have a bunch of source level optimization.
13:02.220 --> 13:03.940
Then we go into LAP,
13:04.040 --> 13:06.800
that's the transformation we are interested in,
13:06.900 --> 13:10.600
and after a few optimizations on LAP,
13:10.700 --> 13:14.140
LAP is assembled into byte-code.
13:14.240 --> 13:16.880
So if we [list it]
13:16.980 --> 13:19.320
in terms of intermediate representations,
13:19.420 --> 13:23.600
we can simplify this pipeline like this.
13:23.700 --> 13:26.100
We start with Lisp, and at a certain point
13:26.200 --> 13:29.520
we are manipulating the program in LAP form,
13:29.620 --> 13:31.980
and then at the end we produce the byte-code
13:32.080 --> 13:34.560
that is the .elc file that [you run]
13:34.660 --> 13:37.660
What I wanted to realize was something like this.
13:37.760 --> 13:41.200
I wanted to start from LAP, do something,
13:41.300 --> 13:44.600
and jump into GCC using libgccjit
13:44.700 --> 13:45.560
and in particular
13:45.660 --> 13:48.000
the libgccjit Intermediate Representation
13:48.100 --> 13:50.340
that we will discuss.
13:50.440 --> 13:53.120
Now, why I wanted to do something like this?
13:53.220 --> 13:57.300
Essentially, writing a compiler from scratch
13:57.400 --> 14:01.520
for Emacs Lisp would have been a very big task.
14:01.620 --> 14:05.120
So I wanted to rely on, as much as I could,
14:05.220 --> 14:07.180
the Emacs Lisp byte compiler,
14:07.280 --> 14:10.280
because I had to produce something
14:10.380 --> 14:12.920
that was as compatible as possible
14:13.020 --> 14:14.240
to the current implementation.
14:14.340 --> 14:18.340
So this was (I believe) a very good idea
14:18.440 --> 14:20.520
to save an enormous quantity of work
14:20.620 --> 14:22.340
and to produce something
14:22.440 --> 14:24.940
that was compatible in terms of semantics
14:25.040 --> 14:26.820
with the original implementation.
14:26.920 --> 14:30.660
Also, I didn't want to implement a code generator
14:30.760 --> 14:32.860
for each architecture we were targeting,
14:32.960 --> 14:35.900
nor wanted to implement all these optimizations
14:36.000 --> 14:37.840
that are already in GCC,
14:37.940 --> 14:40.120
so I thought it was a good idea
14:40.220 --> 14:44.720
to rely on an existing compiler like GCC [?].
14:44.820 --> 14:47.240
Let's talk about libgccjit.
14:47.340 --> 14:50.180
It was added by David Malcolm in GCC 5
14:50.280 --> 14:55.680
It allows you to describe a C-ish semantic to GCC
14:55.780 --> 14:57.380
and have it compile.
14:57.480 --> 15:01.400
It's good for [reading] Jitters or AoT compilers.
15:01.500 --> 15:04.400
And if we talk about GCC:
15:04.500 --> 15:07.180
it's a compiler, it's a very good one,
15:07.280 --> 15:09.340
it has support for a remarkable number
15:09.440 --> 15:11.640
of target architectures.
15:11.740 --> 15:15.080
It's also very good at generating [fast code],
15:15.180 --> 15:17.980
and it's been around for a long time;
15:18.080 --> 15:21.200
I believe it's like 1 year younger than Emacs.
15:21.300 --> 15:23.220
It's still very well maintained,
15:23.320 --> 15:25.840
and we can assume it will be maintained
15:25.940 --> 15:28.920
for quite a while. And, as I mentioned,
15:29.020 --> 15:31.120
this was a very important point.
15:31.220 --> 15:33.860
Also, it's GPL; it's Free Software,
15:33.960 --> 15:36.120
and it's developed under the GNU umbrella,
15:36.220 --> 15:40.360
so I thought it was a very good option.
15:40.460 --> 15:43.300
So we can imagine a simple translation
15:43.400 --> 15:46.400
that goes from LAP to this subset of C
15:46.500 --> 15:48.880
that we can describe to libgccjit.
15:48.980 --> 15:52.880
This simple translation we can see here,
15:52.980 --> 15:55.680
it's actually pretty trivial.
15:55.780 --> 15:58.500
Instead of doing operations
15:58.600 --> 16:02.060
within the execution stack we have seen before,
16:02.160 --> 16:05.080
we have just an array that is replacing it,
16:05.180 --> 16:07.380
and it's called 'local' in this case,
16:07.480 --> 16:12.360
and we have assignments within this array,
16:12.460 --> 16:15.140
so that they are done in place of the original
16:15.240 --> 16:17.100
push and pop activity of the stack.
16:17.200 --> 16:18.280
The nice thing is that,
16:18.380 --> 16:20.260
when you have done this translation,
16:20.360 --> 16:23.160
GCC will be able to optimize this,
16:23.260 --> 16:25.840
and remove all the unnecessary operations,
16:25.940 --> 16:27.180
and generate code
16:27.280 --> 16:29.660
for the specific CPU you are targeting,
16:29.760 --> 16:32.020
[which will be running your code].
16:32.120 --> 16:34.920
This sounds great; it sounds like
16:35.020 --> 16:37.440
a very simple and effective translation,
16:37.540 --> 16:39.920
and in fact the first iteration of my compiler
16:40.020 --> 16:41.120
was doing just this.
16:41.220 --> 16:45.320
It was essentially a big C function
16:45.420 --> 16:48.240
that was taking LAP and doing this conversion
16:48.340 --> 16:50.120
describing the output to libgccjit.
16:50.220 --> 16:53.720
Unfortunately, if you do this,
16:53.820 --> 16:55.740
you will discover that you have
16:55.840 --> 17:00.080
a performance upper bound limit of about 3x.
17:00.180 --> 17:04.360
So it was an option,
17:04.460 --> 17:06.560
but I thought it was a good occasion
17:06.660 --> 17:08.980
for trying to do something more.
17:09.080 --> 17:11.640
And doing something more means
17:11.740 --> 17:13.680
implementing a smarter compiler
17:13.780 --> 17:17.180
that is doing some advanced analysis on the code,
17:17.280 --> 17:18.640
and will be able to perform
17:18.740 --> 17:20.700
Lisp-specific optimizations
17:20.800 --> 17:22.480
-- optimizations that take advantage of
17:22.580 --> 17:24.960
the specific Lisp semantics,
17:25.060 --> 17:27.760
something that GCC is not aware of.
17:27.860 --> 17:31.240
And while I was thinking about that,
17:31.340 --> 17:34.000
I thought that having a smarter compiler
17:34.100 --> 17:38.120
had also other advantages, like a smarter compiler
17:38.220 --> 17:40.120
that understands the semantics
17:40.220 --> 17:41.820
of the programming language being compiled
17:41.920 --> 17:43.680
would be also capable of
17:43.780 --> 17:45.620
giving feedback to the programmers,
17:45.720 --> 17:47.920
like better warnings and errors.
17:48.020 --> 17:51.360
So I was really fascinated about this idea,
17:51.460 --> 17:53.240
and I wanted to change my implementation
17:53.340 --> 17:55.920
because I was not really happy about it.
17:56.020 --> 17:58.320
I had a lot of C code in terms of
17:58.420 --> 18:02.100
lines that were not doing any smart job.
18:02.200 --> 18:07.380
And I wanted to write all the interesting logic
18:07.480 --> 18:09.940
[in Lisp].
18:10.040 --> 18:12.560
So optimizing outside GCC
18:12.660 --> 18:15.840
before jumping into GCC,
18:15.940 --> 18:20.500
as I mentioned, has two main targets:
18:20.600 --> 18:23.060
Either optimize the code before going into GCC,
18:23.160 --> 18:25.380
or present to GCC some code
18:25.480 --> 18:27.740
that we know GCC can optimize effectively.
18:27.840 --> 18:30.800
And also, this will give, as I mentioned,
18:30.900 --> 18:32.660
better options for the compiler
18:32.760 --> 18:34.420
to provide warnings, errors
18:34.520 --> 18:36.340
-- better diagnostics.
18:36.440 --> 18:38.200
So this is pretty much
18:38.300 --> 18:40.640
what the native compiler looks like nowadays,
18:40.740 --> 18:42.720
in terms of passes.
18:42.820 --> 18:44.560
We have a list of passes,
18:44.660 --> 18:46.620
each of which is taking an input
18:46.720 --> 18:48.120
and producing an output.
18:48.220 --> 18:51.040
So it's doing either analysis on the program
18:51.140 --> 18:52.640
that's being passed,
18:52.740 --> 18:54.760
or it's performing a transformation.
18:54.860 --> 18:57.900
All of these passes are implemented in Lisp,
18:58.000 --> 19:00.440
and only the last pass is implemented in C.
19:00.540 --> 19:05.060
That is the one that is talking to libgccjit.
19:05.160 --> 19:07.760
To do that, I have introduced
19:07.860 --> 19:10.640
a new intermediate representation
19:10.740 --> 19:13.960
that I call LIMPLE, as a tribute to GCC GIMPLE,
19:14.060 --> 19:17.220
that is the main internal representation of GCC,
19:17.320 --> 19:20.600
at least one of the main ones.
19:20.700 --> 19:25.080
Introducing a new intermediate representation
19:25.180 --> 19:27.720
-- a new way of representing my program --
19:27.820 --> 19:29.540
solved a bunch of problems.
19:29.640 --> 19:33.200
First, it allowed me to implement
19:33.300 --> 19:37.280
non-trivial analysis and transformations,
19:37.380 --> 19:40.000
the ones I needed in my compiler pipeline.
19:40.100 --> 19:42.160
But also, it solved the problem of
19:42.260 --> 19:43.840
what was the boundary between
19:43.940 --> 19:46.120
what I had to implement in Lisp,
19:46.220 --> 19:48.040
and what in C.
19:48.140 --> 19:49.040
Because once I had
19:49.140 --> 19:51.860
my intermediate representation defined,
19:51.960 --> 19:53.780
essentially the boundary between Lisp and C
19:53.880 --> 19:55.640
is just a function, that is,
19:55.740 --> 19:57.780
the one that is implementing the final pass.
19:57.880 --> 19:59.220
That is taking, as an input,
19:59.320 --> 20:01.620
all of my programs in LIMPLE representation
20:01.720 --> 20:03.880
and it's doing [his bit].
20:03.980 --> 20:07.980
So I was convinced this design at least had sense.
20:08.080 --> 20:10.120
When we go through some of these passes,
20:10.220 --> 20:12.600
just to give you an idea of what these are doing:
20:12.700 --> 20:14.560
the first pass is just responsible for
20:14.660 --> 20:18.100
spilling the LAP from the byte compiler
20:18.200 --> 20:20.160
that effectively here we are using as a front end
20:20.260 --> 20:21.780
for our compiler pipeline.
20:21.880 --> 20:24.040
The second pass, called 'limplify',
20:24.140 --> 20:27.920
will be in charge of converting LAP into LIMPLE.
20:28.020 --> 20:31.260
LIMPLE is an intermediate representation
20:31.360 --> 20:32.860
that is Control Flow Graph based,
20:32.960 --> 20:34.700
and it's capable of SSA.
20:34.800 --> 20:38.640
So we can have a look to what this means.
20:38.740 --> 20:41.300
Let's assume we have our LAP program,
20:41.400 --> 20:42.520
as any program,
20:42.620 --> 20:43.960
that's a simple list of instructions
20:44.060 --> 20:45.640
that we will execute one after the other.
20:45.740 --> 20:47.360
Some of these instructions
20:47.460 --> 20:49.200
are special instructions
20:49.300 --> 20:51.240
that we call conditional branches,
20:51.340 --> 20:52.800
where we check for a condition,
20:52.900 --> 20:54.320
and if this is verified,
20:54.420 --> 20:56.940
we jump to a different address within the program.
20:57.040 --> 20:59.580
(Addresses that here we are calling 'labels'.)
20:59.680 --> 21:03.440
So we can split our program in chunks,
21:03.540 --> 21:08.200
and those chunks we execute without interruption,
21:08.300 --> 21:10.460
so we always enter from the top of those,
21:10.560 --> 21:12.600
and we exit from the bottom.
21:12.700 --> 21:15.980
We can name those, and split them apart,
21:16.080 --> 21:18.980
and these are what we call basic blocks.
21:19.080 --> 21:22.360
And now we have a bunch of these basic blocks
21:22.460 --> 21:23.380
that are floating,
21:23.480 --> 21:25.020
and they are not any more sorted.
21:25.120 --> 21:25.920
This is what is called
21:26.020 --> 21:28.680
a Control Flow Graph based representation.
21:28.780 --> 21:31.400
Now we can get into the SSA topic.
21:31.500 --> 21:33.900
That stands for Static Single Assignment.
21:34.000 --> 21:35.860
I don't want to get into the details,
21:35.960 --> 21:36.720
but just give you a feeling.
21:36.820 --> 21:38.480
I added into our basic blocks
21:38.580 --> 21:41.400
in our Control Flow Graph a few assignments.
21:41.500 --> 21:43.840
We will transform this into SSA
21:43.940 --> 21:45.040
just for the variable 'x',
21:45.140 --> 21:47.360
just for the sake of demonstrating it.
21:47.460 --> 21:49.760
This is done through a number of phases
21:49.860 --> 21:51.760
that are essentially some analysis,
21:51.860 --> 21:52.600
mainly renaming.
21:52.700 --> 21:55.560
But the outcome, the one we see here,
21:55.660 --> 21:59.120
looks quite similar to the original one,
21:59.220 --> 22:01.120
but we can see that the variable 'x'
22:01.220 --> 22:01.960
has been renamed.
22:02.060 --> 22:03.400
And now we don't have anymore just one,
22:03.500 --> 22:06.140
but a number of these variables.
22:06.240 --> 22:08.000
The interesting property is that
22:08.100 --> 22:10.880
each of these variables is assigned just once.
22:10.980 --> 22:13.240
And this allows for the compiler
22:13.340 --> 22:16.760
to do prediction of the value of that variable,
22:16.860 --> 22:19.040
depending on the position
22:19.140 --> 22:19.840
within the Control Flow Graph.
22:19.940 --> 22:21.980
This is very important. For instance,
22:22.080 --> 22:23.440
a very simple case is 'x1'
22:23.540 --> 22:27.440
that we see is assigned once by definition,
22:27.540 --> 22:29.300
in particular here at the beginning.
22:29.400 --> 22:31.040
Here it's very simple to understand
22:31.140 --> 22:33.080
that x1 will have the value 3.
22:33.180 --> 22:35.380
While, for instance, it's more difficult to prove
22:35.480 --> 22:37.060
what is going to be the value of x5,
22:37.160 --> 22:38.620
because it's calling a function,
22:38.720 --> 22:41.940
or we don't know at the moment what x4 is.
22:42.040 --> 22:46.240
So the compiler will gain the capability
22:46.340 --> 22:48.460
to do prediction on all the variables,
22:48.560 --> 22:50.620
and the more we get information on one variable,
22:50.720 --> 22:54.980
the more we can prove about the others.
22:55.460 --> 22:57.280
Coming back to our passes, the next one
22:57.380 --> 22:59.320
is forward propagation.
22:59.420 --> 23:00.600
This pass is responsible for
23:00.700 --> 23:03.240
doing what I briefly mentioned just before:
23:03.340 --> 23:07.160
doing proof over all the different variables
23:07.260 --> 23:09.620
in different positions of the Control Flow Graph,
23:09.720 --> 23:12.700
about the values, types, or ranges.
23:12.800 --> 23:15.080
This pass is also responsible for
23:15.180 --> 23:16.920
executing functions
23:17.020 --> 23:18.640
when we know that the function has no side effect
23:18.740 --> 23:20.520
and the pass managed to
23:20.620 --> 23:22.440
prove all the values of its argument.
23:22.540 --> 23:24.800
So the function is then executed at compile time
23:24.900 --> 23:26.700
and it doesn't even exist anymore
23:26.800 --> 23:27.880
in the produced code.
23:27.980 --> 23:30.300
Then we have another pass, this is
23:30.400 --> 23:33.420
an example of a pass that is very specific:
23:33.520 --> 23:36.120
it's trying to remove the call to funcall
23:36.220 --> 23:38.040
when those are not necessary.
23:38.140 --> 23:39.760
There are a couple situations
23:39.860 --> 23:42.200
where this is very useful.
23:42.300 --> 23:45.240
And not only is this beneficial
23:45.340 --> 23:47.560
because we are generating better code,
23:47.660 --> 23:49.245
but when we manage to do that,
23:49.345 --> 23:52.000
we allow GCC better analysis over the code,
23:52.100 --> 23:54.440
because GCC knows nothing about funcall.
23:54.540 --> 23:57.400
So if we are calling, from 'foo', directly, 'bar',
23:57.500 --> 24:01.280
for GCC it's way easier to do its analysis
24:01.380 --> 24:03.360
on top of this code.
24:03.460 --> 24:06.240
Another interesting pass we can mention is 'tco'.
24:06.340 --> 24:08.800
This is performing Tail Recursion Elimination.
24:08.900 --> 24:11.880
It allows a more functional programming style,
24:11.980 --> 24:13.220
if you want.
24:13.320 --> 24:14.280
We can jump to the last pass
24:14.380 --> 24:16.200
that is called 'final', and as I mentioned,
24:16.300 --> 24:17.520
this one is responsible for
24:17.620 --> 24:19.880
taking our program in LIMPLE representation
24:19.980 --> 24:24.900
and describing it to libgccjit in the gccjit IR.
24:25.000 --> 24:27.480
That's the main task. It's also
24:27.580 --> 24:29.520
defining inline functions
24:29.620 --> 24:32.400
for accessing fundamental data types, and so on.
24:32.500 --> 24:34.460
This pass is also responsible for
24:34.560 --> 24:36.280
using some of the predictions
24:36.380 --> 24:39.560
done by previous passes to generate better code.
24:39.660 --> 24:41.320
Things we had to add
24:41.420 --> 24:43.880
to have all of this machinery work
24:43.980 --> 24:45.240
and to be controllable:
24:45.340 --> 24:47.480
The first one is an opt called 'native-comp-speed'
24:47.580 --> 24:49.920
and it's equivalent to Common Lisp's 'speed'.
24:50.020 --> 24:51.920
It represents the optimization level.
24:52.020 --> 24:53.400
The default is 2 and is
24:53.500 --> 24:55.400
the maximum optimization level
24:55.500 --> 24:58.600
that is meant to reflect
24:58.700 --> 25:00.860
all the original semantics of Emacs Lisp.
25:00.960 --> 25:02.480
So it's the one that should be used by default.
25:02.580 --> 25:04.640
The second one is 'compilation unit'
25:04.740 --> 25:05.960
and it's a kind of new object
25:06.060 --> 25:11.080
that has been added to Emacs.
25:11.180 --> 25:12.960
Let's have a look to
25:13.060 --> 25:14.360
how the Garbage Collector works in this case.
25:14.460 --> 25:15.840
The GNU Emacs Garbage Collector
25:15.940 --> 25:18.660
is a simple mark-and-sweep garbage collector.
25:18.760 --> 25:21.420
It does a tree walk through all the objects
25:21.520 --> 25:25.100
and follows references from one object to another.
25:25.200 --> 25:27.960
All the objects reachable during the mark phase
25:28.060 --> 25:31.240
will be kept in our Lisp universe.
25:31.340 --> 25:33.680
All the other ones will be freed.
25:33.780 --> 25:35.080
In this case we have a bunch of functions,
25:35.180 --> 25:38.760
'foo1', 'foo2', 'bar1', etc., that are defined.
25:38.860 --> 25:40.320
When a function is defined,
25:40.420 --> 25:42.400
it's accessible through its symbol,
25:42.500 --> 25:44.360
so we have the symbol referring to the function.
25:44.460 --> 25:47.720
The function, in this case a native-compiled one,
25:47.820 --> 25:50.040
is referring to the compilation unit.
25:50.140 --> 25:53.020
The compilation unit is essentially
25:53.120 --> 25:58.600
the ELF file that has been compiled,
25:58.700 --> 26:01.160
and contains all those functions
26:01.260 --> 26:03.240
that came from the original .el file,
26:03.340 --> 26:05.100
and that we have loaded into memory.
26:05.200 --> 26:10.000
If, for instance, 'bar1 and 'bar2 are undefined,
26:10.100 --> 26:14.200
functions [3] and 4 will be no longer reachable,
26:14.300 --> 26:16.040
and we will be able to free them
26:16.140 --> 26:18.160
and unload the compilation unit.
26:18.260 --> 26:21.200
We discussed quite a lot about Control Flow Graph,
26:21.300 --> 26:23.560
SSA, and a lot of boring stuff,
26:23.660 --> 26:25.400
and I promised you that we are doing
26:25.500 --> 26:27.320
a lot of interesting proofs over variables,
26:27.420 --> 26:30.220
So let's have some examples of them.
26:30.320 --> 26:31.840
Let's jump into a quick demo
26:31.940 --> 26:34.480
to see what all of this abstract theory
26:34.580 --> 26:37.680
and this esoteric propagation engine can do for us
26:37.780 --> 26:39.240
and how the user can interact with it.
26:39.340 --> 26:42.100
I've defined a bunch of functions,
26:42.200 --> 26:45.240
and I will native-compile and load it.
26:47.500 --> 26:48.840
Alright, Emacs Lisp native compiled and loaded.
26:48.940 --> 26:52.320
At this point, I can disassemble 'foo1'
26:52.420 --> 26:56.320
to make sure it's native code and I'm not lying.
26:56.420 --> 26:58.500
These are the instructions
26:58.600 --> 27:01.320
that will be executed directly by my CPU
27:01.420 --> 27:03.620
when I call this function.
27:03.720 --> 27:07.520
Alright, very cool.
27:07.620 --> 27:16.080
Now, [Lisp:] (symbol-function #'foo1)
27:16.180 --> 27:19.720
Interestingly, this is returning a subroutine,
27:19.820 --> 27:21.820
as it would be a primitive function.
27:21.920 --> 27:23.700
Because this is native code,
27:23.800 --> 27:24.840
even if it's written in Lisp,
27:24.940 --> 27:26.340
has been converted to native code
27:26.440 --> 27:29.200
as if it's a primitive function.
27:29.300 --> 27:31.560
But we can do also a new thing:
27:31.660 --> 27:34.440
asking for the type of the subroutine.
27:34.540 --> 27:38.360
Alright, very cool. It says this is a function,
27:38.460 --> 27:40.280
it's taking one argument of type 't'
27:40.380 --> 27:41.560
(that means anything
27:41.660 --> 27:42.960
because we don't have any information),
27:43.060 --> 27:45.200
and is returning a type 't',
27:45.300 --> 27:47.380
so also there we don't have much information.
27:47.480 --> 27:49.700
OK, very cool, but not very useful.
27:49.800 --> 27:53.680
Let's see #'foo2. #'foo2 is slightly different,
27:53.780 --> 27:55.840
it doesn't take any argument, but it's returning
27:55.940 --> 27:58.040
an integer included between 3 and 3.
27:58.140 --> 28:01.360
Wow, amazing!
28:01.460 --> 28:04.040
Let's get into something a little more complex:
28:04.140 --> 28:09.440
#'foo3 takes one argument we know nothing about,
28:09.540 --> 28:11.740
but it's returning a number.
28:11.840 --> 28:13.280
And why it's returning a number?
28:13.380 --> 28:16.320
Essentially because 1+ is returning a number,
28:16.420 --> 28:18.880
and in all the other cases,
28:18.980 --> 28:20.640
it would signal an error
28:20.740 --> 28:23.380
if it's not happy about its input argument.
28:23.480 --> 28:27.760
Let's have a look to #'foo4.
28:27.860 --> 28:32.920
#'foo4 is a little bit more complex.
28:33.020 --> 28:34.680
It will return nil
28:34.780 --> 28:37.400
if the 'when' condition is not satisfied,
28:37.500 --> 28:39.720
so it's type 'null' here.
28:39.820 --> 28:41.200
It can return a floating point;
28:41.300 --> 28:43.600
we don't do propagation of floating point so far,
28:43.700 --> 28:47.080
or it can return any integer between 4 and 9.
28:47.180 --> 28:52.840
Wow. Let's go on with #'foo5.
28:52.940 --> 28:55.760
#'foo5 is even more complex
28:55.860 --> 28:57.200
because other than
28:57.300 --> 28:59.180
having to satisfy this condition,
28:59.280 --> 29:02.280
we can see that the result of the propagation
29:02.380 --> 29:03.800
of this complex condition
29:03.900 --> 29:05.460
is propagated also across the 'plus'.
29:05.560 --> 29:08.240
So this foo5 can return nil,
29:08.340 --> 29:09.720
a floating point we know nothing about,
29:09.820 --> 29:13.560
or an integer included between 12 and 24.
29:13.660 --> 29:18.080
Let's go on with #'foo6.
29:18.180 --> 29:23.320
#'foo6 is returning anything but an integer.
29:23.420 --> 29:26.520
I think it should be pretty obvious why,
29:26.620 --> 29:28.120
because if it's not an integer we return it,
29:28.220 --> 29:30.000
otherwise we signal an error.
29:30.100 --> 29:32.880
Let's finish with #'foo7 very quickly.
29:32.980 --> 29:37.920
#'foo7 has another very complex condition,
29:38.020 --> 29:40.320
at least for me, but it's also interesting to see
29:40.420 --> 29:42.200
that we are also propagating values for symbols.
29:42.300 --> 29:45.160
So we can return the symbol 'big,
29:45.260 --> 29:46.980
the symbol 'small,
29:47.080 --> 29:51.200
or an integer included between -100 and 100.
29:51.300 --> 29:54.440
Now, the question is: why all of this is useful
29:54.540 --> 29:56.800
other than having Andrea very happy
29:56.900 --> 29:59.560
when he's playing with this all day?
29:59.660 --> 30:01.440
Well, we have to come back one second
30:01.540 --> 30:04.920
to how Lisp_Objects are represented within Emacs.
30:05.020 --> 30:09.480
Lisp_Objects are represented as machine words,
30:09.580 --> 30:12.580
where we reserve a few bits to indicate the type.
30:12.680 --> 30:15.560
And every time we access the object,
30:15.660 --> 30:17.120
when this is a Fixnum,
30:17.220 --> 30:19.920
or a regular object where this is a pointer,
30:20.020 --> 30:21.600
we always have to extract these bits,
30:21.700 --> 30:24.520
make sure that they satisfy a condition,
30:24.620 --> 30:27.120
so make sure that we are going to manipulate
30:27.220 --> 30:28.580
the object of the type we expect,
30:28.680 --> 30:31.040
and then we can extract the object
30:31.140 --> 30:32.540
and do the manipulation.
30:32.640 --> 30:34.420
If the compiler managed to prove
30:34.620 --> 30:37.400
that the contained object is of the right type,
30:37.500 --> 30:42.440
it will be able to not emit the type check,
30:42.540 --> 30:44.500
and save a lot of instructions.
30:44.600 --> 30:48.560
This is a very powerful optimization.
30:48.660 --> 30:50.760
Let's discuss some potential future development
30:50.860 --> 30:52.000
in this area.
30:52.100 --> 30:53.360
First I think it would be extremely nice
30:53.460 --> 30:54.920
to extend the propagation
30:55.020 --> 30:56.920
to types that are not built in.
30:57.020 --> 30:58.160
There are a lot of cases
30:58.260 --> 31:00.600
where we could optimize effectively.
31:00.700 --> 31:02.520
For instance when we do
31:02.620 --> 31:05.720
a lot of accesses to structures
31:05.820 --> 31:06.920
-- lots of stuff like that --
31:07.020 --> 31:08.880
where we keep on checking and checking
31:08.980 --> 31:10.640
the same object for the type
31:10.740 --> 31:13.080
where it's obvious, where it should be trivial
31:13.180 --> 31:14.720
to prove that it's the right type
31:14.820 --> 31:16.360
after the first access or things like that.
31:16.460 --> 31:18.920
So I believe this is a low-hanging fruit
31:19.020 --> 31:21.180
in terms of performance.
31:21.280 --> 31:23.080
Also I think it would be really nice
31:23.180 --> 31:24.720
to extend the declare mechanism
31:24.820 --> 31:27.600
to allow the user to declare argument types.
31:27.700 --> 31:30.480
(Optionally. Indeed, optionally.)
31:30.580 --> 31:32.880
Doing that would give the compiler
31:32.980 --> 31:35.120
a lot of information to propagate value types
31:35.220 --> 31:36.360
within the function.
31:36.460 --> 31:38.840
But those will allow the compiler
31:38.940 --> 31:43.040
to give the user really good diagnostics
31:43.140 --> 31:45.400
during compile time. Like the compiler could say:
31:45.500 --> 31:47.000
Hey, here you are calling a function
31:47.100 --> 31:49.200
that is expecting this argument of this type,
31:49.300 --> 31:52.000
but I'm proving that you are calling it
31:52.100 --> 31:55.200
with an argument that is of THIS type,
31:55.300 --> 31:57.840
and is not a subtype of the expected one.
31:57.940 --> 32:00.240
So you are doing something not coherent.
32:00.340 --> 32:02.800
This kind of interprocedural logic.
32:02.900 --> 32:05.000
And I think the compiler should also take advantage
32:05.100 --> 32:06.640
(under certain circumstances)
32:06.740 --> 32:08.760
of this interprocedural analysis
32:08.860 --> 32:12.520
in order to optimize even more, when possible.
32:12.620 --> 32:13.720
Also I think we should
32:13.820 --> 32:15.480
work on our [?] to improve the code generation
32:15.580 --> 32:17.100
depending on the prediction
32:17.200 --> 32:18.160
that the compiler is doing.
32:18.260 --> 32:20.120
We already take advantage of those predictions,
32:20.220 --> 32:22.300
but I think we could do better.
32:22.400 --> 32:25.120
A quick look at some performance results.
32:25.220 --> 32:28.720
These are from the elisp-benchmarks package
32:28.820 --> 32:30.760
within GNU ELPA.
32:30.860 --> 32:32.520
This is the performance uplift,
32:32.620 --> 32:38.480
and we can identify about 4 "classes" of results.
32:38.580 --> 32:41.440
The first one there is no performance uplift,
32:41.540 --> 32:42.880
because there is not much we can do,
32:42.980 --> 32:44.720
and the time is probably not spent
32:44.820 --> 32:46.280
within the execution engine.
32:46.380 --> 32:49.000
And the ones around 3x are the ones
32:49.100 --> 32:50.680
Where probably we are not triggering
32:50.780 --> 32:52.640
manually specific optimizations,
32:52.740 --> 32:53.600
but just the fact
32:53.700 --> 32:57.100
that we are converting into native code
32:57.200 --> 33:00.500
is giving us this performance uplift.
33:00.600 --> 33:03.280
Then there is a bunch of other benchmarks
33:03.380 --> 33:05.680
where the Lisp optimizations are triggering,
33:05.780 --> 33:09.620
and the uplift is way bigger,
33:09.720 --> 33:11.900
and then we have 3 benchmarks that at the time
33:12.000 --> 33:13.420
are completely optimized out.
33:13.520 --> 33:15.580
That means the compiler became "so smart"
33:15.680 --> 33:18.160
that it was able to compute the result
33:18.260 --> 33:20.160
in the compile time and just put the result
33:20.260 --> 33:21.400
in the generated binary.
33:21.500 --> 33:23.880
Let's discuss a little bit the compilation model.
33:23.980 --> 33:26.080
This is an Hybrid one;
33:26.180 --> 33:29.420
it's both JIT-like and Ahead-of-Time-like.
33:29.520 --> 33:33.400
Emacs is composed of what we call an Emacs image,
33:33.500 --> 33:36.180
essentially the Emacs binary that we start.
33:36.280 --> 33:37.720
It's including all the C code,
33:37.820 --> 33:41.760
plus all the Lisp code that we preload.
33:41.860 --> 33:46.480
Then we have the rest of the Emacs Lisp codebase
33:46.580 --> 33:49.440
that can be loaded just if it's required.
33:49.540 --> 33:52.760
Same for the external packages, if we have any.
33:52.860 --> 33:55.120
If we build an Emacs Lisp
33:55.220 --> 33:57.940
with native compilation enabled, by default,
33:58.040 --> 34:01.200
only the Emacs image will be native compiled.
34:01.300 --> 34:03.920
All the other code will be compiled
34:04.020 --> 34:06.720
on the fly when it's loaded and executed
34:06.820 --> 34:08.800
the first time, if it's necessary.
34:08.900 --> 34:10.880
Same for the packages, in a transparent way
34:10.980 --> 34:12.640
and asynchronous way.
34:12.740 --> 34:13.480
Also worth noting
34:13.580 --> 34:15.400
that the result of this compilation
34:15.500 --> 34:17.240
will be stored into a cache directory
34:17.340 --> 34:19.900
within the home directory of the user.
34:20.000 --> 34:23.600
So it will be reused in the following sessions
34:23.700 --> 34:25.440
if the same file is loaded,
34:25.540 --> 34:27.520
without having to recompile multiple times
34:27.620 --> 34:29.320
the same file in different sessions.
34:29.420 --> 34:31.520
It works a little bit like this:
34:31.620 --> 34:33.800
When we load the byte-code for the first time,
34:33.900 --> 34:35.900
we spawn a native compilation process.
34:36.000 --> 34:39.320
Meanwhile we keep using the byte-code available.
34:39.420 --> 34:41.180
When the native compilation is finished,
34:41.280 --> 34:43.960
we hot-swap the definition of the functions
34:44.060 --> 34:46.040
that are contained in the file
34:46.140 --> 34:47.960
and start using the native code transparently.
34:48.060 --> 34:49.880
We do this asynchronously,
34:49.980 --> 34:53.140
and for more compilation units at the same time,
34:53.240 --> 34:56.280
so it looks a little bit like this.
34:56.380 --> 34:58.560
Let's try a quick demo of all of this machinery.
34:58.660 --> 35:00.880
I've started a fresh Emacs
35:00.980 --> 35:03.280
with native compilation support,
35:03.380 --> 35:05.060
and at this moment nothing is going on.
35:05.160 --> 35:07.440
We will test the machinery with Tetris,
35:07.540 --> 35:10.460
because I can't imagine anything better
35:10.560 --> 35:12.840
to test this.
35:12.940 --> 35:15.080
What I do expect is that when I launch Tetris
35:15.180 --> 35:17.920
it will be loaded, it will immediately
35:18.020 --> 35:20.460
start execution of the byte-compiled version,
35:20.560 --> 35:23.000
so we won't see any delay in the user experience,
35:23.100 --> 35:25.160
and in the meanwhile, a parallel process
35:25.260 --> 35:28.160
will start to native-compile Tetris itself.
35:28.260 --> 35:30.200
When the native compilation will be finished,
35:30.300 --> 35:33.240
the functions of all Tetris will be hot-swapped.
35:33.340 --> 35:36.280
So we will not see any interruption.
35:36.380 --> 35:39.680
So Tetris started, and it's running,
35:39.780 --> 35:41.600
we have seen no delay, and in the meanwhile,
35:41.700 --> 35:43.520
the native compilation probably already finished,
35:43.620 --> 35:44.960
we can have a look.
35:45.060 --> 35:47.160
In this I see the native compilation log buffer.
35:47.260 --> 35:49.720
So we see that Tetris has been native compiled,
35:49.820 --> 35:51.760
and all of its dependencies.
35:51.860 --> 35:53.840
Now Tetris is still running,
35:53.940 --> 36:00.480
but I can do "C-h f tetris"
36:00.580 --> 36:02.540
and we can see that 'tetris'
36:02.640 --> 36:04.880
is an interactive native compiled
36:04.980 --> 36:07.940
Lisp function, so it has been native-compiled.
36:08.040 --> 36:13.500
I can even disassemble if I want.
36:13.600 --> 36:14.880
OK, so very cool.
36:14.980 --> 36:18.020
I guess we can say this mechanism is working.
36:18.120 --> 36:20.660
Also worth noting that if I go back
36:20.760 --> 36:24.120
to the *Async-native-compile-log* buffer,
36:24.220 --> 36:27.960
we see we have compiled another bunch of files.
36:28.060 --> 36:31.600
I think these are because of my 'C-h f',
36:31.700 --> 36:33.800
this help function command and disassemble,
36:33.900 --> 36:34.920
and so on.
36:35.020 --> 36:37.720
The first time you run Emacs, you will have,
36:37.820 --> 36:41.280
from time to time, these processes spawned.
36:41.380 --> 36:43.100
Emacs is "compiling itself",
36:43.200 --> 36:45.520
and it's replacing the byte-code definition
36:45.620 --> 36:47.640
with the native one. But after a few sessions,
36:47.740 --> 36:49.760
you will not see this anymore,
36:49.860 --> 36:51.360
because the output of this compilation,
36:51.460 --> 36:54.900
as I mentioned, are stored in the user directory.
36:55.000 --> 36:57.560
To conclude: Emacs with native compilation support
36:57.660 --> 36:59.080
is coming up in Emacs 28,
36:59.180 --> 37:01.040
that is gonna be the next major stable release
37:01.140 --> 37:02.080
that will be released.
37:02.180 --> 37:04.840
So we ought to celebrate with a big party,
37:04.940 --> 37:07.320
I believe. But before going to the party,
37:07.420 --> 37:09.080
I'd like to list a few points
37:09.180 --> 37:11.340
that I think have been success factors
37:11.440 --> 37:13.420
in upstreaming this work.
37:13.520 --> 37:15.420
It has been extremely important
37:15.520 --> 37:18.140
to get in touch with upstream as soon as possible,
37:18.240 --> 37:20.680
as soon as I had a proof of concept.
37:20.780 --> 37:22.600
It's been extremely important
37:22.700 --> 37:24.660
to involve the community as much as possible,
37:24.760 --> 37:28.480
and this included keeping a development blog,
37:28.580 --> 37:31.600
and posts about that on emacs-devel,
37:31.700 --> 37:33.720
and also producing material,
37:33.820 --> 37:36.020
participating in conferences,
37:36.120 --> 37:38.320
and giving presentations like the one I'm doing,
37:38.420 --> 37:40.680
to explain what I was doing and how it works.
37:40.780 --> 37:43.040
It has been extremely important, also,
37:43.140 --> 37:45.860
to be able to rely on the upstream infrastructure.
37:45.960 --> 37:47.540
So, to develop the software
37:47.640 --> 37:49.080
as a feature branch in the official git,
37:49.180 --> 37:49.960
but even more, I would say,
37:50.060 --> 37:51.660
to use the official bug tracker
37:51.760 --> 37:52.880
for solving bugs of this branch.
37:52.980 --> 37:54.680
This gave the opportunity
37:54.780 --> 37:58.160
to stay really in close touch with maintainers,
37:58.260 --> 37:59.920
and senior developers of Emacs.
38:00.020 --> 38:02.980
That helped me a lot. And at the same time
38:03.080 --> 38:04.820
they were informed about what I was doing
38:04.920 --> 38:07.360
and what was the status of this feature branch.
38:07.460 --> 38:08.800
Extremely important.
38:08.900 --> 38:11.160
And also I think it played a major role
38:11.260 --> 38:14.120
to try to design this enormous patch
38:14.220 --> 38:18.120
in a way that the impact on the current codebase
38:18.220 --> 38:21.120
was minimized (at least as much as possible).
38:21.220 --> 38:23.660
And also minimizing
38:23.760 --> 38:26.240
the impact on the user operation of the software,
38:26.340 --> 38:28.040
in this case Emacs.
38:28.140 --> 38:29.640
So yes, mandatory Special Thanks:
38:29.740 --> 38:33.360
Emacs developers, and especially maintainers
38:33.460 --> 38:36.680
and senior developers like Stefan Monnier,
38:36.780 --> 38:40.440
that helped me a lot across this long journey.
38:40.540 --> 38:42.800
And, well, all the community
38:42.900 --> 38:45.160
that really got so excited about this project
38:45.260 --> 38:46.240
and gave me the energy
38:46.340 --> 38:49.120
to go through all of this time and development
38:49.220 --> 38:51.980
and bugs and solving, etc. etc.
38:52.080 --> 38:54.920
So yes, it was a really exciting time,
38:55.020 --> 38:58.000
and I think we have to look forward
38:58.100 --> 39:01.400
and start thinking about how to improve all this
39:01.500 --> 39:02.920
for the following years.
39:03.020 --> 39:04.300
And that's it.
39:04.400 --> 39:06.040
I think I should be online for questions.
39:06.140 --> 39:07.480
Thank you very much.
39:07.580 --> 39:07.680
[captions by John Cummings]