WEBVTT

00:00.200 --> 00:02.340
Hi everybody, my name is Andrea Corallo,

00:02.440 --> 00:03.280
and this presentation is about

00:03.382 --> 00:05.802
the Emacs Lisp Native Compiler --

00:05.904 --> 00:07.083
having GNU Emacs able to

00:07.184 --> 00:08.805
compile and run Emacs Lisp as native code.

00:08.907 --> 00:11.565
This project has been my hobby project

00:11.667 --> 00:14.009
for the last about two years and a half

00:14.111 --> 00:15.408
And we will see a little bit

00:15.509 --> 00:16.969
where this project is coming from,

00:17.070 --> 00:18.913
where we are, and where we want to go.

00:19.015 --> 00:20.571
So essentially everything you need to know

00:20.672 --> 00:22.132
about the Emacs Lisp Native Compiler,

00:22.232 --> 00:23.416
and probably a little more.

00:23.521 --> 00:26.535
Just a little bit of context on Emacs Lisp.

00:26.635 --> 00:30.175
Well, Emacs Lisp is a programming language,

00:30.278 --> 00:33.544
it's indeed a Lisp,

00:33.647 --> 00:36.619
and one year ago I looked for some statistics,

00:36.722 --> 00:38.742
and I was kind of pleased to see

00:38.842 --> 00:40.030
-- surprised to see actually --

00:40.132 --> 00:40.972
that it's still kind of popular

00:41.073 --> 00:42.844
as a programming language.

00:42.944 --> 00:43.684
It doesn't rank that bad

00:43.785 --> 00:45.554
against other programming languages.

00:45.658 --> 00:48.387
Also, the other important fact about Emacs Lisp

00:48.487 --> 00:51.518
is that there is a lot of Emacs Lisp out there,

00:51.622 --> 00:55.922
and this will have an impact [on this project.]

00:56.025 --> 00:57.632
It's a programming language that is capable

00:57.734 --> 00:59.320
of almost any task, so it's

00:59.420 --> 01:01.728
almost a general-purpose programming language,

01:01.831 --> 01:04.116
and this reflects on Emacs itself,

01:04.217 --> 01:07.232
that it's capable of almost any task.

01:07.335 --> 01:09.679
Indeed this "almost" is something we want to fix,

01:09.781 --> 01:10.720
because we want to do everything,

01:10.820 --> 01:14.218
we want [to do] all of our computing [with Emacs].

01:14.321 --> 01:16.523
Also, an interesting aspect for me

01:16.625 --> 01:18.903
is that it's not specified by any standard.

01:19.005 --> 01:22.507
This implies it can evolve in a more agile way

01:22.609 --> 01:25.667
without having to change the standard, etc.

01:25.770 --> 01:27.870
And, in fact, it's kind of improving,

01:27.970 --> 01:30.370
I believe relatively fast.

01:30.473 --> 01:32.791
A little bit about Lisp in general.

01:32.892 --> 01:34.952
First, it's the best programming language ever,

01:35.052 --> 01:35.712
we all know it.

01:35.813 --> 01:37.833
It has a lot of very nice properties,

01:37.934 --> 01:40.356
like it's dynamic, it's homoiconic, etc.

01:40.462 --> 01:42.394
But the interesting thing for implementors,

01:42.496 --> 01:43.476
is that, in theory,

01:43.576 --> 01:46.716
it can be implemented with very few primitives.

01:46.817 --> 01:49.485
You build very few primitives that are like magic,

01:49.590 --> 01:51.838
and on top of this,

01:51.938 --> 01:53.770
you can implement the whole language.

01:53.873 --> 01:55.759
This sounds very nice,

01:55.860 --> 01:57.280
and very appealing for implementors

01:57.381 --> 02:00.175
meaning to implement a new Lisp implementation,

02:00.279 --> 02:02.539
or improving or modifying an existing one.

02:02.641 --> 02:04.163
So the question is:

02:04.263 --> 02:07.663
how many primitives do we have to implement

02:07.764 --> 02:09.384
if we want to change

02:09.485 --> 02:13.203
the GNU Emacs Lisp implementation?

02:13.308 --> 02:20.128
Unfortunately, not really as few as we would like.

02:20.234 --> 02:25.294
In GNU Emacs, we have about 1500 primitives,

02:25.400 --> 02:27.970
and the main reason for that is performance.

02:28.071 --> 02:30.961
Actually, GNU Emacs was written

02:31.064 --> 02:34.292
when performance was a big issue,

02:34.393 --> 02:36.493
and nowadays certain parts

02:36.594 --> 02:38.486
are still performance-critical.

02:38.591 --> 02:40.295
We have a lot of C code;

02:40.395 --> 02:46.331
30% of the GNU Emacs codebase is C code,

02:46.435 --> 02:49.177
and we have to maintain this.

02:49.279 --> 02:52.859
But not only that, this is the main barrier

02:52.959 --> 02:55.579
for people that tried in the past

02:55.681 --> 02:57.663
to change the Emacs Lisp implementation.

02:57.765 --> 02:59.101
Because not only do you have to

02:59.202 --> 03:01.582
replace these primitives,

03:01.683 --> 03:03.907
all of them or part of them,

03:04.012 --> 03:06.384
but sometimes they also share (these primitives),

03:06.484 --> 03:07.952
internal data structures.

03:08.055 --> 03:09.745
For instance, it's very difficult to say:

03:09.846 --> 03:12.826
Now I want to go from C

03:12.926 --> 03:13.906
to a different programming language

03:14.006 --> 03:15.638
for implementing these primitives.

03:15.740 --> 03:17.028
So this has been, effectively,

03:17.128 --> 03:20.380
the main barrier for doing this work.

03:20.486 --> 03:22.090
Another interesting aspect

03:22.190 --> 03:23.870
about the GNU Emacs implementation

03:23.970 --> 03:26.190
is that Lisp can run interpreted

03:26.291 --> 03:28.647
or byte-compiled for performance,

03:28.752 --> 03:30.632
and the byte compiler itself

03:30.733 --> 03:32.673
is written in Emacs Lisp.

03:32.773 --> 03:35.033
This implies that GNU Emacs has to go through

03:35.134 --> 03:37.214
a bootstrap procedure in order to be built.

03:37.319 --> 03:38.715
But it's kind of interesting

03:38.815 --> 03:41.835
for something that started as a text editor,

03:41.937 --> 03:43.101
or something like it.

03:43.203 --> 03:47.877
The byte-code that is Emacs Lisp

03:47.979 --> 03:50.279
when it's been byte compiled,

03:50.379 --> 03:53.079
it's running on a stack-based virtual machine

03:53.180 --> 03:55.146
that is implemented in C.

03:55.253 --> 03:59.741
OK, so I've listed a bunch of areas

03:59.842 --> 04:02.074
where Emacs Lisp could improve:

04:02.178 --> 04:06.924
Namespace, Extensibility, Performance,

04:07.025 --> 04:11.239
and Debuggability, and Diagnostic, we could say.

04:11.346 --> 04:14.326
This activity, this project in particular,

04:14.428 --> 04:17.088
is affecting primarily Performance

04:17.189 --> 04:21.314
and the performance area the Execution Engine.

04:21.414 --> 04:22.570
That said,

04:22.671 --> 04:25.855
I think it has an impact also on Extensibility,

04:25.957 --> 04:27.811
and I hope it will have an impact also

04:27.913 --> 04:29.779
on programming diagnostics,

04:29.881 --> 04:32.501
so giving better warnings, [unknown].

04:32.604 --> 04:36.694
So which are the benefits

04:36.795 --> 04:39.215
of increasing the Emacs Lisp performance?

04:39.316 --> 04:42.976
Indeed, we will have, if we do that,

04:43.078 --> 04:45.668
programs that run faster.

04:45.774 --> 04:48.738
But the main implication of that

04:48.840 --> 04:50.760
is that we could write less C;

04:50.860 --> 04:52.940
we could maintain and debug less C.

04:53.041 --> 04:56.437
That is kind of a time-consuming task.

04:56.542 --> 04:59.502
And we could also allow for

04:59.603 --> 05:01.723
writing performance-critical extensions

05:01.824 --> 05:03.804
directly in Lisp

05:03.909 --> 05:06.305
without having to use systems like

05:06.406 --> 05:09.686
[I think there's a bunch.]

05:09.927 --> 05:14.791
These are very consistent benefits.

05:14.899 --> 05:16.669
OK, Project Goals,

05:16.769 --> 05:18.829
but I think the title of this slide

05:18.930 --> 05:21.520
maybe should be Project Requirements.

05:21.623 --> 05:23.231
So when I started this activity,

05:23.331 --> 05:26.943
I set some requirements for the project,

05:27.049 --> 05:29.469
with the main goal in mind: to go upstream.

05:29.570 --> 05:31.494
So I wanted to create something,

05:31.594 --> 05:34.250
a modified implementation of GNU Emacs,

05:34.353 --> 05:36.675
that was compatible

05:36.776 --> 05:38.596
as close to 100% as possible

05:38.696 --> 05:40.514
to the current implementation.

05:40.620 --> 05:42.720
And when I say "current implementation,"

05:42.820 --> 05:44.758
I don't refer to what

05:44.859 --> 05:46.879
the Emacs Lisp Programming Manual specifies

05:46.980 --> 05:49.804
as expected behavior of the implementation,

05:49.907 --> 05:52.207
but really the implementation itself.

05:52.309 --> 05:53.841
This is because there are a lot of corner cases

05:53.942 --> 05:56.390
that are not specified by the manual,

05:56.493 --> 05:58.093
but programs do rely on that,

05:58.193 --> 06:00.523
and given there is a ton of Emacs Lisp

06:00.625 --> 06:01.935
already around,

06:02.037 --> 06:05.437
compatibility was definitely a major requirement.

06:05.541 --> 06:07.727
I wanted to produce something that had

06:07.827 --> 06:09.587
reduced impact on the Emacs codebase,

06:09.687 --> 06:11.543
at least as much as possible.

06:11.644 --> 06:14.724
So I didn't want to rewrite all of GNU Emacs.

06:14.827 --> 06:17.609
Indeed, because it would have been

06:17.710 --> 06:21.068
too much work for one single person,

06:21.171 --> 06:25.371
but also still thinking

06:25.473 --> 06:29.153
to an upstream outcome all of this time.

06:29.258 --> 06:31.974
Another requirement was to have

06:32.076 --> 06:34.700
no, or very reduced, impact on the user,

06:34.804 --> 06:36.456
so I didn't want to change

06:36.556 --> 06:38.604
the way Emacs is used by you.

06:38.708 --> 06:40.618
And last but not least,

06:40.718 --> 06:42.848
introducing new dependencies,

06:42.951 --> 06:45.379
those dependencies had to be Free Software,

06:45.479 --> 06:47.191
possibly GPL,

06:47.295 --> 06:49.995
and also an important requirement

06:50.097 --> 06:52.941
is that these dependencies had to be

06:53.043 --> 06:55.143
some kind of trusted software that we know

06:55.243 --> 06:56.681
is going to be maintained in the future.

06:56.781 --> 06:59.261
Given Emacs has been around since forever,

06:59.363 --> 07:01.183
and will be around forever and ever,

07:01.285 --> 07:05.265
this was another very important point.

07:05.367 --> 07:08.707
A little bit of history of this project/

07:08.808 --> 07:10.648
a quick timeline.

07:10.748 --> 07:14.388
2019, in May, I did my first commit,

07:14.490 --> 07:17.110
I think it was when I tried to write

07:17.210 --> 07:20.230
my first primitive function ever in C,

07:20.332 --> 07:21.752
in GNU Emacs.

07:21.852 --> 07:24.832
And this was an attempt to try to compile

07:24.934 --> 07:30.314
a function that, once executed, returning [?]

07:30.415 --> 07:33.755
That was it. Six months after (about),

07:33.857 --> 07:37.057
I had something that was kind of working,

07:37.157 --> 07:42.797
So I was able to start up a semi-standard Emacs

07:42.899 --> 07:44.879
and then to compile and load,

07:44.981 --> 07:47.221
and replacing most of the functions

07:47.321 --> 07:50.921
that I had defined floating in my Lisp universe.

07:51.022 --> 07:54.202
Those functions are the functions

07:54.304 --> 07:55.944
that are essentially composing Emacs,

07:56.044 --> 07:57.884
at least the Lisp side of Emacs.

07:57.984 --> 08:01.084
A lot of features were missing,

08:01.186 --> 08:03.426
like I had no Garbage Collector support,

08:03.526 --> 08:07.006
no bootstrap, I was not optimizing these functions

08:07.108 --> 08:10.068
Because optimization [would be broken].

08:10.169 --> 08:12.749
No image dump support, etc.

08:12.850 --> 08:16.330
But I think this proved the design could work.

08:16.431 --> 08:19.171
So I sent to email to emacs-devel. I said

08:19.272 --> 08:20.852
I have this stuff I'm working on,

08:20.953 --> 08:24.613
and I wanted some feedback from the upstream

08:24.714 --> 08:27.534
to see if there was some interest.

08:27.635 --> 08:30.455
I believe the outcome of this was positive

08:30.557 --> 08:32.837
because about one month after,

08:32.937 --> 08:35.697
I pushed my branch within the Emacs git

08:35.798 --> 08:38.438
as a feature branch, and shortly after,

08:38.539 --> 08:42.679
we started to use the bug tracker to track bugs.

08:42.779 --> 08:45.479
So essentially we moved the development

08:45.580 --> 08:50.040
on the upstream infrastructure.

08:50.140 --> 08:55.620
I believe two years after the first commit,

08:55.721 --> 08:57.781
the project was merged

08:57.882 --> 09:00.922
after literally hundreds of bugs solved,

09:01.022 --> 09:03.842
and improvements, suggestions [unknown]

09:03.943 --> 09:08.623
and this was about six months ago.

09:08.723 --> 09:12.363
Before discussing how the native compiler works,

09:12.464 --> 09:14.224
I think it's worth looking at

09:14.324 --> 09:17.644
how Lisp is implemented in GNU Emacs.

09:17.745 --> 09:19.305
We have Lisp_Objects

09:19.405 --> 09:21.945
floating around our Lisp universe,

09:22.045 --> 09:23.905
and they are internally represented in this way.

09:24.006 --> 09:25.606
We have what is called a tagged pointer,

09:25.706 --> 09:27.406
that is just a regular pointer

09:27.506 --> 09:29.286
that is pointing to the area of memory

09:29.386 --> 09:31.926
where we hold the real data of the object.

09:32.027 --> 09:34.187
But within this tagged pointer,

09:34.287 --> 09:36.747
we reserve a few bits

09:36.848 --> 09:38.968
to indicate the type of object we are pointing to.

09:39.068 --> 09:40.528
This is important because

09:40.628 --> 09:42.288
each time we access an object,

09:42.388 --> 09:46.168
we have to typically check those bits

09:46.269 --> 09:49.029
to check that the object we are manipulating

09:49.129 --> 09:50.529
is of the right kind,

09:50.630 --> 09:52.530
remove those bits, and, if we are happy,

09:52.630 --> 09:55.810
access the object, otherwise [unknown].

09:55.910 --> 09:57.690
All the objects are like this,

09:57.791 --> 09:59.731
except for typically Fixnums,

09:59.831 --> 10:01.031
that are small integers

10:01.131 --> 10:04.891
that we manage to fit directly within the pointer.

10:04.992 --> 10:07.292
Also for manipulating Fixnums,

10:07.392 --> 10:09.412
we have to check the tag bits each time.

10:09.513 --> 10:13.173
Whenever we are not sure of the type of object

10:13.273 --> 10:16.493
we are manipulating (read: almost every time),

10:16.594 --> 10:18.914
we have to check those bits and remove those bits

10:19.014 --> 10:21.754
before doing any manipulation on the Fixnum.

10:21.854 --> 10:26.014
How Emacs Lisp is byte-compiled and executed

10:26.115 --> 10:27.775
in, let's call it, "Vanilla".

10:27.875 --> 10:30.055
If we have a Lisp expression of this kind:

10:30.156 --> 10:32.776
We take the variable 'a' we do plus 2,

10:32.876 --> 10:34.876
and then we multiply the result by 3,

10:34.976 --> 10:37.376
the byte compiler will produce this LAP code.

10:37.477 --> 10:38.497
LAP code is essentially

10:38.597 --> 10:40.017
the assembly for the byte-code,

10:40.117 --> 10:43.697
so it's the "intermediate representation"

10:43.798 --> 10:48.458
that will assembled into byte-code. (.elc files)

10:48.558 --> 10:50.738
How is this program executed?

10:50.839 --> 10:53.639
As I mentioned, it's executed in a virtual machine

10:53.739 --> 10:55.699
that is stack-based,

10:55.800 --> 10:58.540
but we start with an execution stack that's empty,

10:58.640 --> 11:01.680
and a stack pointer pointing to its bottom.

11:01.780 --> 11:04.280
And we execute the first instruction,

11:04.380 --> 11:07.120
that is pushing in the stack the value of 'a',

11:07.220 --> 11:10.360
in this case, 100. Then we push the constant 2.

11:10.460 --> 11:12.400
Then we do the summation,

11:12.500 --> 11:14.040
and we have the result in the stack.

11:14.140 --> 11:16.860
Same: we push the constant 3,

11:16.960 --> 11:17.620
we do the multiplication,

11:17.720 --> 11:19.260
and we will be able to return.

11:19.360 --> 11:22.700
Now, what's good and what's bad about this?

11:22.800 --> 11:25.800
A good thing is that it's very simple

11:25.900 --> 11:27.460
to start from Lisp

11:27.560 --> 11:31.320
and compile this kind of LAP output.

11:31.420 --> 11:34.240
At least it's reasonably simple.

11:34.340 --> 11:36.240
The compiler is not that complex.

11:36.340 --> 11:39.000
The bad thing is that all this machinery

11:39.100 --> 11:40.660
-- push and pop, etc. --

11:40.760 --> 11:44.320
it's very different from how a modern CPU works.

11:44.420 --> 11:45.720
Because modern CPUs,

11:45.820 --> 11:47.660
they are not stack-based anymore

11:47.760 --> 11:50.900
but they have instead a fixed number of registers,

11:51.000 --> 11:54.020
and they work with assignment and operation

11:54.120 --> 11:55.040
within these registers

11:55.140 --> 11:57.040
that are generally called "general-purpose."

11:57.140 --> 11:59.260
So to execute this LAP program,

11:59.360 --> 12:00.760
there is another program,

12:00.860 --> 12:02.600
that is the implementation of the VM itself

12:02.700 --> 12:06.400
that is doing conversion during runtime.

12:06.500 --> 12:08.320
So it's interpreting the LAP program

12:08.420 --> 12:11.360
and it's converting it into instructions

12:11.460 --> 12:13.180
that we can execute on the CPU.

12:13.280 --> 12:14.940
This conversion is done each time

12:15.040 --> 12:17.100
we will run some byte-code.

12:17.200 --> 12:19.760
And it's something that we want to avoid.

12:19.860 --> 12:21.560
Instead of this live conversion,

12:21.660 --> 12:26.220
we want to convert once:

12:26.320 --> 12:28.920
our Lisp program into native code,

12:29.020 --> 12:32.380
that is, a binary program that can be executed

12:32.480 --> 12:34.400
directly by our CPU.

12:34.500 --> 12:36.680
We want to save all this unnecessary conversion

12:36.780 --> 12:39.000
that we do each time we are running a program

12:39.100 --> 12:39.760
while we are running it.

12:39.860 --> 12:42.060
And we want to do this process just once,

12:42.160 --> 12:43.740
when we are compiling.

12:43.840 --> 12:46.240
That's the main goal of this activity.

12:46.340 --> 12:50.200
How is the byte compiler implemented?

12:50.300 --> 12:53.940
As any compiler it's a pipeline of transformations.

12:54.040 --> 12:58.560
We go through macro expansion, closure conversion,

12:58.660 --> 13:02.120
we have a bunch of source level optimization.

13:02.220 --> 13:03.940
Then we go into LAP,

13:04.040 --> 13:06.800
that's the transformation we are interested in,

13:06.900 --> 13:10.600
and after a few optimizations on LAP,

13:10.700 --> 13:14.140
LAP is assembled into byte-code.

13:14.240 --> 13:16.880
So if we [list it]

13:16.980 --> 13:19.320
in terms of intermediate representations,

13:19.420 --> 13:23.600
we can simplify this pipeline like this.

13:23.700 --> 13:26.100
We start with Lisp, and at a certain point

13:26.200 --> 13:29.520
we are manipulating the program in LAP form,

13:29.620 --> 13:31.980
and then at the end we produce the byte-code

13:32.080 --> 13:34.560
that is the .elc file that [you run]

13:34.660 --> 13:37.660
What I wanted to realize was something like this.

13:37.760 --> 13:41.200
I wanted to start from LAP, do something,

13:41.300 --> 13:44.600
and jump into GCC using libgccjit

13:44.700 --> 13:45.560
and in particular

13:45.660 --> 13:48.000
the libgccjit Intermediate Representation

13:48.100 --> 13:50.340
that we will discuss.

13:50.440 --> 13:53.120
Now, why I wanted to do something like this?

13:53.220 --> 13:57.300
Essentially, writing a compiler from scratch

13:57.400 --> 14:01.520
for Emacs Lisp would have been a very big task.

14:01.620 --> 14:05.120
So I wanted to rely on, as much as I could,

14:05.220 --> 14:07.180
the Emacs Lisp byte compiler,

14:07.280 --> 14:10.280
because I had to produce something

14:10.380 --> 14:12.920
that was as compatible as possible

14:13.020 --> 14:14.240
to the current implementation.

14:14.340 --> 14:18.340
So this was (I believe) a very good idea

14:18.440 --> 14:20.520
to save an enormous quantity of work

14:20.620 --> 14:22.340
and to produce something

14:22.440 --> 14:24.940
that was compatible in terms of semantics

14:25.040 --> 14:26.820
with the original implementation.

14:26.920 --> 14:30.660
Also, I didn't want to implement a code generator

14:30.760 --> 14:32.860
for each architecture we were targeting,

14:32.960 --> 14:35.900
nor wanted to implement all these optimizations

14:36.000 --> 14:37.840
that are already in GCC,

14:37.940 --> 14:40.120
so I thought it was a good idea

14:40.220 --> 14:44.720
to rely on an existing compiler like GCC [?].

14:44.820 --> 14:47.240
Let's talk about libgccjit.

14:47.340 --> 14:50.180
It was added by David Malcolm in GCC 5

14:50.280 --> 14:55.680
It allows you to describe a C-ish semantic to GCC

14:55.780 --> 14:57.380
and have it compile.

14:57.480 --> 15:01.400
It's good for [reading] Jitters or AoT compilers.

15:01.500 --> 15:04.400
And if we talk about GCC:

15:04.500 --> 15:07.180
it's a compiler, it's a very good one,

15:07.280 --> 15:09.340
it has support for a remarkable number

15:09.440 --> 15:11.640
of target architectures.

15:11.740 --> 15:15.080
It's also very good at generating [fast code],

15:15.180 --> 15:17.980
and it's been around for a long time;

15:18.080 --> 15:21.200
I believe it's like 1 year younger than Emacs.

15:21.300 --> 15:23.220
It's still very well maintained,

15:23.320 --> 15:25.840
and we can assume it will be maintained

15:25.940 --> 15:28.920
for quite a while. And, as I mentioned,

15:29.020 --> 15:31.120
this was a very important point.

15:31.220 --> 15:33.860
Also, it's GPL; it's Free Software,

15:33.960 --> 15:36.120
and it's developed under the GNU umbrella,

15:36.220 --> 15:40.360
so I thought it was a very good option.

15:40.460 --> 15:43.300
So we can imagine a simple translation

15:43.400 --> 15:46.400
that goes from LAP to this subset of C

15:46.500 --> 15:48.880
that we can describe to libgccjit.

15:48.980 --> 15:52.880
This simple translation we can see here,

15:52.980 --> 15:55.680
it's actually pretty trivial.

15:55.780 --> 15:58.500
Instead of doing operations

15:58.600 --> 16:02.060
within the execution stack we have seen before,

16:02.160 --> 16:05.080
we have just an array that is replacing it,

16:05.180 --> 16:07.380
and it's called 'local' in this case,

16:07.480 --> 16:12.360
and we have assignments within this array,

16:12.460 --> 16:15.140
so that they are done in place of the original

16:15.240 --> 16:17.100
push and pop activity of the stack.

16:17.200 --> 16:18.280
The nice thing is that,

16:18.380 --> 16:20.260
when you have done this translation,

16:20.360 --> 16:23.160
GCC will be able to optimize this,

16:23.260 --> 16:25.840
and remove all the unnecessary operations,

16:25.940 --> 16:27.180
and generate code

16:27.280 --> 16:29.660
for the specific CPU you are targeting,

16:29.760 --> 16:32.020
[which will be running your code].

16:32.120 --> 16:34.920
This sounds great; it sounds like

16:35.020 --> 16:37.440
a very simple and effective translation,

16:37.540 --> 16:39.920
and in fact the first iteration of my compiler

16:40.020 --> 16:41.120
was doing just this.

16:41.220 --> 16:45.320
It was essentially a big C function

16:45.420 --> 16:48.240
that was taking LAP and doing this conversion

16:48.340 --> 16:50.120
describing the output to libgccjit.

16:50.220 --> 16:53.720
Unfortunately, if you do this,

16:53.820 --> 16:55.740
you will discover that you have

16:55.840 --> 17:00.080
a performance upper bound limit of about 3x.

17:00.180 --> 17:04.360
So it was an option,

17:04.460 --> 17:06.560
but I thought it was a good occasion

17:06.660 --> 17:08.980
for trying to do something more.

17:09.080 --> 17:11.640
And doing something more means

17:11.740 --> 17:13.680
implementing a smarter compiler

17:13.780 --> 17:17.180
that is doing some advanced analysis on the code,

17:17.280 --> 17:18.640
and will be able to perform

17:18.740 --> 17:20.700
Lisp-specific optimizations

17:20.800 --> 17:22.480
-- optimizations that take advantage of

17:22.580 --> 17:24.960
the specific Lisp semantics,

17:25.060 --> 17:27.760
something that GCC is not aware of.

17:27.860 --> 17:31.240
And while I was thinking about that,

17:31.340 --> 17:34.000
I thought that having a smarter compiler

17:34.100 --> 17:38.120
had also other advantages, like a smarter compiler

17:38.220 --> 17:40.120
that understands the semantics

17:40.220 --> 17:41.820
of the programming language being compiled

17:41.920 --> 17:43.680
would be also capable of

17:43.780 --> 17:45.620
giving feedback to the programmers,

17:45.720 --> 17:47.920
like better warnings and errors.

17:48.020 --> 17:51.360
So I was really fascinated about this idea,

17:51.460 --> 17:53.240
and I wanted to change my implementation

17:53.340 --> 17:55.920
because I was not really happy about it.

17:56.020 --> 17:58.320
I had a lot of C code in terms of

17:58.420 --> 18:02.100
lines that were not doing any smart job.

18:02.200 --> 18:07.380
And I wanted to write all the interesting logic

18:07.480 --> 18:09.940
[in Lisp].

18:10.040 --> 18:12.560
So optimizing outside GCC

18:12.660 --> 18:15.840
before jumping into GCC,

18:15.940 --> 18:20.500
as I mentioned, has two main targets:

18:20.600 --> 18:23.060
Either optimize the code before going into GCC,

18:23.160 --> 18:25.380
or present to GCC some code

18:25.480 --> 18:27.740
that we know GCC can optimize effectively.

18:27.840 --> 18:30.800
And also, this will give, as I mentioned,

18:30.900 --> 18:32.660
better options for the compiler

18:32.760 --> 18:34.420
to provide warnings, errors

18:34.520 --> 18:36.340
-- better diagnostics.

18:36.440 --> 18:38.200
So this is pretty much

18:38.300 --> 18:40.640
what the native compiler looks like nowadays,

18:40.740 --> 18:42.720
in terms of passes.

18:42.820 --> 18:44.560
We have a list of passes,

18:44.660 --> 18:46.620
each of which is taking an input

18:46.720 --> 18:48.120
and producing an output.

18:48.220 --> 18:51.040
So it's doing either analysis on the program

18:51.140 --> 18:52.640
that's being passed,

18:52.740 --> 18:54.760
or it's performing a transformation.

18:54.860 --> 18:57.900
All of these passes are implemented in Lisp,

18:58.000 --> 19:00.440
and only the last pass is implemented in C.

19:00.540 --> 19:05.060
That is the one that is talking to libgccjit.

19:05.160 --> 19:07.760
To do that, I have introduced

19:07.860 --> 19:10.640
a new intermediate representation

19:10.740 --> 19:13.960
that I call LIMPLE, as a tribute to GCC GIMPLE,

19:14.060 --> 19:17.220
that is the main internal representation of GCC,

19:17.320 --> 19:20.600
at least one of the main ones.

19:20.700 --> 19:25.080
Introducing a new intermediate representation

19:25.180 --> 19:27.720
-- a new way of representing my program --

19:27.820 --> 19:29.540
solved a bunch of problems.

19:29.640 --> 19:33.200
First, it allowed me to implement

19:33.300 --> 19:37.280
non-trivial analysis and transformations,

19:37.380 --> 19:40.000
the ones I needed in my compiler pipeline.

19:40.100 --> 19:42.160
But also, it solved the problem of

19:42.260 --> 19:43.840
what was the boundary between

19:43.940 --> 19:46.120
what I had to implement in Lisp,

19:46.220 --> 19:48.040
and what in C.

19:48.140 --> 19:49.040
Because once I had

19:49.140 --> 19:51.860
my intermediate representation defined,

19:51.960 --> 19:53.780
essentially the boundary between Lisp and C

19:53.880 --> 19:55.640
is just a function, that is,

19:55.740 --> 19:57.780
the one that is implementing the final pass.

19:57.880 --> 19:59.220
That is taking, as an input,

19:59.320 --> 20:01.620
all of my programs in LIMPLE representation

20:01.720 --> 20:03.880
and it's doing [his bit].

20:03.980 --> 20:07.980
So I was convinced this design at least had sense.

20:08.080 --> 20:10.120
When we go through some of these passes,

20:10.220 --> 20:12.600
just to give you an idea of what these are doing:

20:12.700 --> 20:14.560
the first pass is just responsible for

20:14.660 --> 20:18.100
spilling the LAP from the byte compiler

20:18.200 --> 20:20.160
that effectively here we are using as a front end

20:20.260 --> 20:21.780
for our compiler pipeline.

20:21.880 --> 20:24.040
The second pass, called 'limplify',

20:24.140 --> 20:27.920
will be in charge of converting LAP into LIMPLE.

20:28.020 --> 20:31.260
LIMPLE is an intermediate representation

20:31.360 --> 20:32.860
that is Control Flow Graph based,

20:32.960 --> 20:34.700
and it's capable of SSA.

20:34.800 --> 20:38.640
So we can have a look to what this means.

20:38.740 --> 20:41.300
Let's assume we have our LAP program,

20:41.400 --> 20:42.520
as any program,

20:42.620 --> 20:43.960
that's a simple list of instructions

20:44.060 --> 20:45.640
that we will execute one after the other.

20:45.740 --> 20:47.360
Some of these instructions

20:47.460 --> 20:49.200
are special instructions

20:49.300 --> 20:51.240
that we call conditional branches,

20:51.340 --> 20:52.800
where we check for a condition,

20:52.900 --> 20:54.320
and if this is verified,

20:54.420 --> 20:56.940
we jump to a different address within the program.

20:57.040 --> 20:59.580
(Addresses that here we are calling 'labels'.)

20:59.680 --> 21:03.440
So we can split our program in chunks,

21:03.540 --> 21:08.200
and those chunks we execute without interruption,

21:08.300 --> 21:10.460
so we always enter from the top of those,

21:10.560 --> 21:12.600
and we exit from the bottom.

21:12.700 --> 21:15.980
We can name those, and split them apart,

21:16.080 --> 21:18.980
and these are what we call basic blocks.

21:19.080 --> 21:22.360
And now we have a bunch of these basic blocks

21:22.460 --> 21:23.380
that are floating,

21:23.480 --> 21:25.020
and they are not any more sorted.

21:25.120 --> 21:25.920
This is what is called

21:26.020 --> 21:28.680
a Control Flow Graph based representation.

21:28.780 --> 21:31.400
Now we can get into the SSA topic.

21:31.500 --> 21:33.900
That stands for Static Single Assignment.

21:34.000 --> 21:35.860
I don't want to get into the details,

21:35.960 --> 21:36.720
but just give you a feeling.

21:36.820 --> 21:38.480
I added into our basic blocks

21:38.580 --> 21:41.400
in our Control Flow Graph a few assignments.

21:41.500 --> 21:43.840
We will transform this into SSA

21:43.940 --> 21:45.040
just for the variable 'x',

21:45.140 --> 21:47.360
just for the sake of demonstrating it.

21:47.460 --> 21:49.760
This is done through a number of phases

21:49.860 --> 21:51.760
that are essentially some analysis,

21:51.860 --> 21:52.600
mainly renaming.

21:52.700 --> 21:55.560
But the outcome, the one we see here,

21:55.660 --> 21:59.120
looks quite similar to the original one,

21:59.220 --> 22:01.120
but we can see that the variable 'x'

22:01.220 --> 22:01.960
has been renamed.

22:02.060 --> 22:03.400
And now we don't have anymore just one,

22:03.500 --> 22:06.140
but a number of these variables.

22:06.240 --> 22:08.000
The interesting property is that

22:08.100 --> 22:10.880
each of these variables is assigned just once.

22:10.980 --> 22:13.240
And this allows for the compiler

22:13.340 --> 22:16.760
to do prediction of the value of that variable,

22:16.860 --> 22:19.040
depending on the position

22:19.140 --> 22:19.840
within the Control Flow Graph.

22:19.940 --> 22:21.980
This is very important. For instance,

22:22.080 --> 22:23.440
a very simple case is 'x1'

22:23.540 --> 22:27.440
that we see is assigned once by definition,

22:27.540 --> 22:29.300
in particular here at the beginning.

22:29.400 --> 22:31.040
Here it's very simple to understand

22:31.140 --> 22:33.080
that x1 will have the value 3.

22:33.180 --> 22:35.380
While, for instance, it's more difficult to prove

22:35.480 --> 22:37.060
what is going to be the value of x5,

22:37.160 --> 22:38.620
because it's calling a function,

22:38.720 --> 22:41.940
or we don't know at the moment what x4 is.

22:42.040 --> 22:46.240
So the compiler will gain the capability

22:46.340 --> 22:48.460
to do prediction on all the variables,

22:48.560 --> 22:50.620
and the more we get information on one variable,

22:50.720 --> 22:54.980
the more we can prove about the others.

22:55.460 --> 22:57.280
Coming back to our passes, the next one

22:57.380 --> 22:59.320
is forward propagation.

22:59.420 --> 23:00.600
This pass is responsible for

23:00.700 --> 23:03.240
doing what I briefly mentioned just before:

23:03.340 --> 23:07.160
doing proof over all the different variables

23:07.260 --> 23:09.620
in different positions of the Control Flow Graph,

23:09.720 --> 23:12.700
about the values, types, or ranges.

23:12.800 --> 23:15.080
This pass is also responsible for

23:15.180 --> 23:16.920
executing functions

23:17.020 --> 23:18.640
when we know that the function has no side effect

23:18.740 --> 23:20.520
and the pass managed to

23:20.620 --> 23:22.440
prove all the values of its argument.

23:22.540 --> 23:24.800
So the function is then executed at compile time

23:24.900 --> 23:26.700
and it doesn't even exist anymore

23:26.800 --> 23:27.880
in the produced code.

23:27.980 --> 23:30.300
Then we have another pass, this is

23:30.400 --> 23:33.420
an example of a pass that is very specific:

23:33.520 --> 23:36.120
it's trying to remove the call to funcall

23:36.220 --> 23:38.040
when those are not necessary.

23:38.140 --> 23:39.760
There are a couple situations

23:39.860 --> 23:42.200
where this is very useful.

23:42.300 --> 23:45.240
And not only is this beneficial

23:45.340 --> 23:47.560
because we are generating better code,

23:47.660 --> 23:49.245
but when we manage to do that,

23:49.345 --> 23:52.000
we allow GCC better analysis over the code,

23:52.100 --> 23:54.440
because GCC knows nothing about funcall.

23:54.540 --> 23:57.400
So if we are calling, from 'foo', directly, 'bar',

23:57.500 --> 24:01.280
for GCC it's way easier to do its analysis

24:01.380 --> 24:03.360
on top of this code.

24:03.460 --> 24:06.240
Another interesting pass we can mention is 'tco'.

24:06.340 --> 24:08.800
This is performing Tail Recursion Elimination.

24:08.900 --> 24:11.880
It allows a more functional programming style,

24:11.980 --> 24:13.220
if you want.

24:13.320 --> 24:14.280
We can jump to the last pass

24:14.380 --> 24:16.200
that is called 'final', and as I mentioned,

24:16.300 --> 24:17.520
this one is responsible for

24:17.620 --> 24:19.880
taking our program in LIMPLE representation

24:19.980 --> 24:24.900
and describing it to libgccjit in the gccjit IR.

24:25.000 --> 24:27.480
That's the main task. It's also

24:27.580 --> 24:29.520
defining inline functions

24:29.620 --> 24:32.400
for accessing fundamental data types, and so on.

24:32.500 --> 24:34.460
This pass is also responsible for

24:34.560 --> 24:36.280
using some of the predictions

24:36.380 --> 24:39.560
done by previous passes to generate better code.

24:39.660 --> 24:41.320
Things we had to add

24:41.420 --> 24:43.880
to have all of this machinery work

24:43.980 --> 24:45.240
and to be controllable:

24:45.340 --> 24:47.480
The first one is an opt called 'native-comp-speed'

24:47.580 --> 24:49.920
and it's equivalent to Common Lisp's 'speed'.

24:50.020 --> 24:51.920
It represents the optimization level.

24:52.020 --> 24:53.400
The default is 2 and is

24:53.500 --> 24:55.400
the maximum optimization level

24:55.500 --> 24:58.600
that is meant to reflect

24:58.700 --> 25:00.860
all the original semantics of Emacs Lisp.

25:00.960 --> 25:02.480
So it's the one that should be used by default.

25:02.580 --> 25:04.640
The second one is 'compilation unit'

25:04.740 --> 25:05.960
and it's a kind of new object

25:06.060 --> 25:11.080
that has been added to Emacs.

25:11.180 --> 25:12.960
Let's have a look to

25:13.060 --> 25:14.360
how the Garbage Collector works in this case.

25:14.460 --> 25:15.840
The GNU Emacs Garbage Collector

25:15.940 --> 25:18.660
is a simple mark-and-sweep garbage collector.

25:18.760 --> 25:21.420
It does a tree walk through all the objects

25:21.520 --> 25:25.100
and follows references from one object to another.

25:25.200 --> 25:27.960
All the objects reachable during the mark phase

25:28.060 --> 25:31.240
will be kept in our Lisp universe.

25:31.340 --> 25:33.680
All the other ones will be freed.

25:33.780 --> 25:35.080
In this case we have a bunch of functions,

25:35.180 --> 25:38.760
'foo1', 'foo2', 'bar1', etc., that are defined.

25:38.860 --> 25:40.320
When a function is defined,

25:40.420 --> 25:42.400
it's accessible through its symbol,

25:42.500 --> 25:44.360
so we have the symbol referring to the function.

25:44.460 --> 25:47.720
The function, in this case a native-compiled one,

25:47.820 --> 25:50.040
is referring to the compilation unit.

25:50.140 --> 25:53.020
The compilation unit is essentially

25:53.120 --> 25:58.600
the ELF file that has been compiled,

25:58.700 --> 26:01.160
and contains all those functions

26:01.260 --> 26:03.240
that came from the original .el file,

26:03.340 --> 26:05.100
and that we have loaded into memory.

26:05.200 --> 26:10.000
If, for instance, 'bar1 and 'bar2 are undefined,

26:10.100 --> 26:14.200
functions [3] and 4 will be no longer reachable,

26:14.300 --> 26:16.040
and we will be able to free them

26:16.140 --> 26:18.160
and unload the compilation unit.

26:18.260 --> 26:21.200
We discussed quite a lot about Control Flow Graph,

26:21.300 --> 26:23.560
SSA, and a lot of boring stuff,

26:23.660 --> 26:25.400
and I promised you that we are doing

26:25.500 --> 26:27.320
a lot of interesting proofs over variables,

26:27.420 --> 26:30.220
So let's have some examples of them.

26:30.320 --> 26:31.840
Let's jump into a quick demo

26:31.940 --> 26:34.480
to see what all of this abstract theory

26:34.580 --> 26:37.680
and this esoteric propagation engine can do for us

26:37.780 --> 26:39.240
and how the user can interact with it.

26:39.340 --> 26:42.100
I've defined a bunch of functions,

26:42.200 --> 26:45.240
and I will native-compile and load it.

26:47.500 --> 26:48.840
Alright, Emacs Lisp native compiled and loaded.

26:48.940 --> 26:52.320
At this point, I can disassemble 'foo1'

26:52.420 --> 26:56.320
to make sure it's native code and I'm not lying.

26:56.420 --> 26:58.500
These are the instructions

26:58.600 --> 27:01.320
that will be executed directly by my CPU

27:01.420 --> 27:03.620
when I call this function.

27:03.720 --> 27:07.520
Alright, very cool.

27:07.620 --> 27:16.080
Now, [Lisp:] (symbol-function #'foo1)

27:16.180 --> 27:19.720
Interestingly, this is returning a subroutine,

27:19.820 --> 27:21.820
as it would be a primitive function.

27:21.920 --> 27:23.700
Because this is native code,

27:23.800 --> 27:24.840
even if it's written in Lisp,

27:24.940 --> 27:26.340
has been converted to native code

27:26.440 --> 27:29.200
as if it's a primitive function.

27:29.300 --> 27:31.560
But we can do also a new thing:

27:31.660 --> 27:34.440
asking for the type of the subroutine.

27:34.540 --> 27:38.360
Alright, very cool. It says this is a function,

27:38.460 --> 27:40.280
it's taking one argument of type 't'

27:40.380 --> 27:41.560
(that means anything

27:41.660 --> 27:42.960
because we don't have any information),

27:43.060 --> 27:45.200
and is returning a type 't',

27:45.300 --> 27:47.380
so also there we don't have much information.

27:47.480 --> 27:49.700
OK, very cool, but not very useful.

27:49.800 --> 27:53.680
Let's see #'foo2. #'foo2 is slightly different,

27:53.780 --> 27:55.840
it doesn't take any argument, but it's returning

27:55.940 --> 27:58.040
an integer included between 3 and 3.

27:58.140 --> 28:01.360
Wow, amazing!

28:01.460 --> 28:04.040
Let's get into something a little more complex:

28:04.140 --> 28:09.440
#'foo3 takes one argument we know nothing about,

28:09.540 --> 28:11.740
but it's returning a number.

28:11.840 --> 28:13.280
And why it's returning a number?

28:13.380 --> 28:16.320
Essentially because 1+ is returning a number,

28:16.420 --> 28:18.880
and in all the other cases,

28:18.980 --> 28:20.640
it would signal an error

28:20.740 --> 28:23.380
if it's not happy about its input argument.

28:23.480 --> 28:27.760
Let's have a look to #'foo4.

28:27.860 --> 28:32.920
#'foo4 is a little bit more complex.

28:33.020 --> 28:34.680
It will return nil

28:34.780 --> 28:37.400
if the 'when' condition is not satisfied,

28:37.500 --> 28:39.720
so it's type 'null' here.

28:39.820 --> 28:41.200
It can return a floating point;

28:41.300 --> 28:43.600
we don't do propagation of floating point so far,

28:43.700 --> 28:47.080
or it can return any integer between 4 and 9.

28:47.180 --> 28:52.840
Wow. Let's go on with #'foo5.

28:52.940 --> 28:55.760
#'foo5 is even more complex

28:55.860 --> 28:57.200
because other than

28:57.300 --> 28:59.180
having to satisfy this condition,

28:59.280 --> 29:02.280
we can see that the result of the propagation

29:02.380 --> 29:03.800
of this complex condition

29:03.900 --> 29:05.460
is propagated also across the 'plus'.

29:05.560 --> 29:08.240
So this foo5 can return nil,

29:08.340 --> 29:09.720
a floating point we know nothing about,

29:09.820 --> 29:13.560
or an integer included between 12 and 24.

29:13.660 --> 29:18.080
Let's go on with #'foo6.

29:18.180 --> 29:23.320
#'foo6 is returning anything but an integer.

29:23.420 --> 29:26.520
I think it should be pretty obvious why,

29:26.620 --> 29:28.120
because if it's not an integer we return it,

29:28.220 --> 29:30.000
otherwise we signal an error.

29:30.100 --> 29:32.880
Let's finish with #'foo7 very quickly.

29:32.980 --> 29:37.920
#'foo7 has another very complex condition,

29:38.020 --> 29:40.320
at least for me, but it's also interesting to see

29:40.420 --> 29:42.200
that we are also propagating values for symbols.

29:42.300 --> 29:45.160
So we can return the symbol 'big,

29:45.260 --> 29:46.980
the symbol 'small,

29:47.080 --> 29:51.200
or an integer included between -100 and 100.

29:51.300 --> 29:54.440
Now, the question is: why all of this is useful

29:54.540 --> 29:56.800
other than having Andrea very happy

29:56.900 --> 29:59.560
when he's playing with this all day?

29:59.660 --> 30:01.440
Well, we have to come back one second

30:01.540 --> 30:04.920
to how Lisp_Objects are represented within Emacs.

30:05.020 --> 30:09.480
Lisp_Objects are represented as machine words,

30:09.580 --> 30:12.580
where we reserve a few bits to indicate the type.

30:12.680 --> 30:15.560
And every time we access the object,

30:15.660 --> 30:17.120
when this is a Fixnum,

30:17.220 --> 30:19.920
or a regular object where this is a pointer,

30:20.020 --> 30:21.600
we always have to extract these bits,

30:21.700 --> 30:24.520
make sure that they satisfy a condition,

30:24.620 --> 30:27.120
so make sure that we are going to manipulate

30:27.220 --> 30:28.580
the object of the type we expect,

30:28.680 --> 30:31.040
and then we can extract the object

30:31.140 --> 30:32.540
and do the manipulation.

30:32.640 --> 30:34.420
If the compiler managed to prove

30:34.620 --> 30:37.400
that the contained object is of the right type,

30:37.500 --> 30:42.440
it will be able to not emit the type check,

30:42.540 --> 30:44.500
and save a lot of instructions.

30:44.600 --> 30:48.560
This is a very powerful optimization.

30:48.660 --> 30:50.760
Let's discuss some potential future development

30:50.860 --> 30:52.000
in this area.

30:52.100 --> 30:53.360
First I think it would be extremely nice

30:53.460 --> 30:54.920
to extend the propagation

30:55.020 --> 30:56.920
to types that are not built in.

30:57.020 --> 30:58.160
There are a lot of cases

30:58.260 --> 31:00.600
where we could optimize effectively.

31:00.700 --> 31:02.520
For instance when we do

31:02.620 --> 31:05.720
a lot of accesses to structures

31:05.820 --> 31:06.920
-- lots of stuff like that --

31:07.020 --> 31:08.880
where we keep on checking and checking

31:08.980 --> 31:10.640
the same object for the type

31:10.740 --> 31:13.080
where it's obvious, where it should be trivial

31:13.180 --> 31:14.720
to prove that it's the right type

31:14.820 --> 31:16.360
after the first access or things like that.

31:16.460 --> 31:18.920
So I believe this is a low-hanging fruit

31:19.020 --> 31:21.180
in terms of performance.

31:21.280 --> 31:23.080
Also I think it would be really nice

31:23.180 --> 31:24.720
to extend the declare mechanism

31:24.820 --> 31:27.600
to allow the user to declare argument types.

31:27.700 --> 31:30.480
(Optionally. Indeed, optionally.)

31:30.580 --> 31:32.880
Doing that would give the compiler

31:32.980 --> 31:35.120
a lot of information to propagate value types

31:35.220 --> 31:36.360
within the function.

31:36.460 --> 31:38.840
But those will allow the compiler

31:38.940 --> 31:43.040
to give the user really good diagnostics

31:43.140 --> 31:45.400
during compile time. Like the compiler could say:

31:45.500 --> 31:47.000
Hey, here you are calling a function

31:47.100 --> 31:49.200
that is expecting this argument of this type,

31:49.300 --> 31:52.000
but I'm proving that you are calling it

31:52.100 --> 31:55.200
with an argument that is of THIS type,

31:55.300 --> 31:57.840
and is not a subtype of the expected one.

31:57.940 --> 32:00.240
So you are doing something not coherent.

32:00.340 --> 32:02.800
This kind of interprocedural logic.

32:02.900 --> 32:05.000
And I think the compiler should also take advantage

32:05.100 --> 32:06.640
(under certain circumstances)

32:06.740 --> 32:08.760
of this interprocedural analysis

32:08.860 --> 32:12.520
in order to optimize even more, when possible.

32:12.620 --> 32:13.720
Also I think we should

32:13.820 --> 32:15.480
work on our [?] to improve the code generation

32:15.580 --> 32:17.100
depending on the prediction

32:17.200 --> 32:18.160
that the compiler is doing.

32:18.260 --> 32:20.120
We already take advantage of those predictions,

32:20.220 --> 32:22.300
but I think we could do better.

32:22.400 --> 32:25.120
A quick look at some performance results.

32:25.220 --> 32:28.720
These are from the elisp-benchmarks package

32:28.820 --> 32:30.760
within GNU ELPA.

32:30.860 --> 32:32.520
This is the performance uplift,

32:32.620 --> 32:38.480
and we can identify about 4 "classes" of results.

32:38.580 --> 32:41.440
The first one there is no performance uplift,

32:41.540 --> 32:42.880
because there is not much we can do,

32:42.980 --> 32:44.720
and the time is probably not spent

32:44.820 --> 32:46.280
within the execution engine.

32:46.380 --> 32:49.000
And the ones around 3x are the ones

32:49.100 --> 32:50.680
Where probably we are not triggering

32:50.780 --> 32:52.640
manually specific optimizations,

32:52.740 --> 32:53.600
but just the fact

32:53.700 --> 32:57.100
that we are converting into native code

32:57.200 --> 33:00.500
is giving us this performance uplift.

33:00.600 --> 33:03.280
Then there is a bunch of other benchmarks

33:03.380 --> 33:05.680
where the Lisp optimizations are triggering,

33:05.780 --> 33:09.620
and the uplift is way bigger,

33:09.720 --> 33:11.900
and then we have 3 benchmarks that at the time

33:12.000 --> 33:13.420
are completely optimized out.

33:13.520 --> 33:15.580
That means the compiler became "so smart"

33:15.680 --> 33:18.160
that it was able to compute the result

33:18.260 --> 33:20.160
in the compile time and just put the result

33:20.260 --> 33:21.400
in the generated binary.

33:21.500 --> 33:23.880
Let's discuss a little bit the compilation model.

33:23.980 --> 33:26.080
This is an Hybrid one;

33:26.180 --> 33:29.420
it's both JIT-like and Ahead-of-Time-like.

33:29.520 --> 33:33.400
Emacs is composed of what we call an Emacs image,

33:33.500 --> 33:36.180
essentially the Emacs binary that we start.

33:36.280 --> 33:37.720
It's including all the C code,

33:37.820 --> 33:41.760
plus all the Lisp code that we preload.

33:41.860 --> 33:46.480
Then we have the rest of the Emacs Lisp codebase

33:46.580 --> 33:49.440
that can be loaded just if it's required.

33:49.540 --> 33:52.760
Same for the external packages, if we have any.

33:52.860 --> 33:55.120
If we build an Emacs Lisp

33:55.220 --> 33:57.940
with native compilation enabled, by default,

33:58.040 --> 34:01.200
only the Emacs image will be native compiled.

34:01.300 --> 34:03.920
All the other code will be compiled

34:04.020 --> 34:06.720
on the fly when it's loaded and executed

34:06.820 --> 34:08.800
the first time, if it's necessary.

34:08.900 --> 34:10.880
Same for the packages, in a transparent way

34:10.980 --> 34:12.640
and asynchronous way.

34:12.740 --> 34:13.480
Also worth noting

34:13.580 --> 34:15.400
that the result of this compilation

34:15.500 --> 34:17.240
will be stored into a cache directory

34:17.340 --> 34:19.900
within the home directory of the user.

34:20.000 --> 34:23.600
So it will be reused in the following sessions

34:23.700 --> 34:25.440
if the same file is loaded,

34:25.540 --> 34:27.520
without having to recompile multiple times

34:27.620 --> 34:29.320
the same file in different sessions.

34:29.420 --> 34:31.520
It works a little bit like this:

34:31.620 --> 34:33.800
When we load the byte-code for the first time,

34:33.900 --> 34:35.900
we spawn a native compilation process.

34:36.000 --> 34:39.320
Meanwhile we keep using the byte-code available.

34:39.420 --> 34:41.180
When the native compilation is finished,

34:41.280 --> 34:43.960
we hot-swap the definition of the functions

34:44.060 --> 34:46.040
that are contained in the file

34:46.140 --> 34:47.960
and start using the native code transparently.

34:48.060 --> 34:49.880
We do this asynchronously,

34:49.980 --> 34:53.140
and for more compilation units at the same time,

34:53.240 --> 34:56.280
so it looks a little bit like this.

34:56.380 --> 34:58.560
Let's try a quick demo of all of this machinery.

34:58.660 --> 35:00.880
I've started a fresh Emacs

35:00.980 --> 35:03.280
with native compilation support,

35:03.380 --> 35:05.060
and at this moment nothing is going on.

35:05.160 --> 35:07.440
We will test the machinery with Tetris,

35:07.540 --> 35:10.460
because I can't imagine anything better

35:10.560 --> 35:12.840
to test this.

35:12.940 --> 35:15.080
What I do expect is that when I launch Tetris

35:15.180 --> 35:17.920
it will be loaded, it will immediately

35:18.020 --> 35:20.460
start execution of the byte-compiled version,

35:20.560 --> 35:23.000
so we won't see any delay in the user experience,

35:23.100 --> 35:25.160
and in the meanwhile, a parallel process

35:25.260 --> 35:28.160
will start to native-compile Tetris itself.

35:28.260 --> 35:30.200
When the native compilation will be finished,

35:30.300 --> 35:33.240
the functions of all Tetris will be hot-swapped.

35:33.340 --> 35:36.280
So we will not see any interruption.

35:36.380 --> 35:39.680
So Tetris started, and it's running,

35:39.780 --> 35:41.600
we have seen no delay, and in the meanwhile,

35:41.700 --> 35:43.520
the native compilation probably already finished,

35:43.620 --> 35:44.960
we can have a look.

35:45.060 --> 35:47.160
In this I see the native compilation log buffer.

35:47.260 --> 35:49.720
So we see that Tetris has been native compiled,

35:49.820 --> 35:51.760
and all of its dependencies.

35:51.860 --> 35:53.840
Now Tetris is still running,

35:53.940 --> 36:00.480
but I can do "C-h f tetris"

36:00.580 --> 36:02.540
and we can see that 'tetris'

36:02.640 --> 36:04.880
is an interactive native compiled

36:04.980 --> 36:07.940
Lisp function, so it has been native-compiled.

36:08.040 --> 36:13.500
I can even disassemble if I want.

36:13.600 --> 36:14.880
OK, so very cool.

36:14.980 --> 36:18.020
I guess we can say this mechanism is working.

36:18.120 --> 36:20.660
Also worth noting that if I go back

36:20.760 --> 36:24.120
to the *Async-native-compile-log* buffer,

36:24.220 --> 36:27.960
we see we have compiled another bunch of files.

36:28.060 --> 36:31.600
I think these are because of my 'C-h f',

36:31.700 --> 36:33.800
this help function command and disassemble,

36:33.900 --> 36:34.920
and so on.

36:35.020 --> 36:37.720
The first time you run Emacs, you will have,

36:37.820 --> 36:41.280
from time to time, these processes spawned.

36:41.380 --> 36:43.100
Emacs is "compiling itself",

36:43.200 --> 36:45.520
and it's replacing the byte-code definition

36:45.620 --> 36:47.640
with the native one. But after a few sessions,

36:47.740 --> 36:49.760
you will not see this anymore,

36:49.860 --> 36:51.360
because the output of this compilation,

36:51.460 --> 36:54.900
as I mentioned, are stored in the user directory.

36:55.000 --> 36:57.560
To conclude: Emacs with native compilation support

36:57.660 --> 36:59.080
is coming up in Emacs 28,

36:59.180 --> 37:01.040
that is gonna be the next major stable release

37:01.140 --> 37:02.080
that will be released.

37:02.180 --> 37:04.840
So we ought to celebrate with a big party,

37:04.940 --> 37:07.320
I believe. But before going to the party,

37:07.420 --> 37:09.080
I'd like to list a few points

37:09.180 --> 37:11.340
that I think have been success factors

37:11.440 --> 37:13.420
in upstreaming this work.

37:13.520 --> 37:15.420
It has been extremely important

37:15.520 --> 37:18.140
to get in touch with upstream as soon as possible,

37:18.240 --> 37:20.680
as soon as I had a proof of concept.

37:20.780 --> 37:22.600
It's been extremely important

37:22.700 --> 37:24.660
to involve the community as much as possible,

37:24.760 --> 37:28.480
and this included keeping a development blog,

37:28.580 --> 37:31.600
and posts about that on emacs-devel,

37:31.700 --> 37:33.720
and also producing material,

37:33.820 --> 37:36.020
participating in conferences,

37:36.120 --> 37:38.320
and giving presentations like the one I'm doing,

37:38.420 --> 37:40.680
to explain what I was doing and how it works.

37:40.780 --> 37:43.040
It has been extremely important, also,

37:43.140 --> 37:45.860
to be able to rely on the upstream infrastructure.

37:45.960 --> 37:47.540
So, to develop the software

37:47.640 --> 37:49.080
as a feature branch in the official git,

37:49.180 --> 37:49.960
but even more, I would say,

37:50.060 --> 37:51.660
to use the official bug tracker

37:51.760 --> 37:52.880
for solving bugs of this branch.

37:52.980 --> 37:54.680
This gave the opportunity

37:54.780 --> 37:58.160
to stay really in close touch with maintainers,

37:58.260 --> 37:59.920
and senior developers of Emacs.

38:00.020 --> 38:02.980
That helped me a lot. And at the same time

38:03.080 --> 38:04.820
they were informed about what I was doing

38:04.920 --> 38:07.360
and what was the status of this feature branch.

38:07.460 --> 38:08.800
Extremely important.

38:08.900 --> 38:11.160
And also I think it played a major role

38:11.260 --> 38:14.120
to try to design this enormous patch

38:14.220 --> 38:18.120
in a way that the impact on the current codebase

38:18.220 --> 38:21.120
was minimized (at least as much as possible).

38:21.220 --> 38:23.660
And also minimizing

38:23.760 --> 38:26.240
the impact on the user operation of the software,

38:26.340 --> 38:28.040
in this case Emacs.

38:28.140 --> 38:29.640
So yes, mandatory Special Thanks:

38:29.740 --> 38:33.360
Emacs developers, and especially maintainers

38:33.460 --> 38:36.680
and senior developers like Stefan Monnier,

38:36.780 --> 38:40.440
that helped me a lot across this long journey.

38:40.540 --> 38:42.800
And, well, all the community

38:42.900 --> 38:45.160
that really got so excited about this project

38:45.260 --> 38:46.240
and gave me the energy

38:46.340 --> 38:49.120
to go through all of this time and development

38:49.220 --> 38:51.980
and bugs and solving, etc. etc.

38:52.080 --> 38:54.920
So yes, it was a really exciting time,

38:55.020 --> 38:58.000
and I think we have to look forward

38:58.100 --> 39:01.400
and start thinking about how to improve all this

39:01.500 --> 39:02.920
for the following years.

39:03.020 --> 39:04.300
And that's it.

39:04.400 --> 39:06.040
I think I should be online for questions.

39:06.140 --> 39:07.480
Thank you very much.

39:07.580 --> 39:07.680
[captions by John Cummings]