WEBVTT captioned by kana


00:00:01.200 --> 00:00:02.803
Hello! This is Kana!

00:00:02.903 --> 00:00:04.367
And today I'll be talking about

00:00:04.368 --> 00:00:06.067
<b>J</b>ust-<b>I</b>n-<b>T</b>ime compilation, or JIT,

00:00:06.068 --> 00:00:07.363
for Emacs Lisp,

00:00:07.463 --> 00:00:11.163
based on my work-in-progress Emacs clone, Juicemacs.

00:00:11.263 --> 00:00:13.533
Juicemacs aims to explore a few things

00:00:13.534 --> 00:00:15.843
that I've been wondering about for a while.

00:00:15.943 --> 00:00:18.567
For exmaple, what if we had better or even

00:00:18.568 --> 00:00:21.223
transparent concurrency in ELisp?

00:00:21.323 --> 00:00:23.243
Or, can we have a concurrent GUI?

00:00:23.343 --> 00:00:26.783
One that does not block, or is blocked by Lisp code?

00:00:26.883 --> 00:00:31.067
And finally what can JIT compilation do for ELisp?

00:00:31.068 --> 00:00:34.083
Will it provide better performance?

00:00:34.183 --> 00:00:37.400
However, a main problem with explorations

00:00:37.401 --> 00:00:38.623
in Emacs clones is that,

00:00:38.723 --> 00:00:40.863
Emacs is a whole universe.

00:00:40.963 --> 00:00:43.600
And that means, to make these explorations

00:00:43.601 --> 00:00:45.383
meaningful for Emacs users,

00:00:45.483 --> 00:00:47.967
we need to cover a lot of Emacs features,

00:00:47.968 --> 00:00:50.543
before we can ever begin.

00:00:50.643 --> 00:00:53.923
For example, one of the features of Emacs is that,

00:00:54.023 --> 00:00:56.003
it supports a lot of encodings.

00:00:56.103 --> 00:00:59.267
Let's look at this string: it can be encoded

00:00:59.268 --> 00:01:03.643
in both Unicode and Shift-JIS, a Japanese encoding system.

00:01:03.743 --> 00:01:07.067
But currently, Unicode does not have

00:01:07.068 --> 00:01:09.803
an official mapping for this "ki" (﨑) character.

00:01:09.903 --> 00:01:12.767
So when we map from Shift-JIS to Unicode,

00:01:12.768 --> 00:01:14.423
in most programming languages,

00:01:14.523 --> 00:01:16.533
you end up with something like this:

00:01:16.534 --> 00:01:19.143
it's a replacement character.

00:01:19.243 --> 00:01:22.067
But in Emacs, it actually extends

00:01:22.068 --> 00:01:23.883
the Unicode range by threefold,

00:01:23.983 --> 00:01:26.833
and uses the extra range to losslessly

00:01:26.834 --> 00:01:29.483
support characters like this.

00:01:29.583 --> 00:01:31.923
So if you want to support this feature,

00:01:32.023 --> 00:01:34.033
that basically rules out all string

00:01:34.034 --> 00:01:37.243
libraries with Unicode assumptions.

00:01:37.843 --> 00:01:40.067
For another, you need to support

00:01:40.068 --> 00:01:41.883
the regular expressions in Emacs,

00:01:41.983 --> 00:01:45.023
which are, really irregular.

00:01:45.123 --> 00:01:46.900
For example, it supports asserting

00:01:46.901 --> 00:01:49.403
about the user cursor position.

00:01:49.503 --> 00:01:52.033
And it also uses some character tables,

00:01:52.034 --> 00:01:53.883
that can be modified from Lisp code,

00:01:53.983 --> 00:01:56.163
to determine to case mappings.

00:01:56.263 --> 00:01:59.567
And all that makes it really hard, or even

00:01:59.568 --> 00:02:05.123
impossible to use any existing regexp libraries.

00:02:05.223 --> 00:02:07.883
Also, you need a functional garbage collector.

00:02:07.983 --> 00:02:09.867
You need threading primitives, because

00:02:09.868 --> 00:02:12.323
Emacs has already had some threading support.

00:02:12.423 --> 00:02:14.533
And you might want the performance of your clone

00:02:14.534 --> 00:02:18.963
to match Emacs, even with its native compilation enabled.

00:02:19.063 --> 00:02:21.500
Not to mention you also need a GUI for an editor.

00:02:21.501 --> 00:02:23.543
And so on.

00:02:23.643 --> 00:02:25.633
For Juicemacs, building on Java and

00:02:25.634 --> 00:02:27.563
a compiler framework called Truffle,

00:02:27.663 --> 00:02:30.503
helps in getting better performance;

00:02:30.603 --> 00:02:32.933
and by choosing a language with a good GC,

00:02:32.934 --> 00:02:38.063
we can actually focus more on the challenges above.

00:02:38.163 --> 00:02:41.433
Currently, Juicemacs has implemented three out of,

00:02:41.434 --> 00:02:43.983
at least four of the interpreters in Emacs.

00:02:44.083 --> 00:02:46.363
One for lisp code, one for bytecode,

00:02:46.463 --> 00:02:48.567
and one for regular expressions,

00:02:48.568 --> 00:02:50.903
all of them JIT-capable.

00:02:51.003 --> 00:02:53.667
Other than these, Emacs also has around

00:02:53.668 --> 00:02:56.083
two thousand built-in functions in C code.

00:02:56.183 --> 00:02:57.333
And Juicemacs has around

00:02:57.334 --> 00:02:59.763
four hundred of them implemented.

00:02:59.863 --> 00:03:03.603
It's not that many, but it is surprisingly enough

00:03:03.703 --> 00:03:05.200
to bootstrap Emacs and run

00:03:05.201 --> 00:03:08.483
the portable dumper, or pdump, in short.

00:03:08.583 --> 00:03:11.243
Let's have a try.

00:03:11.343 --> 00:03:11.703


00:03:11.803 --> 00:03:14.923
So this is the binary produced by Java native image.

00:03:15.023 --> 00:03:17.167
And it's loading all the files

00:03:17.168 --> 00:03:18.763
needed for bootstrapping.

00:03:18.863 --> 00:03:22.233
Then it dumps the memory to a file to

00:03:22.234 --> 00:03:24.923
be loaded later, giving us fast startup.

00:03:25.023 --> 00:03:28.723
As we can see here, it throws some frame errors

00:03:28.823 --> 00:03:31.400
because Juicemacs doesn't have an editor UI

00:03:31.401 --> 00:03:33.283
or functional frames yet.

00:03:33.383 --> 00:03:35.367
But otherwise, it can already run

00:03:35.368 --> 00:03:36.643
quite some lisp code.

00:03:36.743 --> 00:03:40.400
For example, this code uses the benchmark library

00:03:40.401 --> 00:03:44.403
to measure the performance of this Fibonacci function.

00:03:44.503 --> 00:03:47.067
And we can see here, the JIT engine is

00:03:47.068 --> 00:03:51.163
already kicking in and makes the execution faster.

00:03:51.263 --> 00:03:53.483
In addition to that, with a bit of workaround,

00:03:53.583 --> 00:03:56.467
Juicemacs can also run some of the ERT,

00:03:56.468 --> 00:04:01.043
or, <b>E</b>macs <b>R</b>egression <b>T</b>est suite, that comes with Emacs.

00:04:01.143 --> 00:04:05.823
So... Yes, there are a bunch of test failures,

00:04:05.923 --> 00:04:07.933
which means we are not that compatible

00:04:07.934 --> 00:04:09.523
with Emacs and need more work.

00:04:09.623 --> 00:04:12.803
But the whole testing procedure runs fine,

00:04:12.903 --> 00:04:14.767
and it has proper stack traces,

00:04:14.768 --> 00:04:17.803
which is quite useful for debugging Juicemacs.

00:04:17.903 --> 00:04:21.033
So with that, a rather functional JIT runtime,

00:04:21.034 --> 00:04:25.983
let's now try look into today's topic, JIT compilation for ELisp.

00:04:26.083 --> 00:04:28.533
So, you probably know that Emacs has supported

00:04:28.534 --> 00:04:32.083
native-compilation, or nativecomp in short, for some time now.

00:04:32.183 --> 00:04:35.033
It mainly uses GCC to compile Lisp code

00:04:35.034 --> 00:04:37.363
into native code, ahead of time.

00:04:37.463 --> 00:04:41.433
And during runtime, Emacs loads those compiled files,

00:04:41.434 --> 00:04:44.523
and gets the performance of native code.

00:04:44.623 --> 00:04:47.643
However, for example, for installed packages,

00:04:47.743 --> 00:04:49.059
we might want to compile them when we

00:04:49.060 --> 00:04:51.823
actually use them instead of ahead of time.

00:04:51.923 --> 00:04:53.733
And Emacs supports this through

00:04:53.734 --> 00:04:55.683
this <i>native-comp-jit-compilation</i> flag.

00:04:55.783 --> 00:04:59.767
What it does is, during runtime, Emacs sends

00:04:59.768 --> 00:05:03.203
loaded files to external Emacs worker processes,

00:05:03.303 --> 00:05:06.903
which will then compile those files asynchronously.

00:05:07.003 --> 00:05:09.043
And when the compilation is done,

00:05:09.143 --> 00:05:11.967
the current Emacs session will load the compiled code back

00:05:11.968 --> 00:05:16.323
and improves its performance, on the fly.

00:05:16.423 --> 00:05:18.643
When you look at this procedure, however, it is,

00:05:18.743 --> 00:05:21.563
ahead-of-time compilation, done at runtime.

00:05:21.663 --> 00:05:25.123
And it is what current Emacs calls JIT compilation.

00:05:25.223 --> 00:05:27.867
But if you look at some other JIT engines,

00:05:27.868 --> 00:05:31.803
you'll see much more complex architectures.

00:05:31.903 --> 00:05:34.233
So, take luaJIT for an example,

00:05:34.234 --> 00:05:36.163
in addition to this red line here,

00:05:36.263 --> 00:05:38.767
which leads us from an interpreted state

00:05:38.768 --> 00:05:40.643
to a compiled native state,

00:05:40.743 --> 00:05:42.163
which is also what Emacs does,

00:05:42.263 --> 00:05:44.333
LuaJIT also supports going from

00:05:44.334 --> 00:05:47.523
a compiled state back to its interpreter.

00:05:47.623 --> 00:05:51.483
And this process is called "deoptimization".

00:05:51.583 --> 00:05:55.300
In contrast to its name, deoptimization here actually

00:05:55.301 --> 00:05:58.563
enables a huge category of JIT optimizations.

00:05:58.663 --> 00:06:00.163
They are called speculation.

00:06:01.463 --> 00:06:04.600
Basically, with speculation, the compiler

00:06:04.601 --> 00:06:07.683
can use runtime statistics to speculate,

00:06:07.783 --> 00:06:11.443
to make bolder assumptions in the compiled code.

00:06:11.543 --> 00:06:13.983
And when the assumptions are invalidated,

00:06:14.083 --> 00:06:18.323
the runtime deoptimizes the code, updates statistics,

00:06:18.423 --> 00:06:21.133
and then recompile the code based on new assumptions,

00:06:21.134 --> 00:06:24.443
and that will make the code more performant.

00:06:24.543 --> 00:06:26.763
Let's look at an example.

00:06:28.463 --> 00:06:30.967
So, here is a really simple function,

00:06:30.968 --> 00:06:33.083
that adds one to the input number.

00:06:33.183 --> 00:06:36.167
But in Emacs, it is not that simple,

00:06:36.168 --> 00:06:38.203
because Emacs has three categories of numbers,

00:06:38.303 --> 00:06:42.700
that is, fix numbers, or machine-word-sized integers,

00:06:42.701 --> 00:06:45.603
floating numbers, and big integers.

00:06:45.703 --> 00:06:47.600
And when we compile this, we need

00:06:47.601 --> 00:06:49.363
to handle all three cases.

00:06:49.463 --> 00:06:52.600
And if we analyze the code produced by Emacs,

00:06:52.601 --> 00:06:54.683
as is shown by this gray graph here,

00:06:54.783 --> 00:06:58.083
we can see that it has, two paths:

00:06:58.183 --> 00:07:01.403
One fast path, that does fast fix number addition;

00:07:01.503 --> 00:07:03.967
and one for slow paths, that calls out

00:07:03.968 --> 00:07:06.523
to an external plus-one function,

00:07:06.623 --> 00:07:09.683
to handle floating number and big integers.

00:07:09.783 --> 00:07:13.167
Now, if we pass integers into this function,

00:07:13.168 --> 00:07:16.283
it's pretty fast because it's on the fast path.

00:07:16.383 --> 00:07:19.767
However, if we pass in a floating number,

00:07:19.768 --> 00:07:21.843
then it has to go through the slow path,

00:07:21.943 --> 00:07:25.563
doing an extra function call, which is slow.

00:07:25.663 --> 00:07:28.733
What speculation might help here is that,

00:07:28.734 --> 00:07:31.443
it can have flexible fast paths.

00:07:31.543 --> 00:07:34.563
When we pass a floating number into this function,

00:07:34.663 --> 00:07:37.400
which currently has only fixnumbers on the fast path,

00:07:37.401 --> 00:07:40.723
it also has to go through the slow path.

00:07:40.823 --> 00:07:44.567
But the difference is that, a speculative runtime can

00:07:44.568 --> 00:07:47.763
deoptimize and recompile the code to adapt to this.

00:07:47.863 --> 00:07:50.367
And when it recompiles, it might add

00:07:50.368 --> 00:07:52.643
floating number onto the fast path,

00:07:52.743 --> 00:07:55.003
and now floating number operations are also fast.

00:07:55.103 --> 00:07:58.567
And this kind of speculation is why

00:07:58.568 --> 00:08:03.603
speculative runtime can be really fast.

00:08:03.703 --> 00:08:05.723
Let's take a look at some benchmarks.

00:08:05.823 --> 00:08:09.423
They're obtained with the <i>elisp-benchmarks</i> library on ELPA.

00:08:09.523 --> 00:08:12.600
The blue line here is for nativecomp,

00:08:12.601 --> 00:08:16.043
and these blue areas mean that nativecomp is slower.

00:08:16.143 --> 00:08:19.133
And, likewise, green areas mean that

00:08:19.134 --> 00:08:20.523
Juicemacs is slower.

00:08:20.623 --> 00:08:22.867
At a glance, the two (or four)

00:08:22.868 --> 00:08:25.143
actually seems somehow on par, to me.

00:08:25.243 --> 00:08:30.383
But, let's take a closer look at some of them.

00:08:30.483 --> 00:08:32.667
So, the first few benchmarks are the classic,

00:08:32.668 --> 00:08:33.983
Fibonacci benchmarks.

00:08:34.083 --> 00:08:36.933
We know that, the series is formed by

00:08:36.934 --> 00:08:39.203
adding the previous two numbers in the series.

00:08:39.303 --> 00:08:41.700
And looking at this expression here,

00:08:41.701 --> 00:08:44.043
Fibonacci benchmarks are quite intensive

00:08:44.143 --> 00:08:46.800
in number additions, subtractions,

00:08:46.801 --> 00:08:49.103
and function calls, if you use recursions.

00:08:49.203 --> 00:08:51.000
And it is exactly why

00:08:51.001 --> 00:08:54.323
Fibonacci series is a good benchmark.

00:08:54.423 --> 00:08:57.243
And looking at the results here... wow.

00:08:57.343 --> 00:08:59.843
Emacs nativecomp executes instantaneously.

00:08:59.943 --> 00:09:04.523
It's a total defeat for Juicemacs, seemingly.

00:09:04.623 --> 00:09:08.043
Now, if you're into benchmarks, you know something is wrong here:

00:09:08.143 --> 00:09:11.683
we are comparing the different things.

00:09:11.783 --> 00:09:14.200
So let's look under the hood

00:09:14.201 --> 00:09:15.483
and disassemble the function

00:09:15.583 --> 00:09:17.567
with this convenient Emacs command

00:09:17.568 --> 00:09:19.063
called <i>disassemble</i>...

00:09:19.163 --> 00:09:23.043
And these two lines of code is what we got.

00:09:23.143 --> 00:09:24.700
So, we already can see

00:09:24.701 --> 00:09:26.123
what's going on here:

00:09:26.223 --> 00:09:29.963
GCC sees Fibonacci is a pure function,

00:09:30.063 --> 00:09:31.867
because it returns the same value

00:09:31.868 --> 00:09:33.243
for the same arguments,

00:09:33.343 --> 00:09:35.700
so GCC chooses to do the computation

00:09:35.701 --> 00:09:36.723
at compile time

00:09:36.823 --> 00:09:39.133
and inserts the final number directly

00:09:39.134 --> 00:09:40.323
into the compiled code.

00:09:41.823 --> 00:09:43.603
It is actually great!

00:09:43.703 --> 00:09:45.400
Because it shows that nativecomp

00:09:45.401 --> 00:09:47.283
knows about pure functions,

00:09:47.383 --> 00:09:48.700
and can do all kinds of things

00:09:48.701 --> 00:09:51.203
like removing or constant-folding them.

00:09:51.303 --> 00:09:54.403
And Juicemacs just does not do that.

00:09:54.503 --> 00:09:57.367
However, we are also concerned about

00:09:57.368 --> 00:09:59.003
the things we mentioned earlier:

00:09:59.103 --> 00:10:00.900
the performance of number additions,

00:10:00.901 --> 00:10:02.983
or function calls.

00:10:03.083 --> 00:10:05.633
So, in order to let the benchmarks

00:10:05.634 --> 00:10:06.863
show some extra things,

00:10:06.963 --> 00:10:08.367
we need to modify it a bit...

00:10:08.368 --> 00:10:11.323
by simply making things non-constant.

00:10:11.423 --> 00:10:15.203
With that, Emacs gets much slower now.

00:10:15.303 --> 00:10:17.133
And again, let's look what's

00:10:17.134 --> 00:10:21.083
happening behind these numbers.

00:10:21.183 --> 00:10:23.500
Similarly, with the <i>disassemble</i> command,

00:10:23.501 --> 00:10:25.643
we can look into the assembly.

00:10:25.743 --> 00:10:28.019
And again, we can already see

00:10:28.020 --> 00:10:29.303
what's happening here.

00:10:29.403 --> 00:10:32.083
So, Juicemacs, due to its speculation nature,

00:10:32.183 --> 00:10:35.443
supports fast paths for all three kind of numbers.

00:10:35.543 --> 00:10:39.233
However, currently, Emacs nativecomp

00:10:39.234 --> 00:10:41.243
does not have any fast path

00:10:41.343 --> 00:10:43.433
for the operations here like additions,

00:10:43.434 --> 00:10:45.803
or subtractions, or comparisons,

00:10:45.903 --> 00:10:48.067
which is exactly what

00:10:48.068 --> 00:10:50.963
Fibonacci benchmarks are measuring.

00:10:51.063 --> 00:10:53.800
Emacs, at this time, has to call some generic,

00:10:53.801 --> 00:10:57.963
external functions for them, and this is slow.

00:11:00.063 --> 00:11:03.203
But is nativecomp really that slow?

00:11:03.303 --> 00:11:04.967
So, I also ran the same benchmark

00:11:04.968 --> 00:11:07.083
in Common Lisp, with SBCL.

00:11:07.183 --> 00:11:09.000
And nativecomp is already fast,

00:11:09.001 --> 00:11:11.003
compared to untyped SBCL.

00:11:11.103 --> 00:11:15.500
It's because SBCL also emits call instructions

00:11:15.501 --> 00:11:18.483
when it comes to no type info.

00:11:18.583 --> 00:11:21.700
However, once we declare the types,

00:11:21.701 --> 00:11:25.283
SBCL is able to compile a fast path for fix numbers,

00:11:25.383 --> 00:11:27.467
which makes its performance on par

00:11:27.468 --> 00:11:30.683
with speculative JIT engines (that is, Juicemacs),

00:11:30.783 --> 00:11:34.763
because, now both of us are now on fast paths.

00:11:36.063 --> 00:11:38.400
Additionally, if we are bold enough

00:11:38.401 --> 00:11:41.203
to pass this safety zero flag to SBCL,

00:11:41.303 --> 00:11:43.700
it will remove all the slow paths

00:11:43.701 --> 00:11:44.963
and type checks,

00:11:45.063 --> 00:11:46.367
and its performance is close

00:11:46.368 --> 00:11:48.643
to what you get with C.

00:11:48.743 --> 00:11:51.299
Well, probably we don't want safety zero

00:11:51.300 --> 00:11:52.063
most of the time.

00:11:52.163 --> 00:11:55.133
But even then, if nativecomp were to

00:11:55.134 --> 00:11:57.763
get fast paths for more constructs,

00:11:57.863 --> 00:11:59.867
there certainly is quite

00:11:59.868 --> 00:12:03.563
some room for performance improvement.

00:12:04.063 --> 00:12:06.803
Let's look at some more benchmarks.

00:12:06.903 --> 00:12:08.933
For example, for this inclist,

00:12:08.934 --> 00:12:10.923
or increment-list, benchmark,

00:12:11.023 --> 00:12:14.333
Juicemacs is really slow here. Partly,

00:12:14.334 --> 00:12:17.603
it comes from the cost of Java boxing integers.

00:12:17.703 --> 00:12:20.300
On the other hand, for Emacs nativecomp,

00:12:20.301 --> 00:12:22.043
for this particular benchmark,

00:12:22.143 --> 00:12:23.667
it actually has fast paths

00:12:23.668 --> 00:12:25.523
for all of the operations.

00:12:25.623 --> 00:12:27.723
And that's why it can be so fast,

00:12:27.823 --> 00:12:30.667
and that also proves the nativecomp

00:12:30.668 --> 00:12:33.843
has a lot potential for improvement.

00:12:33.943 --> 00:12:35.833
There is another benchmark here

00:12:35.834 --> 00:12:37.963
that use advices.

00:12:38.063 --> 00:12:40.500
So Emacs Lisp supports using

00:12:40.501 --> 00:12:42.203
advices to override functions

00:12:42.303 --> 00:12:44.833
by wrapping the original function, and an advice

00:12:44.834 --> 00:12:47.443
function, two of them, inside a glue function.

00:12:47.543 --> 00:12:51.467
And in this benchmark, we advice the Fibonacci function

00:12:51.468 --> 00:12:54.523
to cache the first ten entries to speed up computation,

00:12:54.623 --> 00:13:00.003
as can be seen in the speed-up in the Juicemacs results.

00:13:00.103 --> 00:13:02.900
However, it seems that nativecomp does not yet

00:13:02.901 --> 00:13:08.523
compile glue functions, and that makes advices slower.

00:13:08.623 --> 00:13:12.043
With these benchmarks, let's discuss this big question:

00:13:12.143 --> 00:13:16.563
Should GNU Emacs adopt speculative JIT compilation?

00:13:16.663 --> 00:13:18.967
Well, the hidden question is actually,

00:13:18.968 --> 00:13:21.223
is it worth it?

00:13:21.323 --> 00:13:24.163
And, my personal answer is, maybe not.

00:13:24.263 --> 00:13:28.133
The first reason is that, slow paths, like, floating numbers,

00:13:28.134 --> 00:13:31.043
are actually not that frequent in Emacs.

00:13:31.143 --> 00:13:34.100
And optimizing for fast paths like fix numbers

00:13:34.101 --> 00:13:37.983
can already get us very good performance already.

00:13:38.083 --> 00:13:40.333
And the second or main reason is that,

00:13:40.334 --> 00:13:43.163
speculative JIT is very hard.

00:13:43.263 --> 00:13:46.843
LuaJIT, for example, took a genius to build.

00:13:46.943 --> 00:13:50.967
Even with the help of GCC, we need to hand-write

00:13:50.968 --> 00:13:54.283
all those fast path or slow path or switching logic.

00:13:54.383 --> 00:13:58.133
We need to find a way to deoptimize, which requires

00:13:58.134 --> 00:14:01.803
mapping machine registers back to interpreter stack.

00:14:01.903 --> 00:14:04.067
And also, speculation needs runtime info,

00:14:04.068 --> 00:14:07.323
which also costs us extra memory.

00:14:07.423 --> 00:14:10.763
Moreover, as is shown by some benchmarks above,

00:14:10.863 --> 00:14:13.333
there's some low-hanging fruits in nativecomp that

00:14:13.334 --> 00:14:17.343
might get us better performance with relatively lower effort.

00:14:17.443 --> 00:14:22.163
Compared to this, a JIT engine is a huge, huge undertaking.

00:14:22.263 --> 00:14:26.123
But, for Juicemacs, the JIT engine comes a lot cheaper,

00:14:26.223 --> 00:14:29.067
because, we are cheating by building on

00:14:29.068 --> 00:14:33.443
an existing compiler framework called Truffle.

00:14:33.543 --> 00:14:35.883
Truffle is a meta-compiler framework,

00:14:35.983 --> 00:14:37.633
which means that it lets you write

00:14:37.634 --> 00:14:40.103
an interpreter, add required annotations,

00:14:40.203 --> 00:14:42.500
and it will automatically turn the

00:14:42.501 --> 00:14:45.643
interpreter into a JIT runtime.

00:14:45.743 --> 00:14:49.083
So for example, here is a typical bytecode interpreter.

00:14:49.183 --> 00:14:51.233
After you add the required annotations,

00:14:51.234 --> 00:14:52.523
Truffle will know that,

00:14:52.623 --> 00:14:55.533
the bytecode here is constant, and it should

00:14:55.534 --> 00:14:59.123
unroll this loop here, to inline all those bytecode.

00:14:59.223 --> 00:15:00.467
And then, when Truffle

00:15:00.468 --> 00:15:02.243
compiles the code, it knows that:

00:15:02.343 --> 00:15:05.233
the first loop here does: x plus one,

00:15:05.234 --> 00:15:07.723
and the second does: return.

00:15:07.823 --> 00:15:09.533
And then it will compile all that into,

00:15:09.534 --> 00:15:11.363
return x plus 1,

00:15:11.463 --> 00:15:14.067
which is exactly what we would expect

00:15:14.068 --> 00:15:17.683
when compiling this pseudo code.

00:15:17.783 --> 00:15:21.083
Building on that, we can also easily implement speculation,

00:15:21.183 --> 00:15:24.867
by using this <i>transferToInterpreterAndInvalidate</i> function

00:15:24.868 --> 00:15:26.123
provided by Truffle.

00:15:26.223 --> 00:15:28.533
And Truffle will automatically turn that

00:15:28.534 --> 00:15:30.683
into deoptimization.

00:15:30.783 --> 00:15:32.700
Now, for example, when this add function

00:15:32.701 --> 00:15:35.723
is supplied with, two floating numbers.

00:15:35.823 --> 00:15:38.243
It will go through the slow path here,

00:15:38.343 --> 00:15:40.960
which might lead to a compiled slow path,

00:15:40.961 --> 00:15:43.203
or deoptimization.

00:15:43.303 --> 00:15:45.733
And going this deoptimization way,

00:15:45.734 --> 00:15:48.223
it can then update the runtime stats.

00:15:48.323 --> 00:15:50.400
And now, when the code is compiled again,

00:15:50.401 --> 00:15:51.603
Truffle will know,

00:15:51.703 --> 00:15:54.100
that these compilation stats, suggests that,

00:15:54.101 --> 00:15:55.563
we have floating numbers.

00:15:55.663 --> 00:15:58.733
And this floating point addition branch will

00:15:58.734 --> 00:16:02.603
then be incorporated into the fast path.

00:16:02.703 --> 00:16:06.003
To put it into Java code...

00:16:06.103 --> 00:16:08.723
Most operations are just as simple as this.

00:16:08.823 --> 00:16:11.033
And it supports fast paths for integers,

00:16:11.034 --> 00:16:13.963
floating numbers, and big integers.

00:16:14.063 --> 00:16:17.133
And the simplicity of this not only saves us work,

00:16:17.134 --> 00:16:22.243
but also enables Juicemacs to explore more things more rapidly.

00:16:22.343 --> 00:16:26.483
And actually, I have done some silly explorations.

00:16:26.583 --> 00:16:30.203
For example, I tried to constant-fold more things.

00:16:30.303 --> 00:16:32.767
Many of us have an Emacs config that stays

00:16:32.768 --> 00:16:36.683
largely unchanged, at least during one Emacs session.

00:16:36.783 --> 00:16:39.667
And that means many of the global variables

00:16:39.668 --> 00:16:42.323
in ELisp are constant.

00:16:42.423 --> 00:16:44.600
And with speculation, we can

00:16:44.601 --> 00:16:46.683
speculate about the stable ones,

00:16:46.783 --> 00:16:49.563
and try to inline them as constants.

00:16:49.663 --> 00:16:51.733
And this might improve performance,

00:16:51.734 --> 00:16:53.083
or maybe not?

00:16:53.183 --> 00:16:55.367
Because, we will need a full editor

00:16:55.368 --> 00:16:58.123
to get real world data.

00:16:58.223 --> 00:17:01.733
I also tried changing cons lists to be backed

00:17:01.734 --> 00:17:05.243
by some arrays, because, maybe arrays are faster, I guess?

00:17:05.343 --> 00:17:09.033
But in the end, <i>setcdr</i> requires some kind of indirection,

00:17:09.034 --> 00:17:12.883
and that actually makes the performance worse.

00:17:12.983 --> 00:17:14.733
And for regular expressions,

00:17:14.734 --> 00:17:17.923
I also tried borrowing techniques from PCRE JIT,

00:17:18.023 --> 00:17:20.667
which is quite fast in itself, but it is

00:17:20.668 --> 00:17:24.163
unfortunately unsupported by Java Truffle runtime.

00:17:24.263 --> 00:17:27.333
So, looking at these, well,

00:17:27.334 --> 00:17:30.243
explorations can fail, certainly.

00:17:30.343 --> 00:17:32.800
But, with Truffle and Java, these,

00:17:32.801 --> 00:17:34.883
for now, are not that hard to implement,

00:17:34.983 --> 00:17:37.667
and also very often, they teach us something

00:17:37.668 --> 00:17:42.363
in return, whether or not they fail.

00:17:42.463 --> 00:17:45.333
Finally, let's talk about some explorations

00:17:45.334 --> 00:17:47.883
that we might get into in the future.

00:17:47.983 --> 00:17:49.683
For the JIT engine, for example,

00:17:49.783 --> 00:17:52.633
currently I'm looking into the implementation of

00:17:52.634 --> 00:17:56.883
nativecomp to maybe reuse some of its optimizations.

00:17:56.983 --> 00:18:01.323
For the GUI, I'm very very slowly working on one.

00:18:01.423 --> 00:18:03.733
If it ever completes, I have one thing

00:18:03.734 --> 00:18:06.603
I'm really looking forward to implementing.

00:18:06.703 --> 00:18:08.900
That is, inlining widgets, or even

00:18:08.901 --> 00:18:11.763
other buffers, directly into a buffer.

00:18:11.863 --> 00:18:13.967
Well, it's because, people sometimes complain

00:18:13.968 --> 00:18:16.003
about Emacs's GUI capabilities,

00:18:16.103 --> 00:18:19.767
But I personally think that supporting inlining,

00:18:19.768 --> 00:18:23.043
like a whole buffer inside another buffer as a rectangle,

00:18:23.143 --> 00:18:26.883
could get us very far in layout abilities.

00:18:26.983 --> 00:18:28.567
And this approach should also

00:18:28.568 --> 00:18:30.843
be compatible with terminals.

00:18:30.943 --> 00:18:32.933
And I really want to see how this idea

00:18:32.934 --> 00:18:36.003
plays out with Juicemacs.

00:18:36.103 --> 00:18:38.963
And of course, there's Lisp concurrency.

00:18:39.063 --> 00:18:42.167
And currently i'm thinking of a JavaScript-like,

00:18:42.168 --> 00:18:46.283
transparent, single-thread model, using Java's virtual threads.

00:18:46.383 --> 00:18:49.967
But anyway, if you are interested in JIT compilation,

00:18:49.968 --> 00:18:51.663
Truffle, or anything above,

00:18:51.763 --> 00:18:53.867
or maybe you have your own ideas,

00:18:53.868 --> 00:18:56.283
you are very welcome to reach out!

00:18:56.383 --> 00:19:00.033
Juicemacs does need to implement many more built-in functions,

00:19:00.034 --> 00:19:03.063
and any help would be very appreciated.

00:19:03.163 --> 00:19:05.800
And I promise, it can be a very fun playground

00:19:05.801 --> 00:19:08.343
to learn about Emacs and do crazy things.

00:19:08.443 --> 00:19:10.902
Thank you!