WEBVTT captioned by kana
00:00:01.200 --> 00:00:02.803
Hello! This is Kana!
00:00:02.903 --> 00:00:04.367
And today I'll be talking about
00:00:04.368 --> 00:00:06.067
Just-In-Time compilation, or JIT,
00:00:06.068 --> 00:00:07.363
for Emacs Lisp,
00:00:07.463 --> 00:00:11.163
based on my work-in-progress Emacs clone, Juicemacs.
00:00:11.263 --> 00:00:13.533
Juicemacs aims to explore a few things
00:00:13.534 --> 00:00:15.843
that I've been wondering about for a while.
00:00:15.943 --> 00:00:18.567
For exmaple, what if we had better or even
00:00:18.568 --> 00:00:21.223
transparent concurrency in ELisp?
00:00:21.323 --> 00:00:23.243
Or, can we have a concurrent GUI?
00:00:23.343 --> 00:00:26.783
One that does not block, or is blocked by Lisp code?
00:00:26.883 --> 00:00:31.067
And finally what can JIT compilation do for ELisp?
00:00:31.068 --> 00:00:34.083
Will it provide better performance?
00:00:34.183 --> 00:00:37.400
However, a main problem with explorations
00:00:37.401 --> 00:00:38.623
in Emacs clones is that,
00:00:38.723 --> 00:00:40.863
Emacs is a whole universe.
00:00:40.963 --> 00:00:43.600
And that means, to make these explorations
00:00:43.601 --> 00:00:45.383
meaningful for Emacs users,
00:00:45.483 --> 00:00:47.967
we need to cover a lot of Emacs features,
00:00:47.968 --> 00:00:50.543
before we can ever begin.
00:00:50.643 --> 00:00:53.923
For example, one of the features of Emacs is that,
00:00:54.023 --> 00:00:56.003
it supports a lot of encodings.
00:00:56.103 --> 00:00:59.267
Let's look at this string: it can be encoded
00:00:59.268 --> 00:01:03.643
in both Unicode and Shift-JIS, a Japanese encoding system.
00:01:03.743 --> 00:01:07.067
But currently, Unicode does not have
00:01:07.068 --> 00:01:09.803
an official mapping for this "ki" (﨑) character.
00:01:09.903 --> 00:01:12.767
So when we map from Shift-JIS to Unicode,
00:01:12.768 --> 00:01:14.423
in most programming languages,
00:01:14.523 --> 00:01:16.533
you end up with something like this:
00:01:16.534 --> 00:01:19.143
it's a replacement character.
00:01:19.243 --> 00:01:22.067
But in Emacs, it actually extends
00:01:22.068 --> 00:01:23.883
the Unicode range by threefold,
00:01:23.983 --> 00:01:26.833
and uses the extra range to losslessly
00:01:26.834 --> 00:01:29.483
support characters like this.
00:01:29.583 --> 00:01:31.923
So if you want to support this feature,
00:01:32.023 --> 00:01:34.033
that basically rules out all string
00:01:34.034 --> 00:01:37.243
libraries with Unicode assumptions.
00:01:37.843 --> 00:01:40.067
For another, you need to support
00:01:40.068 --> 00:01:41.883
the regular expressions in Emacs,
00:01:41.983 --> 00:01:45.023
which are, really irregular.
00:01:45.123 --> 00:01:46.900
For example, it supports asserting
00:01:46.901 --> 00:01:49.403
about the user cursor position.
00:01:49.503 --> 00:01:52.033
And it also uses some character tables,
00:01:52.034 --> 00:01:53.883
that can be modified from Lisp code,
00:01:53.983 --> 00:01:56.163
to determine to case mappings.
00:01:56.263 --> 00:01:59.567
And all that makes it really hard, or even
00:01:59.568 --> 00:02:05.123
impossible to use any existing regexp libraries.
00:02:05.223 --> 00:02:07.883
Also, you need a functional garbage collector.
00:02:07.983 --> 00:02:09.867
You need threading primitives, because
00:02:09.868 --> 00:02:12.323
Emacs has already had some threading support.
00:02:12.423 --> 00:02:14.533
And you might want the performance of your clone
00:02:14.534 --> 00:02:18.963
to match Emacs, even with its native compilation enabled.
00:02:19.063 --> 00:02:21.500
Not to mention you also need a GUI for an editor.
00:02:21.501 --> 00:02:23.543
And so on.
00:02:23.643 --> 00:02:25.633
For Juicemacs, building on Java and
00:02:25.634 --> 00:02:27.563
a compiler framework called Truffle,
00:02:27.663 --> 00:02:30.503
helps in getting better performance;
00:02:30.603 --> 00:02:32.933
and by choosing a language with a good GC,
00:02:32.934 --> 00:02:38.063
we can actually focus more on the challenges above.
00:02:38.163 --> 00:02:41.433
Currently, Juicemacs has implemented three out of,
00:02:41.434 --> 00:02:43.983
at least four of the interpreters in Emacs.
00:02:44.083 --> 00:02:46.363
One for lisp code, one for bytecode,
00:02:46.463 --> 00:02:48.567
and one for regular expressions,
00:02:48.568 --> 00:02:50.903
all of them JIT-capable.
00:02:51.003 --> 00:02:53.667
Other than these, Emacs also has around
00:02:53.668 --> 00:02:56.083
two thousand built-in functions in C code.
00:02:56.183 --> 00:02:57.333
And Juicemacs has around
00:02:57.334 --> 00:02:59.763
four hundred of them implemented.
00:02:59.863 --> 00:03:03.603
It's not that many, but it is surprisingly enough
00:03:03.703 --> 00:03:05.200
to bootstrap Emacs and run
00:03:05.201 --> 00:03:08.483
the portable dumper, or pdump, in short.
00:03:08.583 --> 00:03:11.243
Let's have a try.
00:03:11.343 --> 00:03:11.703
00:03:11.803 --> 00:03:14.923
So this is the binary produced by Java native image.
00:03:15.023 --> 00:03:17.167
And it's loading all the files
00:03:17.168 --> 00:03:18.763
needed for bootstrapping.
00:03:18.863 --> 00:03:22.233
Then it dumps the memory to a file to
00:03:22.234 --> 00:03:24.923
be loaded later, giving us fast startup.
00:03:25.023 --> 00:03:28.723
As we can see here, it throws some frame errors
00:03:28.823 --> 00:03:31.400
because Juicemacs doesn't have an editor UI
00:03:31.401 --> 00:03:33.283
or functional frames yet.
00:03:33.383 --> 00:03:35.367
But otherwise, it can already run
00:03:35.368 --> 00:03:36.643
quite some lisp code.
00:03:36.743 --> 00:03:40.400
For example, this code uses the benchmark library
00:03:40.401 --> 00:03:44.403
to measure the performance of this Fibonacci function.
00:03:44.503 --> 00:03:47.067
And we can see here, the JIT engine is
00:03:47.068 --> 00:03:51.163
already kicking in and makes the execution faster.
00:03:51.263 --> 00:03:53.483
In addition to that, with a bit of workaround,
00:03:53.583 --> 00:03:56.467
Juicemacs can also run some of the ERT,
00:03:56.468 --> 00:04:01.043
or, Emacs Regression Test suite, that comes with Emacs.
00:04:01.143 --> 00:04:05.823
So... Yes, there are a bunch of test failures,
00:04:05.923 --> 00:04:07.933
which means we are not that compatible
00:04:07.934 --> 00:04:09.523
with Emacs and need more work.
00:04:09.623 --> 00:04:12.803
But the whole testing procedure runs fine,
00:04:12.903 --> 00:04:14.767
and it has proper stack traces,
00:04:14.768 --> 00:04:17.803
which is quite useful for debugging Juicemacs.
00:04:17.903 --> 00:04:21.033
So with that, a rather functional JIT runtime,
00:04:21.034 --> 00:04:25.983
let's now try look into today's topic, JIT compilation for ELisp.
00:04:26.083 --> 00:04:28.533
So, you probably know that Emacs has supported
00:04:28.534 --> 00:04:32.083
native-compilation, or nativecomp in short, for some time now.
00:04:32.183 --> 00:04:35.033
It mainly uses GCC to compile Lisp code
00:04:35.034 --> 00:04:37.363
into native code, ahead of time.
00:04:37.463 --> 00:04:41.433
And during runtime, Emacs loads those compiled files,
00:04:41.434 --> 00:04:44.523
and gets the performance of native code.
00:04:44.623 --> 00:04:47.643
However, for example, for installed packages,
00:04:47.743 --> 00:04:49.059
we might want to compile them when we
00:04:49.060 --> 00:04:51.823
actually use them instead of ahead of time.
00:04:51.923 --> 00:04:53.733
And Emacs supports this through
00:04:53.734 --> 00:04:55.683
this native-comp-jit-compilation flag.
00:04:55.783 --> 00:04:59.767
What it does is, during runtime, Emacs sends
00:04:59.768 --> 00:05:03.203
loaded files to external Emacs worker processes,
00:05:03.303 --> 00:05:06.903
which will then compile those files asynchronously.
00:05:07.003 --> 00:05:09.043
And when the compilation is done,
00:05:09.143 --> 00:05:11.967
the current Emacs session will load the compiled code back
00:05:11.968 --> 00:05:16.323
and improves its performance, on the fly.
00:05:16.423 --> 00:05:18.643
When you look at this procedure, however, it is,
00:05:18.743 --> 00:05:21.563
ahead-of-time compilation, done at runtime.
00:05:21.663 --> 00:05:25.123
And it is what current Emacs calls JIT compilation.
00:05:25.223 --> 00:05:27.867
But if you look at some other JIT engines,
00:05:27.868 --> 00:05:31.803
you'll see much more complex architectures.
00:05:31.903 --> 00:05:34.233
So, take luaJIT for an example,
00:05:34.234 --> 00:05:36.163
in addition to this red line here,
00:05:36.263 --> 00:05:38.767
which leads us from an interpreted state
00:05:38.768 --> 00:05:40.643
to a compiled native state,
00:05:40.743 --> 00:05:42.163
which is also what Emacs does,
00:05:42.263 --> 00:05:44.333
LuaJIT also supports going from
00:05:44.334 --> 00:05:47.523
a compiled state back to its interpreter.
00:05:47.623 --> 00:05:51.483
And this process is called "deoptimization".
00:05:51.583 --> 00:05:55.300
In contrast to its name, deoptimization here actually
00:05:55.301 --> 00:05:58.563
enables a huge category of JIT optimizations.
00:05:58.663 --> 00:06:00.163
They are called speculation.
00:06:01.463 --> 00:06:04.600
Basically, with speculation, the compiler
00:06:04.601 --> 00:06:07.683
can use runtime statistics to speculate,
00:06:07.783 --> 00:06:11.443
to make bolder assumptions in the compiled code.
00:06:11.543 --> 00:06:13.983
And when the assumptions are invalidated,
00:06:14.083 --> 00:06:18.323
the runtime deoptimizes the code, updates statistics,
00:06:18.423 --> 00:06:21.133
and then recompile the code based on new assumptions,
00:06:21.134 --> 00:06:24.443
and that will make the code more performant.
00:06:24.543 --> 00:06:26.763
Let's look at an example.
00:06:28.463 --> 00:06:30.967
So, here is a really simple function,
00:06:30.968 --> 00:06:33.083
that adds one to the input number.
00:06:33.183 --> 00:06:36.167
But in Emacs, it is not that simple,
00:06:36.168 --> 00:06:38.203
because Emacs has three categories of numbers,
00:06:38.303 --> 00:06:42.700
that is, fix numbers, or machine-word-sized integers,
00:06:42.701 --> 00:06:45.603
floating numbers, and big integers.
00:06:45.703 --> 00:06:47.600
And when we compile this, we need
00:06:47.601 --> 00:06:49.363
to handle all three cases.
00:06:49.463 --> 00:06:52.600
And if we analyze the code produced by Emacs,
00:06:52.601 --> 00:06:54.683
as is shown by this gray graph here,
00:06:54.783 --> 00:06:58.083
we can see that it has, two paths:
00:06:58.183 --> 00:07:01.403
One fast path, that does fast fix number addition;
00:07:01.503 --> 00:07:03.967
and one for slow paths, that calls out
00:07:03.968 --> 00:07:06.523
to an external plus-one function,
00:07:06.623 --> 00:07:09.683
to handle floating number and big integers.
00:07:09.783 --> 00:07:13.167
Now, if we pass integers into this function,
00:07:13.168 --> 00:07:16.283
it's pretty fast because it's on the fast path.
00:07:16.383 --> 00:07:19.767
However, if we pass in a floating number,
00:07:19.768 --> 00:07:21.843
then it has to go through the slow path,
00:07:21.943 --> 00:07:25.563
doing an extra function call, which is slow.
00:07:25.663 --> 00:07:28.733
What speculation might help here is that,
00:07:28.734 --> 00:07:31.443
it can have flexible fast paths.
00:07:31.543 --> 00:07:34.563
When we pass a floating number into this function,
00:07:34.663 --> 00:07:37.400
which currently has only fixnumbers on the fast path,
00:07:37.401 --> 00:07:40.723
it also has to go through the slow path.
00:07:40.823 --> 00:07:44.567
But the difference is that, a speculative runtime can
00:07:44.568 --> 00:07:47.763
deoptimize and recompile the code to adapt to this.
00:07:47.863 --> 00:07:50.367
And when it recompiles, it might add
00:07:50.368 --> 00:07:52.643
floating number onto the fast path,
00:07:52.743 --> 00:07:55.003
and now floating number operations are also fast.
00:07:55.103 --> 00:07:58.567
And this kind of speculation is why
00:07:58.568 --> 00:08:03.603
speculative runtime can be really fast.
00:08:03.703 --> 00:08:05.723
Let's take a look at some benchmarks.
00:08:05.823 --> 00:08:09.423
They're obtained with the elisp-benchmarks library on ELPA.
00:08:09.523 --> 00:08:12.600
The blue line here is for nativecomp,
00:08:12.601 --> 00:08:16.043
and these blue areas mean that nativecomp is slower.
00:08:16.143 --> 00:08:19.133
And, likewise, green areas mean that
00:08:19.134 --> 00:08:20.523
Juicemacs is slower.
00:08:20.623 --> 00:08:22.867
At a glance, the two (or four)
00:08:22.868 --> 00:08:25.143
actually seems somehow on par, to me.
00:08:25.243 --> 00:08:30.383
But, let's take a closer look at some of them.
00:08:30.483 --> 00:08:32.667
So, the first few benchmarks are the classic,
00:08:32.668 --> 00:08:33.983
Fibonacci benchmarks.
00:08:34.083 --> 00:08:36.933
We know that, the series is formed by
00:08:36.934 --> 00:08:39.203
adding the previous two numbers in the series.
00:08:39.303 --> 00:08:41.700
And looking at this expression here,
00:08:41.701 --> 00:08:44.043
Fibonacci benchmarks are quite intensive
00:08:44.143 --> 00:08:46.800
in number additions, subtractions,
00:08:46.801 --> 00:08:49.103
and function calls, if you use recursions.
00:08:49.203 --> 00:08:51.000
And it is exactly why
00:08:51.001 --> 00:08:54.323
Fibonacci series is a good benchmark.
00:08:54.423 --> 00:08:57.243
And looking at the results here... wow.
00:08:57.343 --> 00:08:59.843
Emacs nativecomp executes instantaneously.
00:08:59.943 --> 00:09:04.523
It's a total defeat for Juicemacs, seemingly.
00:09:04.623 --> 00:09:08.043
Now, if you're into benchmarks, you know something is wrong here:
00:09:08.143 --> 00:09:11.683
we are comparing the different things.
00:09:11.783 --> 00:09:14.200
So let's look under the hood
00:09:14.201 --> 00:09:15.483
and disassemble the function
00:09:15.583 --> 00:09:17.567
with this convenient Emacs command
00:09:17.568 --> 00:09:19.063
called disassemble...
00:09:19.163 --> 00:09:23.043
And these two lines of code is what we got.
00:09:23.143 --> 00:09:24.700
So, we already can see
00:09:24.701 --> 00:09:26.123
what's going on here:
00:09:26.223 --> 00:09:29.963
GCC sees Fibonacci is a pure function,
00:09:30.063 --> 00:09:31.867
because it returns the same value
00:09:31.868 --> 00:09:33.243
for the same arguments,
00:09:33.343 --> 00:09:35.700
so GCC chooses to do the computation
00:09:35.701 --> 00:09:36.723
at compile time
00:09:36.823 --> 00:09:39.133
and inserts the final number directly
00:09:39.134 --> 00:09:40.323
into the compiled code.
00:09:41.823 --> 00:09:43.603
It is actually great!
00:09:43.703 --> 00:09:45.400
Because it shows that nativecomp
00:09:45.401 --> 00:09:47.283
knows about pure functions,
00:09:47.383 --> 00:09:48.700
and can do all kinds of things
00:09:48.701 --> 00:09:51.203
like removing or constant-folding them.
00:09:51.303 --> 00:09:54.403
And Juicemacs just does not do that.
00:09:54.503 --> 00:09:57.367
However, we are also concerned about
00:09:57.368 --> 00:09:59.003
the things we mentioned earlier:
00:09:59.103 --> 00:10:00.900
the performance of number additions,
00:10:00.901 --> 00:10:02.983
or function calls.
00:10:03.083 --> 00:10:05.633
So, in order to let the benchmarks
00:10:05.634 --> 00:10:06.863
show some extra things,
00:10:06.963 --> 00:10:08.367
we need to modify it a bit...
00:10:08.368 --> 00:10:11.323
by simply making things non-constant.
00:10:11.423 --> 00:10:15.203
With that, Emacs gets much slower now.
00:10:15.303 --> 00:10:17.133
And again, let's look what's
00:10:17.134 --> 00:10:21.083
happening behind these numbers.
00:10:21.183 --> 00:10:23.500
Similarly, with the disassemble command,
00:10:23.501 --> 00:10:25.643
we can look into the assembly.
00:10:25.743 --> 00:10:28.019
And again, we can already see
00:10:28.020 --> 00:10:29.303
what's happening here.
00:10:29.403 --> 00:10:32.083
So, Juicemacs, due to its speculation nature,
00:10:32.183 --> 00:10:35.443
supports fast paths for all three kind of numbers.
00:10:35.543 --> 00:10:39.233
However, currently, Emacs nativecomp
00:10:39.234 --> 00:10:41.243
does not have any fast path
00:10:41.343 --> 00:10:43.433
for the operations here like additions,
00:10:43.434 --> 00:10:45.803
or subtractions, or comparisons,
00:10:45.903 --> 00:10:48.067
which is exactly what
00:10:48.068 --> 00:10:50.963
Fibonacci benchmarks are measuring.
00:10:51.063 --> 00:10:53.800
Emacs, at this time, has to call some generic,
00:10:53.801 --> 00:10:57.963
external functions for them, and this is slow.
00:11:00.063 --> 00:11:03.203
But is nativecomp really that slow?
00:11:03.303 --> 00:11:04.967
So, I also ran the same benchmark
00:11:04.968 --> 00:11:07.083
in Common Lisp, with SBCL.
00:11:07.183 --> 00:11:09.000
And nativecomp is already fast,
00:11:09.001 --> 00:11:11.003
compared to untyped SBCL.
00:11:11.103 --> 00:11:15.500
It's because SBCL also emits call instructions
00:11:15.501 --> 00:11:18.483
when it comes to no type info.
00:11:18.583 --> 00:11:21.700
However, once we declare the types,
00:11:21.701 --> 00:11:25.283
SBCL is able to compile a fast path for fix numbers,
00:11:25.383 --> 00:11:27.467
which makes its performance on par
00:11:27.468 --> 00:11:30.683
with speculative JIT engines (that is, Juicemacs),
00:11:30.783 --> 00:11:34.763
because, now both of us are now on fast paths.
00:11:36.063 --> 00:11:38.400
Additionally, if we are bold enough
00:11:38.401 --> 00:11:41.203
to pass this safety zero flag to SBCL,
00:11:41.303 --> 00:11:43.700
it will remove all the slow paths
00:11:43.701 --> 00:11:44.963
and type checks,
00:11:45.063 --> 00:11:46.367
and its performance is close
00:11:46.368 --> 00:11:48.643
to what you get with C.
00:11:48.743 --> 00:11:51.299
Well, probably we don't want safety zero
00:11:51.300 --> 00:11:52.063
most of the time.
00:11:52.163 --> 00:11:55.133
But even then, if nativecomp were to
00:11:55.134 --> 00:11:57.763
get fast paths for more constructs,
00:11:57.863 --> 00:11:59.867
there certainly is quite
00:11:59.868 --> 00:12:03.563
some room for performance improvement.
00:12:04.063 --> 00:12:06.803
Let's look at some more benchmarks.
00:12:06.903 --> 00:12:08.933
For example, for this inclist,
00:12:08.934 --> 00:12:10.923
or increment-list, benchmark,
00:12:11.023 --> 00:12:14.333
Juicemacs is really slow here. Partly,
00:12:14.334 --> 00:12:17.603
it comes from the cost of Java boxing integers.
00:12:17.703 --> 00:12:20.300
On the other hand, for Emacs nativecomp,
00:12:20.301 --> 00:12:22.043
for this particular benchmark,
00:12:22.143 --> 00:12:23.667
it actually has fast paths
00:12:23.668 --> 00:12:25.523
for all of the operations.
00:12:25.623 --> 00:12:27.723
And that's why it can be so fast,
00:12:27.823 --> 00:12:30.667
and that also proves the nativecomp
00:12:30.668 --> 00:12:33.843
has a lot potential for improvement.
00:12:33.943 --> 00:12:35.833
There is another benchmark here
00:12:35.834 --> 00:12:37.963
that use advices.
00:12:38.063 --> 00:12:40.500
So Emacs Lisp supports using
00:12:40.501 --> 00:12:42.203
advices to override functions
00:12:42.303 --> 00:12:44.833
by wrapping the original function, and an advice
00:12:44.834 --> 00:12:47.443
function, two of them, inside a glue function.
00:12:47.543 --> 00:12:51.467
And in this benchmark, we advice the Fibonacci function
00:12:51.468 --> 00:12:54.523
to cache the first ten entries to speed up computation,
00:12:54.623 --> 00:13:00.003
as can be seen in the speed-up in the Juicemacs results.
00:13:00.103 --> 00:13:02.900
However, it seems that nativecomp does not yet
00:13:02.901 --> 00:13:08.523
compile glue functions, and that makes advices slower.
00:13:08.623 --> 00:13:12.043
With these benchmarks, let's discuss this big question:
00:13:12.143 --> 00:13:16.563
Should GNU Emacs adopt speculative JIT compilation?
00:13:16.663 --> 00:13:18.967
Well, the hidden question is actually,
00:13:18.968 --> 00:13:21.223
is it worth it?
00:13:21.323 --> 00:13:24.163
And, my personal answer is, maybe not.
00:13:24.263 --> 00:13:28.133
The first reason is that, slow paths, like, floating numbers,
00:13:28.134 --> 00:13:31.043
are actually not that frequent in Emacs.
00:13:31.143 --> 00:13:34.100
And optimizing for fast paths like fix numbers
00:13:34.101 --> 00:13:37.983
can already get us very good performance already.
00:13:38.083 --> 00:13:40.333
And the second or main reason is that,
00:13:40.334 --> 00:13:43.163
speculative JIT is very hard.
00:13:43.263 --> 00:13:46.843
LuaJIT, for example, took a genius to build.
00:13:46.943 --> 00:13:50.967
Even with the help of GCC, we need to hand-write
00:13:50.968 --> 00:13:54.283
all those fast path or slow path or switching logic.
00:13:54.383 --> 00:13:58.133
We need to find a way to deoptimize, which requires
00:13:58.134 --> 00:14:01.803
mapping machine registers back to interpreter stack.
00:14:01.903 --> 00:14:04.067
And also, speculation needs runtime info,
00:14:04.068 --> 00:14:07.323
which also costs us extra memory.
00:14:07.423 --> 00:14:10.763
Moreover, as is shown by some benchmarks above,
00:14:10.863 --> 00:14:13.333
there's some low-hanging fruits in nativecomp that
00:14:13.334 --> 00:14:17.343
might get us better performance with relatively lower effort.
00:14:17.443 --> 00:14:22.163
Compared to this, a JIT engine is a huge, huge undertaking.
00:14:22.263 --> 00:14:26.123
But, for Juicemacs, the JIT engine comes a lot cheaper,
00:14:26.223 --> 00:14:29.067
because, we are cheating by building on
00:14:29.068 --> 00:14:33.443
an existing compiler framework called Truffle.
00:14:33.543 --> 00:14:35.883
Truffle is a meta-compiler framework,
00:14:35.983 --> 00:14:37.633
which means that it lets you write
00:14:37.634 --> 00:14:40.103
an interpreter, add required annotations,
00:14:40.203 --> 00:14:42.500
and it will automatically turn the
00:14:42.501 --> 00:14:45.643
interpreter into a JIT runtime.
00:14:45.743 --> 00:14:49.083
So for example, here is a typical bytecode interpreter.
00:14:49.183 --> 00:14:51.233
After you add the required annotations,
00:14:51.234 --> 00:14:52.523
Truffle will know that,
00:14:52.623 --> 00:14:55.533
the bytecode here is constant, and it should
00:14:55.534 --> 00:14:59.123
unroll this loop here, to inline all those bytecode.
00:14:59.223 --> 00:15:00.467
And then, when Truffle
00:15:00.468 --> 00:15:02.243
compiles the code, it knows that:
00:15:02.343 --> 00:15:05.233
the first loop here does: x plus one,
00:15:05.234 --> 00:15:07.723
and the second does: return.
00:15:07.823 --> 00:15:09.533
And then it will compile all that into,
00:15:09.534 --> 00:15:11.363
return x plus 1,
00:15:11.463 --> 00:15:14.067
which is exactly what we would expect
00:15:14.068 --> 00:15:17.683
when compiling this pseudo code.
00:15:17.783 --> 00:15:21.083
Building on that, we can also easily implement speculation,
00:15:21.183 --> 00:15:24.867
by using this transferToInterpreterAndInvalidate function
00:15:24.868 --> 00:15:26.123
provided by Truffle.
00:15:26.223 --> 00:15:28.533
And Truffle will automatically turn that
00:15:28.534 --> 00:15:30.683
into deoptimization.
00:15:30.783 --> 00:15:32.700
Now, for example, when this add function
00:15:32.701 --> 00:15:35.723
is supplied with, two floating numbers.
00:15:35.823 --> 00:15:38.243
It will go through the slow path here,
00:15:38.343 --> 00:15:40.960
which might lead to a compiled slow path,
00:15:40.961 --> 00:15:43.203
or deoptimization.
00:15:43.303 --> 00:15:45.733
And going this deoptimization way,
00:15:45.734 --> 00:15:48.223
it can then update the runtime stats.
00:15:48.323 --> 00:15:50.400
And now, when the code is compiled again,
00:15:50.401 --> 00:15:51.603
Truffle will know,
00:15:51.703 --> 00:15:54.100
that these compilation stats, suggests that,
00:15:54.101 --> 00:15:55.563
we have floating numbers.
00:15:55.663 --> 00:15:58.733
And this floating point addition branch will
00:15:58.734 --> 00:16:02.603
then be incorporated into the fast path.
00:16:02.703 --> 00:16:06.003
To put it into Java code...
00:16:06.103 --> 00:16:08.723
Most operations are just as simple as this.
00:16:08.823 --> 00:16:11.033
And it supports fast paths for integers,
00:16:11.034 --> 00:16:13.963
floating numbers, and big integers.
00:16:14.063 --> 00:16:17.133
And the simplicity of this not only saves us work,
00:16:17.134 --> 00:16:22.243
but also enables Juicemacs to explore more things more rapidly.
00:16:22.343 --> 00:16:26.483
And actually, I have done some silly explorations.
00:16:26.583 --> 00:16:30.203
For example, I tried to constant-fold more things.
00:16:30.303 --> 00:16:32.767
Many of us have an Emacs config that stays
00:16:32.768 --> 00:16:36.683
largely unchanged, at least during one Emacs session.
00:16:36.783 --> 00:16:39.667
And that means many of the global variables
00:16:39.668 --> 00:16:42.323
in ELisp are constant.
00:16:42.423 --> 00:16:44.600
And with speculation, we can
00:16:44.601 --> 00:16:46.683
speculate about the stable ones,
00:16:46.783 --> 00:16:49.563
and try to inline them as constants.
00:16:49.663 --> 00:16:51.733
And this might improve performance,
00:16:51.734 --> 00:16:53.083
or maybe not?
00:16:53.183 --> 00:16:55.367
Because, we will need a full editor
00:16:55.368 --> 00:16:58.123
to get real world data.
00:16:58.223 --> 00:17:01.733
I also tried changing cons lists to be backed
00:17:01.734 --> 00:17:05.243
by some arrays, because, maybe arrays are faster, I guess?
00:17:05.343 --> 00:17:09.033
But in the end, setcdr requires some kind of indirection,
00:17:09.034 --> 00:17:12.883
and that actually makes the performance worse.
00:17:12.983 --> 00:17:14.733
And for regular expressions,
00:17:14.734 --> 00:17:17.923
I also tried borrowing techniques from PCRE JIT,
00:17:18.023 --> 00:17:20.667
which is quite fast in itself, but it is
00:17:20.668 --> 00:17:24.163
unfortunately unsupported by Java Truffle runtime.
00:17:24.263 --> 00:17:27.333
So, looking at these, well,
00:17:27.334 --> 00:17:30.243
explorations can fail, certainly.
00:17:30.343 --> 00:17:32.800
But, with Truffle and Java, these,
00:17:32.801 --> 00:17:34.883
for now, are not that hard to implement,
00:17:34.983 --> 00:17:37.667
and also very often, they teach us something
00:17:37.668 --> 00:17:42.363
in return, whether or not they fail.
00:17:42.463 --> 00:17:45.333
Finally, let's talk about some explorations
00:17:45.334 --> 00:17:47.883
that we might get into in the future.
00:17:47.983 --> 00:17:49.683
For the JIT engine, for example,
00:17:49.783 --> 00:17:52.633
currently I'm looking into the implementation of
00:17:52.634 --> 00:17:56.883
nativecomp to maybe reuse some of its optimizations.
00:17:56.983 --> 00:18:01.323
For the GUI, I'm very very slowly working on one.
00:18:01.423 --> 00:18:03.733
If it ever completes, I have one thing
00:18:03.734 --> 00:18:06.603
I'm really looking forward to implementing.
00:18:06.703 --> 00:18:08.900
That is, inlining widgets, or even
00:18:08.901 --> 00:18:11.763
other buffers, directly into a buffer.
00:18:11.863 --> 00:18:13.967
Well, it's because, people sometimes complain
00:18:13.968 --> 00:18:16.003
about Emacs's GUI capabilities,
00:18:16.103 --> 00:18:19.767
But I personally think that supporting inlining,
00:18:19.768 --> 00:18:23.043
like a whole buffer inside another buffer as a rectangle,
00:18:23.143 --> 00:18:26.883
could get us very far in layout abilities.
00:18:26.983 --> 00:18:28.567
And this approach should also
00:18:28.568 --> 00:18:30.843
be compatible with terminals.
00:18:30.943 --> 00:18:32.933
And I really want to see how this idea
00:18:32.934 --> 00:18:36.003
plays out with Juicemacs.
00:18:36.103 --> 00:18:38.963
And of course, there's Lisp concurrency.
00:18:39.063 --> 00:18:42.167
And currently i'm thinking of a JavaScript-like,
00:18:42.168 --> 00:18:46.283
transparent, single-thread model, using Java's virtual threads.
00:18:46.383 --> 00:18:49.967
But anyway, if you are interested in JIT compilation,
00:18:49.968 --> 00:18:51.663
Truffle, or anything above,
00:18:51.763 --> 00:18:53.867
or maybe you have your own ideas,
00:18:53.868 --> 00:18:56.283
you are very welcome to reach out!
00:18:56.383 --> 00:19:00.033
Juicemacs does need to implement many more built-in functions,
00:19:00.034 --> 00:19:03.063
and any help would be very appreciated.
00:19:03.163 --> 00:19:05.800
And I promise, it can be a very fun playground
00:19:05.801 --> 00:19:08.343
to learn about Emacs and do crazy things.
00:19:08.443 --> 00:19:10.902
Thank you!