1 files changed, 1238 insertions, 0 deletions
diff --git a/2025/captions/emacsconf-2025-juicemacs--juicemacs-exploring-speculative-jit-compilation-for-elisp-in-java--kana--main.vtt b/2025/captions/emacsconf-2025-juicemacs--juicemacs-exploring-speculative-jit-compilation-for-elisp-in-java--kana--main.vtt
new file mode 100644
index 00000000..46820e94
--- /dev/null
+++ b/2025/captions/emacsconf-2025-juicemacs--juicemacs-exploring-speculative-jit-compilation-for-elisp-in-java--kana--main.vtt
@@ -0,0 +1,1238 @@
+WEBVTT captioned by kana
+
+
+00:00:01.200 --> 00:00:02.803
+Hello! This is Kana!
+
+00:00:02.903 --> 00:00:04.367
+And today I'll be talking about
+
+00:00:04.368 --> 00:00:06.067
+<b>J</b>ust-<b>I</b>n-<b>T</b>ime compilation, or JIT,
+
+00:00:06.068 --> 00:00:07.363
+for Emacs Lisp,
+
+00:00:07.463 --> 00:00:11.163
+based on my work-in-progress Emacs clone, Juicemacs.
+
+00:00:11.263 --> 00:00:13.533
+Juicemacs aims to explore a few things
+
+00:00:13.534 --> 00:00:15.843
+that I've been wondering about for a while.
+
+00:00:15.943 --> 00:00:18.567
+For exmaple, what if we had better or even
+
+00:00:18.568 --> 00:00:21.223
+transparent concurrency in ELisp?
+
+00:00:21.323 --> 00:00:23.243
+Or, can we have a concurrent GUI?
+
+00:00:23.343 --> 00:00:26.783
+One that does not block, or is blocked by Lisp code?
+
+00:00:26.883 --> 00:00:31.067
+And finally what can JIT compilation do for ELisp?
+
+00:00:31.068 --> 00:00:34.083
+Will it provide better performance?
+
+00:00:34.183 --> 00:00:37.400
+However, a main problem with explorations
+
+00:00:37.401 --> 00:00:38.623
+in Emacs clones is that,
+
+00:00:38.723 --> 00:00:40.863
+Emacs is a whole universe.
+
+00:00:40.963 --> 00:00:43.600
+And that means, to make these explorations
+
+00:00:43.601 --> 00:00:45.383
+meaningful for Emacs users,
+
+00:00:45.483 --> 00:00:47.967
+we need to cover a lot of Emacs features,
+
+00:00:47.968 --> 00:00:50.543
+before we can ever begin.
+
+00:00:50.643 --> 00:00:53.923
+For example, one of the features of Emacs is that,
+
+00:00:54.023 --> 00:00:56.003
+it supports a lot of encodings.
+
+00:00:56.103 --> 00:00:59.267
+Let's look at this string: it can be encoded
+
+00:00:59.268 --> 00:01:03.643
+in both Unicode and Shift-JIS, a Japanese encoding system.
+
+00:01:03.743 --> 00:01:07.067
+But currently, Unicode does not have
+
+00:01:07.068 --> 00:01:09.803
+an official mapping for this "ki" (﨑) character.
+
+00:01:09.903 --> 00:01:12.767
+So when we map from Shift-JIS to Unicode,
+
+00:01:12.768 --> 00:01:14.423
+in most programming languages,
+
+00:01:14.523 --> 00:01:16.533
+you end up with something like this:
+
+00:01:16.534 --> 00:01:19.143
+it's a replacement character.
+
+00:01:19.243 --> 00:01:22.067
+But in Emacs, it actually extends
+
+00:01:22.068 --> 00:01:23.883
+the Unicode range by threefold,
+
+00:01:23.983 --> 00:01:26.833
+and uses the extra range to losslessly
+
+00:01:26.834 --> 00:01:29.483
+support characters like this.
+
+00:01:29.583 --> 00:01:31.923
+So if you want to support this feature,
+
+00:01:32.023 --> 00:01:34.033
+that basically rules out all string
+
+00:01:34.034 --> 00:01:37.243
+libraries with Unicode assumptions.
+
+00:01:37.843 --> 00:01:40.067
+For another, you need to support
+
+00:01:40.068 --> 00:01:41.883
+the regular expressions in Emacs,
+
+00:01:41.983 --> 00:01:45.023
+which are, really irregular.
+
+00:01:45.123 --> 00:01:46.900
+For example, it supports asserting
+
+00:01:46.901 --> 00:01:49.403
+about the user cursor position.
+
+00:01:49.503 --> 00:01:52.033
+And it also uses some character tables,
+
+00:01:52.034 --> 00:01:53.883
+that can be modified from Lisp code,
+
+00:01:53.983 --> 00:01:56.163
+to determine to case mappings.
+
+00:01:56.263 --> 00:01:59.567
+And all that makes it really hard, or even
+
+00:01:59.568 --> 00:02:05.123
+impossible to use any existing regexp libraries.
+
+00:02:05.223 --> 00:02:07.883
+Also, you need a functional garbage collector.
+
+00:02:07.983 --> 00:02:09.867
+You need threading primitives, because
+
+00:02:09.868 --> 00:02:12.323
+Emacs has already had some threading support.
+
+00:02:12.423 --> 00:02:14.533
+And you might want the performance of your clone
+
+00:02:14.534 --> 00:02:18.963
+to match Emacs, even with its native compilation enabled.
+
+00:02:19.063 --> 00:02:21.500
+Not to mention you also need a GUI for an editor.
+
+00:02:21.501 --> 00:02:23.543
+And so on.
+
+00:02:23.643 --> 00:02:25.633
+For Juicemacs, building on Java and
+
+00:02:25.634 --> 00:02:27.563
+a compiler framework called Truffle,
+
+00:02:27.663 --> 00:02:30.503
+helps in getting better performance;
+
+00:02:30.603 --> 00:02:32.933
+and by choosing a language with a good GC,
+
+00:02:32.934 --> 00:02:38.063
+we can actually focus more on the challenges above.
+
+00:02:38.163 --> 00:02:41.433
+Currently, Juicemacs has implemented three out of,
+
+00:02:41.434 --> 00:02:43.983
+at least four of the interpreters in Emacs.
+
+00:02:44.083 --> 00:02:46.363
+One for lisp code, one for bytecode,
+
+00:02:46.463 --> 00:02:48.567
+and one for regular expressions,
+
+00:02:48.568 --> 00:02:50.903
+all of them JIT-capable.
+
+00:02:51.003 --> 00:02:53.667
+Other than these, Emacs also has around
+
+00:02:53.668 --> 00:02:56.083
+two thousand built-in functions in C code.
+
+00:02:56.183 --> 00:02:57.333
+And Juicemacs has around
+
+00:02:57.334 --> 00:02:59.763
+four hundred of them implemented.
+
+00:02:59.863 --> 00:03:03.603
+It's not that many, but it is surprisingly enough
+
+00:03:03.703 --> 00:03:05.200
+to bootstrap Emacs and run
+
+00:03:05.201 --> 00:03:08.483
+the portable dumper, or pdump, in short.
+
+00:03:08.583 --> 00:03:11.243
+Let's have a try.
+
+00:03:11.343 --> 00:03:11.703
+
+
+00:03:11.803 --> 00:03:14.923
+So this is the binary produced by Java native image.
+
+00:03:15.023 --> 00:03:17.167
+And it's loading all the files
+
+00:03:17.168 --> 00:03:18.763
+needed for bootstrapping.
+
+00:03:18.863 --> 00:03:22.233
+Then it dumps the memory to a file to
+
+00:03:22.234 --> 00:03:24.923
+be loaded later, giving us fast startup.
+
+00:03:25.023 --> 00:03:28.723
+As we can see here, it throws some frame errors
+
+00:03:28.823 --> 00:03:31.400
+because Juicemacs doesn't have an editor UI
+
+00:03:31.401 --> 00:03:33.283
+or functional frames yet.
+
+00:03:33.383 --> 00:03:35.367
+But otherwise, it can already run
+
+00:03:35.368 --> 00:03:36.643
+quite some lisp code.
+
+00:03:36.743 --> 00:03:40.400
+For example, this code uses the benchmark library
+
+00:03:40.401 --> 00:03:44.403
+to measure the performance of this Fibonacci function.
+
+00:03:44.503 --> 00:03:47.067
+And we can see here, the JIT engine is
+
+00:03:47.068 --> 00:03:51.163
+already kicking in and makes the execution faster.
+
+00:03:51.263 --> 00:03:53.483
+In addition to that, with a bit of workaround,
+
+00:03:53.583 --> 00:03:56.467
+Juicemacs can also run some of the ERT,
+
+00:03:56.468 --> 00:04:01.043
+or, <b>E</b>macs <b>R</b>egression <b>T</b>est suite, that comes with Emacs.
+
+00:04:01.143 --> 00:04:05.823
+So... Yes, there are a bunch of test failures,
+
+00:04:05.923 --> 00:04:07.933
+which means we are not that compatible
+
+00:04:07.934 --> 00:04:09.523
+with Emacs and need more work.
+
+00:04:09.623 --> 00:04:12.803
+But the whole testing procedure runs fine,
+
+00:04:12.903 --> 00:04:14.767
+and it has proper stack traces,
+
+00:04:14.768 --> 00:04:17.803
+which is quite useful for debugging Juicemacs.
+
+00:04:17.903 --> 00:04:21.033
+So with that, a rather functional JIT runtime,
+
+00:04:21.034 --> 00:04:25.983
+let's now try look into today's topic, JIT compilation for ELisp.
+
+00:04:26.083 --> 00:04:28.533
+So, you probably know that Emacs has supported
+
+00:04:28.534 --> 00:04:32.083
+native-compilation, or nativecomp in short, for some time now.
+
+00:04:32.183 --> 00:04:35.033
+It mainly uses GCC to compile Lisp code
+
+00:04:35.034 --> 00:04:37.363
+into native code, ahead of time.
+
+00:04:37.463 --> 00:04:41.433
+And during runtime, Emacs loads those compiled files,
+
+00:04:41.434 --> 00:04:44.523
+and gets the performance of native code.
+
+00:04:44.623 --> 00:04:47.643
+However, for example, for installed packages,
+
+00:04:47.743 --> 00:04:49.059
+we might want to compile them when we
+
+00:04:49.060 --> 00:04:51.823
+actually use them instead of ahead of time.
+
+00:04:51.923 --> 00:04:53.733
+And Emacs supports this through
+
+00:04:53.734 --> 00:04:55.683
+this <i>native-comp-jit-compilation</i> flag.
+
+00:04:55.783 --> 00:04:59.767
+What it does is, during runtime, Emacs sends
+
+00:04:59.768 --> 00:05:03.203
+loaded files to external Emacs worker processes,
+
+00:05:03.303 --> 00:05:06.903
+which will then compile those files asynchronously.
+
+00:05:07.003 --> 00:05:09.043
+And when the compilation is done,
+
+00:05:09.143 --> 00:05:11.967
+the current Emacs session will load the compiled code back
+
+00:05:11.968 --> 00:05:16.323
+and improves its performance, on the fly.
+
+00:05:16.423 --> 00:05:18.643
+When you look at this procedure, however, it is,
+
+00:05:18.743 --> 00:05:21.563
+ahead-of-time compilation, done at runtime.
+
+00:05:21.663 --> 00:05:25.123
+And it is what current Emacs calls JIT compilation.
+
+00:05:25.223 --> 00:05:27.867
+But if you look at some other JIT engines,
+
+00:05:27.868 --> 00:05:31.803
+you'll see much more complex architectures.
+
+00:05:31.903 --> 00:05:34.233
+So, take luaJIT for an example,
+
+00:05:34.234 --> 00:05:36.163
+in addition to this red line here,
+
+00:05:36.263 --> 00:05:38.767
+which leads us from an interpreted state
+
+00:05:38.768 --> 00:05:40.643
+to a compiled native state,
+
+00:05:40.743 --> 00:05:42.163
+which is also what Emacs does,
+
+00:05:42.263 --> 00:05:44.333
+LuaJIT also supports going from
+
+00:05:44.334 --> 00:05:47.523
+a compiled state back to its interpreter.
+
+00:05:47.623 --> 00:05:51.483
+And this process is called "deoptimization".
+
+00:05:51.583 --> 00:05:55.300
+In contrast to its name, deoptimization here actually
+
+00:05:55.301 --> 00:05:58.563
+enables a huge category of JIT optimizations.
+
+00:05:58.663 --> 00:06:00.163
+They are called speculation.
+
+00:06:01.463 --> 00:06:04.600
+Basically, with speculation, the compiler
+
+00:06:04.601 --> 00:06:07.683
+can use runtime statistics to speculate,
+
+00:06:07.783 --> 00:06:11.443
+to make bolder assumptions in the compiled code.
+
+00:06:11.543 --> 00:06:13.983
+And when the assumptions are invalidated,
+
+00:06:14.083 --> 00:06:18.323
+the runtime deoptimizes the code, updates statistics,
+
+00:06:18.423 --> 00:06:21.133
+and then recompile the code based on new assumptions,
+
+00:06:21.134 --> 00:06:24.443
+and that will make the code more performant.
+
+00:06:24.543 --> 00:06:26.763
+Let's look at an example.
+
+00:06:28.463 --> 00:06:30.967
+So, here is a really simple function,
+
+00:06:30.968 --> 00:06:33.083
+that adds one to the input number.
+
+00:06:33.183 --> 00:06:36.167
+But in Emacs, it is not that simple,
+
+00:06:36.168 --> 00:06:38.203
+because Emacs has three categories of numbers,
+
+00:06:38.303 --> 00:06:42.700
+that is, fix numbers, or machine-word-sized integers,
+
+00:06:42.701 --> 00:06:45.603
+floating numbers, and big integers.
+
+00:06:45.703 --> 00:06:47.600
+And when we compile this, we need
+
+00:06:47.601 --> 00:06:49.363
+to handle all three cases.
+
+00:06:49.463 --> 00:06:52.600
+And if we analyze the code produced by Emacs,
+
+00:06:52.601 --> 00:06:54.683
+as is shown by this gray graph here,
+
+00:06:54.783 --> 00:06:58.083
+we can see that it has, two paths:
+
+00:06:58.183 --> 00:07:01.403
+One fast path, that does fast fix number addition;
+
+00:07:01.503 --> 00:07:03.967
+and one for slow paths, that calls out
+
+00:07:03.968 --> 00:07:06.523
+to an external plus-one function,
+
+00:07:06.623 --> 00:07:09.683
+to handle floating number and big integers.
+
+00:07:09.783 --> 00:07:13.167
+Now, if we pass integers into this function,
+
+00:07:13.168 --> 00:07:16.283
+it's pretty fast because it's on the fast path.
+
+00:07:16.383 --> 00:07:19.767
+However, if we pass in a floating number,
+
+00:07:19.768 --> 00:07:21.843
+then it has to go through the slow path,
+
+00:07:21.943 --> 00:07:25.563
+doing an extra function call, which is slow.
+
+00:07:25.663 --> 00:07:28.733
+What speculation might help here is that,
+
+00:07:28.734 --> 00:07:31.443
+it can have flexible fast paths.
+
+00:07:31.543 --> 00:07:34.563
+When we pass a floating number into this function,
+
+00:07:34.663 --> 00:07:37.400
+which currently has only fixnumbers on the fast path,
+
+00:07:37.401 --> 00:07:40.723
+it also has to go through the slow path.
+
+00:07:40.823 --> 00:07:44.567
+But the difference is that, a speculative runtime can
+
+00:07:44.568 --> 00:07:47.763
+deoptimize and recompile the code to adapt to this.
+
+00:07:47.863 --> 00:07:50.367
+And when it recompiles, it might add
+
+00:07:50.368 --> 00:07:52.643
+floating number onto the fast path,
+
+00:07:52.743 --> 00:07:55.003
+and now floating number operations are also fast.
+
+00:07:55.103 --> 00:07:58.567
+And this kind of speculation is why
+
+00:07:58.568 --> 00:08:03.603
+speculative runtime can be really fast.
+
+00:08:03.703 --> 00:08:05.723
+Let's take a look at some benchmarks.
+
+00:08:05.823 --> 00:08:09.423
+They're obtained with the <i>elisp-benchmarks</i> library on ELPA.
+
+00:08:09.523 --> 00:08:12.600
+The blue line here is for nativecomp,
+
+00:08:12.601 --> 00:08:16.043
+and these blue areas mean that nativecomp is slower.
+
+00:08:16.143 --> 00:08:19.133
+And, likewise, green areas mean that
+
+00:08:19.134 --> 00:08:20.523
+Juicemacs is slower.
+
+00:08:20.623 --> 00:08:22.867
+At a glance, the two (or four)
+
+00:08:22.868 --> 00:08:25.143
+actually seems somehow on par, to me.
+
+00:08:25.243 --> 00:08:30.383
+But, let's take a closer look at some of them.
+
+00:08:30.483 --> 00:08:32.667
+So, the first few benchmarks are the classic,
+
+00:08:32.668 --> 00:08:33.983
+Fibonacci benchmarks.
+
+00:08:34.083 --> 00:08:36.933
+We know that, the series is formed by
+
+00:08:36.934 --> 00:08:39.203
+adding the previous two numbers in the series.
+
+00:08:39.303 --> 00:08:41.700
+And looking at this expression here,
+
+00:08:41.701 --> 00:08:44.043
+Fibonacci benchmarks are quite intensive
+
+00:08:44.143 --> 00:08:46.800
+in number additions, subtractions,
+
+00:08:46.801 --> 00:08:49.103
+and function calls, if you use recursions.
+
+00:08:49.203 --> 00:08:51.000
+And it is exactly why
+
+00:08:51.001 --> 00:08:54.323
+Fibonacci series is a good benchmark.
+
+00:08:54.423 --> 00:08:57.243
+And looking at the results here... wow.
+
+00:08:57.343 --> 00:08:59.843
+Emacs nativecomp executes instantaneously.
+
+00:08:59.943 --> 00:09:04.523
+It's a total defeat for Juicemacs, seemingly.
+
+00:09:04.623 --> 00:09:08.043
+Now, if you're into benchmarks, you know something is wrong here:
+
+00:09:08.143 --> 00:09:11.683
+we are comparing the different things.
+
+00:09:11.783 --> 00:09:14.200
+So let's look under the hood
+
+00:09:14.201 --> 00:09:15.483
+and disassemble the function
+
+00:09:15.583 --> 00:09:17.567
+with this convenient Emacs command
+
+00:09:17.568 --> 00:09:19.063
+called <i>disassemble</i>...
+
+00:09:19.163 --> 00:09:23.043
+And these two lines of code is what we got.
+
+00:09:23.143 --> 00:09:24.700
+So, we already can see
+
+00:09:24.701 --> 00:09:26.123
+what's going on here:
+
+00:09:26.223 --> 00:09:29.963
+GCC sees Fibonacci is a pure function,
+
+00:09:30.063 --> 00:09:31.867
+because it returns the same value
+
+00:09:31.868 --> 00:09:33.243
+for the same arguments,
+
+00:09:33.343 --> 00:09:35.700
+so GCC chooses to do the computation
+
+00:09:35.701 --> 00:09:36.723
+at compile time
+
+00:09:36.823 --> 00:09:39.133
+and inserts the final number directly
+
+00:09:39.134 --> 00:09:40.323
+into the compiled code.
+
+00:09:41.823 --> 00:09:43.603
+It is actually great!
+
+00:09:43.703 --> 00:09:45.400
+Because it shows that nativecomp
+
+00:09:45.401 --> 00:09:47.283
+knows about pure functions,
+
+00:09:47.383 --> 00:09:48.700
+and can do all kinds of things
+
+00:09:48.701 --> 00:09:51.203
+like removing or constant-folding them.
+
+00:09:51.303 --> 00:09:54.403
+And Juicemacs just does not do that.
+
+00:09:54.503 --> 00:09:57.367
+However, we are also concerned about
+
+00:09:57.368 --> 00:09:59.003
+the things we mentioned earlier:
+
+00:09:59.103 --> 00:10:00.900
+the performance of number additions,
+
+00:10:00.901 --> 00:10:02.983
+or function calls.
+
+00:10:03.083 --> 00:10:05.633
+So, in order to let the benchmarks
+
+00:10:05.634 --> 00:10:06.863
+show some extra things,
+
+00:10:06.963 --> 00:10:08.367
+we need to modify it a bit...
+
+00:10:08.368 --> 00:10:11.323
+by simply making things non-constant.
+
+00:10:11.423 --> 00:10:15.203
+With that, Emacs gets much slower now.
+
+00:10:15.303 --> 00:10:17.133
+And again, let's look what's
+
+00:10:17.134 --> 00:10:21.083
+happening behind these numbers.
+
+00:10:21.183 --> 00:10:23.500
+Similarly, with the <i>disassemble</i> command,
+
+00:10:23.501 --> 00:10:25.643
+we can look into the assembly.
+
+00:10:25.743 --> 00:10:28.019
+And again, we can already see
+
+00:10:28.020 --> 00:10:29.303
+what's happening here.
+
+00:10:29.403 --> 00:10:32.083
+So, Juicemacs, due to its speculation nature,
+
+00:10:32.183 --> 00:10:35.443
+supports fast paths for all three kind of numbers.
+
+00:10:35.543 --> 00:10:39.233
+However, currently, Emacs nativecomp
+
+00:10:39.234 --> 00:10:41.243
+does not have any fast path
+
+00:10:41.343 --> 00:10:43.433
+for the operations here like additions,
+
+00:10:43.434 --> 00:10:45.803
+or subtractions, or comparisons,
+
+00:10:45.903 --> 00:10:48.067
+which is exactly what
+
+00:10:48.068 --> 00:10:50.963
+Fibonacci benchmarks are measuring.
+
+00:10:51.063 --> 00:10:53.800
+Emacs, at this time, has to call some generic,
+
+00:10:53.801 --> 00:10:57.963
+external functions for them, and this is slow.
+
+00:11:00.063 --> 00:11:03.203
+But is nativecomp really that slow?
+
+00:11:03.303 --> 00:11:04.967
+So, I also ran the same benchmark
+
+00:11:04.968 --> 00:11:07.083
+in Common Lisp, with SBCL.
+
+00:11:07.183 --> 00:11:09.000
+And nativecomp is already fast,
+
+00:11:09.001 --> 00:11:11.003
+compared to untyped SBCL.
+
+00:11:11.103 --> 00:11:15.500
+It's because SBCL also emits call instructions
+
+00:11:15.501 --> 00:11:18.483
+when it comes to no type info.
+
+00:11:18.583 --> 00:11:21.700
+However, once we declare the types,
+
+00:11:21.701 --> 00:11:25.283
+SBCL is able to compile a fast path for fix numbers,
+
+00:11:25.383 --> 00:11:27.467
+which makes its performance on par
+
+00:11:27.468 --> 00:11:30.683
+with speculative JIT engines (that is, Juicemacs),
+
+00:11:30.783 --> 00:11:34.763
+because, now both of us are now on fast paths.
+
+00:11:36.063 --> 00:11:38.400
+Additionally, if we are bold enough
+
+00:11:38.401 --> 00:11:41.203
+to pass this safety zero flag to SBCL,
+
+00:11:41.303 --> 00:11:43.700
+it will remove all the slow paths
+
+00:11:43.701 --> 00:11:44.963
+and type checks,
+
+00:11:45.063 --> 00:11:46.367
+and its performance is close
+
+00:11:46.368 --> 00:11:48.643
+to what you get with C.
+
+00:11:48.743 --> 00:11:51.299
+Well, probably we don't want safety zero
+
+00:11:51.300 --> 00:11:52.063
+most of the time.
+
+00:11:52.163 --> 00:11:55.133
+But even then, if nativecomp were to
+
+00:11:55.134 --> 00:11:57.763
+get fast paths for more constructs,
+
+00:11:57.863 --> 00:11:59.867
+there certainly is quite
+
+00:11:59.868 --> 00:12:03.563
+some room for performance improvement.
+
+00:12:04.063 --> 00:12:06.803
+Let's look at some more benchmarks.
+
+00:12:06.903 --> 00:12:08.933
+For example, for this inclist,
+
+00:12:08.934 --> 00:12:10.923
+or increment-list, benchmark,
+
+00:12:11.023 --> 00:12:14.333
+Juicemacs is really slow here. Partly,
+
+00:12:14.334 --> 00:12:17.603
+it comes from the cost of Java boxing integers.
+
+00:12:17.703 --> 00:12:20.300
+On the other hand, for Emacs nativecomp,
+
+00:12:20.301 --> 00:12:22.043
+for this particular benchmark,
+
+00:12:22.143 --> 00:12:23.667
+it actually has fast paths
+
+00:12:23.668 --> 00:12:25.523
+for all of the operations.
+
+00:12:25.623 --> 00:12:27.723
+And that's why it can be so fast,
+
+00:12:27.823 --> 00:12:30.667
+and that also proves the nativecomp
+
+00:12:30.668 --> 00:12:33.843
+has a lot potential for improvement.
+
+00:12:33.943 --> 00:12:35.833
+There is another benchmark here
+
+00:12:35.834 --> 00:12:37.963
+that use advices.
+
+00:12:38.063 --> 00:12:40.500
+So Emacs Lisp supports using
+
+00:12:40.501 --> 00:12:42.203
+advices to override functions
+
+00:12:42.303 --> 00:12:44.833
+by wrapping the original function, and an advice
+
+00:12:44.834 --> 00:12:47.443
+function, two of them, inside a glue function.
+
+00:12:47.543 --> 00:12:51.467
+And in this benchmark, we advice the Fibonacci function
+
+00:12:51.468 --> 00:12:54.523
+to cache the first ten entries to speed up computation,
+
+00:12:54.623 --> 00:13:00.003
+as can be seen in the speed-up in the Juicemacs results.
+
+00:13:00.103 --> 00:13:02.900
+However, it seems that nativecomp does not yet
+
+00:13:02.901 --> 00:13:08.523
+compile glue functions, and that makes advices slower.
+
+00:13:08.623 --> 00:13:12.043
+With these benchmarks, let's discuss this big question:
+
+00:13:12.143 --> 00:13:16.563
+Should GNU Emacs adopt speculative JIT compilation?
+
+00:13:16.663 --> 00:13:18.967
+Well, the hidden question is actually,
+
+00:13:18.968 --> 00:13:21.223
+is it worth it?
+
+00:13:21.323 --> 00:13:24.163
+And, my personal answer is, maybe not.
+
+00:13:24.263 --> 00:13:28.133
+The first reason is that, slow paths, like, floating numbers,
+
+00:13:28.134 --> 00:13:31.043
+are actually not that frequent in Emacs.
+
+00:13:31.143 --> 00:13:34.100
+And optimizing for fast paths like fix numbers
+
+00:13:34.101 --> 00:13:37.983
+can already get us very good performance already.
+
+00:13:38.083 --> 00:13:40.333
+And the second or main reason is that,
+
+00:13:40.334 --> 00:13:43.163
+speculative JIT is very hard.
+
+00:13:43.263 --> 00:13:46.843
+LuaJIT, for example, took a genius to build.
+
+00:13:46.943 --> 00:13:50.967
+Even with the help of GCC, we need to hand-write
+
+00:13:50.968 --> 00:13:54.283
+all those fast path or slow path or switching logic.
+
+00:13:54.383 --> 00:13:58.133
+We need to find a way to deoptimize, which requires
+
+00:13:58.134 --> 00:14:01.803
+mapping machine registers back to interpreter stack.
+
+00:14:01.903 --> 00:14:04.067
+And also, speculation needs runtime info,
+
+00:14:04.068 --> 00:14:07.323
+which also costs us extra memory.
+
+00:14:07.423 --> 00:14:10.763
+Moreover, as is shown by some benchmarks above,
+
+00:14:10.863 --> 00:14:13.333
+there's some low-hanging fruits in nativecomp that
+
+00:14:13.334 --> 00:14:17.343
+might get us better performance with relatively lower effort.
+
+00:14:17.443 --> 00:14:22.163
+Compared to this, a JIT engine is a huge, huge undertaking.
+
+00:14:22.263 --> 00:14:26.123
+But, for Juicemacs, the JIT engine comes a lot cheaper,
+
+00:14:26.223 --> 00:14:29.067
+because, we are cheating by building on
+
+00:14:29.068 --> 00:14:33.443
+an existing compiler framework called Truffle.
+
+00:14:33.543 --> 00:14:35.883
+Truffle is a meta-compiler framework,
+
+00:14:35.983 --> 00:14:37.633
+which means that it lets you write
+
+00:14:37.634 --> 00:14:40.103
+an interpreter, add required annotations,
+
+00:14:40.203 --> 00:14:42.500
+and it will automatically turn the
+
+00:14:42.501 --> 00:14:45.643
+interpreter into a JIT runtime.
+
+00:14:45.743 --> 00:14:49.083
+So for example, here is a typical bytecode interpreter.
+
+00:14:49.183 --> 00:14:51.233
+After you add the required annotations,
+
+00:14:51.234 --> 00:14:52.523
+Truffle will know that,
+
+00:14:52.623 --> 00:14:55.533
+the bytecode here is constant, and it should
+
+00:14:55.534 --> 00:14:59.123
+unroll this loop here, to inline all those bytecode.
+
+00:14:59.223 --> 00:15:00.467
+And then, when Truffle
+
+00:15:00.468 --> 00:15:02.243
+compiles the code, it knows that:
+
+00:15:02.343 --> 00:15:05.233
+the first loop here does: x plus one,
+
+00:15:05.234 --> 00:15:07.723
+and the second does: return.
+
+00:15:07.823 --> 00:15:09.533
+And then it will compile all that into,
+
+00:15:09.534 --> 00:15:11.363
+return x plus 1,
+
+00:15:11.463 --> 00:15:14.067
+which is exactly what we would expect
+
+00:15:14.068 --> 00:15:17.683
+when compiling this pseudo code.
+
+00:15:17.783 --> 00:15:21.083
+Building on that, we can also easily implement speculation,
+
+00:15:21.183 --> 00:15:24.867
+by using this <i>transferToInterpreterAndInvalidate</i> function
+
+00:15:24.868 --> 00:15:26.123
+provided by Truffle.
+
+00:15:26.223 --> 00:15:28.533
+And Truffle will automatically turn that
+
+00:15:28.534 --> 00:15:30.683
+into deoptimization.
+
+00:15:30.783 --> 00:15:32.700
+Now, for example, when this add function
+
+00:15:32.701 --> 00:15:35.723
+is supplied with, two floating numbers.
+
+00:15:35.823 --> 00:15:38.243
+It will go through the slow path here,
+
+00:15:38.343 --> 00:15:40.960
+which might lead to a compiled slow path,
+
+00:15:40.961 --> 00:15:43.203
+or deoptimization.
+
+00:15:43.303 --> 00:15:45.733
+And going this deoptimization way,
+
+00:15:45.734 --> 00:15:48.223
+it can then update the runtime stats.
+
+00:15:48.323 --> 00:15:50.400
+And now, when the code is compiled again,
+
+00:15:50.401 --> 00:15:51.603
+Truffle will know,
+
+00:15:51.703 --> 00:15:54.100
+that these compilation stats, suggests that,
+
+00:15:54.101 --> 00:15:55.563
+we have floating numbers.
+
+00:15:55.663 --> 00:15:58.733
+And this floating point addition branch will
+
+00:15:58.734 --> 00:16:02.603
+then be incorporated into the fast path.
+
+00:16:02.703 --> 00:16:06.003
+To put it into Java code...
+
+00:16:06.103 --> 00:16:08.723
+Most operations are just as simple as this.
+
+00:16:08.823 --> 00:16:11.033
+And it supports fast paths for integers,
+
+00:16:11.034 --> 00:16:13.963
+floating numbers, and big integers.
+
+00:16:14.063 --> 00:16:17.133
+And the simplicity of this not only saves us work,
+
+00:16:17.134 --> 00:16:22.243
+but also enables Juicemacs to explore more things more rapidly.
+
+00:16:22.343 --> 00:16:26.483
+And actually, I have done some silly explorations.
+
+00:16:26.583 --> 00:16:30.203
+For example, I tried to constant-fold more things.
+
+00:16:30.303 --> 00:16:32.767
+Many of us have an Emacs config that stays
+
+00:16:32.768 --> 00:16:36.683
+largely unchanged, at least during one Emacs session.
+
+00:16:36.783 --> 00:16:39.667
+And that means many of the global variables
+
+00:16:39.668 --> 00:16:42.323
+in ELisp are constant.
+
+00:16:42.423 --> 00:16:44.600
+And with speculation, we can
+
+00:16:44.601 --> 00:16:46.683
+speculate about the stable ones,
+
+00:16:46.783 --> 00:16:49.563
+and try to inline them as constants.
+
+00:16:49.663 --> 00:16:51.733
+And this might improve performance,
+
+00:16:51.734 --> 00:16:53.083
+or maybe not?
+
+00:16:53.183 --> 00:16:55.367
+Because, we will need a full editor
+
+00:16:55.368 --> 00:16:58.123
+to get real world data.
+
+00:16:58.223 --> 00:17:01.733
+I also tried changing cons lists to be backed
+
+00:17:01.734 --> 00:17:05.243
+by some arrays, because, maybe arrays are faster, I guess?
+
+00:17:05.343 --> 00:17:09.033
+But in the end, <i>setcdr</i> requires some kind of indirection,
+
+00:17:09.034 --> 00:17:12.883
+and that actually makes the performance worse.
+
+00:17:12.983 --> 00:17:14.733
+And for regular expressions,
+
+00:17:14.734 --> 00:17:17.923
+I also tried borrowing techniques from PCRE JIT,
+
+00:17:18.023 --> 00:17:20.667
+which is quite fast in itself, but it is
+
+00:17:20.668 --> 00:17:24.163
+unfortunately unsupported by Java Truffle runtime.
+
+00:17:24.263 --> 00:17:27.333
+So, looking at these, well,
+
+00:17:27.334 --> 00:17:30.243
+explorations can fail, certainly.
+
+00:17:30.343 --> 00:17:32.800
+But, with Truffle and Java, these,
+
+00:17:32.801 --> 00:17:34.883
+for now, are not that hard to implement,
+
+00:17:34.983 --> 00:17:37.667
+and also very often, they teach us something
+
+00:17:37.668 --> 00:17:42.363
+in return, whether or not they fail.
+
+00:17:42.463 --> 00:17:45.333
+Finally, let's talk about some explorations
+
+00:17:45.334 --> 00:17:47.883
+that we might get into in the future.
+
+00:17:47.983 --> 00:17:49.683
+For the JIT engine, for example,
+
+00:17:49.783 --> 00:17:52.633
+currently I'm looking into the implementation of
+
+00:17:52.634 --> 00:17:56.883
+nativecomp to maybe reuse some of its optimizations.
+
+00:17:56.983 --> 00:18:01.323
+For the GUI, I'm very very slowly working on one.
+
+00:18:01.423 --> 00:18:03.733
+If it ever completes, I have one thing
+
+00:18:03.734 --> 00:18:06.603
+I'm really looking forward to implementing.
+
+00:18:06.703 --> 00:18:08.900
+That is, inlining widgets, or even
+
+00:18:08.901 --> 00:18:11.763
+other buffers, directly into a buffer.
+
+00:18:11.863 --> 00:18:13.967
+Well, it's because, people sometimes complain
+
+00:18:13.968 --> 00:18:16.003
+about Emacs's GUI capabilities,
+
+00:18:16.103 --> 00:18:19.767
+But I personally think that supporting inlining,
+
+00:18:19.768 --> 00:18:23.043
+like a whole buffer inside another buffer as a rectangle,
+
+00:18:23.143 --> 00:18:26.883
+could get us very far in layout abilities.
+
+00:18:26.983 --> 00:18:28.567
+And this approach should also
+
+00:18:28.568 --> 00:18:30.843
+be compatible with terminals.
+
+00:18:30.943 --> 00:18:32.933
+And I really want to see how this idea
+
+00:18:32.934 --> 00:18:36.003
+plays out with Juicemacs.
+
+00:18:36.103 --> 00:18:38.963
+And of course, there's Lisp concurrency.
+
+00:18:39.063 --> 00:18:42.167
+And currently i'm thinking of a JavaScript-like,
+
+00:18:42.168 --> 00:18:46.283
+transparent, single-thread model, using Java's virtual threads.
+
+00:18:46.383 --> 00:18:49.967
+But anyway, if you are interested in JIT compilation,
+
+00:18:49.968 --> 00:18:51.663
+Truffle, or anything above,
+
+00:18:51.763 --> 00:18:53.867
+or maybe you have your own ideas,
+
+00:18:53.868 --> 00:18:56.283
+you are very welcome to reach out!
+
+00:18:56.383 --> 00:19:00.033
+Juicemacs does need to implement many more built-in functions,
+
+00:19:00.034 --> 00:19:03.063
+and any help would be very appreciated.
+
+00:19:03.163 --> 00:19:05.800
+And I promise, it can be a very fun playground
+
+00:19:05.801 --> 00:19:08.343
+to learn about Emacs and do crazy things.
+
+00:19:08.443 --> 00:19:10.902
+Thank you!