diff options
| author | EmacsConf <emacsconf-org@gnu.org> | 2025-12-06 10:15:40 -0500 |
|---|---|---|
| committer | EmacsConf <emacsconf-org@gnu.org> | 2025-12-06 10:15:40 -0500 |
| commit | 583c55a9b9a52db5b2f120fb756f2f2f2f63a7ea (patch) | |
| tree | 8860cd480b38ad599a4dd73cc90e3c509855ead6 /2025/captions/emacsconf-2025-juicemacs--juicemacs-exploring-speculative-jit-compilation-for-elisp-in-java--kana--main.vtt | |
| parent | e27ca6b3510229087d36345c920b36cf67200c51 (diff) | |
| download | emacsconf-wiki-583c55a9b9a52db5b2f120fb756f2f2f2f63a7ea.tar.xz emacsconf-wiki-583c55a9b9a52db5b2f120fb756f2f2f2f63a7ea.zip | |
Automated commit
Diffstat (limited to '2025/captions/emacsconf-2025-juicemacs--juicemacs-exploring-speculative-jit-compilation-for-elisp-in-java--kana--main.vtt')
| -rw-r--r-- | 2025/captions/emacsconf-2025-juicemacs--juicemacs-exploring-speculative-jit-compilation-for-elisp-in-java--kana--main.vtt | 1238 |
1 files changed, 1238 insertions, 0 deletions
diff --git a/2025/captions/emacsconf-2025-juicemacs--juicemacs-exploring-speculative-jit-compilation-for-elisp-in-java--kana--main.vtt b/2025/captions/emacsconf-2025-juicemacs--juicemacs-exploring-speculative-jit-compilation-for-elisp-in-java--kana--main.vtt new file mode 100644 index 00000000..46820e94 --- /dev/null +++ b/2025/captions/emacsconf-2025-juicemacs--juicemacs-exploring-speculative-jit-compilation-for-elisp-in-java--kana--main.vtt @@ -0,0 +1,1238 @@ +WEBVTT captioned by kana + + +00:00:01.200 --> 00:00:02.803 +Hello! This is Kana! + +00:00:02.903 --> 00:00:04.367 +And today I'll be talking about + +00:00:04.368 --> 00:00:06.067 +<b>J</b>ust-<b>I</b>n-<b>T</b>ime compilation, or JIT, + +00:00:06.068 --> 00:00:07.363 +for Emacs Lisp, + +00:00:07.463 --> 00:00:11.163 +based on my work-in-progress Emacs clone, Juicemacs. + +00:00:11.263 --> 00:00:13.533 +Juicemacs aims to explore a few things + +00:00:13.534 --> 00:00:15.843 +that I've been wondering about for a while. + +00:00:15.943 --> 00:00:18.567 +For exmaple, what if we had better or even + +00:00:18.568 --> 00:00:21.223 +transparent concurrency in ELisp? + +00:00:21.323 --> 00:00:23.243 +Or, can we have a concurrent GUI? + +00:00:23.343 --> 00:00:26.783 +One that does not block, or is blocked by Lisp code? + +00:00:26.883 --> 00:00:31.067 +And finally what can JIT compilation do for ELisp? + +00:00:31.068 --> 00:00:34.083 +Will it provide better performance? + +00:00:34.183 --> 00:00:37.400 +However, a main problem with explorations + +00:00:37.401 --> 00:00:38.623 +in Emacs clones is that, + +00:00:38.723 --> 00:00:40.863 +Emacs is a whole universe. + +00:00:40.963 --> 00:00:43.600 +And that means, to make these explorations + +00:00:43.601 --> 00:00:45.383 +meaningful for Emacs users, + +00:00:45.483 --> 00:00:47.967 +we need to cover a lot of Emacs features, + +00:00:47.968 --> 00:00:50.543 +before we can ever begin. + +00:00:50.643 --> 00:00:53.923 +For example, one of the features of Emacs is that, + +00:00:54.023 --> 00:00:56.003 +it supports a lot of encodings. + +00:00:56.103 --> 00:00:59.267 +Let's look at this string: it can be encoded + +00:00:59.268 --> 00:01:03.643 +in both Unicode and Shift-JIS, a Japanese encoding system. + +00:01:03.743 --> 00:01:07.067 +But currently, Unicode does not have + +00:01:07.068 --> 00:01:09.803 +an official mapping for this "ki" (﨑) character. + +00:01:09.903 --> 00:01:12.767 +So when we map from Shift-JIS to Unicode, + +00:01:12.768 --> 00:01:14.423 +in most programming languages, + +00:01:14.523 --> 00:01:16.533 +you end up with something like this: + +00:01:16.534 --> 00:01:19.143 +it's a replacement character. + +00:01:19.243 --> 00:01:22.067 +But in Emacs, it actually extends + +00:01:22.068 --> 00:01:23.883 +the Unicode range by threefold, + +00:01:23.983 --> 00:01:26.833 +and uses the extra range to losslessly + +00:01:26.834 --> 00:01:29.483 +support characters like this. + +00:01:29.583 --> 00:01:31.923 +So if you want to support this feature, + +00:01:32.023 --> 00:01:34.033 +that basically rules out all string + +00:01:34.034 --> 00:01:37.243 +libraries with Unicode assumptions. + +00:01:37.843 --> 00:01:40.067 +For another, you need to support + +00:01:40.068 --> 00:01:41.883 +the regular expressions in Emacs, + +00:01:41.983 --> 00:01:45.023 +which are, really irregular. + +00:01:45.123 --> 00:01:46.900 +For example, it supports asserting + +00:01:46.901 --> 00:01:49.403 +about the user cursor position. + +00:01:49.503 --> 00:01:52.033 +And it also uses some character tables, + +00:01:52.034 --> 00:01:53.883 +that can be modified from Lisp code, + +00:01:53.983 --> 00:01:56.163 +to determine to case mappings. + +00:01:56.263 --> 00:01:59.567 +And all that makes it really hard, or even + +00:01:59.568 --> 00:02:05.123 +impossible to use any existing regexp libraries. + +00:02:05.223 --> 00:02:07.883 +Also, you need a functional garbage collector. + +00:02:07.983 --> 00:02:09.867 +You need threading primitives, because + +00:02:09.868 --> 00:02:12.323 +Emacs has already had some threading support. + +00:02:12.423 --> 00:02:14.533 +And you might want the performance of your clone + +00:02:14.534 --> 00:02:18.963 +to match Emacs, even with its native compilation enabled. + +00:02:19.063 --> 00:02:21.500 +Not to mention you also need a GUI for an editor. + +00:02:21.501 --> 00:02:23.543 +And so on. + +00:02:23.643 --> 00:02:25.633 +For Juicemacs, building on Java and + +00:02:25.634 --> 00:02:27.563 +a compiler framework called Truffle, + +00:02:27.663 --> 00:02:30.503 +helps in getting better performance; + +00:02:30.603 --> 00:02:32.933 +and by choosing a language with a good GC, + +00:02:32.934 --> 00:02:38.063 +we can actually focus more on the challenges above. + +00:02:38.163 --> 00:02:41.433 +Currently, Juicemacs has implemented three out of, + +00:02:41.434 --> 00:02:43.983 +at least four of the interpreters in Emacs. + +00:02:44.083 --> 00:02:46.363 +One for lisp code, one for bytecode, + +00:02:46.463 --> 00:02:48.567 +and one for regular expressions, + +00:02:48.568 --> 00:02:50.903 +all of them JIT-capable. + +00:02:51.003 --> 00:02:53.667 +Other than these, Emacs also has around + +00:02:53.668 --> 00:02:56.083 +two thousand built-in functions in C code. + +00:02:56.183 --> 00:02:57.333 +And Juicemacs has around + +00:02:57.334 --> 00:02:59.763 +four hundred of them implemented. + +00:02:59.863 --> 00:03:03.603 +It's not that many, but it is surprisingly enough + +00:03:03.703 --> 00:03:05.200 +to bootstrap Emacs and run + +00:03:05.201 --> 00:03:08.483 +the portable dumper, or pdump, in short. + +00:03:08.583 --> 00:03:11.243 +Let's have a try. + +00:03:11.343 --> 00:03:11.703 + + +00:03:11.803 --> 00:03:14.923 +So this is the binary produced by Java native image. + +00:03:15.023 --> 00:03:17.167 +And it's loading all the files + +00:03:17.168 --> 00:03:18.763 +needed for bootstrapping. + +00:03:18.863 --> 00:03:22.233 +Then it dumps the memory to a file to + +00:03:22.234 --> 00:03:24.923 +be loaded later, giving us fast startup. + +00:03:25.023 --> 00:03:28.723 +As we can see here, it throws some frame errors + +00:03:28.823 --> 00:03:31.400 +because Juicemacs doesn't have an editor UI + +00:03:31.401 --> 00:03:33.283 +or functional frames yet. + +00:03:33.383 --> 00:03:35.367 +But otherwise, it can already run + +00:03:35.368 --> 00:03:36.643 +quite some lisp code. + +00:03:36.743 --> 00:03:40.400 +For example, this code uses the benchmark library + +00:03:40.401 --> 00:03:44.403 +to measure the performance of this Fibonacci function. + +00:03:44.503 --> 00:03:47.067 +And we can see here, the JIT engine is + +00:03:47.068 --> 00:03:51.163 +already kicking in and makes the execution faster. + +00:03:51.263 --> 00:03:53.483 +In addition to that, with a bit of workaround, + +00:03:53.583 --> 00:03:56.467 +Juicemacs can also run some of the ERT, + +00:03:56.468 --> 00:04:01.043 +or, <b>E</b>macs <b>R</b>egression <b>T</b>est suite, that comes with Emacs. + +00:04:01.143 --> 00:04:05.823 +So... Yes, there are a bunch of test failures, + +00:04:05.923 --> 00:04:07.933 +which means we are not that compatible + +00:04:07.934 --> 00:04:09.523 +with Emacs and need more work. + +00:04:09.623 --> 00:04:12.803 +But the whole testing procedure runs fine, + +00:04:12.903 --> 00:04:14.767 +and it has proper stack traces, + +00:04:14.768 --> 00:04:17.803 +which is quite useful for debugging Juicemacs. + +00:04:17.903 --> 00:04:21.033 +So with that, a rather functional JIT runtime, + +00:04:21.034 --> 00:04:25.983 +let's now try look into today's topic, JIT compilation for ELisp. + +00:04:26.083 --> 00:04:28.533 +So, you probably know that Emacs has supported + +00:04:28.534 --> 00:04:32.083 +native-compilation, or nativecomp in short, for some time now. + +00:04:32.183 --> 00:04:35.033 +It mainly uses GCC to compile Lisp code + +00:04:35.034 --> 00:04:37.363 +into native code, ahead of time. + +00:04:37.463 --> 00:04:41.433 +And during runtime, Emacs loads those compiled files, + +00:04:41.434 --> 00:04:44.523 +and gets the performance of native code. + +00:04:44.623 --> 00:04:47.643 +However, for example, for installed packages, + +00:04:47.743 --> 00:04:49.059 +we might want to compile them when we + +00:04:49.060 --> 00:04:51.823 +actually use them instead of ahead of time. + +00:04:51.923 --> 00:04:53.733 +And Emacs supports this through + +00:04:53.734 --> 00:04:55.683 +this <i>native-comp-jit-compilation</i> flag. + +00:04:55.783 --> 00:04:59.767 +What it does is, during runtime, Emacs sends + +00:04:59.768 --> 00:05:03.203 +loaded files to external Emacs worker processes, + +00:05:03.303 --> 00:05:06.903 +which will then compile those files asynchronously. + +00:05:07.003 --> 00:05:09.043 +And when the compilation is done, + +00:05:09.143 --> 00:05:11.967 +the current Emacs session will load the compiled code back + +00:05:11.968 --> 00:05:16.323 +and improves its performance, on the fly. + +00:05:16.423 --> 00:05:18.643 +When you look at this procedure, however, it is, + +00:05:18.743 --> 00:05:21.563 +ahead-of-time compilation, done at runtime. + +00:05:21.663 --> 00:05:25.123 +And it is what current Emacs calls JIT compilation. + +00:05:25.223 --> 00:05:27.867 +But if you look at some other JIT engines, + +00:05:27.868 --> 00:05:31.803 +you'll see much more complex architectures. + +00:05:31.903 --> 00:05:34.233 +So, take luaJIT for an example, + +00:05:34.234 --> 00:05:36.163 +in addition to this red line here, + +00:05:36.263 --> 00:05:38.767 +which leads us from an interpreted state + +00:05:38.768 --> 00:05:40.643 +to a compiled native state, + +00:05:40.743 --> 00:05:42.163 +which is also what Emacs does, + +00:05:42.263 --> 00:05:44.333 +LuaJIT also supports going from + +00:05:44.334 --> 00:05:47.523 +a compiled state back to its interpreter. + +00:05:47.623 --> 00:05:51.483 +And this process is called "deoptimization". + +00:05:51.583 --> 00:05:55.300 +In contrast to its name, deoptimization here actually + +00:05:55.301 --> 00:05:58.563 +enables a huge category of JIT optimizations. + +00:05:58.663 --> 00:06:00.163 +They are called speculation. + +00:06:01.463 --> 00:06:04.600 +Basically, with speculation, the compiler + +00:06:04.601 --> 00:06:07.683 +can use runtime statistics to speculate, + +00:06:07.783 --> 00:06:11.443 +to make bolder assumptions in the compiled code. + +00:06:11.543 --> 00:06:13.983 +And when the assumptions are invalidated, + +00:06:14.083 --> 00:06:18.323 +the runtime deoptimizes the code, updates statistics, + +00:06:18.423 --> 00:06:21.133 +and then recompile the code based on new assumptions, + +00:06:21.134 --> 00:06:24.443 +and that will make the code more performant. + +00:06:24.543 --> 00:06:26.763 +Let's look at an example. + +00:06:28.463 --> 00:06:30.967 +So, here is a really simple function, + +00:06:30.968 --> 00:06:33.083 +that adds one to the input number. + +00:06:33.183 --> 00:06:36.167 +But in Emacs, it is not that simple, + +00:06:36.168 --> 00:06:38.203 +because Emacs has three categories of numbers, + +00:06:38.303 --> 00:06:42.700 +that is, fix numbers, or machine-word-sized integers, + +00:06:42.701 --> 00:06:45.603 +floating numbers, and big integers. + +00:06:45.703 --> 00:06:47.600 +And when we compile this, we need + +00:06:47.601 --> 00:06:49.363 +to handle all three cases. + +00:06:49.463 --> 00:06:52.600 +And if we analyze the code produced by Emacs, + +00:06:52.601 --> 00:06:54.683 +as is shown by this gray graph here, + +00:06:54.783 --> 00:06:58.083 +we can see that it has, two paths: + +00:06:58.183 --> 00:07:01.403 +One fast path, that does fast fix number addition; + +00:07:01.503 --> 00:07:03.967 +and one for slow paths, that calls out + +00:07:03.968 --> 00:07:06.523 +to an external plus-one function, + +00:07:06.623 --> 00:07:09.683 +to handle floating number and big integers. + +00:07:09.783 --> 00:07:13.167 +Now, if we pass integers into this function, + +00:07:13.168 --> 00:07:16.283 +it's pretty fast because it's on the fast path. + +00:07:16.383 --> 00:07:19.767 +However, if we pass in a floating number, + +00:07:19.768 --> 00:07:21.843 +then it has to go through the slow path, + +00:07:21.943 --> 00:07:25.563 +doing an extra function call, which is slow. + +00:07:25.663 --> 00:07:28.733 +What speculation might help here is that, + +00:07:28.734 --> 00:07:31.443 +it can have flexible fast paths. + +00:07:31.543 --> 00:07:34.563 +When we pass a floating number into this function, + +00:07:34.663 --> 00:07:37.400 +which currently has only fixnumbers on the fast path, + +00:07:37.401 --> 00:07:40.723 +it also has to go through the slow path. + +00:07:40.823 --> 00:07:44.567 +But the difference is that, a speculative runtime can + +00:07:44.568 --> 00:07:47.763 +deoptimize and recompile the code to adapt to this. + +00:07:47.863 --> 00:07:50.367 +And when it recompiles, it might add + +00:07:50.368 --> 00:07:52.643 +floating number onto the fast path, + +00:07:52.743 --> 00:07:55.003 +and now floating number operations are also fast. + +00:07:55.103 --> 00:07:58.567 +And this kind of speculation is why + +00:07:58.568 --> 00:08:03.603 +speculative runtime can be really fast. + +00:08:03.703 --> 00:08:05.723 +Let's take a look at some benchmarks. + +00:08:05.823 --> 00:08:09.423 +They're obtained with the <i>elisp-benchmarks</i> library on ELPA. + +00:08:09.523 --> 00:08:12.600 +The blue line here is for nativecomp, + +00:08:12.601 --> 00:08:16.043 +and these blue areas mean that nativecomp is slower. + +00:08:16.143 --> 00:08:19.133 +And, likewise, green areas mean that + +00:08:19.134 --> 00:08:20.523 +Juicemacs is slower. + +00:08:20.623 --> 00:08:22.867 +At a glance, the two (or four) + +00:08:22.868 --> 00:08:25.143 +actually seems somehow on par, to me. + +00:08:25.243 --> 00:08:30.383 +But, let's take a closer look at some of them. + +00:08:30.483 --> 00:08:32.667 +So, the first few benchmarks are the classic, + +00:08:32.668 --> 00:08:33.983 +Fibonacci benchmarks. + +00:08:34.083 --> 00:08:36.933 +We know that, the series is formed by + +00:08:36.934 --> 00:08:39.203 +adding the previous two numbers in the series. + +00:08:39.303 --> 00:08:41.700 +And looking at this expression here, + +00:08:41.701 --> 00:08:44.043 +Fibonacci benchmarks are quite intensive + +00:08:44.143 --> 00:08:46.800 +in number additions, subtractions, + +00:08:46.801 --> 00:08:49.103 +and function calls, if you use recursions. + +00:08:49.203 --> 00:08:51.000 +And it is exactly why + +00:08:51.001 --> 00:08:54.323 +Fibonacci series is a good benchmark. + +00:08:54.423 --> 00:08:57.243 +And looking at the results here... wow. + +00:08:57.343 --> 00:08:59.843 +Emacs nativecomp executes instantaneously. + +00:08:59.943 --> 00:09:04.523 +It's a total defeat for Juicemacs, seemingly. + +00:09:04.623 --> 00:09:08.043 +Now, if you're into benchmarks, you know something is wrong here: + +00:09:08.143 --> 00:09:11.683 +we are comparing the different things. + +00:09:11.783 --> 00:09:14.200 +So let's look under the hood + +00:09:14.201 --> 00:09:15.483 +and disassemble the function + +00:09:15.583 --> 00:09:17.567 +with this convenient Emacs command + +00:09:17.568 --> 00:09:19.063 +called <i>disassemble</i>... + +00:09:19.163 --> 00:09:23.043 +And these two lines of code is what we got. + +00:09:23.143 --> 00:09:24.700 +So, we already can see + +00:09:24.701 --> 00:09:26.123 +what's going on here: + +00:09:26.223 --> 00:09:29.963 +GCC sees Fibonacci is a pure function, + +00:09:30.063 --> 00:09:31.867 +because it returns the same value + +00:09:31.868 --> 00:09:33.243 +for the same arguments, + +00:09:33.343 --> 00:09:35.700 +so GCC chooses to do the computation + +00:09:35.701 --> 00:09:36.723 +at compile time + +00:09:36.823 --> 00:09:39.133 +and inserts the final number directly + +00:09:39.134 --> 00:09:40.323 +into the compiled code. + +00:09:41.823 --> 00:09:43.603 +It is actually great! + +00:09:43.703 --> 00:09:45.400 +Because it shows that nativecomp + +00:09:45.401 --> 00:09:47.283 +knows about pure functions, + +00:09:47.383 --> 00:09:48.700 +and can do all kinds of things + +00:09:48.701 --> 00:09:51.203 +like removing or constant-folding them. + +00:09:51.303 --> 00:09:54.403 +And Juicemacs just does not do that. + +00:09:54.503 --> 00:09:57.367 +However, we are also concerned about + +00:09:57.368 --> 00:09:59.003 +the things we mentioned earlier: + +00:09:59.103 --> 00:10:00.900 +the performance of number additions, + +00:10:00.901 --> 00:10:02.983 +or function calls. + +00:10:03.083 --> 00:10:05.633 +So, in order to let the benchmarks + +00:10:05.634 --> 00:10:06.863 +show some extra things, + +00:10:06.963 --> 00:10:08.367 +we need to modify it a bit... + +00:10:08.368 --> 00:10:11.323 +by simply making things non-constant. + +00:10:11.423 --> 00:10:15.203 +With that, Emacs gets much slower now. + +00:10:15.303 --> 00:10:17.133 +And again, let's look what's + +00:10:17.134 --> 00:10:21.083 +happening behind these numbers. + +00:10:21.183 --> 00:10:23.500 +Similarly, with the <i>disassemble</i> command, + +00:10:23.501 --> 00:10:25.643 +we can look into the assembly. + +00:10:25.743 --> 00:10:28.019 +And again, we can already see + +00:10:28.020 --> 00:10:29.303 +what's happening here. + +00:10:29.403 --> 00:10:32.083 +So, Juicemacs, due to its speculation nature, + +00:10:32.183 --> 00:10:35.443 +supports fast paths for all three kind of numbers. + +00:10:35.543 --> 00:10:39.233 +However, currently, Emacs nativecomp + +00:10:39.234 --> 00:10:41.243 +does not have any fast path + +00:10:41.343 --> 00:10:43.433 +for the operations here like additions, + +00:10:43.434 --> 00:10:45.803 +or subtractions, or comparisons, + +00:10:45.903 --> 00:10:48.067 +which is exactly what + +00:10:48.068 --> 00:10:50.963 +Fibonacci benchmarks are measuring. + +00:10:51.063 --> 00:10:53.800 +Emacs, at this time, has to call some generic, + +00:10:53.801 --> 00:10:57.963 +external functions for them, and this is slow. + +00:11:00.063 --> 00:11:03.203 +But is nativecomp really that slow? + +00:11:03.303 --> 00:11:04.967 +So, I also ran the same benchmark + +00:11:04.968 --> 00:11:07.083 +in Common Lisp, with SBCL. + +00:11:07.183 --> 00:11:09.000 +And nativecomp is already fast, + +00:11:09.001 --> 00:11:11.003 +compared to untyped SBCL. + +00:11:11.103 --> 00:11:15.500 +It's because SBCL also emits call instructions + +00:11:15.501 --> 00:11:18.483 +when it comes to no type info. + +00:11:18.583 --> 00:11:21.700 +However, once we declare the types, + +00:11:21.701 --> 00:11:25.283 +SBCL is able to compile a fast path for fix numbers, + +00:11:25.383 --> 00:11:27.467 +which makes its performance on par + +00:11:27.468 --> 00:11:30.683 +with speculative JIT engines (that is, Juicemacs), + +00:11:30.783 --> 00:11:34.763 +because, now both of us are now on fast paths. + +00:11:36.063 --> 00:11:38.400 +Additionally, if we are bold enough + +00:11:38.401 --> 00:11:41.203 +to pass this safety zero flag to SBCL, + +00:11:41.303 --> 00:11:43.700 +it will remove all the slow paths + +00:11:43.701 --> 00:11:44.963 +and type checks, + +00:11:45.063 --> 00:11:46.367 +and its performance is close + +00:11:46.368 --> 00:11:48.643 +to what you get with C. + +00:11:48.743 --> 00:11:51.299 +Well, probably we don't want safety zero + +00:11:51.300 --> 00:11:52.063 +most of the time. + +00:11:52.163 --> 00:11:55.133 +But even then, if nativecomp were to + +00:11:55.134 --> 00:11:57.763 +get fast paths for more constructs, + +00:11:57.863 --> 00:11:59.867 +there certainly is quite + +00:11:59.868 --> 00:12:03.563 +some room for performance improvement. + +00:12:04.063 --> 00:12:06.803 +Let's look at some more benchmarks. + +00:12:06.903 --> 00:12:08.933 +For example, for this inclist, + +00:12:08.934 --> 00:12:10.923 +or increment-list, benchmark, + +00:12:11.023 --> 00:12:14.333 +Juicemacs is really slow here. Partly, + +00:12:14.334 --> 00:12:17.603 +it comes from the cost of Java boxing integers. + +00:12:17.703 --> 00:12:20.300 +On the other hand, for Emacs nativecomp, + +00:12:20.301 --> 00:12:22.043 +for this particular benchmark, + +00:12:22.143 --> 00:12:23.667 +it actually has fast paths + +00:12:23.668 --> 00:12:25.523 +for all of the operations. + +00:12:25.623 --> 00:12:27.723 +And that's why it can be so fast, + +00:12:27.823 --> 00:12:30.667 +and that also proves the nativecomp + +00:12:30.668 --> 00:12:33.843 +has a lot potential for improvement. + +00:12:33.943 --> 00:12:35.833 +There is another benchmark here + +00:12:35.834 --> 00:12:37.963 +that use advices. + +00:12:38.063 --> 00:12:40.500 +So Emacs Lisp supports using + +00:12:40.501 --> 00:12:42.203 +advices to override functions + +00:12:42.303 --> 00:12:44.833 +by wrapping the original function, and an advice + +00:12:44.834 --> 00:12:47.443 +function, two of them, inside a glue function. + +00:12:47.543 --> 00:12:51.467 +And in this benchmark, we advice the Fibonacci function + +00:12:51.468 --> 00:12:54.523 +to cache the first ten entries to speed up computation, + +00:12:54.623 --> 00:13:00.003 +as can be seen in the speed-up in the Juicemacs results. + +00:13:00.103 --> 00:13:02.900 +However, it seems that nativecomp does not yet + +00:13:02.901 --> 00:13:08.523 +compile glue functions, and that makes advices slower. + +00:13:08.623 --> 00:13:12.043 +With these benchmarks, let's discuss this big question: + +00:13:12.143 --> 00:13:16.563 +Should GNU Emacs adopt speculative JIT compilation? + +00:13:16.663 --> 00:13:18.967 +Well, the hidden question is actually, + +00:13:18.968 --> 00:13:21.223 +is it worth it? + +00:13:21.323 --> 00:13:24.163 +And, my personal answer is, maybe not. + +00:13:24.263 --> 00:13:28.133 +The first reason is that, slow paths, like, floating numbers, + +00:13:28.134 --> 00:13:31.043 +are actually not that frequent in Emacs. + +00:13:31.143 --> 00:13:34.100 +And optimizing for fast paths like fix numbers + +00:13:34.101 --> 00:13:37.983 +can already get us very good performance already. + +00:13:38.083 --> 00:13:40.333 +And the second or main reason is that, + +00:13:40.334 --> 00:13:43.163 +speculative JIT is very hard. + +00:13:43.263 --> 00:13:46.843 +LuaJIT, for example, took a genius to build. + +00:13:46.943 --> 00:13:50.967 +Even with the help of GCC, we need to hand-write + +00:13:50.968 --> 00:13:54.283 +all those fast path or slow path or switching logic. + +00:13:54.383 --> 00:13:58.133 +We need to find a way to deoptimize, which requires + +00:13:58.134 --> 00:14:01.803 +mapping machine registers back to interpreter stack. + +00:14:01.903 --> 00:14:04.067 +And also, speculation needs runtime info, + +00:14:04.068 --> 00:14:07.323 +which also costs us extra memory. + +00:14:07.423 --> 00:14:10.763 +Moreover, as is shown by some benchmarks above, + +00:14:10.863 --> 00:14:13.333 +there's some low-hanging fruits in nativecomp that + +00:14:13.334 --> 00:14:17.343 +might get us better performance with relatively lower effort. + +00:14:17.443 --> 00:14:22.163 +Compared to this, a JIT engine is a huge, huge undertaking. + +00:14:22.263 --> 00:14:26.123 +But, for Juicemacs, the JIT engine comes a lot cheaper, + +00:14:26.223 --> 00:14:29.067 +because, we are cheating by building on + +00:14:29.068 --> 00:14:33.443 +an existing compiler framework called Truffle. + +00:14:33.543 --> 00:14:35.883 +Truffle is a meta-compiler framework, + +00:14:35.983 --> 00:14:37.633 +which means that it lets you write + +00:14:37.634 --> 00:14:40.103 +an interpreter, add required annotations, + +00:14:40.203 --> 00:14:42.500 +and it will automatically turn the + +00:14:42.501 --> 00:14:45.643 +interpreter into a JIT runtime. + +00:14:45.743 --> 00:14:49.083 +So for example, here is a typical bytecode interpreter. + +00:14:49.183 --> 00:14:51.233 +After you add the required annotations, + +00:14:51.234 --> 00:14:52.523 +Truffle will know that, + +00:14:52.623 --> 00:14:55.533 +the bytecode here is constant, and it should + +00:14:55.534 --> 00:14:59.123 +unroll this loop here, to inline all those bytecode. + +00:14:59.223 --> 00:15:00.467 +And then, when Truffle + +00:15:00.468 --> 00:15:02.243 +compiles the code, it knows that: + +00:15:02.343 --> 00:15:05.233 +the first loop here does: x plus one, + +00:15:05.234 --> 00:15:07.723 +and the second does: return. + +00:15:07.823 --> 00:15:09.533 +And then it will compile all that into, + +00:15:09.534 --> 00:15:11.363 +return x plus 1, + +00:15:11.463 --> 00:15:14.067 +which is exactly what we would expect + +00:15:14.068 --> 00:15:17.683 +when compiling this pseudo code. + +00:15:17.783 --> 00:15:21.083 +Building on that, we can also easily implement speculation, + +00:15:21.183 --> 00:15:24.867 +by using this <i>transferToInterpreterAndInvalidate</i> function + +00:15:24.868 --> 00:15:26.123 +provided by Truffle. + +00:15:26.223 --> 00:15:28.533 +And Truffle will automatically turn that + +00:15:28.534 --> 00:15:30.683 +into deoptimization. + +00:15:30.783 --> 00:15:32.700 +Now, for example, when this add function + +00:15:32.701 --> 00:15:35.723 +is supplied with, two floating numbers. + +00:15:35.823 --> 00:15:38.243 +It will go through the slow path here, + +00:15:38.343 --> 00:15:40.960 +which might lead to a compiled slow path, + +00:15:40.961 --> 00:15:43.203 +or deoptimization. + +00:15:43.303 --> 00:15:45.733 +And going this deoptimization way, + +00:15:45.734 --> 00:15:48.223 +it can then update the runtime stats. + +00:15:48.323 --> 00:15:50.400 +And now, when the code is compiled again, + +00:15:50.401 --> 00:15:51.603 +Truffle will know, + +00:15:51.703 --> 00:15:54.100 +that these compilation stats, suggests that, + +00:15:54.101 --> 00:15:55.563 +we have floating numbers. + +00:15:55.663 --> 00:15:58.733 +And this floating point addition branch will + +00:15:58.734 --> 00:16:02.603 +then be incorporated into the fast path. + +00:16:02.703 --> 00:16:06.003 +To put it into Java code... + +00:16:06.103 --> 00:16:08.723 +Most operations are just as simple as this. + +00:16:08.823 --> 00:16:11.033 +And it supports fast paths for integers, + +00:16:11.034 --> 00:16:13.963 +floating numbers, and big integers. + +00:16:14.063 --> 00:16:17.133 +And the simplicity of this not only saves us work, + +00:16:17.134 --> 00:16:22.243 +but also enables Juicemacs to explore more things more rapidly. + +00:16:22.343 --> 00:16:26.483 +And actually, I have done some silly explorations. + +00:16:26.583 --> 00:16:30.203 +For example, I tried to constant-fold more things. + +00:16:30.303 --> 00:16:32.767 +Many of us have an Emacs config that stays + +00:16:32.768 --> 00:16:36.683 +largely unchanged, at least during one Emacs session. + +00:16:36.783 --> 00:16:39.667 +And that means many of the global variables + +00:16:39.668 --> 00:16:42.323 +in ELisp are constant. + +00:16:42.423 --> 00:16:44.600 +And with speculation, we can + +00:16:44.601 --> 00:16:46.683 +speculate about the stable ones, + +00:16:46.783 --> 00:16:49.563 +and try to inline them as constants. + +00:16:49.663 --> 00:16:51.733 +And this might improve performance, + +00:16:51.734 --> 00:16:53.083 +or maybe not? + +00:16:53.183 --> 00:16:55.367 +Because, we will need a full editor + +00:16:55.368 --> 00:16:58.123 +to get real world data. + +00:16:58.223 --> 00:17:01.733 +I also tried changing cons lists to be backed + +00:17:01.734 --> 00:17:05.243 +by some arrays, because, maybe arrays are faster, I guess? + +00:17:05.343 --> 00:17:09.033 +But in the end, <i>setcdr</i> requires some kind of indirection, + +00:17:09.034 --> 00:17:12.883 +and that actually makes the performance worse. + +00:17:12.983 --> 00:17:14.733 +And for regular expressions, + +00:17:14.734 --> 00:17:17.923 +I also tried borrowing techniques from PCRE JIT, + +00:17:18.023 --> 00:17:20.667 +which is quite fast in itself, but it is + +00:17:20.668 --> 00:17:24.163 +unfortunately unsupported by Java Truffle runtime. + +00:17:24.263 --> 00:17:27.333 +So, looking at these, well, + +00:17:27.334 --> 00:17:30.243 +explorations can fail, certainly. + +00:17:30.343 --> 00:17:32.800 +But, with Truffle and Java, these, + +00:17:32.801 --> 00:17:34.883 +for now, are not that hard to implement, + +00:17:34.983 --> 00:17:37.667 +and also very often, they teach us something + +00:17:37.668 --> 00:17:42.363 +in return, whether or not they fail. + +00:17:42.463 --> 00:17:45.333 +Finally, let's talk about some explorations + +00:17:45.334 --> 00:17:47.883 +that we might get into in the future. + +00:17:47.983 --> 00:17:49.683 +For the JIT engine, for example, + +00:17:49.783 --> 00:17:52.633 +currently I'm looking into the implementation of + +00:17:52.634 --> 00:17:56.883 +nativecomp to maybe reuse some of its optimizations. + +00:17:56.983 --> 00:18:01.323 +For the GUI, I'm very very slowly working on one. + +00:18:01.423 --> 00:18:03.733 +If it ever completes, I have one thing + +00:18:03.734 --> 00:18:06.603 +I'm really looking forward to implementing. + +00:18:06.703 --> 00:18:08.900 +That is, inlining widgets, or even + +00:18:08.901 --> 00:18:11.763 +other buffers, directly into a buffer. + +00:18:11.863 --> 00:18:13.967 +Well, it's because, people sometimes complain + +00:18:13.968 --> 00:18:16.003 +about Emacs's GUI capabilities, + +00:18:16.103 --> 00:18:19.767 +But I personally think that supporting inlining, + +00:18:19.768 --> 00:18:23.043 +like a whole buffer inside another buffer as a rectangle, + +00:18:23.143 --> 00:18:26.883 +could get us very far in layout abilities. + +00:18:26.983 --> 00:18:28.567 +And this approach should also + +00:18:28.568 --> 00:18:30.843 +be compatible with terminals. + +00:18:30.943 --> 00:18:32.933 +And I really want to see how this idea + +00:18:32.934 --> 00:18:36.003 +plays out with Juicemacs. + +00:18:36.103 --> 00:18:38.963 +And of course, there's Lisp concurrency. + +00:18:39.063 --> 00:18:42.167 +And currently i'm thinking of a JavaScript-like, + +00:18:42.168 --> 00:18:46.283 +transparent, single-thread model, using Java's virtual threads. + +00:18:46.383 --> 00:18:49.967 +But anyway, if you are interested in JIT compilation, + +00:18:49.968 --> 00:18:51.663 +Truffle, or anything above, + +00:18:51.763 --> 00:18:53.867 +or maybe you have your own ideas, + +00:18:53.868 --> 00:18:56.283 +you are very welcome to reach out! + +00:18:56.383 --> 00:19:00.033 +Juicemacs does need to implement many more built-in functions, + +00:19:00.034 --> 00:19:03.063 +and any help would be very appreciated. + +00:19:03.163 --> 00:19:05.800 +And I promise, it can be a very fun playground + +00:19:05.801 --> 00:19:08.343 +to learn about Emacs and do crazy things. + +00:19:08.443 --> 00:19:10.902 +Thank you! |
