WEBVTT captioned by kana 00:00:01.200 --> 00:00:02.803 Hello! This is Kana! 00:00:02.903 --> 00:00:04.367 And today I'll be talking about 00:00:04.368 --> 00:00:06.067 Just-In-Time compilation, or JIT, 00:00:06.068 --> 00:00:07.363 for Emacs Lisp, 00:00:07.463 --> 00:00:11.163 based on my work-in-progress Emacs clone, Juicemacs. 00:00:11.263 --> 00:00:13.533 Juicemacs aims to explore a few things 00:00:13.534 --> 00:00:15.843 that I've been wondering about for a while. 00:00:15.943 --> 00:00:18.567 For exmaple, what if we had better or even 00:00:18.568 --> 00:00:21.223 transparent concurrency in ELisp? 00:00:21.323 --> 00:00:23.243 Or, can we have a concurrent GUI? 00:00:23.343 --> 00:00:26.783 One that does not block, or is blocked by Lisp code? 00:00:26.883 --> 00:00:31.067 And finally what can JIT compilation do for ELisp? 00:00:31.068 --> 00:00:34.083 Will it provide better performance? 00:00:34.183 --> 00:00:37.400 However, a main problem with explorations 00:00:37.401 --> 00:00:38.623 in Emacs clones is that, 00:00:38.723 --> 00:00:40.863 Emacs is a whole universe. 00:00:40.963 --> 00:00:43.600 And that means, to make these explorations 00:00:43.601 --> 00:00:45.383 meaningful for Emacs users, 00:00:45.483 --> 00:00:47.967 we need to cover a lot of Emacs features, 00:00:47.968 --> 00:00:50.543 before we can ever begin. 00:00:50.643 --> 00:00:53.923 For example, one of the features of Emacs is that, 00:00:54.023 --> 00:00:56.003 it supports a lot of encodings. 00:00:56.103 --> 00:00:59.267 Let's look at this string: it can be encoded 00:00:59.268 --> 00:01:03.643 in both Unicode and Shift-JIS, a Japanese encoding system. 00:01:03.743 --> 00:01:07.067 But currently, Unicode does not have 00:01:07.068 --> 00:01:09.803 an official mapping for this "ki" (﨑) character. 00:01:09.903 --> 00:01:12.767 So when we map from Shift-JIS to Unicode, 00:01:12.768 --> 00:01:14.423 in most programming languages, 00:01:14.523 --> 00:01:16.533 you end up with something like this: 00:01:16.534 --> 00:01:19.143 it's a replacement character. 00:01:19.243 --> 00:01:22.067 But in Emacs, it actually extends 00:01:22.068 --> 00:01:23.883 the Unicode range by threefold, 00:01:23.983 --> 00:01:26.833 and uses the extra range to losslessly 00:01:26.834 --> 00:01:29.483 support characters like this. 00:01:29.583 --> 00:01:31.923 So if you want to support this feature, 00:01:32.023 --> 00:01:34.033 that basically rules out all string 00:01:34.034 --> 00:01:37.243 libraries with Unicode assumptions. 00:01:37.843 --> 00:01:40.067 For another, you need to support 00:01:40.068 --> 00:01:41.883 the regular expressions in Emacs, 00:01:41.983 --> 00:01:45.023 which are, really irregular. 00:01:45.123 --> 00:01:46.900 For example, it supports asserting 00:01:46.901 --> 00:01:49.403 about the user cursor position. 00:01:49.503 --> 00:01:52.033 And it also uses some character tables, 00:01:52.034 --> 00:01:53.883 that can be modified from Lisp code, 00:01:53.983 --> 00:01:56.163 to determine to case mappings. 00:01:56.263 --> 00:01:59.567 And all that makes it really hard, or even 00:01:59.568 --> 00:02:05.123 impossible to use any existing regexp libraries. 00:02:05.223 --> 00:02:07.883 Also, you need a functional garbage collector. 00:02:07.983 --> 00:02:09.867 You need threading primitives, because 00:02:09.868 --> 00:02:12.323 Emacs has already had some threading support. 00:02:12.423 --> 00:02:14.533 And you might want the performance of your clone 00:02:14.534 --> 00:02:18.963 to match Emacs, even with its native compilation enabled. 00:02:19.063 --> 00:02:21.500 Not to mention you also need a GUI for an editor. 00:02:21.501 --> 00:02:23.543 And so on. 00:02:23.643 --> 00:02:25.633 For Juicemacs, building on Java and 00:02:25.634 --> 00:02:27.563 a compiler framework called Truffle, 00:02:27.663 --> 00:02:30.503 helps in getting better performance; 00:02:30.603 --> 00:02:32.933 and by choosing a language with a good GC, 00:02:32.934 --> 00:02:38.063 we can actually focus more on the challenges above. 00:02:38.163 --> 00:02:41.433 Currently, Juicemacs has implemented three out of, 00:02:41.434 --> 00:02:43.983 at least four of the interpreters in Emacs. 00:02:44.083 --> 00:02:46.363 One for lisp code, one for bytecode, 00:02:46.463 --> 00:02:48.567 and one for regular expressions, 00:02:48.568 --> 00:02:50.903 all of them JIT-capable. 00:02:51.003 --> 00:02:53.667 Other than these, Emacs also has around 00:02:53.668 --> 00:02:56.083 two thousand built-in functions in C code. 00:02:56.183 --> 00:02:57.333 And Juicemacs has around 00:02:57.334 --> 00:02:59.763 four hundred of them implemented. 00:02:59.863 --> 00:03:03.603 It's not that many, but it is surprisingly enough 00:03:03.703 --> 00:03:05.200 to bootstrap Emacs and run 00:03:05.201 --> 00:03:08.483 the portable dumper, or pdump, in short. 00:03:08.583 --> 00:03:11.243 Let's have a try. 00:03:11.343 --> 00:03:11.703 00:03:11.803 --> 00:03:14.923 So this is the binary produced by Java native image. 00:03:15.023 --> 00:03:17.167 And it's loading all the files 00:03:17.168 --> 00:03:18.763 needed for bootstrapping. 00:03:18.863 --> 00:03:22.233 Then it dumps the memory to a file to 00:03:22.234 --> 00:03:24.923 be loaded later, giving us fast startup. 00:03:25.023 --> 00:03:28.723 As we can see here, it throws some frame errors 00:03:28.823 --> 00:03:31.400 because Juicemacs doesn't have an editor UI 00:03:31.401 --> 00:03:33.283 or functional frames yet. 00:03:33.383 --> 00:03:35.367 But otherwise, it can already run 00:03:35.368 --> 00:03:36.643 quite some lisp code. 00:03:36.743 --> 00:03:40.400 For example, this code uses the benchmark library 00:03:40.401 --> 00:03:44.403 to measure the performance of this Fibonacci function. 00:03:44.503 --> 00:03:47.067 And we can see here, the JIT engine is 00:03:47.068 --> 00:03:51.163 already kicking in and makes the execution faster. 00:03:51.263 --> 00:03:53.483 In addition to that, with a bit of workaround, 00:03:53.583 --> 00:03:56.467 Juicemacs can also run some of the ERT, 00:03:56.468 --> 00:04:01.043 or, Emacs Regression Test suite, that comes with Emacs. 00:04:01.143 --> 00:04:05.823 So... Yes, there are a bunch of test failures, 00:04:05.923 --> 00:04:07.933 which means we are not that compatible 00:04:07.934 --> 00:04:09.523 with Emacs and need more work. 00:04:09.623 --> 00:04:12.803 But the whole testing procedure runs fine, 00:04:12.903 --> 00:04:14.767 and it has proper stack traces, 00:04:14.768 --> 00:04:17.803 which is quite useful for debugging Juicemacs. 00:04:17.903 --> 00:04:21.033 So with that, a rather functional JIT runtime, 00:04:21.034 --> 00:04:25.983 let's now try look into today's topic, JIT compilation for ELisp. 00:04:26.083 --> 00:04:28.533 So, you probably know that Emacs has supported 00:04:28.534 --> 00:04:32.083 native-compilation, or nativecomp in short, for some time now. 00:04:32.183 --> 00:04:35.033 It mainly uses GCC to compile Lisp code 00:04:35.034 --> 00:04:37.363 into native code, ahead of time. 00:04:37.463 --> 00:04:41.433 And during runtime, Emacs loads those compiled files, 00:04:41.434 --> 00:04:44.523 and gets the performance of native code. 00:04:44.623 --> 00:04:47.643 However, for example, for installed packages, 00:04:47.743 --> 00:04:49.059 we might want to compile them when we 00:04:49.060 --> 00:04:51.823 actually use them instead of ahead of time. 00:04:51.923 --> 00:04:53.733 And Emacs supports this through 00:04:53.734 --> 00:04:55.683 this native-comp-jit-compilation flag. 00:04:55.783 --> 00:04:59.767 What it does is, during runtime, Emacs sends 00:04:59.768 --> 00:05:03.203 loaded files to external Emacs worker processes, 00:05:03.303 --> 00:05:06.903 which will then compile those files asynchronously. 00:05:07.003 --> 00:05:09.043 And when the compilation is done, 00:05:09.143 --> 00:05:11.967 the current Emacs session will load the compiled code back 00:05:11.968 --> 00:05:16.323 and improves its performance, on the fly. 00:05:16.423 --> 00:05:18.643 When you look at this procedure, however, it is, 00:05:18.743 --> 00:05:21.563 ahead-of-time compilation, done at runtime. 00:05:21.663 --> 00:05:25.123 And it is what current Emacs calls JIT compilation. 00:05:25.223 --> 00:05:27.867 But if you look at some other JIT engines, 00:05:27.868 --> 00:05:31.803 you'll see much more complex architectures. 00:05:31.903 --> 00:05:34.233 So, take luaJIT for an example, 00:05:34.234 --> 00:05:36.163 in addition to this red line here, 00:05:36.263 --> 00:05:38.767 which leads us from an interpreted state 00:05:38.768 --> 00:05:40.643 to a compiled native state, 00:05:40.743 --> 00:05:42.163 which is also what Emacs does, 00:05:42.263 --> 00:05:44.333 LuaJIT also supports going from 00:05:44.334 --> 00:05:47.523 a compiled state back to its interpreter. 00:05:47.623 --> 00:05:51.483 And this process is called "deoptimization". 00:05:51.583 --> 00:05:55.300 In contrast to its name, deoptimization here actually 00:05:55.301 --> 00:05:58.563 enables a huge category of JIT optimizations. 00:05:58.663 --> 00:06:00.163 They are called speculation. 00:06:01.463 --> 00:06:04.600 Basically, with speculation, the compiler 00:06:04.601 --> 00:06:07.683 can use runtime statistics to speculate, 00:06:07.783 --> 00:06:11.443 to make bolder assumptions in the compiled code. 00:06:11.543 --> 00:06:13.983 And when the assumptions are invalidated, 00:06:14.083 --> 00:06:18.323 the runtime deoptimizes the code, updates statistics, 00:06:18.423 --> 00:06:21.133 and then recompile the code based on new assumptions, 00:06:21.134 --> 00:06:24.443 and that will make the code more performant. 00:06:24.543 --> 00:06:26.763 Let's look at an example. 00:06:28.463 --> 00:06:30.967 So, here is a really simple function, 00:06:30.968 --> 00:06:33.083 that adds one to the input number. 00:06:33.183 --> 00:06:36.167 But in Emacs, it is not that simple, 00:06:36.168 --> 00:06:38.203 because Emacs has three categories of numbers, 00:06:38.303 --> 00:06:42.700 that is, fix numbers, or machine-word-sized integers, 00:06:42.701 --> 00:06:45.603 floating numbers, and big integers. 00:06:45.703 --> 00:06:47.600 And when we compile this, we need 00:06:47.601 --> 00:06:49.363 to handle all three cases. 00:06:49.463 --> 00:06:52.600 And if we analyze the code produced by Emacs, 00:06:52.601 --> 00:06:54.683 as is shown by this gray graph here, 00:06:54.783 --> 00:06:58.083 we can see that it has, two paths: 00:06:58.183 --> 00:07:01.403 One fast path, that does fast fix number addition; 00:07:01.503 --> 00:07:03.967 and one for slow paths, that calls out 00:07:03.968 --> 00:07:06.523 to an external plus-one function, 00:07:06.623 --> 00:07:09.683 to handle floating number and big integers. 00:07:09.783 --> 00:07:13.167 Now, if we pass integers into this function, 00:07:13.168 --> 00:07:16.283 it's pretty fast because it's on the fast path. 00:07:16.383 --> 00:07:19.767 However, if we pass in a floating number, 00:07:19.768 --> 00:07:21.843 then it has to go through the slow path, 00:07:21.943 --> 00:07:25.563 doing an extra function call, which is slow. 00:07:25.663 --> 00:07:28.733 What speculation might help here is that, 00:07:28.734 --> 00:07:31.443 it can have flexible fast paths. 00:07:31.543 --> 00:07:34.563 When we pass a floating number into this function, 00:07:34.663 --> 00:07:37.400 which currently has only fixnumbers on the fast path, 00:07:37.401 --> 00:07:40.723 it also has to go through the slow path. 00:07:40.823 --> 00:07:44.567 But the difference is that, a speculative runtime can 00:07:44.568 --> 00:07:47.763 deoptimize and recompile the code to adapt to this. 00:07:47.863 --> 00:07:50.367 And when it recompiles, it might add 00:07:50.368 --> 00:07:52.643 floating number onto the fast path, 00:07:52.743 --> 00:07:55.003 and now floating number operations are also fast. 00:07:55.103 --> 00:07:58.567 And this kind of speculation is why 00:07:58.568 --> 00:08:03.603 speculative runtime can be really fast. 00:08:03.703 --> 00:08:05.723 Let's take a look at some benchmarks. 00:08:05.823 --> 00:08:09.423 They're obtained with the elisp-benchmarks library on ELPA. 00:08:09.523 --> 00:08:12.600 The blue line here is for nativecomp, 00:08:12.601 --> 00:08:16.043 and these blue areas mean that nativecomp is slower. 00:08:16.143 --> 00:08:19.133 And, likewise, green areas mean that 00:08:19.134 --> 00:08:20.523 Juicemacs is slower. 00:08:20.623 --> 00:08:22.867 At a glance, the two (or four) 00:08:22.868 --> 00:08:25.143 actually seems somehow on par, to me. 00:08:25.243 --> 00:08:30.383 But, let's take a closer look at some of them. 00:08:30.483 --> 00:08:32.667 So, the first few benchmarks are the classic, 00:08:32.668 --> 00:08:33.983 Fibonacci benchmarks. 00:08:34.083 --> 00:08:36.933 We know that, the series is formed by 00:08:36.934 --> 00:08:39.203 adding the previous two numbers in the series. 00:08:39.303 --> 00:08:41.700 And looking at this expression here, 00:08:41.701 --> 00:08:44.043 Fibonacci benchmarks are quite intensive 00:08:44.143 --> 00:08:46.800 in number additions, subtractions, 00:08:46.801 --> 00:08:49.103 and function calls, if you use recursions. 00:08:49.203 --> 00:08:51.000 And it is exactly why 00:08:51.001 --> 00:08:54.323 Fibonacci series is a good benchmark. 00:08:54.423 --> 00:08:57.243 And looking at the results here... wow. 00:08:57.343 --> 00:08:59.843 Emacs nativecomp executes instantaneously. 00:08:59.943 --> 00:09:04.523 It's a total defeat for Juicemacs, seemingly. 00:09:04.623 --> 00:09:08.043 Now, if you're into benchmarks, you know something is wrong here: 00:09:08.143 --> 00:09:11.683 we are comparing the different things. 00:09:11.783 --> 00:09:14.200 So let's look under the hood 00:09:14.201 --> 00:09:15.483 and disassemble the function 00:09:15.583 --> 00:09:17.567 with this convenient Emacs command 00:09:17.568 --> 00:09:19.063 called disassemble... 00:09:19.163 --> 00:09:23.043 And these two lines of code is what we got. 00:09:23.143 --> 00:09:24.700 So, we already can see 00:09:24.701 --> 00:09:26.123 what's going on here: 00:09:26.223 --> 00:09:29.963 GCC sees Fibonacci is a pure function, 00:09:30.063 --> 00:09:31.867 because it returns the same value 00:09:31.868 --> 00:09:33.243 for the same arguments, 00:09:33.343 --> 00:09:35.700 so GCC chooses to do the computation 00:09:35.701 --> 00:09:36.723 at compile time 00:09:36.823 --> 00:09:39.133 and inserts the final number directly 00:09:39.134 --> 00:09:40.323 into the compiled code. 00:09:41.823 --> 00:09:43.603 It is actually great! 00:09:43.703 --> 00:09:45.400 Because it shows that nativecomp 00:09:45.401 --> 00:09:47.283 knows about pure functions, 00:09:47.383 --> 00:09:48.700 and can do all kinds of things 00:09:48.701 --> 00:09:51.203 like removing or constant-folding them. 00:09:51.303 --> 00:09:54.403 And Juicemacs just does not do that. 00:09:54.503 --> 00:09:57.367 However, we are also concerned about 00:09:57.368 --> 00:09:59.003 the things we mentioned earlier: 00:09:59.103 --> 00:10:00.900 the performance of number additions, 00:10:00.901 --> 00:10:02.983 or function calls. 00:10:03.083 --> 00:10:05.633 So, in order to let the benchmarks 00:10:05.634 --> 00:10:06.863 show some extra things, 00:10:06.963 --> 00:10:08.367 we need to modify it a bit... 00:10:08.368 --> 00:10:11.323 by simply making things non-constant. 00:10:11.423 --> 00:10:15.203 With that, Emacs gets much slower now. 00:10:15.303 --> 00:10:17.133 And again, let's look what's 00:10:17.134 --> 00:10:21.083 happening behind these numbers. 00:10:21.183 --> 00:10:23.500 Similarly, with the disassemble command, 00:10:23.501 --> 00:10:25.643 we can look into the assembly. 00:10:25.743 --> 00:10:28.019 And again, we can already see 00:10:28.020 --> 00:10:29.303 what's happening here. 00:10:29.403 --> 00:10:32.083 So, Juicemacs, due to its speculation nature, 00:10:32.183 --> 00:10:35.443 supports fast paths for all three kind of numbers. 00:10:35.543 --> 00:10:39.233 However, currently, Emacs nativecomp 00:10:39.234 --> 00:10:41.243 does not have any fast path 00:10:41.343 --> 00:10:43.433 for the operations here like additions, 00:10:43.434 --> 00:10:45.803 or subtractions, or comparisons, 00:10:45.903 --> 00:10:48.067 which is exactly what 00:10:48.068 --> 00:10:50.963 Fibonacci benchmarks are measuring. 00:10:51.063 --> 00:10:53.800 Emacs, at this time, has to call some generic, 00:10:53.801 --> 00:10:57.963 external functions for them, and this is slow. 00:11:00.063 --> 00:11:03.203 But is nativecomp really that slow? 00:11:03.303 --> 00:11:04.967 So, I also ran the same benchmark 00:11:04.968 --> 00:11:07.083 in Common Lisp, with SBCL. 00:11:07.183 --> 00:11:09.000 And nativecomp is already fast, 00:11:09.001 --> 00:11:11.003 compared to untyped SBCL. 00:11:11.103 --> 00:11:15.500 It's because SBCL also emits call instructions 00:11:15.501 --> 00:11:18.483 when it comes to no type info. 00:11:18.583 --> 00:11:21.700 However, once we declare the types, 00:11:21.701 --> 00:11:25.283 SBCL is able to compile a fast path for fix numbers, 00:11:25.383 --> 00:11:27.467 which makes its performance on par 00:11:27.468 --> 00:11:30.683 with speculative JIT engines (that is, Juicemacs), 00:11:30.783 --> 00:11:34.763 because, now both of us are now on fast paths. 00:11:36.063 --> 00:11:38.400 Additionally, if we are bold enough 00:11:38.401 --> 00:11:41.203 to pass this safety zero flag to SBCL, 00:11:41.303 --> 00:11:43.700 it will remove all the slow paths 00:11:43.701 --> 00:11:44.963 and type checks, 00:11:45.063 --> 00:11:46.367 and its performance is close 00:11:46.368 --> 00:11:48.643 to what you get with C. 00:11:48.743 --> 00:11:51.299 Well, probably we don't want safety zero 00:11:51.300 --> 00:11:52.063 most of the time. 00:11:52.163 --> 00:11:55.133 But even then, if nativecomp were to 00:11:55.134 --> 00:11:57.763 get fast paths for more constructs, 00:11:57.863 --> 00:11:59.867 there certainly is quite 00:11:59.868 --> 00:12:03.563 some room for performance improvement. 00:12:04.063 --> 00:12:06.803 Let's look at some more benchmarks. 00:12:06.903 --> 00:12:08.933 For example, for this inclist, 00:12:08.934 --> 00:12:10.923 or increment-list, benchmark, 00:12:11.023 --> 00:12:14.333 Juicemacs is really slow here. Partly, 00:12:14.334 --> 00:12:17.603 it comes from the cost of Java boxing integers. 00:12:17.703 --> 00:12:20.300 On the other hand, for Emacs nativecomp, 00:12:20.301 --> 00:12:22.043 for this particular benchmark, 00:12:22.143 --> 00:12:23.667 it actually has fast paths 00:12:23.668 --> 00:12:25.523 for all of the operations. 00:12:25.623 --> 00:12:27.723 And that's why it can be so fast, 00:12:27.823 --> 00:12:30.667 and that also proves the nativecomp 00:12:30.668 --> 00:12:33.843 has a lot potential for improvement. 00:12:33.943 --> 00:12:35.833 There is another benchmark here 00:12:35.834 --> 00:12:37.963 that use advices. 00:12:38.063 --> 00:12:40.500 So Emacs Lisp supports using 00:12:40.501 --> 00:12:42.203 advices to override functions 00:12:42.303 --> 00:12:44.833 by wrapping the original function, and an advice 00:12:44.834 --> 00:12:47.443 function, two of them, inside a glue function. 00:12:47.543 --> 00:12:51.467 And in this benchmark, we advice the Fibonacci function 00:12:51.468 --> 00:12:54.523 to cache the first ten entries to speed up computation, 00:12:54.623 --> 00:13:00.003 as can be seen in the speed-up in the Juicemacs results. 00:13:00.103 --> 00:13:02.900 However, it seems that nativecomp does not yet 00:13:02.901 --> 00:13:08.523 compile glue functions, and that makes advices slower. 00:13:08.623 --> 00:13:12.043 With these benchmarks, let's discuss this big question: 00:13:12.143 --> 00:13:16.563 Should GNU Emacs adopt speculative JIT compilation? 00:13:16.663 --> 00:13:18.967 Well, the hidden question is actually, 00:13:18.968 --> 00:13:21.223 is it worth it? 00:13:21.323 --> 00:13:24.163 And, my personal answer is, maybe not. 00:13:24.263 --> 00:13:28.133 The first reason is that, slow paths, like, floating numbers, 00:13:28.134 --> 00:13:31.043 are actually not that frequent in Emacs. 00:13:31.143 --> 00:13:34.100 And optimizing for fast paths like fix numbers 00:13:34.101 --> 00:13:37.983 can already get us very good performance already. 00:13:38.083 --> 00:13:40.333 And the second or main reason is that, 00:13:40.334 --> 00:13:43.163 speculative JIT is very hard. 00:13:43.263 --> 00:13:46.843 LuaJIT, for example, took a genius to build. 00:13:46.943 --> 00:13:50.967 Even with the help of GCC, we need to hand-write 00:13:50.968 --> 00:13:54.283 all those fast path or slow path or switching logic. 00:13:54.383 --> 00:13:58.133 We need to find a way to deoptimize, which requires 00:13:58.134 --> 00:14:01.803 mapping machine registers back to interpreter stack. 00:14:01.903 --> 00:14:04.067 And also, speculation needs runtime info, 00:14:04.068 --> 00:14:07.323 which also costs us extra memory. 00:14:07.423 --> 00:14:10.763 Moreover, as is shown by some benchmarks above, 00:14:10.863 --> 00:14:13.333 there's some low-hanging fruits in nativecomp that 00:14:13.334 --> 00:14:17.343 might get us better performance with relatively lower effort. 00:14:17.443 --> 00:14:22.163 Compared to this, a JIT engine is a huge, huge undertaking. 00:14:22.263 --> 00:14:26.123 But, for Juicemacs, the JIT engine comes a lot cheaper, 00:14:26.223 --> 00:14:29.067 because, we are cheating by building on 00:14:29.068 --> 00:14:33.443 an existing compiler framework called Truffle. 00:14:33.543 --> 00:14:35.883 Truffle is a meta-compiler framework, 00:14:35.983 --> 00:14:37.633 which means that it lets you write 00:14:37.634 --> 00:14:40.103 an interpreter, add required annotations, 00:14:40.203 --> 00:14:42.500 and it will automatically turn the 00:14:42.501 --> 00:14:45.643 interpreter into a JIT runtime. 00:14:45.743 --> 00:14:49.083 So for example, here is a typical bytecode interpreter. 00:14:49.183 --> 00:14:51.233 After you add the required annotations, 00:14:51.234 --> 00:14:52.523 Truffle will know that, 00:14:52.623 --> 00:14:55.533 the bytecode here is constant, and it should 00:14:55.534 --> 00:14:59.123 unroll this loop here, to inline all those bytecode. 00:14:59.223 --> 00:15:00.467 And then, when Truffle 00:15:00.468 --> 00:15:02.243 compiles the code, it knows that: 00:15:02.343 --> 00:15:05.233 the first loop here does: x plus one, 00:15:05.234 --> 00:15:07.723 and the second does: return. 00:15:07.823 --> 00:15:09.533 And then it will compile all that into, 00:15:09.534 --> 00:15:11.363 return x plus 1, 00:15:11.463 --> 00:15:14.067 which is exactly what we would expect 00:15:14.068 --> 00:15:17.683 when compiling this pseudo code. 00:15:17.783 --> 00:15:21.083 Building on that, we can also easily implement speculation, 00:15:21.183 --> 00:15:24.867 by using this transferToInterpreterAndInvalidate function 00:15:24.868 --> 00:15:26.123 provided by Truffle. 00:15:26.223 --> 00:15:28.533 And Truffle will automatically turn that 00:15:28.534 --> 00:15:30.683 into deoptimization. 00:15:30.783 --> 00:15:32.700 Now, for example, when this add function 00:15:32.701 --> 00:15:35.723 is supplied with, two floating numbers. 00:15:35.823 --> 00:15:38.243 It will go through the slow path here, 00:15:38.343 --> 00:15:40.960 which might lead to a compiled slow path, 00:15:40.961 --> 00:15:43.203 or deoptimization. 00:15:43.303 --> 00:15:45.733 And going this deoptimization way, 00:15:45.734 --> 00:15:48.223 it can then update the runtime stats. 00:15:48.323 --> 00:15:50.400 And now, when the code is compiled again, 00:15:50.401 --> 00:15:51.603 Truffle will know, 00:15:51.703 --> 00:15:54.100 that these compilation stats, suggests that, 00:15:54.101 --> 00:15:55.563 we have floating numbers. 00:15:55.663 --> 00:15:58.733 And this floating point addition branch will 00:15:58.734 --> 00:16:02.603 then be incorporated into the fast path. 00:16:02.703 --> 00:16:06.003 To put it into Java code... 00:16:06.103 --> 00:16:08.723 Most operations are just as simple as this. 00:16:08.823 --> 00:16:11.033 And it supports fast paths for integers, 00:16:11.034 --> 00:16:13.963 floating numbers, and big integers. 00:16:14.063 --> 00:16:17.133 And the simplicity of this not only saves us work, 00:16:17.134 --> 00:16:22.243 but also enables Juicemacs to explore more things more rapidly. 00:16:22.343 --> 00:16:26.483 And actually, I have done some silly explorations. 00:16:26.583 --> 00:16:30.203 For example, I tried to constant-fold more things. 00:16:30.303 --> 00:16:32.767 Many of us have an Emacs config that stays 00:16:32.768 --> 00:16:36.683 largely unchanged, at least during one Emacs session. 00:16:36.783 --> 00:16:39.667 And that means many of the global variables 00:16:39.668 --> 00:16:42.323 in ELisp are constant. 00:16:42.423 --> 00:16:44.600 And with speculation, we can 00:16:44.601 --> 00:16:46.683 speculate about the stable ones, 00:16:46.783 --> 00:16:49.563 and try to inline them as constants. 00:16:49.663 --> 00:16:51.733 And this might improve performance, 00:16:51.734 --> 00:16:53.083 or maybe not? 00:16:53.183 --> 00:16:55.367 Because, we will need a full editor 00:16:55.368 --> 00:16:58.123 to get real world data. 00:16:58.223 --> 00:17:01.733 I also tried changing cons lists to be backed 00:17:01.734 --> 00:17:05.243 by some arrays, because, maybe arrays are faster, I guess? 00:17:05.343 --> 00:17:09.033 But in the end, setcdr requires some kind of indirection, 00:17:09.034 --> 00:17:12.883 and that actually makes the performance worse. 00:17:12.983 --> 00:17:14.733 And for regular expressions, 00:17:14.734 --> 00:17:17.923 I also tried borrowing techniques from PCRE JIT, 00:17:18.023 --> 00:17:20.667 which is quite fast in itself, but it is 00:17:20.668 --> 00:17:24.163 unfortunately unsupported by Java Truffle runtime. 00:17:24.263 --> 00:17:27.333 So, looking at these, well, 00:17:27.334 --> 00:17:30.243 explorations can fail, certainly. 00:17:30.343 --> 00:17:32.800 But, with Truffle and Java, these, 00:17:32.801 --> 00:17:34.883 for now, are not that hard to implement, 00:17:34.983 --> 00:17:37.667 and also very often, they teach us something 00:17:37.668 --> 00:17:42.363 in return, whether or not they fail. 00:17:42.463 --> 00:17:45.333 Finally, let's talk about some explorations 00:17:45.334 --> 00:17:47.883 that we might get into in the future. 00:17:47.983 --> 00:17:49.683 For the JIT engine, for example, 00:17:49.783 --> 00:17:52.633 currently I'm looking into the implementation of 00:17:52.634 --> 00:17:56.883 nativecomp to maybe reuse some of its optimizations. 00:17:56.983 --> 00:18:01.323 For the GUI, I'm very very slowly working on one. 00:18:01.423 --> 00:18:03.733 If it ever completes, I have one thing 00:18:03.734 --> 00:18:06.603 I'm really looking forward to implementing. 00:18:06.703 --> 00:18:08.900 That is, inlining widgets, or even 00:18:08.901 --> 00:18:11.763 other buffers, directly into a buffer. 00:18:11.863 --> 00:18:13.967 Well, it's because, people sometimes complain 00:18:13.968 --> 00:18:16.003 about Emacs's GUI capabilities, 00:18:16.103 --> 00:18:19.767 But I personally think that supporting inlining, 00:18:19.768 --> 00:18:23.043 like a whole buffer inside another buffer as a rectangle, 00:18:23.143 --> 00:18:26.883 could get us very far in layout abilities. 00:18:26.983 --> 00:18:28.567 And this approach should also 00:18:28.568 --> 00:18:30.843 be compatible with terminals. 00:18:30.943 --> 00:18:32.933 And I really want to see how this idea 00:18:32.934 --> 00:18:36.003 plays out with Juicemacs. 00:18:36.103 --> 00:18:38.963 And of course, there's Lisp concurrency. 00:18:39.063 --> 00:18:42.167 And currently i'm thinking of a JavaScript-like, 00:18:42.168 --> 00:18:46.283 transparent, single-thread model, using Java's virtual threads. 00:18:46.383 --> 00:18:49.967 But anyway, if you are interested in JIT compilation, 00:18:49.968 --> 00:18:51.663 Truffle, or anything above, 00:18:51.763 --> 00:18:53.867 or maybe you have your own ideas, 00:18:53.868 --> 00:18:56.283 you are very welcome to reach out! 00:18:56.383 --> 00:19:00.033 Juicemacs does need to implement many more built-in functions, 00:19:00.034 --> 00:19:03.063 and any help would be very appreciated. 00:19:03.163 --> 00:19:05.800 And I promise, it can be a very fun playground 00:19:05.801 --> 00:19:08.343 to learn about Emacs and do crazy things. 00:19:08.443 --> 00:19:10.902 Thank you!