1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
|
WEBVTT captioned by kana
00:00:01.200 --> 00:00:02.803
Hello! This is Kana!
00:00:02.903 --> 00:00:04.367
And today I'll be talking about
00:00:04.368 --> 00:00:06.067
<b>J</b>ust-<b>I</b>n-<b>T</b>ime compilation, or JIT,
00:00:06.068 --> 00:00:07.363
for Emacs Lisp,
00:00:07.463 --> 00:00:11.163
based on my work-in-progress Emacs clone, Juicemacs.
00:00:11.263 --> 00:00:13.533
Juicemacs aims to explore a few things
00:00:13.534 --> 00:00:15.843
that I've been wondering about for a while.
00:00:15.943 --> 00:00:18.567
For exmaple, what if we had better or even
00:00:18.568 --> 00:00:21.223
transparent concurrency in ELisp?
00:00:21.323 --> 00:00:23.243
Or, can we have a concurrent GUI?
00:00:23.343 --> 00:00:26.783
One that does not block, or is blocked by Lisp code?
00:00:26.883 --> 00:00:31.067
And finally what can JIT compilation do for ELisp?
00:00:31.068 --> 00:00:34.083
Will it provide better performance?
00:00:34.183 --> 00:00:37.400
However, a main problem with explorations
00:00:37.401 --> 00:00:38.623
in Emacs clones is that,
00:00:38.723 --> 00:00:40.863
Emacs is a whole universe.
00:00:40.963 --> 00:00:43.600
And that means, to make these explorations
00:00:43.601 --> 00:00:45.383
meaningful for Emacs users,
00:00:45.483 --> 00:00:47.967
we need to cover a lot of Emacs features,
00:00:47.968 --> 00:00:50.543
before we can ever begin.
00:00:50.643 --> 00:00:53.923
For example, one of the features of Emacs is that,
00:00:54.023 --> 00:00:56.003
it supports a lot of encodings.
00:00:56.103 --> 00:00:59.267
Let's look at this string: it can be encoded
00:00:59.268 --> 00:01:03.643
in both Unicode and Shift-JIS, a Japanese encoding system.
00:01:03.743 --> 00:01:07.067
But currently, Unicode does not have
00:01:07.068 --> 00:01:09.803
an official mapping for this "ki" (﨑) character.
00:01:09.903 --> 00:01:12.767
So when we map from Shift-JIS to Unicode,
00:01:12.768 --> 00:01:14.423
in most programming languages,
00:01:14.523 --> 00:01:16.533
you end up with something like this:
00:01:16.534 --> 00:01:19.143
it's a replacement character.
00:01:19.243 --> 00:01:22.067
But in Emacs, it actually extends
00:01:22.068 --> 00:01:23.883
the Unicode range by threefold,
00:01:23.983 --> 00:01:26.833
and uses the extra range to losslessly
00:01:26.834 --> 00:01:29.483
support characters like this.
00:01:29.583 --> 00:01:31.923
So if you want to support this feature,
00:01:32.023 --> 00:01:34.033
that basically rules out all string
00:01:34.034 --> 00:01:37.243
libraries with Unicode assumptions.
00:01:37.843 --> 00:01:40.067
For another, you need to support
00:01:40.068 --> 00:01:41.883
the regular expressions in Emacs,
00:01:41.983 --> 00:01:45.023
which are, really irregular.
00:01:45.123 --> 00:01:46.900
For example, it supports asserting
00:01:46.901 --> 00:01:49.403
about the user cursor position.
00:01:49.503 --> 00:01:52.033
And it also uses some character tables,
00:01:52.034 --> 00:01:53.883
that can be modified from Lisp code,
00:01:53.983 --> 00:01:56.163
to determine to case mappings.
00:01:56.263 --> 00:01:59.567
And all that makes it really hard, or even
00:01:59.568 --> 00:02:05.123
impossible to use any existing regexp libraries.
00:02:05.223 --> 00:02:07.883
Also, you need a functional garbage collector.
00:02:07.983 --> 00:02:09.867
You need threading primitives, because
00:02:09.868 --> 00:02:12.323
Emacs has already had some threading support.
00:02:12.423 --> 00:02:14.533
And you might want the performance of your clone
00:02:14.534 --> 00:02:18.963
to match Emacs, even with its native compilation enabled.
00:02:19.063 --> 00:02:21.500
Not to mention you also need a GUI for an editor.
00:02:21.501 --> 00:02:23.543
And so on.
00:02:23.643 --> 00:02:25.633
For Juicemacs, building on Java and
00:02:25.634 --> 00:02:27.563
a compiler framework called Truffle,
00:02:27.663 --> 00:02:30.503
helps in getting better performance;
00:02:30.603 --> 00:02:32.933
and by choosing a language with a good GC,
00:02:32.934 --> 00:02:38.063
we can actually focus more on the challenges above.
00:02:38.163 --> 00:02:41.433
Currently, Juicemacs has implemented three out of,
00:02:41.434 --> 00:02:43.983
at least four of the interpreters in Emacs.
00:02:44.083 --> 00:02:46.363
One for lisp code, one for bytecode,
00:02:46.463 --> 00:02:48.567
and one for regular expressions,
00:02:48.568 --> 00:02:50.903
all of them JIT-capable.
00:02:51.003 --> 00:02:53.667
Other than these, Emacs also has around
00:02:53.668 --> 00:02:56.083
two thousand built-in functions in C code.
00:02:56.183 --> 00:02:57.333
And Juicemacs has around
00:02:57.334 --> 00:02:59.763
four hundred of them implemented.
00:02:59.863 --> 00:03:03.603
It's not that many, but it is surprisingly enough
00:03:03.703 --> 00:03:05.200
to bootstrap Emacs and run
00:03:05.201 --> 00:03:08.483
the portable dumper, or pdump, in short.
00:03:08.583 --> 00:03:11.243
Let's have a try.
00:03:11.343 --> 00:03:11.703
00:03:11.803 --> 00:03:14.923
So this is the binary produced by Java native image.
00:03:15.023 --> 00:03:17.167
And it's loading all the files
00:03:17.168 --> 00:03:18.763
needed for bootstrapping.
00:03:18.863 --> 00:03:22.233
Then it dumps the memory to a file to
00:03:22.234 --> 00:03:24.923
be loaded later, giving us fast startup.
00:03:25.023 --> 00:03:28.723
As we can see here, it throws some frame errors
00:03:28.823 --> 00:03:31.400
because Juicemacs doesn't have an editor UI
00:03:31.401 --> 00:03:33.283
or functional frames yet.
00:03:33.383 --> 00:03:35.367
But otherwise, it can already run
00:03:35.368 --> 00:03:36.643
quite some lisp code.
00:03:36.743 --> 00:03:40.400
For example, this code uses the benchmark library
00:03:40.401 --> 00:03:44.403
to measure the performance of this Fibonacci function.
00:03:44.503 --> 00:03:47.067
And we can see here, the JIT engine is
00:03:47.068 --> 00:03:51.163
already kicking in and makes the execution faster.
00:03:51.263 --> 00:03:53.483
In addition to that, with a bit of workaround,
00:03:53.583 --> 00:03:56.467
Juicemacs can also run some of the ERT,
00:03:56.468 --> 00:04:01.043
or, <b>E</b>macs <b>R</b>egression <b>T</b>est suite, that comes with Emacs.
00:04:01.143 --> 00:04:05.823
So... Yes, there are a bunch of test failures,
00:04:05.923 --> 00:04:07.933
which means we are not that compatible
00:04:07.934 --> 00:04:09.523
with Emacs and need more work.
00:04:09.623 --> 00:04:12.803
But the whole testing procedure runs fine,
00:04:12.903 --> 00:04:14.767
and it has proper stack traces,
00:04:14.768 --> 00:04:17.803
which is quite useful for debugging Juicemacs.
00:04:17.903 --> 00:04:21.033
So with that, a rather functional JIT runtime,
00:04:21.034 --> 00:04:25.983
let's now try look into today's topic, JIT compilation for ELisp.
00:04:26.083 --> 00:04:28.533
So, you probably know that Emacs has supported
00:04:28.534 --> 00:04:32.083
native-compilation, or nativecomp in short, for some time now.
00:04:32.183 --> 00:04:35.033
It mainly uses GCC to compile Lisp code
00:04:35.034 --> 00:04:37.363
into native code, ahead of time.
00:04:37.463 --> 00:04:41.433
And during runtime, Emacs loads those compiled files,
00:04:41.434 --> 00:04:44.523
and gets the performance of native code.
00:04:44.623 --> 00:04:47.643
However, for example, for installed packages,
00:04:47.743 --> 00:04:49.059
we might want to compile them when we
00:04:49.060 --> 00:04:51.823
actually use them instead of ahead of time.
00:04:51.923 --> 00:04:53.733
And Emacs supports this through
00:04:53.734 --> 00:04:55.683
this <i>native-comp-jit-compilation</i> flag.
00:04:55.783 --> 00:04:59.767
What it does is, during runtime, Emacs sends
00:04:59.768 --> 00:05:03.203
loaded files to external Emacs worker processes,
00:05:03.303 --> 00:05:06.903
which will then compile those files asynchronously.
00:05:07.003 --> 00:05:09.043
And when the compilation is done,
00:05:09.143 --> 00:05:11.967
the current Emacs session will load the compiled code back
00:05:11.968 --> 00:05:16.323
and improves its performance, on the fly.
00:05:16.423 --> 00:05:18.643
When you look at this procedure, however, it is,
00:05:18.743 --> 00:05:21.563
ahead-of-time compilation, done at runtime.
00:05:21.663 --> 00:05:25.123
And it is what current Emacs calls JIT compilation.
00:05:25.223 --> 00:05:27.867
But if you look at some other JIT engines,
00:05:27.868 --> 00:05:31.803
you'll see much more complex architectures.
00:05:31.903 --> 00:05:34.233
So, take luaJIT for an example,
00:05:34.234 --> 00:05:36.163
in addition to this red line here,
00:05:36.263 --> 00:05:38.767
which leads us from an interpreted state
00:05:38.768 --> 00:05:40.643
to a compiled native state,
00:05:40.743 --> 00:05:42.163
which is also what Emacs does,
00:05:42.263 --> 00:05:44.333
LuaJIT also supports going from
00:05:44.334 --> 00:05:47.523
a compiled state back to its interpreter.
00:05:47.623 --> 00:05:51.483
And this process is called "deoptimization".
00:05:51.583 --> 00:05:55.300
In contrast to its name, deoptimization here actually
00:05:55.301 --> 00:05:58.563
enables a huge category of JIT optimizations.
00:05:58.663 --> 00:06:00.163
They are called speculation.
00:06:01.463 --> 00:06:04.600
Basically, with speculation, the compiler
00:06:04.601 --> 00:06:07.683
can use runtime statistics to speculate,
00:06:07.783 --> 00:06:11.443
to make bolder assumptions in the compiled code.
00:06:11.543 --> 00:06:13.983
And when the assumptions are invalidated,
00:06:14.083 --> 00:06:18.323
the runtime deoptimizes the code, updates statistics,
00:06:18.423 --> 00:06:21.133
and then recompile the code based on new assumptions,
00:06:21.134 --> 00:06:24.443
and that will make the code more performant.
00:06:24.543 --> 00:06:26.763
Let's look at an example.
00:06:28.463 --> 00:06:30.967
So, here is a really simple function,
00:06:30.968 --> 00:06:33.083
that adds one to the input number.
00:06:33.183 --> 00:06:36.167
But in Emacs, it is not that simple,
00:06:36.168 --> 00:06:38.203
because Emacs has three categories of numbers,
00:06:38.303 --> 00:06:42.700
that is, fix numbers, or machine-word-sized integers,
00:06:42.701 --> 00:06:45.603
floating numbers, and big integers.
00:06:45.703 --> 00:06:47.600
And when we compile this, we need
00:06:47.601 --> 00:06:49.363
to handle all three cases.
00:06:49.463 --> 00:06:52.600
And if we analyze the code produced by Emacs,
00:06:52.601 --> 00:06:54.683
as is shown by this gray graph here,
00:06:54.783 --> 00:06:58.083
we can see that it has, two paths:
00:06:58.183 --> 00:07:01.403
One fast path, that does fast fix number addition;
00:07:01.503 --> 00:07:03.967
and one for slow paths, that calls out
00:07:03.968 --> 00:07:06.523
to an external plus-one function,
00:07:06.623 --> 00:07:09.683
to handle floating number and big integers.
00:07:09.783 --> 00:07:13.167
Now, if we pass integers into this function,
00:07:13.168 --> 00:07:16.283
it's pretty fast because it's on the fast path.
00:07:16.383 --> 00:07:19.767
However, if we pass in a floating number,
00:07:19.768 --> 00:07:21.843
then it has to go through the slow path,
00:07:21.943 --> 00:07:25.563
doing an extra function call, which is slow.
00:07:25.663 --> 00:07:28.733
What speculation might help here is that,
00:07:28.734 --> 00:07:31.443
it can have flexible fast paths.
00:07:31.543 --> 00:07:34.563
When we pass a floating number into this function,
00:07:34.663 --> 00:07:37.400
which currently has only fixnumbers on the fast path,
00:07:37.401 --> 00:07:40.723
it also has to go through the slow path.
00:07:40.823 --> 00:07:44.567
But the difference is that, a speculative runtime can
00:07:44.568 --> 00:07:47.763
deoptimize and recompile the code to adapt to this.
00:07:47.863 --> 00:07:50.367
And when it recompiles, it might add
00:07:50.368 --> 00:07:52.643
floating number onto the fast path,
00:07:52.743 --> 00:07:55.003
and now floating number operations are also fast.
00:07:55.103 --> 00:07:58.567
And this kind of speculation is why
00:07:58.568 --> 00:08:03.603
speculative runtime can be really fast.
00:08:03.703 --> 00:08:05.723
Let's take a look at some benchmarks.
00:08:05.823 --> 00:08:09.423
They're obtained with the <i>elisp-benchmarks</i> library on ELPA.
00:08:09.523 --> 00:08:12.600
The blue line here is for nativecomp,
00:08:12.601 --> 00:08:16.043
and these blue areas mean that nativecomp is slower.
00:08:16.143 --> 00:08:19.133
And, likewise, green areas mean that
00:08:19.134 --> 00:08:20.523
Juicemacs is slower.
00:08:20.623 --> 00:08:22.867
At a glance, the two (or four)
00:08:22.868 --> 00:08:25.143
actually seems somehow on par, to me.
00:08:25.243 --> 00:08:30.383
But, let's take a closer look at some of them.
00:08:30.483 --> 00:08:32.667
So, the first few benchmarks are the classic,
00:08:32.668 --> 00:08:33.983
Fibonacci benchmarks.
00:08:34.083 --> 00:08:36.933
We know that, the series is formed by
00:08:36.934 --> 00:08:39.203
adding the previous two numbers in the series.
00:08:39.303 --> 00:08:41.700
And looking at this expression here,
00:08:41.701 --> 00:08:44.043
Fibonacci benchmarks are quite intensive
00:08:44.143 --> 00:08:46.800
in number additions, subtractions,
00:08:46.801 --> 00:08:49.103
and function calls, if you use recursions.
00:08:49.203 --> 00:08:51.000
And it is exactly why
00:08:51.001 --> 00:08:54.323
Fibonacci series is a good benchmark.
00:08:54.423 --> 00:08:57.243
And looking at the results here... wow.
00:08:57.343 --> 00:08:59.843
Emacs nativecomp executes instantaneously.
00:08:59.943 --> 00:09:04.523
It's a total defeat for Juicemacs, seemingly.
00:09:04.623 --> 00:09:08.043
Now, if you're into benchmarks, you know something is wrong here:
00:09:08.143 --> 00:09:11.683
we are comparing the different things.
00:09:11.783 --> 00:09:14.200
So let's look under the hood
00:09:14.201 --> 00:09:15.483
and disassemble the function
00:09:15.583 --> 00:09:17.567
with this convenient Emacs command
00:09:17.568 --> 00:09:19.063
called <i>disassemble</i>...
00:09:19.163 --> 00:09:23.043
And these two lines of code is what we got.
00:09:23.143 --> 00:09:24.700
So, we already can see
00:09:24.701 --> 00:09:26.123
what's going on here:
00:09:26.223 --> 00:09:29.963
GCC sees Fibonacci is a pure function,
00:09:30.063 --> 00:09:31.867
because it returns the same value
00:09:31.868 --> 00:09:33.243
for the same arguments,
00:09:33.343 --> 00:09:35.700
so GCC chooses to do the computation
00:09:35.701 --> 00:09:36.723
at compile time
00:09:36.823 --> 00:09:39.133
and inserts the final number directly
00:09:39.134 --> 00:09:40.323
into the compiled code.
00:09:41.823 --> 00:09:43.603
It is actually great!
00:09:43.703 --> 00:09:45.400
Because it shows that nativecomp
00:09:45.401 --> 00:09:47.283
knows about pure functions,
00:09:47.383 --> 00:09:48.700
and can do all kinds of things
00:09:48.701 --> 00:09:51.203
like removing or constant-folding them.
00:09:51.303 --> 00:09:54.403
And Juicemacs just does not do that.
00:09:54.503 --> 00:09:57.367
However, we are also concerned about
00:09:57.368 --> 00:09:59.003
the things we mentioned earlier:
00:09:59.103 --> 00:10:00.900
the performance of number additions,
00:10:00.901 --> 00:10:02.983
or function calls.
00:10:03.083 --> 00:10:05.633
So, in order to let the benchmarks
00:10:05.634 --> 00:10:06.863
show some extra things,
00:10:06.963 --> 00:10:08.367
we need to modify it a bit...
00:10:08.368 --> 00:10:11.323
by simply making things non-constant.
00:10:11.423 --> 00:10:15.203
With that, Emacs gets much slower now.
00:10:15.303 --> 00:10:17.133
And again, let's look what's
00:10:17.134 --> 00:10:21.083
happening behind these numbers.
00:10:21.183 --> 00:10:23.500
Similarly, with the <i>disassemble</i> command,
00:10:23.501 --> 00:10:25.643
we can look into the assembly.
00:10:25.743 --> 00:10:28.019
And again, we can already see
00:10:28.020 --> 00:10:29.303
what's happening here.
00:10:29.403 --> 00:10:32.083
So, Juicemacs, due to its speculation nature,
00:10:32.183 --> 00:10:35.443
supports fast paths for all three kind of numbers.
00:10:35.543 --> 00:10:39.233
However, currently, Emacs nativecomp
00:10:39.234 --> 00:10:41.243
does not have any fast path
00:10:41.343 --> 00:10:43.433
for the operations here like additions,
00:10:43.434 --> 00:10:45.803
or subtractions, or comparisons,
00:10:45.903 --> 00:10:48.067
which is exactly what
00:10:48.068 --> 00:10:50.963
Fibonacci benchmarks are measuring.
00:10:51.063 --> 00:10:53.800
Emacs, at this time, has to call some generic,
00:10:53.801 --> 00:10:57.963
external functions for them, and this is slow.
00:11:00.063 --> 00:11:03.203
But is nativecomp really that slow?
00:11:03.303 --> 00:11:04.967
So, I also ran the same benchmark
00:11:04.968 --> 00:11:07.083
in Common Lisp, with SBCL.
00:11:07.183 --> 00:11:09.000
And nativecomp is already fast,
00:11:09.001 --> 00:11:11.003
compared to untyped SBCL.
00:11:11.103 --> 00:11:15.500
It's because SBCL also emits call instructions
00:11:15.501 --> 00:11:18.483
when it comes to no type info.
00:11:18.583 --> 00:11:21.700
However, once we declare the types,
00:11:21.701 --> 00:11:25.283
SBCL is able to compile a fast path for fix numbers,
00:11:25.383 --> 00:11:27.467
which makes its performance on par
00:11:27.468 --> 00:11:30.683
with speculative JIT engines (that is, Juicemacs),
00:11:30.783 --> 00:11:34.763
because, now both of us are now on fast paths.
00:11:36.063 --> 00:11:38.400
Additionally, if we are bold enough
00:11:38.401 --> 00:11:41.203
to pass this safety zero flag to SBCL,
00:11:41.303 --> 00:11:43.700
it will remove all the slow paths
00:11:43.701 --> 00:11:44.963
and type checks,
00:11:45.063 --> 00:11:46.367
and its performance is close
00:11:46.368 --> 00:11:48.643
to what you get with C.
00:11:48.743 --> 00:11:51.299
Well, probably we don't want safety zero
00:11:51.300 --> 00:11:52.063
most of the time.
00:11:52.163 --> 00:11:55.133
But even then, if nativecomp were to
00:11:55.134 --> 00:11:57.763
get fast paths for more constructs,
00:11:57.863 --> 00:11:59.867
there certainly is quite
00:11:59.868 --> 00:12:03.563
some room for performance improvement.
00:12:04.063 --> 00:12:06.803
Let's look at some more benchmarks.
00:12:06.903 --> 00:12:08.933
For example, for this inclist,
00:12:08.934 --> 00:12:10.923
or increment-list, benchmark,
00:12:11.023 --> 00:12:14.333
Juicemacs is really slow here. Partly,
00:12:14.334 --> 00:12:17.603
it comes from the cost of Java boxing integers.
00:12:17.703 --> 00:12:20.300
On the other hand, for Emacs nativecomp,
00:12:20.301 --> 00:12:22.043
for this particular benchmark,
00:12:22.143 --> 00:12:23.667
it actually has fast paths
00:12:23.668 --> 00:12:25.523
for all of the operations.
00:12:25.623 --> 00:12:27.723
And that's why it can be so fast,
00:12:27.823 --> 00:12:30.667
and that also proves the nativecomp
00:12:30.668 --> 00:12:33.843
has a lot potential for improvement.
00:12:33.943 --> 00:12:35.833
There is another benchmark here
00:12:35.834 --> 00:12:37.963
that use advices.
00:12:38.063 --> 00:12:40.500
So Emacs Lisp supports using
00:12:40.501 --> 00:12:42.203
advices to override functions
00:12:42.303 --> 00:12:44.833
by wrapping the original function, and an advice
00:12:44.834 --> 00:12:47.443
function, two of them, inside a glue function.
00:12:47.543 --> 00:12:51.467
And in this benchmark, we advice the Fibonacci function
00:12:51.468 --> 00:12:54.523
to cache the first ten entries to speed up computation,
00:12:54.623 --> 00:13:00.003
as can be seen in the speed-up in the Juicemacs results.
00:13:00.103 --> 00:13:02.900
However, it seems that nativecomp does not yet
00:13:02.901 --> 00:13:08.523
compile glue functions, and that makes advices slower.
00:13:08.623 --> 00:13:12.043
With these benchmarks, let's discuss this big question:
00:13:12.143 --> 00:13:16.563
Should GNU Emacs adopt speculative JIT compilation?
00:13:16.663 --> 00:13:18.967
Well, the hidden question is actually,
00:13:18.968 --> 00:13:21.223
is it worth it?
00:13:21.323 --> 00:13:24.163
And, my personal answer is, maybe not.
00:13:24.263 --> 00:13:28.133
The first reason is that, slow paths, like, floating numbers,
00:13:28.134 --> 00:13:31.043
are actually not that frequent in Emacs.
00:13:31.143 --> 00:13:34.100
And optimizing for fast paths like fix numbers
00:13:34.101 --> 00:13:37.983
can already get us very good performance already.
00:13:38.083 --> 00:13:40.333
And the second or main reason is that,
00:13:40.334 --> 00:13:43.163
speculative JIT is very hard.
00:13:43.263 --> 00:13:46.843
LuaJIT, for example, took a genius to build.
00:13:46.943 --> 00:13:50.967
Even with the help of GCC, we need to hand-write
00:13:50.968 --> 00:13:54.283
all those fast path or slow path or switching logic.
00:13:54.383 --> 00:13:58.133
We need to find a way to deoptimize, which requires
00:13:58.134 --> 00:14:01.803
mapping machine registers back to interpreter stack.
00:14:01.903 --> 00:14:04.067
And also, speculation needs runtime info,
00:14:04.068 --> 00:14:07.323
which also costs us extra memory.
00:14:07.423 --> 00:14:10.763
Moreover, as is shown by some benchmarks above,
00:14:10.863 --> 00:14:13.333
there's some low-hanging fruits in nativecomp that
00:14:13.334 --> 00:14:17.343
might get us better performance with relatively lower effort.
00:14:17.443 --> 00:14:22.163
Compared to this, a JIT engine is a huge, huge undertaking.
00:14:22.263 --> 00:14:26.123
But, for Juicemacs, the JIT engine comes a lot cheaper,
00:14:26.223 --> 00:14:29.067
because, we are cheating by building on
00:14:29.068 --> 00:14:33.443
an existing compiler framework called Truffle.
00:14:33.543 --> 00:14:35.883
Truffle is a meta-compiler framework,
00:14:35.983 --> 00:14:37.633
which means that it lets you write
00:14:37.634 --> 00:14:40.103
an interpreter, add required annotations,
00:14:40.203 --> 00:14:42.500
and it will automatically turn the
00:14:42.501 --> 00:14:45.643
interpreter into a JIT runtime.
00:14:45.743 --> 00:14:49.083
So for example, here is a typical bytecode interpreter.
00:14:49.183 --> 00:14:51.233
After you add the required annotations,
00:14:51.234 --> 00:14:52.523
Truffle will know that,
00:14:52.623 --> 00:14:55.533
the bytecode here is constant, and it should
00:14:55.534 --> 00:14:59.123
unroll this loop here, to inline all those bytecode.
00:14:59.223 --> 00:15:00.467
And then, when Truffle
00:15:00.468 --> 00:15:02.243
compiles the code, it knows that:
00:15:02.343 --> 00:15:05.233
the first loop here does: x plus one,
00:15:05.234 --> 00:15:07.723
and the second does: return.
00:15:07.823 --> 00:15:09.533
And then it will compile all that into,
00:15:09.534 --> 00:15:11.363
return x plus 1,
00:15:11.463 --> 00:15:14.067
which is exactly what we would expect
00:15:14.068 --> 00:15:17.683
when compiling this pseudo code.
00:15:17.783 --> 00:15:21.083
Building on that, we can also easily implement speculation,
00:15:21.183 --> 00:15:24.867
by using this <i>transferToInterpreterAndInvalidate</i> function
00:15:24.868 --> 00:15:26.123
provided by Truffle.
00:15:26.223 --> 00:15:28.533
And Truffle will automatically turn that
00:15:28.534 --> 00:15:30.683
into deoptimization.
00:15:30.783 --> 00:15:32.700
Now, for example, when this add function
00:15:32.701 --> 00:15:35.723
is supplied with, two floating numbers.
00:15:35.823 --> 00:15:38.243
It will go through the slow path here,
00:15:38.343 --> 00:15:40.960
which might lead to a compiled slow path,
00:15:40.961 --> 00:15:43.203
or deoptimization.
00:15:43.303 --> 00:15:45.733
And going this deoptimization way,
00:15:45.734 --> 00:15:48.223
it can then update the runtime stats.
00:15:48.323 --> 00:15:50.400
And now, when the code is compiled again,
00:15:50.401 --> 00:15:51.603
Truffle will know,
00:15:51.703 --> 00:15:54.100
that these compilation stats, suggests that,
00:15:54.101 --> 00:15:55.563
we have floating numbers.
00:15:55.663 --> 00:15:58.733
And this floating point addition branch will
00:15:58.734 --> 00:16:02.603
then be incorporated into the fast path.
00:16:02.703 --> 00:16:06.003
To put it into Java code...
00:16:06.103 --> 00:16:08.723
Most operations are just as simple as this.
00:16:08.823 --> 00:16:11.033
And it supports fast paths for integers,
00:16:11.034 --> 00:16:13.963
floating numbers, and big integers.
00:16:14.063 --> 00:16:17.133
And the simplicity of this not only saves us work,
00:16:17.134 --> 00:16:22.243
but also enables Juicemacs to explore more things more rapidly.
00:16:22.343 --> 00:16:26.483
And actually, I have done some silly explorations.
00:16:26.583 --> 00:16:30.203
For example, I tried to constant-fold more things.
00:16:30.303 --> 00:16:32.767
Many of us have an Emacs config that stays
00:16:32.768 --> 00:16:36.683
largely unchanged, at least during one Emacs session.
00:16:36.783 --> 00:16:39.667
And that means many of the global variables
00:16:39.668 --> 00:16:42.323
in ELisp are constant.
00:16:42.423 --> 00:16:44.600
And with speculation, we can
00:16:44.601 --> 00:16:46.683
speculate about the stable ones,
00:16:46.783 --> 00:16:49.563
and try to inline them as constants.
00:16:49.663 --> 00:16:51.733
And this might improve performance,
00:16:51.734 --> 00:16:53.083
or maybe not?
00:16:53.183 --> 00:16:55.367
Because, we will need a full editor
00:16:55.368 --> 00:16:58.123
to get real world data.
00:16:58.223 --> 00:17:01.733
I also tried changing cons lists to be backed
00:17:01.734 --> 00:17:05.243
by some arrays, because, maybe arrays are faster, I guess?
00:17:05.343 --> 00:17:09.033
But in the end, <i>setcdr</i> requires some kind of indirection,
00:17:09.034 --> 00:17:12.883
and that actually makes the performance worse.
00:17:12.983 --> 00:17:14.733
And for regular expressions,
00:17:14.734 --> 00:17:17.923
I also tried borrowing techniques from PCRE JIT,
00:17:18.023 --> 00:17:20.667
which is quite fast in itself, but it is
00:17:20.668 --> 00:17:24.163
unfortunately unsupported by Java Truffle runtime.
00:17:24.263 --> 00:17:27.333
So, looking at these, well,
00:17:27.334 --> 00:17:30.243
explorations can fail, certainly.
00:17:30.343 --> 00:17:32.800
But, with Truffle and Java, these,
00:17:32.801 --> 00:17:34.883
for now, are not that hard to implement,
00:17:34.983 --> 00:17:37.667
and also very often, they teach us something
00:17:37.668 --> 00:17:42.363
in return, whether or not they fail.
00:17:42.463 --> 00:17:45.333
Finally, let's talk about some explorations
00:17:45.334 --> 00:17:47.883
that we might get into in the future.
00:17:47.983 --> 00:17:49.683
For the JIT engine, for example,
00:17:49.783 --> 00:17:52.633
currently I'm looking into the implementation of
00:17:52.634 --> 00:17:56.883
nativecomp to maybe reuse some of its optimizations.
00:17:56.983 --> 00:18:01.323
For the GUI, I'm very very slowly working on one.
00:18:01.423 --> 00:18:03.733
If it ever completes, I have one thing
00:18:03.734 --> 00:18:06.603
I'm really looking forward to implementing.
00:18:06.703 --> 00:18:08.900
That is, inlining widgets, or even
00:18:08.901 --> 00:18:11.763
other buffers, directly into a buffer.
00:18:11.863 --> 00:18:13.967
Well, it's because, people sometimes complain
00:18:13.968 --> 00:18:16.003
about Emacs's GUI capabilities,
00:18:16.103 --> 00:18:19.767
But I personally think that supporting inlining,
00:18:19.768 --> 00:18:23.043
like a whole buffer inside another buffer as a rectangle,
00:18:23.143 --> 00:18:26.883
could get us very far in layout abilities.
00:18:26.983 --> 00:18:28.567
And this approach should also
00:18:28.568 --> 00:18:30.843
be compatible with terminals.
00:18:30.943 --> 00:18:32.933
And I really want to see how this idea
00:18:32.934 --> 00:18:36.003
plays out with Juicemacs.
00:18:36.103 --> 00:18:38.963
And of course, there's Lisp concurrency.
00:18:39.063 --> 00:18:42.167
And currently i'm thinking of a JavaScript-like,
00:18:42.168 --> 00:18:46.283
transparent, single-thread model, using Java's virtual threads.
00:18:46.383 --> 00:18:49.967
But anyway, if you are interested in JIT compilation,
00:18:49.968 --> 00:18:51.663
Truffle, or anything above,
00:18:51.763 --> 00:18:53.867
or maybe you have your own ideas,
00:18:53.868 --> 00:18:56.283
you are very welcome to reach out!
00:18:56.383 --> 00:19:00.033
Juicemacs does need to implement many more built-in functions,
00:19:00.034 --> 00:19:03.063
and any help would be very appreciated.
00:19:03.163 --> 00:19:05.800
And I promise, it can be a very fun playground
00:19:05.801 --> 00:19:08.343
to learn about Emacs and do crazy things.
00:19:08.443 --> 00:19:10.902
Thank you!
|