1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
|
WEBVTT captioned by sachac, checked by bhavin
NOTE Rune
00:00:00.000 --> 00:00:05.119
Hello, EmacsConf. My name is Troy Hinckley, and this is my
00:00:05.120 --> 00:00:08.759
talk on Rune, a Rust implementation in Emacs. We strive to be
00:00:08.760 --> 00:00:11.839
bug compatible with Emacs, so you can use the same Elisp.
00:00:11.840 --> 00:00:14.879
It's still a fairly early stage experimental project, and
00:00:14.880 --> 00:00:17.081
we have some basic things implemented.
NOTE The Emacs core
00:00:17.082 --> 00:00:19.946
Before I get started, I want to talk a bit more
00:00:19.947 --> 00:00:21.847
about what the core is.
00:00:21.848 --> 00:00:24.559
So the Emacs core, it includes the runtime, the interpreter,
00:00:24.560 --> 00:00:26.439
garbage collector, everything used to run the code.
00:00:26.440 --> 00:00:29.799
It includes the GUI. It includes all the data structures.
00:00:29.800 --> 00:00:31.919
If you look underneath all the Elisp data structures,
00:00:31.920 --> 00:00:33.599
there's C code underneath there,
00:00:33.600 --> 00:00:35.559
as well as the auxiliary functions
00:00:35.560 --> 00:00:39.239
of which there's about 1500. In making this talk, I don't
00:00:39.240 --> 00:00:40.919
want to give the impression that I'm saying the core is
00:00:40.920 --> 00:00:42.879
outdated or that needs to be replaced or that it can't be
00:00:42.880 --> 00:00:45.519
evolved on its own, because clearly it has continued to
00:00:45.520 --> 00:00:48.319
evolve. If we look in just the last few years, we can see that
00:00:48.320 --> 00:00:50.439
we've added native compilation, we've added tree-sitter
00:00:50.440 --> 00:00:52.759
support, we've added color emoji, and there's work right
00:00:52.760 --> 00:00:57.167
now to add a new garbage collector to Emacs as well.
NOTE Why create this?
00:00:57.168 --> 00:01:01.071
Why create this project? Emacs has a long history.
00:01:01.072 --> 00:01:04.535
It has a lot of users. It needs to support a big community.
00:01:04.536 --> 00:01:06.837
Because of that, it has to be very conservative
00:01:06.838 --> 00:01:10.321
about what things it can allow into the project.
00:01:10.322 --> 00:01:11.639
Forks like this create an
00:01:11.640 --> 00:01:15.586
opportunity to experiment and try new approaches.
00:01:15.587 --> 00:01:18.799
This is particularly a good use case for Rust because the C core,
00:01:18.800 --> 00:01:20.849
it's pretty well tested. It's been around for a long time.
00:01:20.850 --> 00:01:22.959
A lot of the bugs have been ironed out, but when you're doing a
00:01:22.960 --> 00:01:26.439
new greenfield project, it's very easy to introduce new
00:01:26.440 --> 00:01:28.774
undefined behavior and memory unsafety
00:01:28.775 --> 00:01:32.376
and stuff like that. Rust protects us from most of that,
00:01:32.377 --> 00:01:34.937
but it also gives us the ability to be fast
00:01:34.938 --> 00:01:37.883
and has a strong ecosystem behind it.
00:01:37.884 --> 00:01:40.399
Rust is also really good at multi-threading.
00:01:40.400 --> 00:01:43.399
Their phrase in the community is fearless concurrency.
00:01:43.400 --> 00:01:45.559
They should be able to write concurrent programs without
00:01:45.560 --> 00:01:49.319
having to worry about data races. It's also really high
00:01:49.320 --> 00:01:51.839
performance. It has a really good regex engine. It's known
00:01:51.840 --> 00:01:55.864
for its non-copy I/O as well.
NOTE How does this compare to other projects?
00:01:55.865 --> 00:01:57.479
How does this compare to other
00:01:57.480 --> 00:01:59.919
Rust and Emacs projects, whether there's been a couple? The
00:01:59.920 --> 00:02:02.799
first is Remacs. This project was the first. It took an
00:02:02.800 --> 00:02:05.519
outside-in approach. Basically you could take a C
00:02:05.520 --> 00:02:09.319
function and replace it with a Rust function and build it
00:02:09.320 --> 00:02:11.799
together as one executable. This is pretty easy to do
00:02:11.800 --> 00:02:14.639
because they could both talk over the C ABI. You could
00:02:14.640 --> 00:02:16.479
swap out functions once at a time. They made really good
00:02:16.480 --> 00:02:20.279
progress at first, but eventually they ran into the problem
00:02:20.280 --> 00:02:23.079
that as you get down to the really core parts of it, you can't
00:02:23.080 --> 00:02:25.919
just replace one function at a time anymore, because some of
00:02:25.920 --> 00:02:28.159
that functionality is connected to other things. Like for
00:02:28.160 --> 00:02:30.359
example, you can't replace the garbage collector without
00:02:30.360 --> 00:02:32.759
replacing the entire garbage collection system. So the
00:02:32.760 --> 00:02:36.279
progress really kind of slowed down. Another issue with it
00:02:36.280 --> 00:02:38.839
was, is that they were doing a one-to-one rewrite, so they
00:02:38.840 --> 00:02:41.079
weren't adding any new features or functionality, just
00:02:41.080 --> 00:02:43.879
taking the same code and replacing it in Rust, which doesn't
00:02:43.880 --> 00:02:46.801
add any advantages in and of itself.
00:02:46.802 --> 00:02:50.399
This spawned Emacs-NG, which was kind of the spiritual successor to
00:02:50.400 --> 00:02:52.746
Remacs, where they decided to add new functionality,
00:02:52.747 --> 00:02:55.808
the biggest one being a JavaScript runtime,
00:02:55.809 --> 00:02:58.230
as well as some new renderers to Emacs.
00:02:58.231 --> 00:03:01.314
This is no longer actively developed though.
NOTE Multi-threading
00:03:01.315 --> 00:03:04.079
In this project, one of the big focuses we have is
00:03:04.080 --> 00:03:07.559
on multi-threading. The C core itself is, everything is
00:03:07.560 --> 00:03:09.959
designed around being single-threaded, all the data
00:03:09.960 --> 00:03:13.039
structures and everything like that. Rust has a great
00:03:13.040 --> 00:03:15.719
concurrency story. In Rust, everything is intended to be
00:03:15.720 --> 00:03:18.199
multi-threaded. That doesn't mean that everything has to
00:03:18.200 --> 00:03:20.719
run on multiple threads, but you can't write something that
00:03:20.720 --> 00:03:22.719
is limited to only running in a single-threaded
00:03:22.720 --> 00:03:25.799
environment. So this makes it really easy to use all the
00:03:25.800 --> 00:03:28.039
existing packages and build something that is
00:03:28.040 --> 00:03:30.480
concurrency safe. which is what we've done here,
00:03:30.481 --> 00:03:32.440
and that was relatively easy to do.
NOTE Multi-threading elisp
00:03:32.441 --> 00:03:34.781
But adding it to Elisp is the hard part,
00:03:34.782 --> 00:03:36.502
because we've got to come up with a good model
00:03:36.503 --> 00:03:39.624
for Lisp, and Elisp is just a giant ball
00:03:39.625 --> 00:03:41.479
of mutable state. We need to find some
00:03:41.480 --> 00:03:44.566
way to tame that so we can make workable concurrency
00:03:44.567 --> 00:03:47.647
out of it. There's really two ways you can do this.
NOTE No-GIL method
00:03:47.648 --> 00:03:49.268
One is what I call the no-GIL method.
00:03:49.269 --> 00:03:51.399
This is what Python is doing, where
00:03:51.400 --> 00:03:53.919
you take all of your data structures, you make them
00:03:53.920 --> 00:03:56.439
concurrency safe, and then you just leave it up to the
00:03:56.440 --> 00:03:58.119
programmer to decide what they're going to do with it.
00:03:58.120 --> 00:04:00.467
They've got to build safe abstractions on top of that.
00:04:00.468 --> 00:04:03.086
One of the big downsides with this is that
00:04:03.087 --> 00:04:05.247
it comes with a pretty high cost.
00:04:05.248 --> 00:04:07.799
The last benchmarks I've seen is that by making
00:04:07.800 --> 00:04:10.879
everything concurrency safe in Python, single-threaded
00:04:10.880 --> 00:04:15.799
code is about 20% slower in some benchmarks.
00:04:15.800 --> 00:04:19.079
Since most code is single-threaded, this has a big
00:04:19.080 --> 00:04:21.039
performance impact for most code that isn't taking
00:04:21.040 --> 00:04:23.719
advantage of the multi-threading. The other thing is this
00:04:23.720 --> 00:04:26.279
introduces a lot of nasty concurrency bugs because you can
00:04:26.280 --> 00:04:29.039
have anything mutating any part of the data from any thread,
00:04:29.040 --> 00:04:32.637
even if you can't have memory unsafety per se.
NOTE Actors
00:04:32.638 --> 00:04:34.738
The other option is actors,
00:04:34.739 --> 00:04:36.639
which are a really known way to approach this,
00:04:36.640 --> 00:04:39.079
where you trade some of that flexibility that you get
00:04:39.080 --> 00:04:43.719
with fully concurrent for more control and. Code and
00:04:43.720 --> 00:04:45.839
functions are shared between all the different threads,
00:04:45.840 --> 00:04:50.599
but data has to be passed along channels between different
00:04:50.600 --> 00:04:51.251
actors.
NOTE Multi-threading elisp (functions)
00:04:51.252 --> 00:04:52.919
We want the functions to be shared, and this
00:04:52.920 --> 00:04:55.159
should be pretty easy because we don't mutate functions
00:04:55.160 --> 00:05:00.119
like we do data, except when we do. In Lisp, functions are
00:05:00.120 --> 00:05:03.239
just lists like anything else. So you can mutate them
00:05:03.240 --> 00:05:06.279
just like lists. Even if you're not talking about
00:05:06.280 --> 00:05:09.159
interpreted code, like if you have a native compiled
00:05:09.160 --> 00:05:11.959
function, you can still mutate the constants inside the
00:05:11.960 --> 00:05:14.839
function. For example, here we have a function returns a
00:05:14.840 --> 00:05:17.679
string. We take that string out, we mutate that string, and
00:05:17.680 --> 00:05:23.079
now the function returns a different string. In Rune, we
00:05:23.080 --> 00:05:24.999
enforce that all functions, their constants are
00:05:25.000 --> 00:05:27.199
immutable. You can't mutate the insides of a function. You
00:05:27.200 --> 00:05:29.239
can still swap out functions and redefine them, but you
00:05:29.240 --> 00:05:32.239
can't mutate the inside of a function. This enables them
00:05:32.240 --> 00:05:34.679
to be safely shared across threads.
NOTE Caveats
00:05:34.680 --> 00:05:36.400
However, there are some caveats to this.
00:05:36.401 --> 00:05:38.159
For example, some functions actually do
00:05:38.160 --> 00:05:41.439
need to mutate their own data. The example that we run into is
00:05:41.440 --> 00:05:44.839
cl-generic. It uses a method cache. So it has to be able to
00:05:44.840 --> 00:05:47.639
update that cache. In this case, we just made a special
00:05:47.640 --> 00:05:50.799
case for this particular situation, but we don't know what
00:05:50.800 --> 00:05:53.159
more of these we're gonna run into the future where this is
00:05:53.160 --> 00:05:57.089
needed behavior to be able to mutate a function.
NOTE Multi-threading elisp (data)
00:05:57.090 --> 00:05:59.810
Okay, so functions are pretty easy.
00:05:59.811 --> 00:06:00.919
They just can be shared between
00:06:00.920 --> 00:06:05.159
threads, but data can't be immutable, at least not into the
00:06:05.160 --> 00:06:08.759
model that Emacs currently has. We have two different
00:06:08.760 --> 00:06:12.039
ways to handle this. One is we require whenever you're
00:06:12.040 --> 00:06:14.399
calling some other code in a different thread, you have to
00:06:14.400 --> 00:06:17.039
send all the variables that it's going to need over to that
00:06:17.040 --> 00:06:19.159
thread. This is how you traditionally do inside actors.
00:06:19.160 --> 00:06:21.919
Any data that needs to go to a different actor needs to be sent
00:06:21.920 --> 00:06:25.519
over a channel. It's relatively easy implementation, but
00:06:25.520 --> 00:06:28.159
this is difficult in the Emacs case because everything is
00:06:28.160 --> 00:06:30.799
going to be accessing different variables. That means
00:06:30.800 --> 00:06:33.119
when you call something, you have to know ahead of time, all
00:06:33.120 --> 00:06:34.879
the different variables that are gonna be accessed inside
00:06:34.880 --> 00:06:38.248
that other thread and put those in when you call it.
NOTE Copy values to other threads on demands
00:06:38.249 --> 00:06:40.959
The other option we're using is we're copying values to the
00:06:40.960 --> 00:06:43.439
other threads on demand. If you're running a thread, it
00:06:43.440 --> 00:06:45.759
tries to look up a variable. It doesn't have any value for
00:06:45.760 --> 00:06:48.759
that variable. It will go back and ask the main thread and it
00:06:48.760 --> 00:06:50.959
will copy that value into that thread and it can continue
00:06:50.960 --> 00:06:53.399
execution. This is nice because you can just launch some
00:06:53.400 --> 00:06:55.639
code and it'll take care of handling all the data transfer
00:06:55.640 --> 00:06:57.883
for you.
NOTE Multi-threading elisp (buffers)
00:06:57.884 --> 00:07:00.359
But we don't want to be copying around is buffers,
00:07:00.360 --> 00:07:04.199
because they can be really large. In this case, we have a
00:07:04.200 --> 00:07:07.599
mutex. Each thread could only have one current buffer that
00:07:07.600 --> 00:07:12.279
it has an exclusive lock to. This comes with some
00:07:12.280 --> 00:07:16.079
trade-offs, big one being that if the user tries to access
00:07:16.080 --> 00:07:18.359
some buffer, they want to type something, and a background
00:07:18.360 --> 00:07:20.239
thread is holding onto that buffer, what do we do in that
00:07:20.240 --> 00:07:24.959
situation? And we still need to hold an exclusive lock, even
00:07:24.960 --> 00:07:26.359
if we're only going to read a buffer. If you have multiple
00:07:26.360 --> 00:07:29.159
readers, they each still need to take turns because we can't
00:07:29.160 --> 00:07:30.999
determine if at some point a thread is going to try and mutate
00:07:31.000 --> 00:07:33.879
the buffer. It has to be an exclusive lock. The other issue
00:07:33.880 --> 00:07:37.799
is buffer-locals. This is less of a implementation issue
00:07:37.800 --> 00:07:40.519
as much as it is a technical issue. Because you think about
00:07:40.520 --> 00:07:42.759
when we switch to a buffer, it has some buffer-local data and
00:07:42.760 --> 00:07:45.399
we have some thread-local data. As we go through, we're
00:07:45.400 --> 00:07:47.599
mutating everything. Those can get intertwined and
00:07:47.600 --> 00:07:49.719
pointing to each other. Then we switch away from that
00:07:49.720 --> 00:07:51.679
buffer. We need some quick way to be able to separate those
00:07:51.680 --> 00:07:54.279
out. The buffer-locals can go with the buffer-locals and
00:07:54.280 --> 00:07:56.439
the thread data can stay with thread data and make copies of
00:07:56.440 --> 00:07:58.719
anything that was pointing to the other side. But we don't
00:07:58.720 --> 00:08:02.839
have a good method to determine how to separate those two,
00:08:02.840 --> 00:08:05.359
like what data belongs to this and what data belongs to this,
00:08:05.360 --> 00:08:08.199
so that we can do that quickly. We haven't found a good
00:08:08.200 --> 00:08:09.599
solution to that yet, but it's something we're still
00:08:09.600 --> 00:08:11.902
working on.
NOTE Would this actually be useful?
00:08:11.903 --> 00:08:13.079
The question is, would this actually be
00:08:13.080 --> 00:08:15.959
useful for doing real work inside Emacs? I would say,
00:08:15.960 --> 00:08:17.959
yes, there's a lot of things you can do with this. You could
00:08:17.960 --> 00:08:20.239
handle process output in the background. You can do syntax
00:08:20.240 --> 00:08:23.479
highlighting. You can do buffer search in parallel. You can
00:08:23.480 --> 00:08:26.679
do LSP. You can do fetching your mail in the background. You
00:08:26.680 --> 00:08:29.639
can have a window manager that doesn't block your window
00:08:29.640 --> 00:08:32.319
manager when Emacs is blocked. You could do
00:08:32.320 --> 00:08:34.479
something like a file system watcher that keeps up on files
00:08:34.480 --> 00:08:37.559
without blocking Emacs. This wouldn't be so great for
00:08:37.560 --> 00:08:39.159
building concurrent data structures or operating on
00:08:39.160 --> 00:08:42.199
shared data or building your own abstractions, because of the
00:08:42.200 --> 00:08:46.039
trade-offs that we've made here. Okay. That's talking
00:08:46.040 --> 00:08:46.918
about multi-threading.
NOTE Precise garbage collection
00:08:46.919 --> 00:08:47.599
The other thing we're going to talk
00:08:47.600 --> 00:08:51.319
about is precise garbage collection. In Rune, we have a
00:08:51.320 --> 00:08:54.439
safe, precise garbage collection because of the Rust type
00:08:54.440 --> 00:08:58.119
system. Let's look at what the problem is with garbage
00:08:58.120 --> 00:09:00.479
collection in the first place. Really, the tricky part
00:09:00.480 --> 00:09:03.719
about garbage collection is rooting. How do we find out what
00:09:03.720 --> 00:09:06.159
the roots are? These are all the values that are on the
00:09:06.160 --> 00:09:08.679
stack or inside the registers. In this example here, we
00:09:08.680 --> 00:09:11.919
allocate an object. We call garbage_collect, that object's
00:09:11.920 --> 00:09:13.536
collected, and then we try and return it.
00:09:13.537 --> 00:09:16.536
It's no longer valid.
NOTE How Emacs used to deal with roots
00:09:16.537 --> 00:09:19.039
Let's look at how Emacs used to deal with this
00:09:19.040 --> 00:09:22.559
problem way back in the day. There was a system called gcpro
00:09:22.560 --> 00:09:26.319
or GC Protect, which is basically designed that every time a
00:09:26.320 --> 00:09:28.919
value needed to survive past a garbage collection point,
00:09:28.920 --> 00:09:32.359
you had to try and protect it. In order to do this, you had
00:09:32.360 --> 00:09:35.439
to declare a struct, you had to put a macro around it to root
00:09:35.440 --> 00:09:37.999
the object, and then you had to unroot it when you were done--
00:09:38.000 --> 00:09:41.559
past the garbage collection. Now the value is safe. You
00:09:41.560 --> 00:09:44.039
can see down here, I pulled these eight rules out from a
00:09:44.040 --> 00:09:46.919
really old version of the Emacs manual about all the things
00:09:46.920 --> 00:09:49.279
you had to keep track of when you were trying to use this
00:09:49.280 --> 00:09:52.319
system. All right, so there was a special handling for
00:09:52.320 --> 00:09:54.639
nested GC protects. You had to make sure the memory was
00:09:54.640 --> 00:09:58.239
initialized. You had to make sure that traps couldn't occur
00:09:58.240 --> 00:10:00.839
between allocating and when GC protect would happen. It
00:10:00.840 --> 00:10:03.319
can be tricky because you don't always know when a function
00:10:03.320 --> 00:10:06.879
that's getting called could potentially call garbage
00:10:06.880 --> 00:10:10.719
collection. So if you got something wrong, you also
00:10:10.720 --> 00:10:12.719
might not catch it for a long time because garbage
00:10:12.720 --> 00:10:15.719
collection may only get called one out of 99 times. The other
00:10:15.720 --> 00:10:18.999
99 times is just fine. That one time it happens and you
00:10:19.000 --> 00:10:22.559
can't reproduce the issue. When you do get this wrong and
00:10:22.560 --> 00:10:24.439
some, something doesn't get rooted and it gets
00:10:24.440 --> 00:10:26.319
overwritten, it generally doesn't show up right where the
00:10:26.320 --> 00:10:28.799
problem is. It gets showed up way later when you actually try
00:10:28.800 --> 00:10:31.479
and access the value and the value is invalid. You've got
00:10:31.480 --> 00:10:33.639
to track it back to where that thing did not get properly
00:10:33.640 --> 00:10:37.359
rooted. It's a huge source of bugs and very hard to
00:10:37.360 --> 00:10:38.712
maintain.
NOTE Conservative stack scanning
00:10:38.713 --> 00:10:40.119
Emacs decided to go with a different path,
00:10:40.120 --> 00:10:42.399
which we call conservative stack scanning. Basically,
00:10:42.400 --> 00:10:45.239
the garbage collector just looks at the stack and all the
00:10:45.240 --> 00:10:47.959
registers and any data inside there that looks like it could
00:10:47.960 --> 00:10:52.279
be a pointer, it treats it as a pointer. This is nice because
00:10:52.280 --> 00:10:54.711
you get really easy root tracking,
00:10:54.712 --> 00:10:56.113
but it also comes with some trade-offs,
00:10:56.114 --> 00:11:00.156
mostly that your objects are no longer movable.
NOTE Movable objects
00:11:00.157 --> 00:11:03.079
Why would we want movable objects in Emacs?
00:11:03.080 --> 00:11:05.839
There's a couple of different reasons. One is compaction.
00:11:05.840 --> 00:11:08.199
You can take all your heap, you can pack that on down because
00:11:08.200 --> 00:11:11.239
you can coalesce all your objects together. Another is that
00:11:11.240 --> 00:11:13.239
it's easy to implement generational garbage collection.
00:11:13.240 --> 00:11:16.039
You can just copy everything out of your minor heap into your
00:11:16.040 --> 00:11:21.839
older heap. Really, Emacs is kind of uniquely ideal for
00:11:21.840 --> 00:11:24.279
generational collection, because the typical way we
00:11:24.280 --> 00:11:27.799
interact with Emacs is as a series of commands. You execute
00:11:27.800 --> 00:11:29.959
some command, you'd execute the next command, you execute
00:11:29.960 --> 00:11:33.199
a command. It could be happening every key press, it could be
00:11:33.200 --> 00:11:36.759
happening with M-x. However long that command is, that is
00:11:36.760 --> 00:11:40.959
the ideal length for the minor collection generation, the
00:11:40.960 --> 00:11:43.399
first generation. Because once you're done with that
00:11:43.400 --> 00:11:45.879
generation, anything that's still existing is going to be
00:11:45.880 --> 00:11:49.079
around for a very long time. So that works out really well
00:11:49.080 --> 00:11:52.279
for Emacs. We want to make this a generational collector.
00:11:52.280 --> 00:11:56.199
The other thing is with object layout. We use a lot of lists
00:11:56.200 --> 00:12:00.559
inside Emacs Lisp. Every time you go to the cdr, you've
00:12:00.560 --> 00:12:03.039
got to be chasing a pointer around the heap and following
00:12:03.040 --> 00:12:05.439
that. That can potentially result in cache misses and
00:12:05.440 --> 00:12:08.239
all sorts of other things like that. So it can take a long
00:12:08.240 --> 00:12:12.159
time. It can be quite slow. But if you have the ability to move
00:12:12.160 --> 00:12:16.559
objects, you can just relocate an entire list and lay it out
00:12:16.560 --> 00:12:19.168
in an array right next to each other inside memory.
00:12:19.169 --> 00:12:22.479
So iterating over it is just as fast as iterating over an array.
00:12:22.480 --> 00:12:25.421
But you can only do that if you have movable objects.
00:12:25.422 --> 00:12:28.399
I'll point out here too, that with conservative stack scanning,
00:12:28.400 --> 00:12:31.599
it's not that all objects are immovable. It's only ones that
00:12:31.600 --> 00:12:35.519
are pointed to from the stack or from the registers that have
00:12:35.520 --> 00:12:38.828
to become immovable.
NOTE How Rust makes precise GC easy
00:12:38.829 --> 00:12:41.039
Let's look at how Rust makes precise
00:12:41.040 --> 00:12:44.439
garbage collection easy. Here I have some Rust code to
00:12:44.440 --> 00:12:47.279
show kind of how the lifetime system works and what we call
00:12:47.280 --> 00:12:49.879
XOR mutability, where we can only have one mutable
00:12:49.880 --> 00:12:52.879
reference or multiple immutable references to the same
00:12:52.880 --> 00:12:56.199
thing. Here we declare a vector, we take a reference to the
00:12:56.200 --> 00:12:59.199
first element of the vector, and then we mutate the vector.
00:12:59.200 --> 00:13:02.239
Now this could potentially resize the vector and move it to a
00:13:02.240 --> 00:13:04.919
different location in memory, so that reference is no
00:13:04.920 --> 00:13:07.759
longer valid. The nice thing is, Rust catches this for
00:13:07.760 --> 00:13:10.479
us. It says, hey, this is no longer valid. This reference
00:13:10.480 --> 00:13:14.519
can't survive past when you mutated it. Okay? That's
00:13:14.520 --> 00:13:17.559
exactly what we want for a garbage collector. You can see
00:13:17.560 --> 00:13:19.879
here, we take this in a garbage collection context, we
00:13:19.880 --> 00:13:23.359
create a new context object, we add an object, we call
00:13:23.360 --> 00:13:26.759
garbage_collect, then we try and access that object. It's no
00:13:26.760 --> 00:13:29.199
longer accessible, and Rust will prevent us from trying to
00:13:29.200 --> 00:13:34.839
access that variable. So, how do we solve this? We have a
00:13:34.840 --> 00:13:39.759
root macro. We declared this root macro, it lets us take the
00:13:39.760 --> 00:13:41.759
object and let it live past garbage collection, and
00:13:41.760 --> 00:13:45.319
everything works out. The nice thing is, this root macro
00:13:45.320 --> 00:13:47.799
will get dropped when it's out of scope, so we don't have to
00:13:47.800 --> 00:13:51.519
worry about the un-gc-protect step of this. Statically,
00:13:51.520 --> 00:13:55.799
Rust will verify and tell us any object that needs to be
00:13:55.800 --> 00:13:58.279
rooted. If we try and access it, it'll tell us it's invalid.
00:13:58.280 --> 00:14:00.999
We have this root macro and then we can access it. So in
00:14:01.000 --> 00:14:03.759
that way, we have safe, precise garbage collection without
00:14:03.760 --> 00:14:07.479
any chance of introducing undefined behavior, which is
00:14:07.480 --> 00:14:09.999
really, really powerful. It's really easy because the
00:14:10.000 --> 00:14:13.226
type system will catch it all for us.
NOTE Other Rust niceties: proc macro
00:14:13.227 --> 00:14:15.147
There's some other Rust niceties I want to kind of
00:14:15.148 --> 00:14:16.799
talk through that are useful, but
00:14:16.800 --> 00:14:21.079
are not, you know, star features. One is proc macros. You
00:14:21.080 --> 00:14:23.679
can see up on the top, you can see how you declare a function
00:14:23.680 --> 00:14:27.359
inside the C core. All right. You have to use the macro. You
00:14:27.360 --> 00:14:29.141
have to put the list type, the function type,
00:14:29.142 --> 00:14:30.963
the struct type, the different types of arguments
00:14:30.964 --> 00:14:33.225
or different number of arguments, the doc string,
00:14:33.226 --> 00:14:36.023
and then you can put your argument listing down inside there.
00:14:36.024 --> 00:14:37.984
On the Rust side, we just write this like we would
00:14:37.985 --> 00:14:40.044
any other Rust function. And then we put
00:14:40.045 --> 00:14:41.285
the defun proc macro on there
00:14:41.286 --> 00:14:44.186
and it takes care of everything for us behind the scenes.
00:14:44.187 --> 00:14:46.407
A couple of cool additional things we can do with this
00:14:46.408 --> 00:14:48.727
is that we don't have to make everything just an object.
00:14:48.728 --> 00:14:49.759
We can actually make things
00:14:49.760 --> 00:14:54.239
more specific types. Here we have symbols. As well as
00:14:54.240 --> 00:14:56.279
you can see subfeature, it's an optional parameter, and we
00:14:56.280 --> 00:15:00.919
just make it an option inside Rust and it automatically make
00:15:00.920 --> 00:15:03.599
it an optional inside Elisp.
00:15:03.600 --> 00:15:05.181
This makes them really easy to write.
00:15:05.182 --> 00:15:06.439
I can't take credit for this is because this is
00:15:06.440 --> 00:15:09.119
something that I saw inside Remacs and I stole from them, but
00:15:09.120 --> 00:15:11.439
it makes the functions really easy to call from each other
00:15:11.440 --> 00:15:14.559
and really easy to write as well.
NOTE sum types
00:15:14.560 --> 00:15:18.523
Another thing that's really nice is sum types.
00:15:18.524 --> 00:15:21.039
In the C core, if I wanted to get a
00:15:21.040 --> 00:15:23.759
string out of an object, I would first need to check that it's
00:15:23.760 --> 00:15:28.319
a string and then dereference it as a string. But if it's not a
00:15:28.320 --> 00:15:30.679
string, I may introduce undefined behavior. So in
00:15:30.680 --> 00:15:32.799
complicated code, I have to make sure that I have always
00:15:32.800 --> 00:15:34.959
checked what type it is before I try and dereference that
00:15:34.960 --> 00:15:37.879
type. We don't have to worry about any of that inside Rust
00:15:37.880 --> 00:15:41.319
because we can untag a value and we can use their some types,
00:15:41.320 --> 00:15:44.399
basically create an enum and we can match on what the
00:15:44.400 --> 00:15:47.639
different values can be. Then we only get out the types
00:15:47.640 --> 00:15:50.359
that are viable or are actually there. So we never
00:15:50.360 --> 00:15:52.159
accidentally get something out of an object that we didn't
00:15:52.160 --> 00:15:54.239
mean to, or dereference it as something that doesn't
00:15:54.240 --> 00:15:56.879
really exist. We can just match on it and we can get out the
00:15:56.880 --> 00:16:01.040
values that we need, which is really, really powerful.
NOTE Regex
00:16:01.041 --> 00:16:03.639
So there's some other Rust niceties as well working with here.
00:16:03.640 --> 00:16:07.799
One is the regex engine inside Rust is really fast, high
00:16:07.800 --> 00:16:10.959
performance. We are using that for the Elixir regex
00:16:10.960 --> 00:16:14.879
engine to give it high performance and worst-case
00:16:14.880 --> 00:16:16.051
guarantees.
NOTE Parsers
00:16:16.052 --> 00:16:18.599
The other is that Rust has a lot of really good
00:16:18.600 --> 00:16:21.559
parsers for things like JSON that are no copy parsers that
00:16:21.560 --> 00:16:24.719
are high performance. We can use those inside Rune as
00:16:24.720 --> 00:16:27.209
well.
NOTE Other changes: GUI first, terminal second
00:16:27.210 --> 00:16:29.439
There's a handful of other changes we're working on
00:16:29.440 --> 00:16:33.119
that are not Rust-specific, but we'd like to see. The first is
00:16:33.120 --> 00:16:36.759
being GUI first, terminal second. This means two things.
00:16:36.760 --> 00:16:40.039
First is that we have all of our key bindings. Right now
00:16:40.040 --> 00:16:43.279
inside Emacs, C-i and TAB are bound to the same key
00:16:43.280 --> 00:16:45.039
binding by default, because that's how it works inside the
00:16:45.040 --> 00:16:48.119
terminal. In the GUI, you shouldn't have that limitation.
00:16:48.120 --> 00:16:52.559
The second is that the GUI should not block when Lisp is
00:16:52.560 --> 00:16:55.199
blocked. It should be independent of that. Your GUI can
00:16:55.200 --> 00:16:58.918
still continue to operate when Lisp is running.
NOTE Off-screen cursor
00:16:58.919 --> 00:17:01.279
The other is the ability to have an off-screen cursor
00:17:01.280 --> 00:17:02.699
so that you can be typing on something,
00:17:02.700 --> 00:17:04.319
you can scroll up and down and the point
00:17:04.320 --> 00:17:06.719
doesn't have to follow you where you lose your place where
00:17:06.720 --> 00:17:09.399
you were before. You don't have to intentionally set a mark.
00:17:09.400 --> 00:17:11.199
You can just scroll and then start typing and it'll go back up
00:17:11.200 --> 00:17:13.879
to where it was before, like it works in most applications.
00:17:13.880 --> 00:17:16.304
And this can be optional.
NOTE Image flow
00:17:16.305 --> 00:17:18.079
The other is image flow. We want it
00:17:18.080 --> 00:17:20.879
so that you can easily flow images and you can have large
00:17:20.880 --> 00:17:23.159
images and scroll past them without jumping and you can flow
00:17:23.160 --> 00:17:24.439
text around images.
NOTE Testing
00:17:24.440 --> 00:17:29.799
How are we testing this project? Because there's a lot of
00:17:29.800 --> 00:17:33.159
things that you could get wrong here. One thing we're doing
00:17:33.160 --> 00:17:38.039
is we're using ERT. Emacs ships with over 7,000 built-in
00:17:38.040 --> 00:17:42.879
tests--Elisp tests. We are using this test suite to test
00:17:42.880 --> 00:17:45.079
our project as well. We can kind of use this as a dashboard
00:17:45.080 --> 00:17:47.679
of saying how close are we to getting to parity with GNU
00:17:47.680 --> 00:17:52.319
Emacs. The other thing that we have is a tool called elprop,
00:17:52.320 --> 00:17:55.279
which is an external utility that basically tests for
00:17:55.280 --> 00:17:58.719
correctness. Because really, the correctness of Rune is
00:17:58.720 --> 00:18:00.999
whatever Emacs does, because there's no official spec on
00:18:01.000 --> 00:18:04.079
how things should behave. To do this, we can go look at
00:18:04.080 --> 00:18:07.159
the Rust function signature. We know what the arguments
00:18:07.160 --> 00:18:09.319
are, we know how many they are, and we know what types they
00:18:09.320 --> 00:18:11.679
should be. Given that information, we can generate a
00:18:11.680 --> 00:18:15.279
whole bunch of random functions feeding those types in. And
00:18:15.280 --> 00:18:18.959
then we send a copy over to Emacs, we send a copy over to Rune.
00:18:18.960 --> 00:18:21.679
They each evaluate it and they return the result and we make
00:18:21.680 --> 00:18:23.519
sure the results are the same. Then you do that for
00:18:23.520 --> 00:18:26.199
thousands of different implementations of the function.
00:18:26.200 --> 00:18:29.039
And it helps us find corner cases really easy without having
00:18:29.040 --> 00:18:31.639
to handwrite a whole bunch of different cases to test things
00:18:31.640 --> 00:18:36.344
and say, where are these two functions different?
NOTE Status
00:18:36.345 --> 00:18:39.359
So the current status: we already have a multi-threaded Elixir
00:18:39.360 --> 00:18:42.999
interpreter and bytecode engine inside there. There's no
00:18:43.000 --> 00:18:45.679
actual text editor in there yet, but the primitives are
00:18:45.680 --> 00:18:48.679
there. Like you can insert text, move point around,
00:18:48.680 --> 00:18:52.039
delete text, do different things like that. But we don't
00:18:52.040 --> 00:18:53.679
have a GUI hooked up to different key bindings to actually
00:18:53.680 --> 00:18:58.159
type on. There's just a REPL to operate in. We have about
00:18:58.160 --> 00:19:01.279
250 of the 1500 built-in functions already implemented
00:19:01.280 --> 00:19:04.119
inside there. There's a lot of low-hanging fruit inside this
00:19:04.120 --> 00:19:07.246
area to still be implemented.
NOTE Next directions
00:19:07.247 --> 00:19:07.719
The next directions we're
00:19:07.720 --> 00:19:11.959
working on is we're optimizing the GC. We want to make it
00:19:11.960 --> 00:19:13.839
generational. Like I said, right now, it's just a simple
00:19:13.840 --> 00:19:17.359
semi-spaced copying GC. We want to add a proper GUI. We need
00:19:17.360 --> 00:19:19.599
to implement text properties, overlays, process and job
00:19:19.600 --> 00:19:22.738
control, all that goodness right there.
NOTE How to get involved
00:19:22.739 --> 00:19:25.378
How can you get involved? This is hosted on GitHub.
00:19:25.379 --> 00:19:26.424
You can come on over.
00:19:26.425 --> 00:19:28.639
If you have any ideas about how to implement something or
00:19:28.640 --> 00:19:30.639
something you'd like to see done, go ahead and just open an
00:19:30.640 --> 00:19:32.799
issue so we can have a discussion about it. We've had lots of
00:19:32.800 --> 00:19:34.599
interesting discussions with different people coming in
00:19:34.600 --> 00:19:37.639
to the GitHub repo. If you're interested in contributing,
00:19:37.640 --> 00:19:40.439
the easiest way is probably to run elprop, pick some
00:19:40.440 --> 00:19:43.279
function, run elprop on it. I promise it won't take long to
00:19:43.280 --> 00:19:45.639
find some issues, some discrepancy between Emacs and Rune,
00:19:45.640 --> 00:19:48.959
and that lets you dive into the Rust code and figure out, and
00:19:48.960 --> 00:19:50.879
the C code, and figure out what the difference is between the
00:19:50.880 --> 00:19:53.119
two. or come along and help implement your favorite
00:19:53.120 --> 00:19:55.679
functionality. This has been a really interesting project
00:19:55.680 --> 00:19:58.359
so far, and we've had a handful of different contributors on
00:19:58.360 --> 00:20:01.799
it who just kind of want to learn Rust or get more into
00:20:01.800 --> 00:20:06.000
systems-level programming. Thank you.
|