summaryrefslogblamecommitdiffstats
path: root/2024/captions/emacsconf-2024-rust--an-experimental-emacs-core-in-rust--troy-hinckley--main.vtt
blob: 05826fb8a19cd576ecfb18a430bfd005482cc1f5 (plain) (tree)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351






































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































                                                                   
WEBVTT captioned by sachac, checked by bhavin

NOTE Rune

00:00:00.000 --> 00:00:05.119
Hello, EmacsConf. My name is Troy Hinckley, and this is my

00:00:05.120 --> 00:00:08.759
talk on Rune, a Rust implementation in Emacs. We strive to be

00:00:08.760 --> 00:00:11.839
bug compatible with Emacs, so you can use the same Elisp.

00:00:11.840 --> 00:00:14.879
It's still a fairly early stage experimental project, and

00:00:14.880 --> 00:00:17.081
we have some basic things implemented.

NOTE The Emacs core

00:00:17.082 --> 00:00:19.946
Before I get started, I want to talk a bit more

00:00:19.947 --> 00:00:21.847
about what the core is.

00:00:21.848 --> 00:00:24.559
So the Emacs core, it includes the runtime, the interpreter,

00:00:24.560 --> 00:00:26.439
garbage collector, everything used to run the code.

00:00:26.440 --> 00:00:29.799
It includes the GUI. It includes all the data structures.

00:00:29.800 --> 00:00:31.919
If you look underneath all the Elisp data structures,

00:00:31.920 --> 00:00:33.599
there's C code underneath there,

00:00:33.600 --> 00:00:35.559
as well as the auxiliary functions

00:00:35.560 --> 00:00:39.239
of which there's about 1500. In making this talk, I don't

00:00:39.240 --> 00:00:40.919
want to give the impression that I'm saying the core is

00:00:40.920 --> 00:00:42.879
outdated or that needs to be replaced or that it can't be

00:00:42.880 --> 00:00:45.519
evolved on its own, because clearly it has continued to

00:00:45.520 --> 00:00:48.319
evolve. If we look in just the last few years, we can see that

00:00:48.320 --> 00:00:50.439
we've added native compilation, we've added tree-sitter

00:00:50.440 --> 00:00:52.759
support, we've added color emoji, and there's work right

00:00:52.760 --> 00:00:57.167
now to add a new garbage collector to Emacs as well.

NOTE Why create this?

00:00:57.168 --> 00:01:01.071
Why create this project? Emacs has a long history.

00:01:01.072 --> 00:01:04.535
It has a lot of users. It needs to support a big community.

00:01:04.536 --> 00:01:06.837
Because of that, it has to be very conservative

00:01:06.838 --> 00:01:10.321
about what things it can allow into the project.

00:01:10.322 --> 00:01:11.639
Forks like this create an

00:01:11.640 --> 00:01:15.586
opportunity to experiment and try new approaches.

00:01:15.587 --> 00:01:18.799
This is particularly a good use case for Rust because the C core,

00:01:18.800 --> 00:01:20.849
it's pretty well tested. It's been around for a long time.

00:01:20.850 --> 00:01:22.959
A lot of the bugs have been ironed out, but when you're doing a

00:01:22.960 --> 00:01:26.439
new greenfield project, it's very easy to introduce new

00:01:26.440 --> 00:01:28.774
undefined behavior and memory unsafety

00:01:28.775 --> 00:01:32.376
and stuff like that. Rust protects us from most of that,

00:01:32.377 --> 00:01:34.937
but it also gives us the ability to be fast

00:01:34.938 --> 00:01:37.883
and has a strong ecosystem behind it.

00:01:37.884 --> 00:01:40.399
Rust is also really good at multi-threading.

00:01:40.400 --> 00:01:43.399
Their phrase in the community is fearless concurrency.

00:01:43.400 --> 00:01:45.559
They should be able to write concurrent programs without

00:01:45.560 --> 00:01:49.319
having to worry about data races. It's also really high

00:01:49.320 --> 00:01:51.839
performance. It has a really good regex engine. It's known

00:01:51.840 --> 00:01:55.864
for its non-copy I/O as well.

NOTE How does this compare to other projects?

00:01:55.865 --> 00:01:57.479
How does this compare to other

00:01:57.480 --> 00:01:59.919
Rust and Emacs projects, whether there's been a couple? The

00:01:59.920 --> 00:02:02.799
first is Remacs. This project was the first. It took an

00:02:02.800 --> 00:02:05.519
outside-in approach. Basically you could take a C

00:02:05.520 --> 00:02:09.319
function and replace it with a Rust function and build it

00:02:09.320 --> 00:02:11.799
together as one executable. This is pretty easy to do

00:02:11.800 --> 00:02:14.639
because they could both talk over the C ABI. You could

00:02:14.640 --> 00:02:16.479
swap out functions once at a time. They made really good

00:02:16.480 --> 00:02:20.279
progress at first, but eventually they ran into the problem

00:02:20.280 --> 00:02:23.079
that as you get down to the really core parts of it, you can't

00:02:23.080 --> 00:02:25.919
just replace one function at a time anymore, because some of

00:02:25.920 --> 00:02:28.159
that functionality is connected to other things. Like for

00:02:28.160 --> 00:02:30.359
example, you can't replace the garbage collector without

00:02:30.360 --> 00:02:32.759
replacing the entire garbage collection system. So the

00:02:32.760 --> 00:02:36.279
progress really kind of slowed down. Another issue with it

00:02:36.280 --> 00:02:38.839
was, is that they were doing a one-to-one rewrite, so they

00:02:38.840 --> 00:02:41.079
weren't adding any new features or functionality, just

00:02:41.080 --> 00:02:43.879
taking the same code and replacing it in Rust, which doesn't

00:02:43.880 --> 00:02:46.801
add any advantages in and of itself.

00:02:46.802 --> 00:02:50.399
This spawned Emacs-NG, which was kind of the spiritual successor to

00:02:50.400 --> 00:02:52.746
Remacs, where they decided to add new functionality,

00:02:52.747 --> 00:02:55.808
the biggest one being a JavaScript runtime,

00:02:55.809 --> 00:02:58.230
as well as some new renderers to Emacs.

00:02:58.231 --> 00:03:01.314
This is no longer actively developed though.

NOTE Multi-threading

00:03:01.315 --> 00:03:04.079
In this project, one of the big focuses we have is

00:03:04.080 --> 00:03:07.559
on multi-threading. The C core itself is, everything is

00:03:07.560 --> 00:03:09.959
designed around being single-threaded, all the data

00:03:09.960 --> 00:03:13.039
structures and everything like that. Rust has a great

00:03:13.040 --> 00:03:15.719
concurrency story. In Rust, everything is intended to be

00:03:15.720 --> 00:03:18.199
multi-threaded. That doesn't mean that everything has to

00:03:18.200 --> 00:03:20.719
run on multiple threads, but you can't write something that

00:03:20.720 --> 00:03:22.719
is limited to only running in a single-threaded

00:03:22.720 --> 00:03:25.799
environment. So this makes it really easy to use all the

00:03:25.800 --> 00:03:28.039
existing packages and build something that is

00:03:28.040 --> 00:03:30.480
concurrency safe. which is what we've done here,

00:03:30.481 --> 00:03:32.440
and that was relatively easy to do.

NOTE Multi-threading elisp

00:03:32.441 --> 00:03:34.781
But adding it to Elisp is the hard part,

00:03:34.782 --> 00:03:36.502
because we've got to come up with a good model

00:03:36.503 --> 00:03:39.624
for Lisp, and Elisp is just a giant ball

00:03:39.625 --> 00:03:41.479
of mutable state. We need to find some

00:03:41.480 --> 00:03:44.566
way to tame that so we can make workable concurrency

00:03:44.567 --> 00:03:47.647
out of it. There's really two ways you can do this.

NOTE No-GIL method

00:03:47.648 --> 00:03:49.268
One is what I call the no-GIL method.

00:03:49.269 --> 00:03:51.399
This is what Python is doing, where

00:03:51.400 --> 00:03:53.919
you take all of your data structures, you make them

00:03:53.920 --> 00:03:56.439
concurrency safe, and then you just leave it up to the

00:03:56.440 --> 00:03:58.119
programmer to decide what they're going to do with it.

00:03:58.120 --> 00:04:00.467
They've got to build safe abstractions on top of that.

00:04:00.468 --> 00:04:03.086
One of the big downsides with this is that

00:04:03.087 --> 00:04:05.247
it comes with a pretty high cost.

00:04:05.248 --> 00:04:07.799
The last benchmarks I've seen is that by making

00:04:07.800 --> 00:04:10.879
everything concurrency safe in Python, single-threaded

00:04:10.880 --> 00:04:15.799
code is about 20% slower in some benchmarks.

00:04:15.800 --> 00:04:19.079
Since most code is single-threaded, this has a big

00:04:19.080 --> 00:04:21.039
performance impact for most code that isn't taking

00:04:21.040 --> 00:04:23.719
advantage of the multi-threading. The other thing is this

00:04:23.720 --> 00:04:26.279
introduces a lot of nasty concurrency bugs because you can

00:04:26.280 --> 00:04:29.039
have anything mutating any part of the data from any thread,

00:04:29.040 --> 00:04:32.637
even if you can't have memory unsafety per se.

NOTE Actors

00:04:32.638 --> 00:04:34.738
The other option is actors,

00:04:34.739 --> 00:04:36.639
which are a really known way to approach this,

00:04:36.640 --> 00:04:39.079
where you trade some of that flexibility that you get

00:04:39.080 --> 00:04:43.719
with fully concurrent for more control and. Code and

00:04:43.720 --> 00:04:45.839
functions are shared between all the different threads,

00:04:45.840 --> 00:04:50.599
but data has to be passed along channels between different

00:04:50.600 --> 00:04:51.251
actors.

NOTE Multi-threading elisp (functions)

00:04:51.252 --> 00:04:52.919
We want the functions to be shared, and this

00:04:52.920 --> 00:04:55.159
should be pretty easy because we don't mutate functions

00:04:55.160 --> 00:05:00.119
like we do data, except when we do. In Lisp, functions are

00:05:00.120 --> 00:05:03.239
just lists like anything else. So you can mutate them

00:05:03.240 --> 00:05:06.279
just like lists. Even if you're not talking about

00:05:06.280 --> 00:05:09.159
interpreted code, like if you have a native compiled

00:05:09.160 --> 00:05:11.959
function, you can still mutate the constants inside the

00:05:11.960 --> 00:05:14.839
function. For example, here we have a function returns a

00:05:14.840 --> 00:05:17.679
string. We take that string out, we mutate that string, and

00:05:17.680 --> 00:05:23.079
now the function returns a different string. In Rune, we

00:05:23.080 --> 00:05:24.999
enforce that all functions, their constants are

00:05:25.000 --> 00:05:27.199
immutable. You can't mutate the insides of a function. You

00:05:27.200 --> 00:05:29.239
can still swap out functions and redefine them, but you

00:05:29.240 --> 00:05:32.239
can't mutate the inside of a function. This enables them

00:05:32.240 --> 00:05:34.679
to be safely shared across threads.

NOTE Caveats

00:05:34.680 --> 00:05:36.400
However, there are some caveats to this.

00:05:36.401 --> 00:05:38.159
For example, some functions actually do

00:05:38.160 --> 00:05:41.439
need to mutate their own data. The example that we run into is

00:05:41.440 --> 00:05:44.839
cl-generic. It uses a method cache. So it has to be able to

00:05:44.840 --> 00:05:47.639
update that cache. In this case, we just made a special

00:05:47.640 --> 00:05:50.799
case for this particular situation, but we don't know what

00:05:50.800 --> 00:05:53.159
more of these we're gonna run into the future where this is

00:05:53.160 --> 00:05:57.089
needed behavior to be able to mutate a function.

NOTE Multi-threading elisp (data)

00:05:57.090 --> 00:05:59.810
Okay, so functions are pretty easy.

00:05:59.811 --> 00:06:00.919
They just can be shared between

00:06:00.920 --> 00:06:05.159
threads, but data can't be immutable, at least not into the

00:06:05.160 --> 00:06:08.759
model that Emacs currently has. We have two different

00:06:08.760 --> 00:06:12.039
ways to handle this. One is we require whenever you're

00:06:12.040 --> 00:06:14.399
calling some other code in a different thread, you have to

00:06:14.400 --> 00:06:17.039
send all the variables that it's going to need over to that

00:06:17.040 --> 00:06:19.159
thread. This is how you traditionally do inside actors.

00:06:19.160 --> 00:06:21.919
Any data that needs to go to a different actor needs to be sent

00:06:21.920 --> 00:06:25.519
over a channel. It's relatively easy implementation, but

00:06:25.520 --> 00:06:28.159
this is difficult in the Emacs case because everything is

00:06:28.160 --> 00:06:30.799
going to be accessing different variables. That means

00:06:30.800 --> 00:06:33.119
when you call something, you have to know ahead of time, all

00:06:33.120 --> 00:06:34.879
the different variables that are gonna be accessed inside

00:06:34.880 --> 00:06:38.248
that other thread and put those in when you call it.

NOTE Copy values to other threads on demands

00:06:38.249 --> 00:06:40.959
The other option we're using is we're copying values to the

00:06:40.960 --> 00:06:43.439
other threads on demand. If you're running a thread, it

00:06:43.440 --> 00:06:45.759
tries to look up a variable. It doesn't have any value for

00:06:45.760 --> 00:06:48.759
that variable. It will go back and ask the main thread and it

00:06:48.760 --> 00:06:50.959
will copy that value into that thread and it can continue

00:06:50.960 --> 00:06:53.399
execution. This is nice because you can just launch some

00:06:53.400 --> 00:06:55.639
code and it'll take care of handling all the data transfer

00:06:55.640 --> 00:06:57.883
for you.

NOTE Multi-threading elisp (buffers)

00:06:57.884 --> 00:07:00.359
But we don't want to be copying around is buffers,

00:07:00.360 --> 00:07:04.199
because they can be really large. In this case, we have a

00:07:04.200 --> 00:07:07.599
mutex. Each thread could only have one current buffer that

00:07:07.600 --> 00:07:12.279
it has an exclusive lock to. This comes with some

00:07:12.280 --> 00:07:16.079
trade-offs, big one being that if the user tries to access

00:07:16.080 --> 00:07:18.359
some buffer, they want to type something, and a background

00:07:18.360 --> 00:07:20.239
thread is holding onto that buffer, what do we do in that

00:07:20.240 --> 00:07:24.959
situation? And we still need to hold an exclusive lock, even

00:07:24.960 --> 00:07:26.359
if we're only going to read a buffer. If you have multiple

00:07:26.360 --> 00:07:29.159
readers, they each still need to take turns because we can't

00:07:29.160 --> 00:07:30.999
determine if at some point a thread is going to try and mutate

00:07:31.000 --> 00:07:33.879
the buffer. It has to be an exclusive lock. The other issue

00:07:33.880 --> 00:07:37.799
is buffer-locals. This is less of a implementation issue

00:07:37.800 --> 00:07:40.519
as much as it is a technical issue. Because you think about

00:07:40.520 --> 00:07:42.759
when we switch to a buffer, it has some buffer-local data and

00:07:42.760 --> 00:07:45.399
we have some thread-local data. As we go through, we're

00:07:45.400 --> 00:07:47.599
mutating everything. Those can get intertwined and

00:07:47.600 --> 00:07:49.719
pointing to each other. Then we switch away from that

00:07:49.720 --> 00:07:51.679
buffer. We need some quick way to be able to separate those

00:07:51.680 --> 00:07:54.279
out. The buffer-locals can go with the buffer-locals and

00:07:54.280 --> 00:07:56.439
the thread data can stay with thread data and make copies of

00:07:56.440 --> 00:07:58.719
anything that was pointing to the other side. But we don't

00:07:58.720 --> 00:08:02.839
have a good method to determine how to separate those two,

00:08:02.840 --> 00:08:05.359
like what data belongs to this and what data belongs to this,

00:08:05.360 --> 00:08:08.199
so that we can do that quickly. We haven't found a good

00:08:08.200 --> 00:08:09.599
solution to that yet, but it's something we're still

00:08:09.600 --> 00:08:11.902
working on.

NOTE Would this actually be useful?

00:08:11.903 --> 00:08:13.079
The question is, would this actually be

00:08:13.080 --> 00:08:15.959
useful for doing real work inside Emacs? I would say,

00:08:15.960 --> 00:08:17.959
yes, there's a lot of things you can do with this. You could

00:08:17.960 --> 00:08:20.239
handle process output in the background. You can do syntax

00:08:20.240 --> 00:08:23.479
highlighting. You can do buffer search in parallel. You can

00:08:23.480 --> 00:08:26.679
do LSP. You can do fetching your mail in the background. You

00:08:26.680 --> 00:08:29.639
can have a window manager that doesn't block your window

00:08:29.640 --> 00:08:32.319
manager when Emacs is blocked. You could do

00:08:32.320 --> 00:08:34.479
something like a file system watcher that keeps up on files

00:08:34.480 --> 00:08:37.559
without blocking Emacs. This wouldn't be so great for

00:08:37.560 --> 00:08:39.159
building concurrent data structures or operating on

00:08:39.160 --> 00:08:42.199
shared data or building your own abstractions, because of the

00:08:42.200 --> 00:08:46.039
trade-offs that we've made here. Okay. That's talking

00:08:46.040 --> 00:08:46.918
about multi-threading.

NOTE Precise garbage collection

00:08:46.919 --> 00:08:47.599
The other thing we're going to talk

00:08:47.600 --> 00:08:51.319
about is precise garbage collection. In Rune, we have a

00:08:51.320 --> 00:08:54.439
safe, precise garbage collection because of the Rust type

00:08:54.440 --> 00:08:58.119
system. Let's look at what the problem is with garbage

00:08:58.120 --> 00:09:00.479
collection in the first place. Really, the tricky part

00:09:00.480 --> 00:09:03.719
about garbage collection is rooting. How do we find out what

00:09:03.720 --> 00:09:06.159
the roots are? These are all the values that are on the

00:09:06.160 --> 00:09:08.679
stack or inside the registers. In this example here, we

00:09:08.680 --> 00:09:11.919
allocate an object. We call garbage_collect, that object's

00:09:11.920 --> 00:09:13.536
collected, and then we try and return it.

00:09:13.537 --> 00:09:16.536
It's no longer valid.

NOTE How Emacs used to deal with roots

00:09:16.537 --> 00:09:19.039
Let's look at how Emacs used to deal with this

00:09:19.040 --> 00:09:22.559
problem way back in the day. There was a system called gcpro

00:09:22.560 --> 00:09:26.319
or GC Protect, which is basically designed that every time a

00:09:26.320 --> 00:09:28.919
value needed to survive past a garbage collection point,

00:09:28.920 --> 00:09:32.359
you had to try and protect it. In order to do this, you had

00:09:32.360 --> 00:09:35.439
to declare a struct, you had to put a macro around it to root

00:09:35.440 --> 00:09:37.999
the object, and then you had to unroot it when you were done--

00:09:38.000 --> 00:09:41.559
past the garbage collection. Now the value is safe. You

00:09:41.560 --> 00:09:44.039
can see down here, I pulled these eight rules out from a

00:09:44.040 --> 00:09:46.919
really old version of the Emacs manual about all the things

00:09:46.920 --> 00:09:49.279
you had to keep track of when you were trying to use this

00:09:49.280 --> 00:09:52.319
system. All right, so there was a special handling for

00:09:52.320 --> 00:09:54.639
nested GC protects. You had to make sure the memory was

00:09:54.640 --> 00:09:58.239
initialized. You had to make sure that traps couldn't occur

00:09:58.240 --> 00:10:00.839
between allocating and when GC protect would happen. It

00:10:00.840 --> 00:10:03.319
can be tricky because you don't always know when a function

00:10:03.320 --> 00:10:06.879
that's getting called could potentially call garbage

00:10:06.880 --> 00:10:10.719
collection. So if you got something wrong, you also

00:10:10.720 --> 00:10:12.719
might not catch it for a long time because garbage

00:10:12.720 --> 00:10:15.719
collection may only get called one out of 99 times. The other

00:10:15.720 --> 00:10:18.999
99 times is just fine. That one time it happens and you

00:10:19.000 --> 00:10:22.559
can't reproduce the issue. When you do get this wrong and

00:10:22.560 --> 00:10:24.439
some, something doesn't get rooted and it gets

00:10:24.440 --> 00:10:26.319
overwritten, it generally doesn't show up right where the

00:10:26.320 --> 00:10:28.799
problem is. It gets showed up way later when you actually try

00:10:28.800 --> 00:10:31.479
and access the value and the value is invalid. You've got

00:10:31.480 --> 00:10:33.639
to track it back to where that thing did not get properly

00:10:33.640 --> 00:10:37.359
rooted. It's a huge source of bugs and very hard to

00:10:37.360 --> 00:10:38.712
maintain.

NOTE Conservative stack scanning

00:10:38.713 --> 00:10:40.119
Emacs decided to go with a different path,

00:10:40.120 --> 00:10:42.399
which we call conservative stack scanning. Basically,

00:10:42.400 --> 00:10:45.239
the garbage collector just looks at the stack and all the

00:10:45.240 --> 00:10:47.959
registers and any data inside there that looks like it could

00:10:47.960 --> 00:10:52.279
be a pointer, it treats it as a pointer. This is nice because

00:10:52.280 --> 00:10:54.711
you get really easy root tracking,

00:10:54.712 --> 00:10:56.113
but it also comes with some trade-offs,

00:10:56.114 --> 00:11:00.156
mostly that your objects are no longer movable.

NOTE Movable objects

00:11:00.157 --> 00:11:03.079
Why would we want movable objects in Emacs?

00:11:03.080 --> 00:11:05.839
There's a couple of different reasons. One is compaction.

00:11:05.840 --> 00:11:08.199
You can take all your heap, you can pack that on down because

00:11:08.200 --> 00:11:11.239
you can coalesce all your objects together. Another is that

00:11:11.240 --> 00:11:13.239
it's easy to implement generational garbage collection.

00:11:13.240 --> 00:11:16.039
You can just copy everything out of your minor heap into your

00:11:16.040 --> 00:11:21.839
older heap. Really, Emacs is kind of uniquely ideal for

00:11:21.840 --> 00:11:24.279
generational collection, because the typical way we

00:11:24.280 --> 00:11:27.799
interact with Emacs is as a series of commands. You execute

00:11:27.800 --> 00:11:29.959
some command, you'd execute the next command, you execute

00:11:29.960 --> 00:11:33.199
a command. It could be happening every key press, it could be

00:11:33.200 --> 00:11:36.759
happening with M-x. However long that command is, that is

00:11:36.760 --> 00:11:40.959
the ideal length for the minor collection generation, the

00:11:40.960 --> 00:11:43.399
first generation. Because once you're done with that

00:11:43.400 --> 00:11:45.879
generation, anything that's still existing is going to be

00:11:45.880 --> 00:11:49.079
around for a very long time. So that works out really well

00:11:49.080 --> 00:11:52.279
for Emacs. We want to make this a generational collector.

00:11:52.280 --> 00:11:56.199
The other thing is with object layout. We use a lot of lists

00:11:56.200 --> 00:12:00.559
inside Emacs Lisp. Every time you go to the cdr, you've

00:12:00.560 --> 00:12:03.039
got to be chasing a pointer around the heap and following

00:12:03.040 --> 00:12:05.439
that. That can potentially result in cache misses and

00:12:05.440 --> 00:12:08.239
all sorts of other things like that. So it can take a long

00:12:08.240 --> 00:12:12.159
time. It can be quite slow. But if you have the ability to move

00:12:12.160 --> 00:12:16.559
objects, you can just relocate an entire list and lay it out

00:12:16.560 --> 00:12:19.168
in an array right next to each other inside memory.

00:12:19.169 --> 00:12:22.479
So iterating over it is just as fast as iterating over an array.

00:12:22.480 --> 00:12:25.421
But you can only do that if you have movable objects.

00:12:25.422 --> 00:12:28.399
I'll point out here too, that with conservative stack scanning,

00:12:28.400 --> 00:12:31.599
it's not that all objects are immovable. It's only ones that

00:12:31.600 --> 00:12:35.519
are pointed to from the stack or from the registers that have

00:12:35.520 --> 00:12:38.828
to become immovable.

NOTE How Rust makes precise GC easy

00:12:38.829 --> 00:12:41.039
Let's look at how Rust makes precise

00:12:41.040 --> 00:12:44.439
garbage collection easy. Here I have some Rust code to

00:12:44.440 --> 00:12:47.279
show kind of how the lifetime system works and what we call

00:12:47.280 --> 00:12:49.879
XOR mutability, where we can only have one mutable

00:12:49.880 --> 00:12:52.879
reference or multiple immutable references to the same

00:12:52.880 --> 00:12:56.199
thing. Here we declare a vector, we take a reference to the

00:12:56.200 --> 00:12:59.199
first element of the vector, and then we mutate the vector.

00:12:59.200 --> 00:13:02.239
Now this could potentially resize the vector and move it to a

00:13:02.240 --> 00:13:04.919
different location in memory, so that reference is no

00:13:04.920 --> 00:13:07.759
longer valid. The nice thing is, Rust catches this for

00:13:07.760 --> 00:13:10.479
us. It says, hey, this is no longer valid. This reference

00:13:10.480 --> 00:13:14.519
can't survive past when you mutated it. Okay? That's

00:13:14.520 --> 00:13:17.559
exactly what we want for a garbage collector. You can see

00:13:17.560 --> 00:13:19.879
here, we take this in a garbage collection context, we

00:13:19.880 --> 00:13:23.359
create a new context object, we add an object, we call

00:13:23.360 --> 00:13:26.759
garbage_collect, then we try and access that object. It's no

00:13:26.760 --> 00:13:29.199
longer accessible, and Rust will prevent us from trying to

00:13:29.200 --> 00:13:34.839
access that variable. So, how do we solve this? We have a

00:13:34.840 --> 00:13:39.759
root macro. We declared this root macro, it lets us take the

00:13:39.760 --> 00:13:41.759
object and let it live past garbage collection, and

00:13:41.760 --> 00:13:45.319
everything works out. The nice thing is, this root macro

00:13:45.320 --> 00:13:47.799
will get dropped when it's out of scope, so we don't have to

00:13:47.800 --> 00:13:51.519
worry about the un-gc-protect step of this. Statically,

00:13:51.520 --> 00:13:55.799
Rust will verify and tell us any object that needs to be

00:13:55.800 --> 00:13:58.279
rooted. If we try and access it, it'll tell us it's invalid.

00:13:58.280 --> 00:14:00.999
We have this root macro and then we can access it. So in

00:14:01.000 --> 00:14:03.759
that way, we have safe, precise garbage collection without

00:14:03.760 --> 00:14:07.479
any chance of introducing undefined behavior, which is

00:14:07.480 --> 00:14:09.999
really, really powerful. It's really easy because the

00:14:10.000 --> 00:14:13.226
type system will catch it all for us.

NOTE Other Rust niceties: proc macro

00:14:13.227 --> 00:14:15.147
There's some other Rust niceties I want to kind of

00:14:15.148 --> 00:14:16.799
talk through that are useful, but

00:14:16.800 --> 00:14:21.079
are not, you know, star features. One is proc macros. You

00:14:21.080 --> 00:14:23.679
can see up on the top, you can see how you declare a function

00:14:23.680 --> 00:14:27.359
inside the C core. All right. You have to use the macro. You

00:14:27.360 --> 00:14:29.141
have to put the list type, the function type,

00:14:29.142 --> 00:14:30.963
the struct type, the different types of arguments

00:14:30.964 --> 00:14:33.225
or different number of arguments, the doc string,

00:14:33.226 --> 00:14:36.023
and then you can put your argument listing down inside there.

00:14:36.024 --> 00:14:37.984
On the Rust side, we just write this like we would

00:14:37.985 --> 00:14:40.044
any other Rust function. And then we put

00:14:40.045 --> 00:14:41.285
the defun proc macro on there

00:14:41.286 --> 00:14:44.186
and it takes care of everything for us behind the scenes.

00:14:44.187 --> 00:14:46.407
A couple of cool additional things we can do with this

00:14:46.408 --> 00:14:48.727
is that we don't have to make everything just an object.

00:14:48.728 --> 00:14:49.759
We can actually make things

00:14:49.760 --> 00:14:54.239
more specific types. Here we have symbols. As well as

00:14:54.240 --> 00:14:56.279
you can see subfeature, it's an optional parameter, and we

00:14:56.280 --> 00:15:00.919
just make it an option inside Rust and it automatically make

00:15:00.920 --> 00:15:03.599
it an optional inside Elisp.

00:15:03.600 --> 00:15:05.181
This makes them really easy to write.

00:15:05.182 --> 00:15:06.439
I can't take credit for this is because this is

00:15:06.440 --> 00:15:09.119
something that I saw inside Remacs and I stole from them, but

00:15:09.120 --> 00:15:11.439
it makes the functions really easy to call from each other

00:15:11.440 --> 00:15:14.559
and really easy to write as well.

NOTE sum types

00:15:14.560 --> 00:15:18.523
Another thing that's really nice is sum types.

00:15:18.524 --> 00:15:21.039
In the C core, if I wanted to get a

00:15:21.040 --> 00:15:23.759
string out of an object, I would first need to check that it's

00:15:23.760 --> 00:15:28.319
a string and then dereference it as a string. But if it's not a

00:15:28.320 --> 00:15:30.679
string, I may introduce undefined behavior. So in

00:15:30.680 --> 00:15:32.799
complicated code, I have to make sure that I have always

00:15:32.800 --> 00:15:34.959
checked what type it is before I try and dereference that

00:15:34.960 --> 00:15:37.879
type. We don't have to worry about any of that inside Rust

00:15:37.880 --> 00:15:41.319
because we can untag a value and we can use their some types,

00:15:41.320 --> 00:15:44.399
basically create an enum and we can match on what the

00:15:44.400 --> 00:15:47.639
different values can be. Then we only get out the types

00:15:47.640 --> 00:15:50.359
that are viable or are actually there. So we never

00:15:50.360 --> 00:15:52.159
accidentally get something out of an object that we didn't

00:15:52.160 --> 00:15:54.239
mean to, or dereference it as something that doesn't

00:15:54.240 --> 00:15:56.879
really exist. We can just match on it and we can get out the

00:15:56.880 --> 00:16:01.040
values that we need, which is really, really powerful.

NOTE Regex

00:16:01.041 --> 00:16:03.639
So there's some other Rust niceties as well working with here.

00:16:03.640 --> 00:16:07.799
One is the regex engine inside Rust is really fast, high

00:16:07.800 --> 00:16:10.959
performance. We are using that for the Elixir regex

00:16:10.960 --> 00:16:14.879
engine to give it high performance and worst-case

00:16:14.880 --> 00:16:16.051
guarantees.

NOTE Parsers

00:16:16.052 --> 00:16:18.599
The other is that Rust has a lot of really good

00:16:18.600 --> 00:16:21.559
parsers for things like JSON that are no copy parsers that

00:16:21.560 --> 00:16:24.719
are high performance. We can use those inside Rune as

00:16:24.720 --> 00:16:27.209
well.

NOTE Other changes: GUI first, terminal second

00:16:27.210 --> 00:16:29.439
There's a handful of other changes we're working on

00:16:29.440 --> 00:16:33.119
that are not Rust-specific, but we'd like to see. The first is

00:16:33.120 --> 00:16:36.759
being GUI first, terminal second. This means two things.

00:16:36.760 --> 00:16:40.039
First is that we have all of our key bindings. Right now

00:16:40.040 --> 00:16:43.279
inside Emacs, C-i and TAB are bound to the same key

00:16:43.280 --> 00:16:45.039
binding by default, because that's how it works inside the

00:16:45.040 --> 00:16:48.119
terminal. In the GUI, you shouldn't have that limitation.

00:16:48.120 --> 00:16:52.559
The second is that the GUI should not block when Lisp is

00:16:52.560 --> 00:16:55.199
blocked. It should be independent of that. Your GUI can

00:16:55.200 --> 00:16:58.918
still continue to operate when Lisp is running.

NOTE Off-screen cursor

00:16:58.919 --> 00:17:01.279
The other is the ability to have an off-screen cursor

00:17:01.280 --> 00:17:02.699
so that you can be typing on something,

00:17:02.700 --> 00:17:04.319
you can scroll up and down and the point

00:17:04.320 --> 00:17:06.719
doesn't have to follow you where you lose your place where

00:17:06.720 --> 00:17:09.399
you were before. You don't have to intentionally set a mark.

00:17:09.400 --> 00:17:11.199
You can just scroll and then start typing and it'll go back up

00:17:11.200 --> 00:17:13.879
to where it was before, like it works in most applications.

00:17:13.880 --> 00:17:16.304
And this can be optional.

NOTE Image flow

00:17:16.305 --> 00:17:18.079
The other is image flow. We want it

00:17:18.080 --> 00:17:20.879
so that you can easily flow images and you can have large

00:17:20.880 --> 00:17:23.159
images and scroll past them without jumping and you can flow

00:17:23.160 --> 00:17:24.439
text around images.

NOTE Testing

00:17:24.440 --> 00:17:29.799
How are we testing this project? Because there's a lot of

00:17:29.800 --> 00:17:33.159
things that you could get wrong here. One thing we're doing

00:17:33.160 --> 00:17:38.039
is we're using ERT. Emacs ships with over 7,000 built-in

00:17:38.040 --> 00:17:42.879
tests--Elisp tests. We are using this test suite to test

00:17:42.880 --> 00:17:45.079
our project as well. We can kind of use this as a dashboard

00:17:45.080 --> 00:17:47.679
of saying how close are we to getting to parity with GNU

00:17:47.680 --> 00:17:52.319
Emacs. The other thing that we have is a tool called elprop,

00:17:52.320 --> 00:17:55.279
which is an external utility that basically tests for

00:17:55.280 --> 00:17:58.719
correctness. Because really, the correctness of Rune is

00:17:58.720 --> 00:18:00.999
whatever Emacs does, because there's no official spec on

00:18:01.000 --> 00:18:04.079
how things should behave. To do this, we can go look at

00:18:04.080 --> 00:18:07.159
the Rust function signature. We know what the arguments

00:18:07.160 --> 00:18:09.319
are, we know how many they are, and we know what types they

00:18:09.320 --> 00:18:11.679
should be. Given that information, we can generate a

00:18:11.680 --> 00:18:15.279
whole bunch of random functions feeding those types in. And

00:18:15.280 --> 00:18:18.959
then we send a copy over to Emacs, we send a copy over to Rune.

00:18:18.960 --> 00:18:21.679
They each evaluate it and they return the result and we make

00:18:21.680 --> 00:18:23.519
sure the results are the same. Then you do that for

00:18:23.520 --> 00:18:26.199
thousands of different implementations of the function.

00:18:26.200 --> 00:18:29.039
And it helps us find corner cases really easy without having

00:18:29.040 --> 00:18:31.639
to handwrite a whole bunch of different cases to test things

00:18:31.640 --> 00:18:36.344
and say, where are these two functions different?

NOTE Status

00:18:36.345 --> 00:18:39.359
So the current status: we already have a multi-threaded Elixir

00:18:39.360 --> 00:18:42.999
interpreter and bytecode engine inside there. There's no

00:18:43.000 --> 00:18:45.679
actual text editor in there yet, but the primitives are

00:18:45.680 --> 00:18:48.679
there. Like you can insert text, move point around,

00:18:48.680 --> 00:18:52.039
delete text, do different things like that. But we don't

00:18:52.040 --> 00:18:53.679
have a GUI hooked up to different key bindings to actually

00:18:53.680 --> 00:18:58.159
type on. There's just a REPL to operate in. We have about

00:18:58.160 --> 00:19:01.279
250 of the 1500 built-in functions already implemented

00:19:01.280 --> 00:19:04.119
inside there. There's a lot of low-hanging fruit inside this

00:19:04.120 --> 00:19:07.246
area to still be implemented.

NOTE Next directions

00:19:07.247 --> 00:19:07.719
The next directions we're

00:19:07.720 --> 00:19:11.959
working on is we're optimizing the GC. We want to make it

00:19:11.960 --> 00:19:13.839
generational. Like I said, right now, it's just a simple

00:19:13.840 --> 00:19:17.359
semi-spaced copying GC. We want to add a proper GUI. We need

00:19:17.360 --> 00:19:19.599
to implement text properties, overlays, process and job

00:19:19.600 --> 00:19:22.738
control, all that goodness right there.

NOTE How to get involved

00:19:22.739 --> 00:19:25.378
How can you get involved? This is hosted on GitHub.

00:19:25.379 --> 00:19:26.424
You can come on over.

00:19:26.425 --> 00:19:28.639
If you have any ideas about how to implement something or

00:19:28.640 --> 00:19:30.639
something you'd like to see done, go ahead and just open an

00:19:30.640 --> 00:19:32.799
issue so we can have a discussion about it. We've had lots of

00:19:32.800 --> 00:19:34.599
interesting discussions with different people coming in

00:19:34.600 --> 00:19:37.639
to the GitHub repo. If you're interested in contributing,

00:19:37.640 --> 00:19:40.439
the easiest way is probably to run elprop, pick some

00:19:40.440 --> 00:19:43.279
function, run elprop on it. I promise it won't take long to

00:19:43.280 --> 00:19:45.639
find some issues, some discrepancy between Emacs and Rune,

00:19:45.640 --> 00:19:48.959
and that lets you dive into the Rust code and figure out, and

00:19:48.960 --> 00:19:50.879
the C code, and figure out what the difference is between the

00:19:50.880 --> 00:19:53.119
two. or come along and help implement your favorite

00:19:53.120 --> 00:19:55.679
functionality. This has been a really interesting project

00:19:55.680 --> 00:19:58.359
so far, and we've had a handful of different contributors on

00:19:58.360 --> 00:20:01.799
it who just kind of want to learn Rust or get more into

00:20:01.800 --> 00:20:06.000
systems-level programming. Thank you.