2021/captions/emacsconf-2021-faster--optimizing-emacs-lisp-code--dmitry-gutov--main.vtt


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528

WEBVTT

00:01.120 --> 00:00:04.640
Hi. Greetings from the cloudy St. Petersburg.

00:04.640 --> 00:00:06.080
My name is Dmitry Gutov. 

00:00:06.080 --> 00:00:09.280
I write Ruby by day, Emacs Lisp by night

00:00:09.280 --> 00:00:12.080
when I don't do anything else.

00:12.080 --> 00:00:14.559
You might know my work 

00:00:14.559 --> 00:00:18.160
from a number of third-party packages

00:00:18.160 --> 00:00:22.240
as well as built-in ones.

00:22.240 --> 00:00:25.599
The idea for this talk came out from

00:25.599 --> 00:00:26.800
an improvement request 

00:00:26.800 --> 00:00:32.960
for the performance of grep-like commands

00:32.960 --> 00:00:35.200
using the xref interface 

00:00:35.200 --> 00:00:43.280
and its storage types, container types 

00:00:43.280 --> 00:00:47.840
for the search results,

00:47.840 --> 00:00:51.440
complaining that it's not as fast

00:00:51.440 --> 00:00:54.320
as it potentially could have been

00:54.320 --> 00:00:58.239
when there are lots of search results 

00:00:58.239 --> 00:01:01.600
coming for a given search.

01:01.600 --> 00:01:06.159
I have noticed myself that 

00:01:06.159 --> 00:01:10.159
when working on it, and probably before. 

00:01:10.159 --> 00:01:16.560
My approach to optimizing Lisp code 

01:16.560 --> 00:01:21.680
has changed recently-ish,

01:21.680 --> 00:01:26.960
and I'd like to talk about it here.

01:26.960 --> 00:01:34.079
This talk is for people who already

01:34.079 --> 00:01:37.040
know how to write some Emacs Lisp 

00:01:37.040 --> 00:01:38.880
to solve their problems 

00:01:38.880 --> 00:01:43.360
and who have possibly encountered 

00:01:43.360 --> 00:01:49.200
some performance issues doing that.

01:49.200 --> 00:01:51.600
Also, if you want to contribute 

00:01:51.600 --> 00:01:54.799
some improvements to the code 

00:01:54.799 --> 00:01:56.960
that is already in Emacs 

00:01:56.960 --> 00:02:02.159
or to other packages 

00:02:02.159 --> 00:02:04.479
which are owned by somebody else, 

00:02:04.479 --> 00:02:13.520
it should also be helpful. Let's start. 

00:02:13.520 --> 00:02:18.640
First of all, about Emacs and Emacs Lisp. 

00:02:18.640 --> 00:02:21.360
It's not the fastest language.

02:21.360 --> 00:02:26.560
Let's switch to the notes. 

00:02:26.560 --> 00:02:31.440
I hope the font is big enough 

00:02:31.440 --> 00:02:36.479
to be readable on this video, really hope.

02:36.480 --> 00:02:40.160
Emacs Lisp is not the fastest of the bunch.

00:02:40.160 --> 00:02:44.400
The garbage collector creates pauses 

00:02:44.400 --> 00:02:48.800
whenever it needs to sweep data.

02:48.800 --> 00:02:51.760
The interpreter is not the fastest one,

02:51.760 --> 00:02:54.720
even though the native compilation branch

00:02:54.720 --> 00:02:59.120
is improving on that by twice

00:02:59.120 --> 00:03:04.640
in certain scenarios, which is good.

03:04.640 --> 00:03:10.560
The standard library or functions

00:03:10.560 --> 00:03:12.800
for working with collections, for example,

03:12.800 --> 00:03:17.599
are not so uniformly great.

03:17.599 --> 00:03:20.080
So there is some work to be done there.

03:20.080 --> 00:03:22.239
Maybe you can contribute

03:22.239 --> 00:03:26.640
the next improvement in there.

03:26.640 --> 00:03:33.200
And also, if your package displays stuff, 

00:03:33.200 --> 00:03:35.040
it has a visual component,

03:35.040 --> 00:03:36.959
then you might have to deal with

00:03:36.959 --> 00:03:43.120
some drawbacks of the display engine

03:43.120 --> 00:03:47.760
which might slow to a crawl 

00:03:47.760 --> 00:03:53.680
when your code has... 

00:03:53.680 --> 00:03:56.560
when you're trying to display lines 

00:03:56.560 --> 00:04:00.319
which are a little too long--

00:04:00.319 --> 00:04:03.120
trying to print a [long] line, for example--

04:03.120 --> 00:04:07.120
or you are using a lot of overlays,

04:07.120 --> 00:04:09.120
if you know what it is.

04:09.120 --> 00:04:12.640
With lots of overlays 

00:04:12.640 --> 00:04:19.839
on the same visual line, in particular.

04:19.840 --> 00:04:24.160
But okay. The first thing to understand,

00:04:24.160 --> 00:04:27.680
and I hope everybody who's done

00:04:27.680 --> 00:04:29.520
some programming in the past knows, 

00:04:29.520 --> 00:04:35.120
it's: first you write correctly, 

00:04:35.120 --> 00:04:37.360
and then you try to benchmark.

04:37.360 --> 00:04:39.199
and see where the problems are, 

00:04:39.199 --> 00:04:40.720
if there are any.

04:40.720 --> 00:04:45.919
So first do it right, then do it fast.

04:45.919 --> 00:04:50.720
How do we find the hotspots, 

00:04:50.720 --> 00:04:52.400
the bottlenecks on the second step? 

00:04:52.400 --> 00:04:54.720
We try to do some profiling 

00:04:54.720 --> 00:04:58.960
or measuring how long the code takes.

04:58.960 --> 00:05:03.038
Emacs has two different profilers. 

00:05:03.039 --> 00:05:05.440
One is the native profiler,

05:05.440 --> 00:05:10.639
which as I recall was contributed by

00:05:10.639 --> 00:05:13.280
Tomohiro Matsuyama, the author 

00:05:13.280 --> 00:05:16.240
of the autocomplete package

05:16.240 --> 00:05:22.720
back in Emacs 24.3, apparently.

05:22.720 --> 00:05:24.320
It's a low overhead profiler. 

00:05:24.320 --> 00:05:31.520
It's sampling, which is good on one hand

05:31.520 --> 00:05:33.919
but might be less good on the other hand.

05:33.919 --> 00:05:35.199
Let's try using it. 

00:05:35.199 --> 00:05:40.960
So we start with profiler-start. 

00:05:40.960 --> 00:05:44.960
Let's just ask for CPU profiling here,

00:05:44.960 --> 00:05:48.320
but it also can profile memory locations.

05:48.320 --> 00:05:56.400
Let's try a search for some string

00:05:56.400 --> 00:06:01.360
in the current project.

06:01.360 --> 00:06:04.080
Let's do it a few times, maybe three times,

00:06:04.080 --> 00:06:09.360
so the profiler has more data to work with.

06:09.360 --> 00:06:13.680
Let's call the report.

06:13.680 --> 00:06:16.960
Okay. So here we have the tree, 

00:06:16.960 --> 00:06:19.360
the execution tree with percentages

00:06:19.360 --> 00:06:22.960
of the time spent in each function.

06:22.960 --> 00:06:25.840
You can unwrap it by going up and down

00:06:25.840 --> 00:06:35.039
and pressing TAB to unwrap every element.

06:35.039 --> 00:06:52.000
This is weird.

06:52.000 --> 00:06:55.199
Okay, here we see that the actual command

00:06:55.199 --> 00:06:56.639
that was called only takes 

00:06:56.639 --> 00:06:59.599
8% of the whole runtime,

06:59.599 --> 00:07:07.759
meaning the input was...

07:07.759 --> 00:07:10.960
The command took not enough time 

00:07:10.960 --> 00:07:13.360
for us to really dig into it. 

00:07:13.360 --> 00:07:18.960
Let's try a shorter input with more matches. 

00:07:18.960 --> 00:07:25.680
So profiler-start again, CPU.

07:25.680 --> 00:07:34.240
Let's search for list.

07:34.240 --> 00:07:40.160
Don't mind the minibuffer just yet.

07:40.160 --> 00:07:47.680
Okay. So let's look at the report.

07:47.680 --> 00:07:52.000
We can unwrap it here, and we see

00:07:52.000 --> 00:07:55.759
52% of the time was spent doing this,

07:55.759 --> 00:07:58.240
at least according to the profile. 

00:07:58.240 --> 00:08:00.960
We can unwrap it, see the description 

00:08:00.960 --> 00:08:02.800
of any function that was called 

00:08:02.800 --> 00:08:08.479
by hitting d, or even jump to it 

00:08:08.479 --> 00:08:12.960
by tapping f.

08:12.960 --> 00:08:16.160
By going down the stream,

08:16.160 --> 00:08:18.240
we unwrap it with TAB,

08:18.240 --> 00:08:22.400
you can see where time was spent.

08:22.400 --> 00:08:25.199
One of the bigger drawbacks 

00:08:25.199 --> 00:08:30.639
of this profiler is the arithmetics

08:30.639 --> 00:08:39.200
don't always work out, 

00:08:39.200 --> 00:08:45.760
like you might have...

08:45.760 --> 00:08:47.519
This is not a good example, 

00:08:47.519 --> 00:08:52.640
but okay, you have 14% spent in here,

00:08:52.640 --> 00:08:57.200
but when we expand this entry,

00:08:57.200 --> 00:09:01.760
we only see like 6%, 3%, and 0%. 

00:09:01.760 --> 00:09:06.640
Different sum, sometimes even bigger than that.

00:09:06.640 --> 00:09:10.800
So the native profiler 

00:09:10.800 --> 00:09:13.200
can give an accurate picture 

00:09:13.200 --> 00:09:15.920
and it has little overhead, 

00:09:15.920 --> 00:09:20.399
but the specific numbers 

00:09:20.399 --> 00:09:22.959
are not very precise

09:22.959 --> 00:09:28.640
because the principle is probabilistic.

09:28.640 --> 00:09:31.199
Let's stop here.

09:31.200 --> 00:09:36.959
There is another package called elp,

09:36.959 --> 00:09:39.360
Emacs Lisp Profiler, 

00:09:39.360 --> 00:09:43.440
which is much older than that.

09:43.440 --> 00:09:47.920
It allows us to instrument

09:47.920 --> 00:09:53.600
just specific functions or a package.

09:53.600 --> 00:09:57.680
We're instrumenting the xref package here.

09:57.680 --> 00:10:01.360
It works through advice. You can see

00:10:01.360 --> 00:10:03.279
the description of one of the functions

00:10:03.279 --> 00:10:04.320
in the package, and we see 

00:10:04.320 --> 00:10:12.640
that it has had :around device added.

10:12.640 --> 00:10:18.000
If we run the same search,

10:18.000 --> 00:10:21.360
we can see that -- when it finishes --

00:10:21.360 --> 00:10:30.399
the table of all the numbers,

10:30.399 --> 00:10:32.640
that every function in the package 

00:10:32.640 --> 00:10:41.680
has been called, and the times 

00:10:41.680 --> 00:10:48.000
which the runtime has spent inside of them.

10:48.000 --> 00:10:55.040
sorted by impact, as it understands that.

10:55.040 --> 00:11:00.079
The main problem with this profiler is 

00:11:00.079 --> 00:11:04.160
it is slower because it adds overhead

00:11:04.160 --> 00:11:06.880
to every function call,

11:06.880 --> 00:11:11.519
so it, in the end, might give an impression

00:11:11.519 --> 00:11:19.519
that functions with lots of calls,

11:19.519 --> 00:11:21.839
which have been called a lot of times,

11:21.839 --> 00:11:26.880
are more important, 

11:26.880 --> 00:11:29.279
hotter than they actually are, 

00:11:29.279 --> 00:11:30.959
because it slows down 

00:11:30.959 --> 00:11:32.640
every such function call,

11:32.640 --> 00:11:35.279
including the subsequent calls 

00:11:35.279 --> 00:11:38.160
that these functions do 

00:11:38.160 --> 00:11:40.560
inside the same package.

11:40.560 --> 00:11:48.240
So it's a good way to analyze and drill down

11:48.240 --> 00:11:50.320
to see which functions here 

00:11:50.320 --> 00:11:51.200
take a lot of time,

00:11:51.200 --> 00:11:53.839
but just keep in mind 

00:11:53.839 --> 00:11:57.760
that sometimes you might end up 

00:11:57.760 --> 00:12:00.079
trying to optimize one of these 

00:12:00.079 --> 00:12:01.600
smaller functions called 

00:12:01.600 --> 00:12:09.279
or usually smaller, and that the result

00:12:09.279 --> 00:12:11.040
might not actually affect 

00:12:11.040 --> 00:12:18.720
the production runtime at all.

12:18.720 --> 00:12:22.240
But it's still a good tool,

00:12:22.240 --> 00:12:25.440
especially when you already know

12:25.440 --> 00:12:30.560
which set of actions you are interested in, 

00:12:30.560 --> 00:12:33.760
and which functions might be slower.

12:33.760 --> 00:12:37.120
elp allows you to instrument a package 

00:12:37.120 --> 00:12:42.720
or a set of functions. Just don't forget 

00:12:42.720 --> 00:12:49.600
to un-instrument them all afterwards, 

00:12:49.600 --> 00:12:52.000
or else your session, 

00:12:52.000 --> 00:12:55.920
your subsequent optimization efforts 

00:12:55.920 --> 00:12:59.360
might not work as well as you might

12:59.360 --> 00:13:01.359
want to work.

13:01.360 --> 00:13:04.560
And there's also this very nice, 

00:13:04.560 --> 00:13:09.600
very handy package called benchmark,

13:09.600 --> 00:13:12.480
which unfortunately 

00:13:12.480 --> 00:13:16.000
not everybody knows about.

13:16.000 --> 00:13:21.519
It turns out that a lot of 

00:13:21.519 --> 00:13:30.079
older Emacs Lisp developers,

13:30.079 --> 00:13:34.160
users, package developers usually have 

00:13:34.160 --> 00:13:38.000
some benchmarking macros of their own.

00:13:38.000 --> 00:13:42.880
But there is this package 

00:13:42.880 --> 00:13:46.160
with perfectly usable interface 

00:13:46.160 --> 00:13:49.199
which doesn't require you 

00:13:49.199 --> 00:13:51.120
to define something else,

00:13:51.120 --> 00:13:52.560
especially when you are in

00:13:52.560 --> 00:13:57.120
an emacs -Q session

13:57.120 --> 00:14:00.480
trying to benchmark your code

00:14:00.480 --> 00:14:05.440
in a bare Emacs. So it has 

00:14:05.440 --> 00:14:09.680
two main endpoints, I would say.

14:09.680 --> 00:14:14.480
First one is the benchmark macro,

14:14.480 --> 00:14:19.199
and the second one is benchmark-progn. 

00:14:19.199 --> 00:14:20.399
benchmark is a function,

00:14:20.399 --> 00:14:25.040
and benchmark-progn is a macro.

14:25.040 --> 00:14:27.839
The first one you can use by specifying

00:14:27.839 --> 00:14:30.480
the number of iterations and the form. 

00:14:30.480 --> 00:14:36.800
I hope the minibuffer is easy to read here.

14:36.800 --> 00:14:43.360
For instance, we can take this long list

00:14:43.360 --> 00:14:44.800
and try nreverse-ing it,

00:14:44.800 --> 00:14:49.680
and we see how long that takes.

14:49.680 --> 00:14:54.160
Then you can do it,

14:54.160 --> 00:14:55.600
adjust the inner code, 

00:14:55.600 --> 00:14:58.079
and then basically compare 

00:14:58.079 --> 00:14:59.519
to figure out how much

00:14:59.519 --> 00:15:01.440
and how long nreverse takes 

00:15:01.440 --> 00:15:03.839
in this scenario.

15:03.839 --> 00:15:07.760
Or, we can take a function 

00:15:07.760 --> 00:15:10.880
which we have probably found

15:10.880 --> 00:15:12.880
using one of the previous methods 

00:15:12.880 --> 00:15:17.279
which we anticipate that,

15:17.279 --> 00:15:19.440
as we understand, it takes a while,

15:19.440 --> 00:15:26.720
and annotate it with a benchmark-progn form, 

00:15:26.720 --> 00:15:33.360
which... just execute the body 

00:15:33.360 --> 00:15:36.880
and report, each and every one of them,

15:36.880 --> 00:15:41.360
how long that body execution took.

15:41.360 --> 00:15:45.360
So for instance, here we added

00:15:45.360 --> 00:15:46.959
a few message calls 

00:15:46.959 --> 00:15:49.040
inside those benchmark-progns, 

00:15:49.040 --> 00:15:53.199
so we can see how long each part

00:15:53.199 --> 00:16:03.600
of the function xref-matches-in-files 

00:16:03.600 --> 00:16:06.320
takes when it is run.

16:06.320 --> 00:16:11.839
Let's try it. Let's first call

16:11.839 --> 00:16:17.199
emacs-lisp-byte-compile-and-load,

16:17.199 --> 00:16:20.720
so that we're sure that we are running

16:20.720 --> 00:16:24.079
the fastest possible version of this code,

16:24.079 --> 00:16:29.519
so when we do find the bottlenecks,

00:16:29.519 --> 00:16:31.279
we are sure that they are real

00:16:31.279 --> 00:16:36.240
and not because of the uncompiled code, 

00:16:36.240 --> 00:16:43.519
macro expansion, or the lack of 

00:16:43.519 --> 00:16:46.160
some substitutions, other substitutions

00:16:46.160 --> 00:16:50.800
that our code is relying on

16:50.800 --> 00:17:01.279
but which might not be available

17:01.279 --> 00:17:03.279
in the interpreter mode 

00:17:03.279 --> 00:17:07.039
just in the compiled code.

17:07.039 --> 00:17:10.160
Let's run.

17:10.160 --> 00:17:14.880
So we have this list,

17:14.880 --> 00:17:26.160
search for list,

17:26.160 --> 00:17:29.760
and the remaining time is spent during 

00:17:29.760 --> 00:17:36.720
printing of the results, which we didn't

17:36.720 --> 00:17:38.320
annotate with the benchmarking macro, 

00:17:38.320 --> 00:17:41.280
but it's still there.

17:41.280 --> 00:17:43.679
So we can see by switching to the

17:43.679 --> 00:17:49.919
*Messages* buffer, C-h e,

17:49.919 --> 00:17:52.080
that unquoting was very fast.

00:17:52.080 --> 00:17:59.280
So the search took 400 milliseconds. 

00:17:59.280 --> 00:18:02.480
It's slower than it would be

18:02.480 --> 00:18:06.080
without the video being recorded, as well

00:18:06.080 --> 00:18:09.039
as the rest of the measurements here.

18:09.039 --> 00:18:12.160
So the parsing of the results took more

00:18:12.160 --> 00:18:16.559
than that, and the object allocation

18:16.559 --> 00:18:21.919
took even more, which is unfortunate 

00:18:21.919 --> 00:18:23.919
but it's the reality 

00:18:23.919 --> 00:18:26.400
when we're dealing with a lot of

00:18:26.400 --> 00:18:29.840
search results, because, well,

18:29.840 --> 00:18:33.520
Emacs Lisp is slower than grep.

18:34.559 --> 00:18:49.200
That's just a given.

18:49.200 --> 00:18:54.400
What can be done and what had been done

00:18:54.400 --> 00:18:57.120
to improve on this?

18:57.120 --> 00:19:04.240
Well, first of all,

19:04.240 --> 00:19:06.000
let's change the question, 

00:19:06.000 --> 00:19:07.360
because this is more of 

00:19:07.360 --> 00:19:09.760
a retrospective sort of talk

19:09.760 --> 00:19:13.439
rather than talking about future plans. 

00:19:13.440 --> 00:19:19.679
What can one do to improve performance?

19:19.679 --> 00:19:24.160
Well, basically, two things: 

00:19:24.160 --> 00:19:27.919
first, you try to make sure 

00:19:27.919 --> 00:19:30.559
that you're writing less code, 

00:19:30.559 --> 00:19:37.200
then your code does fewer things,

19:37.200 --> 00:19:40.400
Fewer iterations of your operations.

19:40.400 --> 00:19:42.000
Basically, it's the realm 

00:19:42.000 --> 00:19:46.880
of choosing your algorithm well.

19:46.880 --> 00:19:54.880
But make sure it's as little complexity 

00:19:54.880 --> 00:20:00.239
as you can manage reasonably.

20:00.240 --> 00:20:05.760
Another is to try to reduce memory allocations.

20:05.760 --> 00:20:13.760
It's creating conses, the links of the list,

20:13.760 --> 00:20:17.919
of the lists. If you are mapping through 

00:20:17.919 --> 00:20:19.200
a list, creating a new one, 

00:20:19.200 --> 00:20:22.640
you are creating conses.

00:20:22.640 --> 00:20:26.159
New objects in memory, anyway.

20:26.159 --> 00:20:30.480
And also, when you call string operations

00:20:30.480 --> 00:20:33.679
like substring, when you create new strings,

20:33.679 --> 00:20:36.640
that's also what you do.

20:36.640 --> 00:20:39.200
When you allocate new memory, 

00:20:39.200 --> 00:20:43.600
that triggers garbage collections later,

00:20:43.600 --> 00:20:49.200
which affect the performance of your code

20:49.200 --> 00:21:01.600
quite severely. I have found that actually, 

00:21:01.600 --> 00:21:04.960
contrary to my personal intuition,

21:04.960 --> 00:21:08.240
when you try to fine-tune the code

00:21:08.240 --> 00:21:13.120
working on long lists of elements,

00:21:13.120 --> 00:21:15.520
long, long collection,  large collections, 

00:21:15.520 --> 00:21:20.159
it's even more important to try to avoid

21:20.159 --> 00:21:23.200
or reduce memory allocations

21:23.200 --> 00:21:26.400
rather than try to ensure 

00:21:26.400 --> 00:21:32.240
that less code is running,

21:32.240 --> 00:21:34.960
less operations are performed,

21:34.960 --> 00:21:38.080
because the garbage collector

21:38.080 --> 00:21:44.799
can hit you in the posterior quite suddenly

00:21:44.799 --> 00:21:49.520
that you will not always...

21:49.520 --> 00:21:51.760
When you measure the input impact,

00:21:51.760 --> 00:21:56.799
you might not always measure the whole of it.

21:56.799 --> 00:22:02.640
And sometimes even when you have

00:22:02.640 --> 00:22:06.559
a few operations, you can measure

22:06.559 --> 00:22:09.520
all three of them, but at some point,

00:22:09.520 --> 00:22:11.919
if you just reduce a lot,

22:11.919 --> 00:22:14.720
remove most of the allocations, 

00:22:14.720 --> 00:22:16.799
all three of them, the improvement

00:22:16.799 --> 00:22:21.520
might be even bigger than the total

22:21.520 --> 00:22:23.440
of three improvements 

00:22:23.440 --> 00:22:27.200
which you might have measured previously,

22:27.200 --> 00:22:29.679
like separately.

22:29.679 --> 00:22:33.039
So it's something to be on the lookout for,

00:22:33.039 --> 00:22:39.520
but of course, when you pick the algorithm,

00:22:39.520 --> 00:22:45.200
it's important not to do

00:22:45.200 --> 00:22:52.158
more operations than you have to.

22:52.159 --> 00:22:56.000
We have examples of both 

00:22:56.000 --> 00:22:58.720
in the recent changes 

00:22:58.720 --> 00:23:02.559
to the xref and project packages,

23:02.559 --> 00:23:06.960
which we can examine here.

23:06.960 --> 00:23:11.200
Let's take a look at this one.

23:11.200 --> 00:23:14.880
This commit message lies a little bit

23:14.880 --> 00:23:19.039
because it's referring to 

00:23:19.039 --> 00:23:24.559
the use of assoc instead of cl-assoc,

23:24.559 --> 00:23:29.919
and the actual change incorporates that,

00:23:29.919 --> 00:23:33.440
but it also incorporates 

00:23:33.440 --> 00:23:39.440
the second part of the sentence 

00:23:39.440 --> 00:23:41.360
which actually replaced 

00:23:41.360 --> 00:23:46.960
the use of assoc in there.

23:46.960 --> 00:23:50.480
Curiously, cl-assoc was pretty slow 

00:23:50.480 --> 00:23:52.559
because not only it created 

00:23:52.559 --> 00:23:57.919
quadratic complexity in the operation

23:57.919 --> 00:23:59.919
and it was also calling 

00:23:59.919 --> 00:24:02.880
the Lisp function [equal]

24:02.880 --> 00:24:04.400
for every iteration, 

00:24:04.400 --> 00:24:08.880
whereas if we just use assoc there,

24:08.880 --> 00:24:10.080
which was the first version 

00:24:10.080 --> 00:24:13.919
of this change, of the improvement,

24:13.919 --> 00:24:15.760
it became already much faster, 

00:24:15.760 --> 00:24:20.640
but then switching to a hash table

24:20.640 --> 00:24:28.080
which turned this lookup 

00:24:28.080 --> 00:24:31.760
from O(n) complexity

24:31.760 --> 00:24:35.600
into, well, amortized constant one,

24:35.600 --> 00:24:37.600
even better.

24:37.600 --> 00:24:45.679
So, use hash tables, kids.

24:45.679 --> 00:24:52.400
Another commit here is about using 

00:24:52.400 --> 00:24:55.520
the inhibit-modification-hooks. 

00:24:55.520 --> 00:24:58.000
So, turns out when you're printing 

00:24:58.000 --> 00:25:01.679
into a buffer, even if you have already

25:01.679 --> 00:25:06.159
disabled the undo history,

25:06.159 --> 00:25:08.480
binding this variable to

25:08.480 --> 00:25:10.880
a non-null value is pretty good 

00:25:10.880 --> 00:25:15.520
because you are able to avoid running

25:15.520 --> 00:25:21.120
a number of hooks, which improves performance.

25:21.120 --> 00:25:28.640
Next. This one was about moving the 

00:25:28.640 --> 00:25:32.000
file-remote-p call

25:32.000 --> 00:25:37.279
from inside the loop.

25:37.279 --> 00:25:42.039
This function is actually 

00:25:42.039 --> 00:25:49.039
surprisingly slow-ish for the goal,

25:49.039 --> 00:25:52.000
for its purpose, so you don't really want

25:52.000 --> 00:25:56.159
to call it on every file name in the list

00:25:56.159 --> 00:25:59.520
when you have a lot of them,

25:59.520 --> 00:26:02.000
and especially if your code 

00:26:02.000 --> 00:26:04.640
is running in a buffer 

00:26:04.640 --> 00:26:10.640
visiting a remote file, like through TRAMP.

26:10.640 --> 00:26:15.120
You might end up trying to

00:26:15.120 --> 00:26:16.880
devise different approaches

00:26:16.880 --> 00:26:20.159
to avoid checking whether 

00:26:20.159 --> 00:26:23.279
every file name is remote

26:23.279 --> 00:26:25.360
if you're dealing with a list of file names, 

00:26:25.360 --> 00:26:34.640
so this one, take a look, be careful with it.

26:34.640 --> 00:26:43.120
A similar, slower function, with-current-buffer,

26:43.120 --> 00:26:46.880
but not so much. So it all depends on

26:46.880 --> 00:26:48.400
really how often you call it. 

00:26:48.400 --> 00:26:51.279
Sometimes you might want to 

00:26:51.279 --> 00:26:52.720
rewrite your code so that 

00:26:52.720 --> 00:26:54.720
it's called less often.

26:54.720 --> 00:26:59.760
And expand-file-name, which hits the disk.

26:59.760 --> 00:27:01.200
So if you're dealing with file names, 

00:27:01.200 --> 00:27:02.960
you might want to replace it 

00:27:02.960 --> 00:27:07.520
with concatenation at certain points.

27:07.520 --> 00:27:22.159
Okay, back to these changes later.

27:22.159 --> 00:27:28.480
This one just removed a location of a cons

27:28.480 --> 00:27:33.679
per match hit, and still it brought 5%

27:33.679 --> 00:27:37.039
or something like that improvement 

00:27:37.039 --> 00:27:41.120
to the whole command.

27:41.120 --> 00:27:46.000
So that's a pretty significant improvement 

00:27:46.000 --> 00:27:53.600
for a small change like that.

27:53.600 --> 00:28:01.679
Similarly, here we just made sure 

00:28:01.679 --> 00:28:09.520
to avoid a splits... no, a substring call,

28:09.520 --> 00:28:12.399
and probably an allocation of the whole

28:12.399 --> 00:28:16.320
buffer string, but in my testing,

00:28:16.320 --> 00:28:19.440
that doesn't actually matter much, 

00:28:19.440 --> 00:28:22.399
at least so much, but a substring call

00:28:22.399 --> 00:28:28.960
per result... If we see, 

00:28:28.960 --> 00:28:33.039
since we changed to manual parsing 

00:28:33.039 --> 00:28:37.440
of the buffer, with the strings 

00:28:37.440 --> 00:28:41.440
delimited with zero bytes,

28:41.440 --> 00:28:43.840
it gave an overall improvement 

00:28:43.840 --> 00:28:47.440
of 20%, again on my machine

00:28:47.440 --> 00:28:50.720
with a pretty fast SSD, 

00:28:50.720 --> 00:28:53.679
and with a warm disk cache, of course.

28:53.679 --> 00:29:05.360
But still... Going back to this revision,

00:29:05.360 --> 00:29:09.440
it was actually quite surprising

00:29:09.440 --> 00:29:15.200
that migration to a cl-defstruct

29:15.200 --> 00:29:24.080
from eieio, the Common Lisp-inspired 

00:29:24.080 --> 00:29:26.480
object system first introduced 

00:29:26.480 --> 00:29:34.240
in the CEDET tools, 

00:29:34.240 --> 00:29:39.360
that was a bit of a surprise,

00:29:39.360 --> 00:29:44.799
because not much of my 

00:29:44.799 --> 00:29:46.080
benchmark benchmarking 

00:29:46.080 --> 00:29:50.960
actually pointed at it being the problem.

29:50.960 --> 00:29:55.760
Probably because the accessors 

00:29:55.760 --> 00:29:57.520
were not the actual problem,

29:57.520 --> 00:30:00.399
like the oref macros 

00:30:00.399 --> 00:30:06.960
and the code accessing the slots,

30:06.960 --> 00:30:10.720
but the construction, 

00:30:10.720 --> 00:30:14.320
the object construction code,

30:14.320 --> 00:30:16.559
that was where most of the time 

00:30:16.559 --> 00:30:21.200
was spent unnecessarily,

30:21.200 --> 00:30:24.240
maybe doing type-checking,

30:24.240 --> 00:30:28.880
maybe some other stuff.

30:28.880 --> 00:30:36.080
So if you have lots of values,

30:36.080 --> 00:30:39.120
you need to treat like objects in... and

00:30:39.120 --> 00:30:42.000
virtual dispatch on them in your package, 

00:30:42.000 --> 00:30:45.200
you might want to look into

30:45.200 --> 00:30:52.239
cl-defstruct for them.

30:52.240 --> 00:30:54.080
Going on to the next section, 

00:30:54.080 --> 00:31:01.200
I have prepared the sort of comparison 

31:01.200 --> 00:31:05.519
between cl-lib, dash, and seq, 

00:31:05.519 --> 00:31:09.360
the collection libraries we have

31:09.360 --> 00:31:11.279
available for us in Emacs. 

00:31:11.279 --> 00:31:15.760
That is the popular ones,

31:15.760 --> 00:31:21.440
but since I'm running behind on time, 

00:31:21.440 --> 00:31:27.760
I'll probably just summarize the findings.

31:27.760 --> 00:31:31.919
First of all, seq is nice.

00:31:31.919 --> 00:31:38.480
Its generic approach is probably

31:38.480 --> 00:31:41.919
quite decent for most of the situations,

31:41.919 --> 00:31:49.840
but there are places where it could be

31:49.840 --> 00:31:53.679
optimized better, so instead of having

31:53.679 --> 00:31:56.399
quadratic performance, it could use a

31:56.399 --> 00:31:59.200
hash table, like for instance,

31:59.200 --> 00:32:02.000
dash does here--

32:02.000 --> 00:32:06.240
in dash, union or delete-dups 

00:32:06.240 --> 00:32:12.960
does in its implementation.

32:12.960 --> 00:32:16.640
The dash itself is curiously fast,

00:32:16.640 --> 00:32:20.000
at least faster than I might have expected,

32:20.000 --> 00:32:21.600
possibly because of 

00:32:21.600 --> 00:32:25.120
the implementation approach 

00:32:25.120 --> 00:32:33.200
where it uses code generation

00:32:33.200 --> 00:32:35.840
to avoid function calls,

32:35.840 --> 00:32:37.840
at least some of them,

32:37.840 --> 00:32:41.120
which is interesting.

32:41.120 --> 00:32:45.600
But since both seq and dash 

00:32:45.600 --> 00:32:49.919
avoid mutations--they don't really have

32:49.919 --> 00:32:52.399
mutating counterparts to common functions

00:32:52.399 --> 00:32:56.480
like you have with cl-remove-if, cl-delete-if,

32:56.480 --> 00:33:02.000
or just cl-remove, cl-delete,

33:02.000 --> 00:33:08.000
it still can be valuable to look into

00:33:08.000 --> 00:33:12.240
destructive versions of those functions,

33:12.240 --> 00:33:16.880
something from the core library 

00:33:16.880 --> 00:33:22.559
like delete-dups or nreverse,

33:22.559 --> 00:33:24.399
for your code when you're really trying

33:24.399 --> 00:33:29.519
to get as close to the metal 

00:33:29.519 --> 00:33:34.320
or whatever as you can, 

00:33:34.320 --> 00:33:40.080
because avoiding extra allocations, 

00:33:40.080 --> 00:33:43.200
it can really be useful.

33:43.200 --> 00:33:46.159
You can really improve their performance

33:46.159 --> 00:33:54.080
if you don't do a lot of other stuff.

33:54.080 --> 00:34:00.080
delete-consecutive-dups is blazing faster.

34:00.080 --> 00:34:03.919
It only requires pre-sorted strings.

34:03.919 --> 00:34:08.399
What else to say...

34:08.399 --> 00:34:13.200
If you are going to read these measurements,

34:13.200 --> 00:34:15.359
make sure to keep in mind 

00:34:15.359 --> 00:34:18.000
that reverse is not free,

34:18.000 --> 00:34:20.240
so for instance, if we're looking 

00:34:20.240 --> 00:34:22.480
at this comparison 

00:34:22.480 --> 00:34:26.399
between remove and delete, for instance,

34:26.399 --> 00:34:27.919
they're using reverse 

00:34:27.919 --> 00:34:32.159
to avoid modifying the data,

34:32.159 --> 00:34:34.800
the sample data, so we don't have to

34:34.800 --> 00:34:36.720
create it every time.

34:36.720 --> 00:34:41.919
But to compare how much faster 

00:34:41.919 --> 00:34:43.760
delete is than remove,

34:43.760 --> 00:34:50.800
we need to subtract 787 milliseconds

34:50.800 --> 00:34:52.320
from here and from here 

00:34:52.320 --> 00:34:58.480
so it comes out to like 230 milliseconds 

00:34:58.480 --> 00:35:02.560
in this example, the last example, 

00:35:02.560 --> 00:35:12.000
and 100 to 1 second, 250 milliseconds here,

00:35:12.000 --> 00:35:17.599
so the difference is 5-fold here.

35:17.599 --> 00:35:20.160
Not 2-fold.

35:20.160 --> 00:35:26.480
All right. With this, I'm going to

35:26.480 --> 00:35:29.520
thank you for listening, for watching

35:29.520 --> 00:35:31.920
and I'll be taking questions.

35:31.920 --> 00:35:32.920
Thank you.

00:35:32.920 --> 00:35:35.160
[captions by sachac]