summaryrefslogtreecommitdiffstats
path: root/2023/captions/emacsconf-2023-gc--emacsgcstats-does-garbage-collection-actually-slow-down-emacs--ihor-radchenko--main.vtt
blob: c1bea8de622cf7eaf608363861800d84e3a3f045 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
WEBVTT

00:00.000 --> 00:06.480
Hello everyone, my name is Igor Achinko and you may know me from Org Mailing List.

00:07.440 --> 00:11.760
However, today I'm not going to talk about Org Mode. Today I'm going to talk about

00:11.760 --> 00:16.800
Emacs performance and how it's affected by its memory management code.

00:18.880 --> 00:24.720
First, I will introduce the basic concepts of Emacs memory management and what garbage

00:24.720 --> 00:32.320
collection is. Then I will show you user statistics collected from volunteer users

00:32.320 --> 00:42.080
over the last half year and I will end with some guidelines on how to tweak Emacs garbage

00:42.080 --> 00:48.640
collection customizations to optimize Emacs performance and when it's necessary or not

00:49.120 --> 00:56.560
to do. Let's begin. What is garbage collection? To understand what is garbage collection we need

00:56.560 --> 01:01.920
to realize that anything you do in Emacs is some kind of command and any command is most likely

01:01.920 --> 01:07.280
running some Elisp code and every time you run Elisp code you most likely need to locate certain

01:07.280 --> 01:14.160
memory in RAM and some of this memory is retained for a long time and some of this memory is

01:14.160 --> 01:20.320
transient. Of course, Emacs has to clear this transient memory from time to time to not occupy

01:20.320 --> 01:27.200
all the possible RAM in the computer. In this small example we have one global variable

01:28.480 --> 01:35.600
that is assigned a value but when assigning the value we first allocate a temporary variable

01:35.600 --> 01:41.360
and then a temporary list and only retain some part of this list in this global variable.

01:42.240 --> 01:51.920
In terms of memory graph we can represent this as two variable slots, one transient, one permanent

01:52.480 --> 02:01.680
and then a list of three concerns and part of which is retained as a global variable but part

02:01.680 --> 02:07.280
of it which is a temporary variable symbol and the first term of the list is not used and it

02:07.840 --> 02:15.040
might be cleared at some point. So that's what Emacs does. Every now and then Emacs goes through

02:15.040 --> 02:20.320
all the memory and identify which part of the memory are not used and then clear them so that

02:20.320 --> 02:27.760
it can free up the RAM. This process is called garbage collection and Emacs uses a very simple

02:27.760 --> 02:33.440
and old algorithm which is called mark and sweep. So during this mark and sweep process

02:33.440 --> 02:40.880
is basically two stages. First Emacs scans all the memory that is allocated and then identify

02:40.880 --> 02:46.320
which memory is still in use which is linked to some variables for example and which memory is

02:46.320 --> 02:51.600
not used anymore even though it was allocated in the past and the second stage is clear that

02:51.600 --> 02:56.240
whatever a memory is not that is not allocated. During the process

02:56.880 --> 03:03.920
Emacs cannot do anything now. So basically every time Emacs scans the memory it freezes up and

03:03.920 --> 03:09.840
doesn't respond to anything and if it takes too much time so that users can notice it then of

03:09.840 --> 03:18.160
course Emacs is not responsive at all and if this garbage collection is triggered too frequently

03:18.160 --> 03:23.760
then it's not just not responsive every now and then it's also not responsive all the time almost

03:24.000 --> 03:29.840
all the time so it cannot even normally type or stuff or do some normal commands.

03:32.320 --> 03:40.080
This mark and sweep algorithm is taking longer the more memory Emacs uses. So basically

03:40.080 --> 03:46.480
the more buffers you open, the more packages you load, the more complex commands you run,

03:46.480 --> 03:55.840
the more memory is used and basically the longer Emacs takes to perform a single garbage collection.

04:00.560 --> 04:07.280
Of course Emacs being Emacs and this garbage collection can be tweaked. In particular

04:07.280 --> 04:12.960
users can tweak how frequently Emacs does garbage collection using two basic variables

04:12.960 --> 04:19.840
GCConsThreshold and GCConsPercentage. GCConsThreshold is the raw number of kilobytes

04:21.440 --> 04:27.200
Emacs needs to allocate before triggering another garbage collection and the GCConsPercentage

04:27.200 --> 04:31.680
is similar but it's defined in terms of fraction of already allocated memory.

04:33.840 --> 04:41.840
If you follow various Emacs forums you may be familiar with people complaining about

04:41.840 --> 04:47.760
garbage collection and there are many many suggestions about what to do with it.

04:50.320 --> 04:52.640
Most frequently you see GCConsThreshold

04:54.640 --> 05:01.280
recommended to be increased and a number of pre-packaged Emacs distributions like

05:01.280 --> 05:07.280
DoMeEmacs do increase it or like I have seen suggestions which are actually horrible to

05:07.280 --> 05:11.120
disable garbage collection temporarily or for a long time.

05:14.240 --> 05:19.600
Which is nice you can see it quite frequently which indicates there might be some problem.

05:19.600 --> 05:26.320
However every time like one user poses about this problem it's just one data point and it doesn't

05:26.320 --> 05:30.000
mean that everyone actually suffers from it. It doesn't mean that everyone should do it.

05:30.720 --> 05:37.680
So in order to understand if this garbage collection is really a problem which is a

05:37.680 --> 05:48.000
common problem we do need some kind of statistics and only using the actual statistics we can

05:48.000 --> 05:54.880
understand if it should be recommended for everyone to tweak the defaults or like whether

05:54.880 --> 06:00.000
it should be recommended for certain users or maybe it should be asked Emacs devs to do

06:00.000 --> 06:08.800
something about the defaults. And what I did some time ago is exactly this. I tried to collect the

06:08.800 --> 06:18.000
user statistics. So I wrote a small package on Elp and some users installed this package and

06:18.000 --> 06:24.080
then reported back these statistics of the garbage collection for their particular use.

06:25.360 --> 06:33.840
By now we have obtained 129 user submissions with over 1 million GC records in there.

06:35.760 --> 06:42.320
So like some of these submissions used default GC settings without any customizations.

06:42.320 --> 06:47.040
Some used increased GC cost threshold and GC cost percentage.

06:48.880 --> 06:56.640
So using this data we can try to draw some reliable conclusions on what should be done

06:56.640 --> 07:02.480
and whether should anything be done about garbage collection on Emacs dev level or at least on user

07:02.480 --> 07:08.240
level. Of course we need to keep in mind that there's some kind of bias because it's more

07:08.240 --> 07:13.680
likely that users already have problems with GC or they think they have problems with GC will report

07:14.480 --> 07:20.240
and submit the data. But anyway having s statistics is much more useful than just

07:20.240 --> 07:28.240
having anecdotal evidences from one or other reddit posts. And just one thing I will do

07:28.880 --> 07:33.280
during the rest of my presentation is that for all the statistics I will normalize

07:33.520 --> 07:41.440
user data so that every user contributes equally. For example if one user submits like 100 hours

07:41.440 --> 07:46.640
Emacs uptime statistics and other users submit one hour Emacs uptime then I will

07:47.200 --> 07:49.520
anyway make it so that they contribute equally.

07:53.280 --> 07:59.280
Let's start from one of the most obvious things we can look into is which is the time it takes

07:59.360 --> 08:05.520
for garbage collection to single garbage collection process. Here you see

08:08.240 --> 08:16.240
frequency distribution of GC duration for all the 129 users we got and

08:17.600 --> 08:26.800
you can see that most of the garbage collections are done quite quickly in less than 0.1 second

08:27.440 --> 08:33.680
and less than 0.1 second is usually just not noticeable. So even though there is garbage

08:33.680 --> 08:43.200
collection it will not interrupt the work in Emacs. However there is a fraction of users who

08:43.920 --> 08:49.680
experience garbage collection it takes like 0.2, 0.3 or even half a second which will be quite

08:49.680 --> 08:58.800
noticeable. For the purposes of this study I will consider that anything that is less than 0.1

08:58.800 --> 09:06.000
second which is insignificant so like you will not notice it and it's like obviously all the Emacs

09:06.000 --> 09:13.600
usage will be just normal. But if it's more than 0.1 or 0.2 seconds then it will be very noticeable

09:13.600 --> 09:20.800
and you will see that Emacs hang for a little while or not so little while. In terms of numbers

09:21.360 --> 09:28.000
it's better to plot the statistics not as a distribution but as a cumulative distribution.

09:29.040 --> 09:34.080
So like at every point of this graph you'll see like for example here 0.4 seconds

09:34.480 --> 09:49.040
you have this percent of like almost 90% of users have no more than 0.4 gc duration. So like

09:49.040 --> 09:55.760
we can look here if we take one gc critical gc duration which is 0.1 second

09:55.840 --> 10:02.400
0.1 second and look at how many users have it so we have 56% which is like

10:03.600 --> 10:12.880
44% users have less than 0.1 second gc duration and the rest 56% have more than 0.1 second.

10:13.600 --> 10:20.720
So you can see like more than half of users actually have noticeable gc delay so the

10:20.720 --> 10:27.040
Emacs freezes for some noticeable time and a quarter of users actually have very noticeable

10:27.040 --> 10:36.640
so like Emacs freezes such that you see an actual delay that Emacs actually has

10:37.760 --> 10:47.600
which is quite significant and important point. But apart from the duration of each individual gc

10:47.600 --> 10:52.640
it is important to see how frequent it is because even if you do notice a delay

10:53.440 --> 10:59.120
even a few seconds delay it doesn't matter if it happens once during the whole Emacs session.

11:01.360 --> 11:10.720
So if you look into frequency distribution again here I plot time between

11:11.680 --> 11:17.760
subsequent garbage collections versus how frequent it is and we have very clear trend that

11:18.560 --> 11:24.560
most of the garbage collections are quite frequent like we talk about every few seconds a few tens

11:24.560 --> 11:32.560
of seconds. There's a few outliers which are at very round numbers like 60 seconds, 120 seconds,

11:32.560 --> 11:40.640
300 seconds. These are usually timers so like you have something running on timer and then it

11:41.440 --> 11:48.000
is complex command and it triggers garbage collection but it's not the majority.

11:49.280 --> 11:54.000
Again to run the numbers it's better to look into cumulative distribution and see that

11:54.000 --> 11:58.160
50% of garbage collections are basically less than 10 seconds apart.

12:00.000 --> 12:07.920
And we can combine it with previous data and we look into whatever garbage collection takes

12:07.920 --> 12:12.960
less than 10 seconds from each other and also takes more than say 0.1 seconds.

12:13.680 --> 12:20.800
So and then we see that one quarter of all garbage collections are just noticeable and also frequent

12:21.760 --> 12:27.840
and 9% are not like more than 0.2% very noticeable and also frequent. So basically

12:27.840 --> 12:34.480
it constitutes Emacs freezing. So 9% of all the garbage collection Emacs freezing. Of course

12:35.360 --> 12:42.960
if you remember there is a bias but 9% is quite significant number. So garbage collection can

12:42.960 --> 12:47.280
really slow down things not for everyone but for significant fraction of users.

12:49.440 --> 12:57.440
Another thing I'd like to look into is what I call agglomerated GCs. What I mean by agglomerated

12:57.440 --> 13:02.720
is when you have one garbage collection and then another garbage immediately after it. So

13:03.680 --> 13:09.840
in terms of numbers I took every subsequent garbage collection which is either immediately

13:09.840 --> 13:16.000
after or no more than one second after each. So from point of view of users is like

13:16.960 --> 13:22.880
multiple garbage collection they add up together into one giant garbage collection.

13:23.440 --> 13:29.440
And if you look into numbers of how many agglomerated garbage collections there are

13:29.440 --> 13:35.360
you can see even numbers over 100. So 100 garbage collection going one after another.

13:36.720 --> 13:42.560
Even if you think about each garbage collection taking 0.1 second we look into 100 of them

13:43.280 --> 13:50.480
it's total 10 seconds. It's like Emacs hanging forever or like a significant number is also 10.

13:50.480 --> 13:58.160
So again this would be very annoying to meet such thing. How frequently does it happen? Again we

13:58.160 --> 14:04.400
can plot cumulative distribution and we see that 20 percent like 19 percent of all the garbage

14:04.400 --> 14:13.680
collection are at least two together and 8 percent like more than 10. So like you think about oh

14:13.680 --> 14:17.840
each garbage collection is not taking much time but when you have 10 of them yeah that becomes a

14:17.840 --> 14:32.560
problem. Another thing is to answer a question that some people complain about is that

14:33.680 --> 14:42.320
longer you use Emacs the slower Emacs become. Of course it may be caused by garbage collection and

14:42.720 --> 14:50.000
I wanted to look into how garbage collection time and other statistics, other parameters

14:50.880 --> 14:58.880
are evolving over time. And what I can see here is a cumulative distribution of GC duration

14:59.680 --> 15:04.720
for like first 10 minutes of Emacs uptime first 100 minutes first 1000 minutes.

15:05.520 --> 15:13.840
And if you look closer then you see that each individual garbage collection on average

15:15.440 --> 15:24.000
takes longer as you use Emacs longer. However this longer is not much it's like maybe 10 percent

15:24.000 --> 15:33.040
like basically garbage collection gets like slow Emacs down more as you use Emacs more

15:33.680 --> 15:40.320
but not much. So basically if you do you see Emacs being slower and slower over time

15:40.960 --> 15:46.960
it's probably not really garbage collection because it doesn't change too much. And if you

15:46.960 --> 15:52.720
look into time between individual garbage collections and you see that the time actually

15:52.720 --> 15:58.880
increases as you use Emacs longer which makes sense because initially like first few minutes

15:58.880 --> 16:04.720
you have all kind of packages loading like all the port loading and then later everything is

16:04.720 --> 16:12.560
loaded and things become more stable. So the conclusion on this part is that

16:13.520 --> 16:18.480
if Emacs becomes slower in a long session it's probably not caused by garbage collection.

16:20.320 --> 16:27.760
And one word of warning of course is that it's all nice and all when I present the statistics

16:27.760 --> 16:32.800
but it's only an average and if you are an actual user like here is one example

16:34.080 --> 16:39.920
which shows a total garbage collection time like accumulated together over Emacs uptime

16:40.880 --> 16:45.360
and you see different lines which correspond to different sessions of one user

16:46.800 --> 16:51.360
and you see they are wildly different like one time there is almost no garbage collection

16:52.240 --> 16:57.840
another time you see garbage collection because probably Emacs is used more early or like

16:57.840 --> 17:04.560
different pattern of usage and even during a single Emacs session you see a different slope

17:04.560 --> 17:10.560
of this curve which means that sometimes garbage collection is infrequent and sometimes it's much

17:10.560 --> 17:16.000
more frequent so it's probably much more noticeable one time and less noticeable other time.

17:16.000 --> 17:23.360
So if you think about these statistics of course they only represent an average usage

17:23.360 --> 17:26.240
but sometimes it can get worse sometimes it can get better.

17:30.320 --> 17:35.600
The last parameter I'd like to talk about is garbage collection during Emacs init.

17:36.960 --> 17:42.320
Basically if you think about what happens during Emacs init like when Emacs just starting up

17:42.320 --> 17:46.720
then whatever garbage collection there it's one or it's several times

17:46.720 --> 17:50.640
it all contributes to Emacs taking longer to start.

17:53.200 --> 18:00.640
And again we can look into the statistic and see what is the total GC duration after Emacs init

18:01.840 --> 18:10.240
and we see that 50% of all the submissions garbage collection adds up more than one second

18:10.240 --> 18:17.760
to Emacs init time and for 20% of users it's extra three seconds Emacs start time which is

18:17.760 --> 18:22.640
very significant especially for people who are used to Vim which can start in like a fraction

18:22.640 --> 18:27.200
of a second and here it just does garbage collection because garbage collection is

18:27.200 --> 18:31.760
not everything Emacs does during startup adds up more to the load.

18:33.680 --> 18:39.280
Okay that's all nice and all but what can we do about these statistics can we draw any

18:39.280 --> 18:46.000
conclusions and the answer is of course like the most important conclusion here is that

18:46.720 --> 18:52.320
yes garbage collection can slow down Emacs at least for some people and what to do about it

18:53.360 --> 18:58.720
there are two variables which you can tweak it's because gcconce threshold gcconce percentage

18:58.720 --> 19:06.400
and having the statistics I can at least look a little bit into what is the effect of

19:06.400 --> 19:12.400
increasing these variables like most people just increase gcconce threshold

19:13.760 --> 19:17.040
and like all the submissions people did increase and

19:17.680 --> 19:20.880
doesn't make much sense to decrease it like to make things worse

19:24.560 --> 19:31.280
of course for these statistics the exact values of this increased thresholds

19:31.680 --> 19:36.320
are not always the same but at least we can look into some trends

19:38.640 --> 19:48.480
so first and obvious thing we can observe is when we compare the standard gc settings

19:49.120 --> 19:57.680
standard thresholds and increased thresholds for time between subsequent gcs and as one may expect

19:57.680 --> 20:03.440
if you increase the threshold Emacs will do garbage collection less frequently so the spacing

20:03.440 --> 20:10.080
between garbage collection increases okay the only thing is that if garbage collection is

20:10.080 --> 20:16.800
less frequent then each individual garbage collection becomes longer so if you think about

20:16.800 --> 20:24.240
increasing garbage collection thresholds be prepared that in each individual time Emacs

20:24.240 --> 20:33.040
freezes will take longer this is one caveat when we talk about this agglomerated gcs which

20:33.040 --> 20:40.160
are one after other like if you increase the threshold sufficiently then whatever happened

20:40.160 --> 20:46.880
that garbage collections were like done one after other we can now make it so that they are actually

20:46.880 --> 20:53.840
separated so like you don't see one giant freeze caused by like 10 gcs in a row instead you can

20:53.840 --> 21:00.880
make it so that they are separated and in statistics it's very clear that the number of

21:00.880 --> 21:06.560
agglomerated garbage collections decreases dramatically when you increase the thresholds

21:07.920 --> 21:11.600
it's particularly evident when we look into startup time

21:13.520 --> 21:19.680
if you look at gc duration during Emacs startup and if we look into what happens when you

21:19.680 --> 21:25.680
increase the thresholds it's very clear that Emacs startup become faster when you increase gc

21:25.680 --> 21:37.120
thresholds so that's all for actual user statistics and now let's try to run into some like actual

21:37.120 --> 21:44.480
recommendations on what numbers to set and before we start let me explain a little bit about

21:44.480 --> 21:48.720
the difference between these two variables which is gc constant threshold and gc constant percentage

21:49.440 --> 21:55.120
so if you think about Emacs memory like there's a certain memory allocated by Emacs

21:56.000 --> 22:00.000
and then as you run commands and turn using Emacs there is more memory allocated

22:01.360 --> 22:07.120
and Emacs decides when to do garbage collection according these two variables and actually what

22:07.120 --> 22:12.880
it does it chooses the larger one so say you have you are late in Emacs session you have a lot of

22:12.880 --> 22:18.960
Emacs memory allocated then you have gc constant percentage which is percent of the already

22:18.960 --> 22:27.360
allocated memory and that percent is probably going to be the largest because you have more memory and

22:28.800 --> 22:34.480
memory means that percent of it is larger so like you have a larger number

22:35.040 --> 22:41.680
cost by gc constant percentage so in this scenario when Emacs session is

22:42.240 --> 22:46.880
already running for a long time and there is a lot of memory allocated you have

22:49.600 --> 22:54.240
gc constant percentage controlling the garbage collection while early in Emacs there is not much

22:54.240 --> 23:00.240
memory placed Emacs just starting up then gc constant threshold is controlling how frequently

23:00.240 --> 23:06.160
garbage collection happens because smaller allocated memory means its percentage will be a

23:06.160 --> 23:14.080
small number so in terms of default values at least gc constant threshold is 800 kilobytes

23:14.800 --> 23:24.080
and gc constant percentage is 10 so gc constant percentage becomes larger than that threshold

23:24.080 --> 23:30.480
when you have more than eight megabytes of allocated memory by Emacs which is quite early

23:30.480 --> 23:37.040
and it will probably hold just during the startup and once you start using your maximum

23:37.040 --> 23:42.080
and once you load all the histories all the kinds of buffers it's probably going to take

23:42.080 --> 23:52.320
more than much more than eight megabytes so now we understand this we can draw certain

23:52.320 --> 24:00.960
recommendations about tweaking the gc thresholds so first of all I need to emphasize that

24:01.760 --> 24:07.840
any time you increase gc threshold an individual garbage collection time increases so it's not

24:07.840 --> 24:12.320
free at all if you don't have problems with garbage collection which is half of the users

24:12.320 --> 24:19.360
don't have much problem you don't need to tweak anything only when gc is frequent and slow

24:19.360 --> 24:27.040
when Emacs is really really present frequently you may consider increasing gc thresholds only

24:28.240 --> 24:35.040
and in particular I recommend increasing gc constant percentage because that's what mostly

24:35.040 --> 24:43.600
controls gc when Emacs is running for long session and the numbers are probably like

24:43.600 --> 24:48.640
yeah we can estimate the effect of these numbers like for example if you have a default value of

24:48.640 --> 24:54.720
0.1 percent for gc constant percentage 0.1 which is 10 percent and then increase it twice

24:55.760 --> 25:02.880
obviously you get twice less frequent gcs but it will come at the cost of extra 10 percent gc time

25:02.880 --> 25:09.840
and if you increase 10 times you can think about 10 less 10 x less frequent gcs but almost twice

25:09.840 --> 25:16.880
longer individual garbage collection time so probably you want to set the number closer to 0.1

25:19.520 --> 25:29.280
another part of the users may actually try to optimize Emacs startup time which is quite frequent

25:29.280 --> 25:37.200
problem in this case it's probably better to increase gc constant but not too much so like

25:37.200 --> 25:42.640
first of all it makes sense to check whether garbage collection is a problem at all during

25:43.520 --> 25:48.160
startup and there are two variables which can show what is happening

25:49.120 --> 25:54.800
this garbage collection so gc done is a variable that shows how many garbage collection

25:57.520 --> 26:02.560
like what is the number of garbage collections triggered like when you check the value or

26:02.560 --> 26:08.320
right after you start Emacs you will see that number and gc elapsed variable

26:09.280 --> 26:15.440
which gives you a number of seconds which Emacs spent in doing garbage collection so this is

26:15.440 --> 26:20.800
probably the most important variable and if you see it's large then you may consider tweaking it

26:20.800 --> 26:30.000
for the Emacs startup we can estimate some bounds because in the statistics I never saw anything

26:30.000 --> 26:35.600
that is more than 10 seconds extra which even 10 seconds is probably like a really really hard

26:35.600 --> 26:45.280
upper bound so or say if you want to decrease the gc contribution like order of magnitude

26:45.920 --> 26:52.080
or like two orders of magnitudes let's say like as a really hard top estimate then it

26:52.080 --> 27:00.080
corresponds to 80 megabytes gc constant and probably much less so like there's no point

27:00.080 --> 27:06.880
setting it to a few hundred megabytes of course there's one caveat which is important to keep in

27:06.880 --> 27:16.800
mind though that increasing the gc thresholds is not just increasing individual gc time

27:16.800 --> 27:23.600
there's also an actual real impact on the RAM usage so like if you increase gc threshold

27:24.400 --> 27:29.600
it increases the RAM usage of Emacs and you shouldn't think that like okay I increased

27:30.480 --> 27:37.200
the threshold by like 100 megabytes then 100 megabytes extra RAM usage doesn't matter

27:37.200 --> 27:44.480
it's not 100 megabytes because less frequent garbage collection means it will lead to

27:44.480 --> 27:51.680
memory fragmentation so in practice if you increase the thresholds to tens or hundreds

27:51.680 --> 27:58.240
of megabytes we are talking about gigabytes extra RAM usage for me personally when I tried to

27:58.240 --> 28:05.200
play with gc thresholds I have seen Emacs taking two gigabytes like compared to several times less

28:05.760 --> 28:12.240
when with default settings so it's not free at all and only like either when you have a lot of

28:12.240 --> 28:19.440
free RAM and you don't care or when your Emacs is really slow then you may need to consider this

28:19.440 --> 28:24.160
tweaking these defaults so again don't tweak defaults if you don't really have a problem

28:24.800 --> 28:31.360
and of course this RAM problem is a big big deal for Emacs devs because from

28:32.960 --> 28:38.400
from the point of single user you have like normal laptop most likely like normal PC with a lot of

28:38.400 --> 28:45.760
RAM you don't care about these things too much but Emacs in general can run on like

28:46.320 --> 28:53.200
all kinds of machines including low-end machines with very limited RAM and anytime Emacs developers

28:53.280 --> 29:00.320
consider increasing the defaults for garbage collection it's like they always have to consider

29:00.320 --> 29:06.800
if you increase them too much then Emacs may just stop running on certain platforms

29:09.840 --> 29:15.600
so that's a very big consideration in terms of the global defaults for everyone

29:16.320 --> 29:24.560
although I have to I would say that it might be related to the safe to increase GCCons threshold

29:24.560 --> 29:29.600
because it mostly affects startup and during startup it's probably not the peak usage of

29:30.560 --> 29:38.160
Emacs and like as Emacs runs for longer it's probably where most of RAM will be used later

29:38.720 --> 29:43.920
on the other hand GCCons percentage is much more debating because it has pros and cons

29:43.920 --> 29:48.880
it will increase the RAM usage it will increase the individual GC time so

29:50.240 --> 29:56.560
if we consider changing it it's much more tricky and we have discussing probably measure the impact

29:56.560 --> 30:06.080
on users and a final note on or from the point of view of Emacs development is that

30:06.480 --> 30:11.440
this simple mark-and-sweep algorithm is like a very old and not the state-of-the-art algorithm

30:13.040 --> 30:16.960
there are variants of garbage collection that are like totally non-blocking

30:18.000 --> 30:22.720
so Emacs just doesn't have to freeze during the garbage collection or there are variants

30:22.720 --> 30:27.440
of garbage collection algorithm that do not scan all the memory just fraction of it

30:28.640 --> 30:35.520
and scan another fraction less frequently so there are actually ways just to

30:36.480 --> 30:39.680
change the garbage collection algorithm to make things much faster

30:40.400 --> 30:47.280
of course like just changing the numbers of variables like the numbers of variable values

30:47.280 --> 30:52.000
is much more tricky and one has to implement it obviously it would be nice if someone implements

30:52.000 --> 30:58.720
it but so far it's not happening so yeah it would be nice but maybe not not so quickly

30:59.600 --> 31:02.080
there is more chance to change the defaults here

31:02.240 --> 31:05.680
to conclude let me reiterate the most important points

31:06.640 --> 31:12.400
so from point of view of users you need to understand that yes garbage collection may be

31:12.400 --> 31:20.480
a problem but not for everyone so like you should only think about changing the variables when you

31:20.480 --> 31:28.240
really know that garbage collection is the problem for you so if you have slow Emacs startup

31:28.400 --> 31:34.000
slow Emacs startup and you know that it's caused by garbage collection like by you can check the

31:34.000 --> 31:41.520
GC elapsed variable then you may increase GC count threshold like to few tens of megabytes

31:41.520 --> 31:48.160
not more it doesn't make sense to increase it much more and if you really have major problems

31:48.160 --> 31:56.080
with Emacs being slaggy then you can increase GC count percentage to like 0.2 0.3 maybe

31:56.080 --> 32:02.640
one is probably overkill but do watch your Emacs ROM usage it may be really impacted

32:04.160 --> 32:12.400
for Emacs developers I'd like to emphasize that there is a real problem with garbage collection

32:12.400 --> 32:22.720
and nine percent of all the garbage collection data points we have correspond to really slow

32:22.720 --> 32:27.920
noticeable Emacs precision and really frequent less than 10 seconds

32:30.000 --> 32:35.120
I'd say that it's really worth increasing GC count threshold at least during startup

32:36.400 --> 32:41.440
because it really impacts the Emacs startup time making Emacs startup much faster

32:42.400 --> 32:48.560
ideally we need to reimplement the garbage collection algorithm of course it's not easy

32:48.560 --> 32:56.880
but it would be really nice and for GC count percentage defaults it's hard to say we may

32:56.880 --> 33:03.040
consider changing it but it's up to discussion and we probably need to be conservative here

33:04.320 --> 33:11.280
so we came to the end of my talk and this presentation all the data will be available

33:11.280 --> 33:21.760
publicly and you can reproduce all the statistic graphs if you wish and thank you for attention