WEBVTT 00:01.120 --> 00:00:04.640 Hi. Greetings from the cloudy St. Petersburg. 00:04.640 --> 00:00:06.080 My name is Dmitry Gutov. 00:00:06.080 --> 00:00:09.280 I write Ruby by day, Emacs Lisp by night 00:00:09.280 --> 00:00:12.080 when I don't do anything else. 00:12.080 --> 00:00:14.559 You might know my work 00:00:14.559 --> 00:00:18.160 from a number of third-party packages 00:00:18.160 --> 00:00:22.240 as well as built-in ones. 00:22.240 --> 00:00:25.599 The idea for this talk came out from 00:25.599 --> 00:00:26.800 an improvement request 00:00:26.800 --> 00:00:32.960 for the performance of grep-like commands 00:32.960 --> 00:00:35.200 using the xref interface 00:00:35.200 --> 00:00:43.280 and its storage types, container types 00:00:43.280 --> 00:00:47.840 for the search results, 00:47.840 --> 00:00:51.440 complaining that it's not as fast 00:00:51.440 --> 00:00:54.320 as it potentially could have been 00:54.320 --> 00:00:58.239 when there are lots of search results 00:00:58.239 --> 00:01:01.600 coming for a given search. 01:01.600 --> 00:01:06.159 I have noticed myself that 00:01:06.159 --> 00:01:10.159 when working on it, and probably before. 00:01:10.159 --> 00:01:16.560 My approach to optimizing Lisp code 01:16.560 --> 00:01:21.680 has changed recently-ish, 01:21.680 --> 00:01:26.960 and I'd like to talk about it here. 01:26.960 --> 00:01:34.079 This talk is for people who already 01:34.079 --> 00:01:37.040 know how to write some Emacs Lisp 00:01:37.040 --> 00:01:38.880 to solve their problems 00:01:38.880 --> 00:01:43.360 and who have possibly encountered 00:01:43.360 --> 00:01:49.200 some performance issues doing that. 01:49.200 --> 00:01:51.600 Also, if you want to contribute 00:01:51.600 --> 00:01:54.799 some improvements to the code 00:01:54.799 --> 00:01:56.960 that is already in Emacs 00:01:56.960 --> 00:02:02.159 or to other packages 00:02:02.159 --> 00:02:04.479 which are owned by somebody else, 00:02:04.479 --> 00:02:13.520 it should also be helpful. Let's start. 00:02:13.520 --> 00:02:18.640 First of all, about Emacs and Emacs Lisp. 00:02:18.640 --> 00:02:21.360 It's not the fastest language. 02:21.360 --> 00:02:26.560 Let's switch to the notes. 00:02:26.560 --> 00:02:31.440 I hope the font is big enough 00:02:31.440 --> 00:02:36.479 to be readable on this video, really hope. 02:36.480 --> 00:02:40.160 Emacs Lisp is not the fastest of the bunch. 00:02:40.160 --> 00:02:44.400 The garbage collector creates pauses 00:02:44.400 --> 00:02:48.800 whenever it needs to sweep data. 02:48.800 --> 00:02:51.760 The interpreter is not the fastest one, 02:51.760 --> 00:02:54.720 even though the native compilation branch 00:02:54.720 --> 00:02:59.120 is improving on that by twice 00:02:59.120 --> 00:03:04.640 in certain scenarios, which is good. 03:04.640 --> 00:03:10.560 The standard library or functions 00:03:10.560 --> 00:03:12.800 for working with collections, for example, 03:12.800 --> 00:03:17.599 are not so uniformly great. 03:17.599 --> 00:03:20.080 So there is some work to be done there. 03:20.080 --> 00:03:22.239 Maybe you can contribute 03:22.239 --> 00:03:26.640 the next improvement in there. 03:26.640 --> 00:03:33.200 And also, if your package displays stuff, 00:03:33.200 --> 00:03:35.040 it has a visual component, 03:35.040 --> 00:03:36.959 then you might have to deal with 00:03:36.959 --> 00:03:43.120 some drawbacks of the display engine 03:43.120 --> 00:03:47.760 which might slow to a crawl 00:03:47.760 --> 00:03:53.680 when your code has... 00:03:53.680 --> 00:03:56.560 when you're trying to display lines 00:03:56.560 --> 00:04:00.319 which are a little too long-- 00:04:00.319 --> 00:04:03.120 trying to print a [long] line, for example-- 04:03.120 --> 00:04:07.120 or you are using a lot of overlays, 04:07.120 --> 00:04:09.120 if you know what it is. 04:09.120 --> 00:04:12.640 With lots of overlays 00:04:12.640 --> 00:04:19.839 on the same visual line, in particular. 04:19.840 --> 00:04:24.160 But okay. The first thing to understand, 00:04:24.160 --> 00:04:27.680 and I hope everybody who's done 00:04:27.680 --> 00:04:29.520 some programming in the past knows, 00:04:29.520 --> 00:04:35.120 it's: first you write correctly, 00:04:35.120 --> 00:04:37.360 and then you try to benchmark. 04:37.360 --> 00:04:39.199 and see where the problems are, 00:04:39.199 --> 00:04:40.720 if there are any. 04:40.720 --> 00:04:45.919 So first do it right, then do it fast. 04:45.919 --> 00:04:50.720 How do we find the hotspots, 00:04:50.720 --> 00:04:52.400 the bottlenecks on the second step? 00:04:52.400 --> 00:04:54.720 We try to do some profiling 00:04:54.720 --> 00:04:58.960 or measuring how long the code takes. 04:58.960 --> 00:05:03.038 Emacs has two different profilers. 00:05:03.039 --> 00:05:05.440 One is the native profiler, 05:05.440 --> 00:05:10.639 which as I recall was contributed by 00:05:10.639 --> 00:05:13.280 Tomohiro Matsuyama, the author 00:05:13.280 --> 00:05:16.240 of the autocomplete package 05:16.240 --> 00:05:22.720 back in Emacs 24.3, apparently. 05:22.720 --> 00:05:24.320 It's a low overhead profiler. 00:05:24.320 --> 00:05:31.520 It's sampling, which is good on one hand 05:31.520 --> 00:05:33.919 but might be less good on the other hand. 05:33.919 --> 00:05:35.199 Let's try using it. 00:05:35.199 --> 00:05:40.960 So we start with profiler-start. 00:05:40.960 --> 00:05:44.960 Let's just ask for CPU profiling here, 00:05:44.960 --> 00:05:48.320 but it also can profile memory locations. 05:48.320 --> 00:05:56.400 Let's try a search for some string 00:05:56.400 --> 00:06:01.360 in the current project. 06:01.360 --> 00:06:04.080 Let's do it a few times, maybe three times, 00:06:04.080 --> 00:06:09.360 so the profiler has more data to work with. 06:09.360 --> 00:06:13.680 Let's call the report. 06:13.680 --> 00:06:16.960 Okay. So here we have the tree, 00:06:16.960 --> 00:06:19.360 the execution tree with percentages 00:06:19.360 --> 00:06:22.960 of the time spent in each function. 06:22.960 --> 00:06:25.840 You can unwrap it by going up and down 00:06:25.840 --> 00:06:35.039 and pressing TAB to unwrap every element. 06:35.039 --> 00:06:52.000 This is weird. 06:52.000 --> 00:06:55.199 Okay, here we see that the actual command 00:06:55.199 --> 00:06:56.639 that was called only takes 00:06:56.639 --> 00:06:59.599 8% of the whole runtime, 06:59.599 --> 00:07:07.759 meaning the input was... 07:07.759 --> 00:07:10.960 The command took not enough time 00:07:10.960 --> 00:07:13.360 for us to really dig into it. 00:07:13.360 --> 00:07:18.960 Let's try a shorter input with more matches. 00:07:18.960 --> 00:07:25.680 So profiler-start again, CPU. 07:25.680 --> 00:07:34.240 Let's search for list. 07:34.240 --> 00:07:40.160 Don't mind the minibuffer just yet. 07:40.160 --> 00:07:47.680 Okay. So let's look at the report. 07:47.680 --> 00:07:52.000 We can unwrap it here, and we see 00:07:52.000 --> 00:07:55.759 52% of the time was spent doing this, 07:55.759 --> 00:07:58.240 at least according to the profile. 00:07:58.240 --> 00:08:00.960 We can unwrap it, see the description 00:08:00.960 --> 00:08:02.800 of any function that was called 00:08:02.800 --> 00:08:08.479 by hitting d, or even jump to it 00:08:08.479 --> 00:08:12.960 by tapping f. 08:12.960 --> 00:08:16.160 By going down the stream, 08:16.160 --> 00:08:18.240 we unwrap it with TAB, 08:18.240 --> 00:08:22.400 you can see where time was spent. 08:22.400 --> 00:08:25.199 One of the bigger drawbacks 00:08:25.199 --> 00:08:30.639 of this profiler is the arithmetics 08:30.639 --> 00:08:39.200 don't always work out, 00:08:39.200 --> 00:08:45.760 like you might have... 08:45.760 --> 00:08:47.519 This is not a good example, 00:08:47.519 --> 00:08:52.640 but okay, you have 14% spent in here, 00:08:52.640 --> 00:08:57.200 but when we expand this entry, 00:08:57.200 --> 00:09:01.760 we only see like 6%, 3%, and 0%. 00:09:01.760 --> 00:09:06.640 Different sum, sometimes even bigger than that. 00:09:06.640 --> 00:09:10.800 So the native profiler 00:09:10.800 --> 00:09:13.200 can give an accurate picture 00:09:13.200 --> 00:09:15.920 and it has little overhead, 00:09:15.920 --> 00:09:20.399 but the specific numbers 00:09:20.399 --> 00:09:22.959 are not very precise 09:22.959 --> 00:09:28.640 because the principle is probabilistic. 09:28.640 --> 00:09:31.199 Let's stop here. 09:31.200 --> 00:09:36.959 There is another package called elp, 09:36.959 --> 00:09:39.360 Emacs Lisp Profiler, 00:09:39.360 --> 00:09:43.440 which is much older than that. 09:43.440 --> 00:09:47.920 It allows us to instrument 09:47.920 --> 00:09:53.600 just specific functions or a package. 09:53.600 --> 00:09:57.680 We're instrumenting the xref package here. 09:57.680 --> 00:10:01.360 It works through advice. You can see 00:10:01.360 --> 00:10:03.279 the description of one of the functions 00:10:03.279 --> 00:10:04.320 in the package, and we see 00:10:04.320 --> 00:10:12.640 that it has had :around device added. 10:12.640 --> 00:10:18.000 If we run the same search, 10:18.000 --> 00:10:21.360 we can see that -- when it finishes -- 00:10:21.360 --> 00:10:30.399 the table of all the numbers, 10:30.399 --> 00:10:32.640 that every function in the package 00:10:32.640 --> 00:10:41.680 has been called, and the times 00:10:41.680 --> 00:10:48.000 which the runtime has spent inside of them. 10:48.000 --> 00:10:55.040 sorted by impact, as it understands that. 10:55.040 --> 00:11:00.079 The main problem with this profiler is 00:11:00.079 --> 00:11:04.160 it is slower because it adds overhead 00:11:04.160 --> 00:11:06.880 to every function call, 11:06.880 --> 00:11:11.519 so it, in the end, might give an impression 00:11:11.519 --> 00:11:19.519 that functions with lots of calls, 11:19.519 --> 00:11:21.839 which have been called a lot of times, 11:21.839 --> 00:11:26.880 are more important, 11:26.880 --> 00:11:29.279 hotter than they actually are, 00:11:29.279 --> 00:11:30.959 because it slows down 00:11:30.959 --> 00:11:32.640 every such function call, 11:32.640 --> 00:11:35.279 including the subsequent calls 00:11:35.279 --> 00:11:38.160 that these functions do 00:11:38.160 --> 00:11:40.560 inside the same package. 11:40.560 --> 00:11:48.240 So it's a good way to analyze and drill down 11:48.240 --> 00:11:50.320 to see which functions here 00:11:50.320 --> 00:11:51.200 take a lot of time, 00:11:51.200 --> 00:11:53.839 but just keep in mind 00:11:53.839 --> 00:11:57.760 that sometimes you might end up 00:11:57.760 --> 00:12:00.079 trying to optimize one of these 00:12:00.079 --> 00:12:01.600 smaller functions called 00:12:01.600 --> 00:12:09.279 or usually smaller, and that the result 00:12:09.279 --> 00:12:11.040 might not actually affect 00:12:11.040 --> 00:12:18.720 the production runtime at all. 12:18.720 --> 00:12:22.240 But it's still a good tool, 00:12:22.240 --> 00:12:25.440 especially when you already know 12:25.440 --> 00:12:30.560 which set of actions you are interested in, 00:12:30.560 --> 00:12:33.760 and which functions might be slower. 12:33.760 --> 00:12:37.120 elp allows you to instrument a package 00:12:37.120 --> 00:12:42.720 or a set of functions. Just don't forget 00:12:42.720 --> 00:12:49.600 to un-instrument them all afterwards, 00:12:49.600 --> 00:12:52.000 or else your session, 00:12:52.000 --> 00:12:55.920 your subsequent optimization efforts 00:12:55.920 --> 00:12:59.360 might not work as well as you might 12:59.360 --> 00:13:01.359 want to work. 13:01.360 --> 00:13:04.560 And there's also this very nice, 00:13:04.560 --> 00:13:09.600 very handy package called benchmark, 13:09.600 --> 00:13:12.480 which unfortunately 00:13:12.480 --> 00:13:16.000 not everybody knows about. 13:16.000 --> 00:13:21.519 It turns out that a lot of 00:13:21.519 --> 00:13:30.079 older Emacs Lisp developers, 13:30.079 --> 00:13:34.160 users, package developers usually have 00:13:34.160 --> 00:13:38.000 some benchmarking macros of their own. 00:13:38.000 --> 00:13:42.880 But there is this package 00:13:42.880 --> 00:13:46.160 with perfectly usable interface 00:13:46.160 --> 00:13:49.199 which doesn't require you 00:13:49.199 --> 00:13:51.120 to define something else, 00:13:51.120 --> 00:13:52.560 especially when you are in 00:13:52.560 --> 00:13:57.120 an emacs -Q session 13:57.120 --> 00:14:00.480 trying to benchmark your code 00:14:00.480 --> 00:14:05.440 in a bare Emacs. So it has 00:14:05.440 --> 00:14:09.680 two main endpoints, I would say. 14:09.680 --> 00:14:14.480 First one is the benchmark macro, 14:14.480 --> 00:14:19.199 and the second one is benchmark-progn. 00:14:19.199 --> 00:14:20.399 benchmark is a function, 00:14:20.399 --> 00:14:25.040 and benchmark-progn is a macro. 14:25.040 --> 00:14:27.839 The first one you can use by specifying 00:14:27.839 --> 00:14:30.480 the number of iterations and the form. 00:14:30.480 --> 00:14:36.800 I hope the minibuffer is easy to read here. 14:36.800 --> 00:14:43.360 For instance, we can take this long list 00:14:43.360 --> 00:14:44.800 and try nreverse-ing it, 00:14:44.800 --> 00:14:49.680 and we see how long that takes. 14:49.680 --> 00:14:54.160 Then you can do it, 14:54.160 --> 00:14:55.600 adjust the inner code, 00:14:55.600 --> 00:14:58.079 and then basically compare 00:14:58.079 --> 00:14:59.519 to figure out how much 00:14:59.519 --> 00:15:01.440 and how long nreverse takes 00:15:01.440 --> 00:15:03.839 in this scenario. 15:03.839 --> 00:15:07.760 Or, we can take a function 00:15:07.760 --> 00:15:10.880 which we have probably found 15:10.880 --> 00:15:12.880 using one of the previous methods 00:15:12.880 --> 00:15:17.279 which we anticipate that, 15:17.279 --> 00:15:19.440 as we understand, it takes a while, 15:19.440 --> 00:15:26.720 and annotate it with a benchmark-progn form, 00:15:26.720 --> 00:15:33.360 which... just execute the body 00:15:33.360 --> 00:15:36.880 and report, each and every one of them, 15:36.880 --> 00:15:41.360 how long that body execution took. 15:41.360 --> 00:15:45.360 So for instance, here we added 00:15:45.360 --> 00:15:46.959 a few message calls 00:15:46.959 --> 00:15:49.040 inside those benchmark-progns, 00:15:49.040 --> 00:15:53.199 so we can see how long each part 00:15:53.199 --> 00:16:03.600 of the function xref-matches-in-files 00:16:03.600 --> 00:16:06.320 takes when it is run. 16:06.320 --> 00:16:11.839 Let's try it. Let's first call 16:11.839 --> 00:16:17.199 emacs-lisp-byte-compile-and-load, 16:17.199 --> 00:16:20.720 so that we're sure that we are running 16:20.720 --> 00:16:24.079 the fastest possible version of this code, 16:24.079 --> 00:16:29.519 so when we do find the bottlenecks, 00:16:29.519 --> 00:16:31.279 we are sure that they are real 00:16:31.279 --> 00:16:36.240 and not because of the uncompiled code, 00:16:36.240 --> 00:16:43.519 macro expansion, or the lack of 00:16:43.519 --> 00:16:46.160 some substitutions, other substitutions 00:16:46.160 --> 00:16:50.800 that our code is relying on 16:50.800 --> 00:17:01.279 but which might not be available 17:01.279 --> 00:17:03.279 in the interpreter mode 00:17:03.279 --> 00:17:07.039 just in the compiled code. 17:07.039 --> 00:17:10.160 Let's run. 17:10.160 --> 00:17:14.880 So we have this list, 17:14.880 --> 00:17:26.160 search for list, 17:26.160 --> 00:17:29.760 and the remaining time is spent during 00:17:29.760 --> 00:17:36.720 printing of the results, which we didn't 17:36.720 --> 00:17:38.320 annotate with the benchmarking macro, 00:17:38.320 --> 00:17:41.280 but it's still there. 17:41.280 --> 00:17:43.679 So we can see by switching to the 17:43.679 --> 00:17:49.919 *Messages* buffer, C-h e, 17:49.919 --> 00:17:52.080 that unquoting was very fast. 00:17:52.080 --> 00:17:59.280 So the search took 400 milliseconds. 00:17:59.280 --> 00:18:02.480 It's slower than it would be 18:02.480 --> 00:18:06.080 without the video being recorded, as well 00:18:06.080 --> 00:18:09.039 as the rest of the measurements here. 18:09.039 --> 00:18:12.160 So the parsing of the results took more 00:18:12.160 --> 00:18:16.559 than that, and the object allocation 18:16.559 --> 00:18:21.919 took even more, which is unfortunate 00:18:21.919 --> 00:18:23.919 but it's the reality 00:18:23.919 --> 00:18:26.400 when we're dealing with a lot of 00:18:26.400 --> 00:18:29.840 search results, because, well, 18:29.840 --> 00:18:33.520 Emacs Lisp is slower than grep. 18:34.559 --> 00:18:49.200 That's just a given. 18:49.200 --> 00:18:54.400 What can be done and what had been done 00:18:54.400 --> 00:18:57.120 to improve on this? 18:57.120 --> 00:19:04.240 Well, first of all, 19:04.240 --> 00:19:06.000 let's change the question, 00:19:06.000 --> 00:19:07.360 because this is more of 00:19:07.360 --> 00:19:09.760 a retrospective sort of talk 19:09.760 --> 00:19:13.439 rather than talking about future plans. 00:19:13.440 --> 00:19:19.679 What can one do to improve performance? 19:19.679 --> 00:19:24.160 Well, basically, two things: 00:19:24.160 --> 00:19:27.919 first, you try to make sure 00:19:27.919 --> 00:19:30.559 that you're writing less code, 00:19:30.559 --> 00:19:37.200 then your code does fewer things, 19:37.200 --> 00:19:40.400 Fewer iterations of your operations. 19:40.400 --> 00:19:42.000 Basically, it's the realm 00:19:42.000 --> 00:19:46.880 of choosing your algorithm well. 19:46.880 --> 00:19:54.880 But make sure it's as little complexity 00:19:54.880 --> 00:20:00.239 as you can manage reasonably. 20:00.240 --> 00:20:05.760 Another is to try to reduce memory allocations. 20:05.760 --> 00:20:13.760 It's creating conses, the links of the list, 20:13.760 --> 00:20:17.919 of the lists. If you are mapping through 00:20:17.919 --> 00:20:19.200 a list, creating a new one, 00:20:19.200 --> 00:20:22.640 you are creating conses. 00:20:22.640 --> 00:20:26.159 New objects in memory, anyway. 20:26.159 --> 00:20:30.480 And also, when you call string operations 00:20:30.480 --> 00:20:33.679 like substring, when you create new strings, 20:33.679 --> 00:20:36.640 that's also what you do. 20:36.640 --> 00:20:39.200 When you allocate new memory, 00:20:39.200 --> 00:20:43.600 that triggers garbage collections later, 00:20:43.600 --> 00:20:49.200 which affect the performance of your code 20:49.200 --> 00:21:01.600 quite severely. I have found that actually, 00:21:01.600 --> 00:21:04.960 contrary to my personal intuition, 21:04.960 --> 00:21:08.240 when you try to fine-tune the code 00:21:08.240 --> 00:21:13.120 working on long lists of elements, 00:21:13.120 --> 00:21:15.520 long, long collection, large collections, 00:21:15.520 --> 00:21:20.159 it's even more important to try to avoid 21:20.159 --> 00:21:23.200 or reduce memory allocations 21:23.200 --> 00:21:26.400 rather than try to ensure 00:21:26.400 --> 00:21:32.240 that less code is running, 21:32.240 --> 00:21:34.960 less operations are performed, 21:34.960 --> 00:21:38.080 because the garbage collector 21:38.080 --> 00:21:44.799 can hit you in the posterior quite suddenly 00:21:44.799 --> 00:21:49.520 that you will not always... 21:49.520 --> 00:21:51.760 When you measure the input impact, 00:21:51.760 --> 00:21:56.799 you might not always measure the whole of it. 21:56.799 --> 00:22:02.640 And sometimes even when you have 00:22:02.640 --> 00:22:06.559 a few operations, you can measure 22:06.559 --> 00:22:09.520 all three of them, but at some point, 00:22:09.520 --> 00:22:11.919 if you just reduce a lot, 22:11.919 --> 00:22:14.720 remove most of the allocations, 00:22:14.720 --> 00:22:16.799 all three of them, the improvement 00:22:16.799 --> 00:22:21.520 might be even bigger than the total 22:21.520 --> 00:22:23.440 of three improvements 00:22:23.440 --> 00:22:27.200 which you might have measured previously, 22:27.200 --> 00:22:29.679 like separately. 22:29.679 --> 00:22:33.039 So it's something to be on the lookout for, 00:22:33.039 --> 00:22:39.520 but of course, when you pick the algorithm, 00:22:39.520 --> 00:22:45.200 it's important not to do 00:22:45.200 --> 00:22:52.158 more operations than you have to. 22:52.159 --> 00:22:56.000 We have examples of both 00:22:56.000 --> 00:22:58.720 in the recent changes 00:22:58.720 --> 00:23:02.559 to the xref and project packages, 23:02.559 --> 00:23:06.960 which we can examine here. 23:06.960 --> 00:23:11.200 Let's take a look at this one. 23:11.200 --> 00:23:14.880 This commit message lies a little bit 23:14.880 --> 00:23:19.039 because it's referring to 00:23:19.039 --> 00:23:24.559 the use of assoc instead of cl-assoc, 23:24.559 --> 00:23:29.919 and the actual change incorporates that, 00:23:29.919 --> 00:23:33.440 but it also incorporates 00:23:33.440 --> 00:23:39.440 the second part of the sentence 00:23:39.440 --> 00:23:41.360 which actually replaced 00:23:41.360 --> 00:23:46.960 the use of assoc in there. 23:46.960 --> 00:23:50.480 Curiously, cl-assoc was pretty slow 00:23:50.480 --> 00:23:52.559 because not only it created 00:23:52.559 --> 00:23:57.919 quadratic complexity in the operation 23:57.919 --> 00:23:59.919 and it was also calling 00:23:59.919 --> 00:24:02.880 the Lisp function [equal] 24:02.880 --> 00:24:04.400 for every iteration, 00:24:04.400 --> 00:24:08.880 whereas if we just use assoc there, 24:08.880 --> 00:24:10.080 which was the first version 00:24:10.080 --> 00:24:13.919 of this change, of the improvement, 24:13.919 --> 00:24:15.760 it became already much faster, 00:24:15.760 --> 00:24:20.640 but then switching to a hash table 24:20.640 --> 00:24:28.080 which turned this lookup 00:24:28.080 --> 00:24:31.760 from O(n) complexity 24:31.760 --> 00:24:35.600 into, well, amortized constant one, 24:35.600 --> 00:24:37.600 even better. 24:37.600 --> 00:24:45.679 So, use hash tables, kids. 24:45.679 --> 00:24:52.400 Another commit here is about using 00:24:52.400 --> 00:24:55.520 the inhibit-modification-hooks. 00:24:55.520 --> 00:24:58.000 So, turns out when you're printing 00:24:58.000 --> 00:25:01.679 into a buffer, even if you have already 25:01.679 --> 00:25:06.159 disabled the undo history, 25:06.159 --> 00:25:08.480 binding this variable to 25:08.480 --> 00:25:10.880 a non-null value is pretty good 00:25:10.880 --> 00:25:15.520 because you are able to avoid running 25:15.520 --> 00:25:21.120 a number of hooks, which improves performance. 25:21.120 --> 00:25:28.640 Next. This one was about moving the 00:25:28.640 --> 00:25:32.000 file-remote-p call 25:32.000 --> 00:25:37.279 from inside the loop. 25:37.279 --> 00:25:42.039 This function is actually 00:25:42.039 --> 00:25:49.039 surprisingly slow-ish for the goal, 25:49.039 --> 00:25:52.000 for its purpose, so you don't really want 25:52.000 --> 00:25:56.159 to call it on every file name in the list 00:25:56.159 --> 00:25:59.520 when you have a lot of them, 25:59.520 --> 00:26:02.000 and especially if your code 00:26:02.000 --> 00:26:04.640 is running in a buffer 00:26:04.640 --> 00:26:10.640 visiting a remote file, like through TRAMP. 26:10.640 --> 00:26:15.120 You might end up trying to 00:26:15.120 --> 00:26:16.880 devise different approaches 00:26:16.880 --> 00:26:20.159 to avoid checking whether 00:26:20.159 --> 00:26:23.279 every file name is remote 26:23.279 --> 00:26:25.360 if you're dealing with a list of file names, 00:26:25.360 --> 00:26:34.640 so this one, take a look, be careful with it. 26:34.640 --> 00:26:43.120 A similar, slower function, with-current-buffer, 26:43.120 --> 00:26:46.880 but not so much. So it all depends on 26:46.880 --> 00:26:48.400 really how often you call it. 00:26:48.400 --> 00:26:51.279 Sometimes you might want to 00:26:51.279 --> 00:26:52.720 rewrite your code so that 00:26:52.720 --> 00:26:54.720 it's called less often. 26:54.720 --> 00:26:59.760 And expand-file-name, which hits the disk. 26:59.760 --> 00:27:01.200 So if you're dealing with file names, 00:27:01.200 --> 00:27:02.960 you might want to replace it 00:27:02.960 --> 00:27:07.520 with concatenation at certain points. 27:07.520 --> 00:27:22.159 Okay, back to these changes later. 27:22.159 --> 00:27:28.480 This one just removed a location of a cons 27:28.480 --> 00:27:33.679 per match hit, and still it brought 5% 27:33.679 --> 00:27:37.039 or something like that improvement 00:27:37.039 --> 00:27:41.120 to the whole command. 27:41.120 --> 00:27:46.000 So that's a pretty significant improvement 00:27:46.000 --> 00:27:53.600 for a small change like that. 27:53.600 --> 00:28:01.679 Similarly, here we just made sure 00:28:01.679 --> 00:28:09.520 to avoid a splits... no, a substring call, 28:09.520 --> 00:28:12.399 and probably an allocation of the whole 28:12.399 --> 00:28:16.320 buffer string, but in my testing, 00:28:16.320 --> 00:28:19.440 that doesn't actually matter much, 00:28:19.440 --> 00:28:22.399 at least so much, but a substring call 00:28:22.399 --> 00:28:28.960 per result... If we see, 00:28:28.960 --> 00:28:33.039 since we changed to manual parsing 00:28:33.039 --> 00:28:37.440 of the buffer, with the strings 00:28:37.440 --> 00:28:41.440 delimited with zero bytes, 28:41.440 --> 00:28:43.840 it gave an overall improvement 00:28:43.840 --> 00:28:47.440 of 20%, again on my machine 00:28:47.440 --> 00:28:50.720 with a pretty fast SSD, 00:28:50.720 --> 00:28:53.679 and with a warm disk cache, of course. 28:53.679 --> 00:29:05.360 But still... Going back to this revision, 00:29:05.360 --> 00:29:09.440 it was actually quite surprising 00:29:09.440 --> 00:29:15.200 that migration to a cl-defstruct 29:15.200 --> 00:29:24.080 from eieio, the Common Lisp-inspired 00:29:24.080 --> 00:29:26.480 object system first introduced 00:29:26.480 --> 00:29:34.240 in the CEDET tools, 00:29:34.240 --> 00:29:39.360 that was a bit of a surprise, 00:29:39.360 --> 00:29:44.799 because not much of my 00:29:44.799 --> 00:29:46.080 benchmark benchmarking 00:29:46.080 --> 00:29:50.960 actually pointed at it being the problem. 29:50.960 --> 00:29:55.760 Probably because the accessors 00:29:55.760 --> 00:29:57.520 were not the actual problem, 29:57.520 --> 00:30:00.399 like the oref macros 00:30:00.399 --> 00:30:06.960 and the code accessing the slots, 30:06.960 --> 00:30:10.720 but the construction, 00:30:10.720 --> 00:30:14.320 the object construction code, 30:14.320 --> 00:30:16.559 that was where most of the time 00:30:16.559 --> 00:30:21.200 was spent unnecessarily, 30:21.200 --> 00:30:24.240 maybe doing type-checking, 30:24.240 --> 00:30:28.880 maybe some other stuff. 30:28.880 --> 00:30:36.080 So if you have lots of values, 30:36.080 --> 00:30:39.120 you need to treat like objects in... and 00:30:39.120 --> 00:30:42.000 virtual dispatch on them in your package, 00:30:42.000 --> 00:30:45.200 you might want to look into 30:45.200 --> 00:30:52.239 cl-defstruct for them. 30:52.240 --> 00:30:54.080 Going on to the next section, 00:30:54.080 --> 00:31:01.200 I have prepared the sort of comparison 31:01.200 --> 00:31:05.519 between cl-lib, dash, and seq, 00:31:05.519 --> 00:31:09.360 the collection libraries we have 31:09.360 --> 00:31:11.279 available for us in Emacs. 00:31:11.279 --> 00:31:15.760 That is the popular ones, 31:15.760 --> 00:31:21.440 but since I'm running behind on time, 00:31:21.440 --> 00:31:27.760 I'll probably just summarize the findings. 31:27.760 --> 00:31:31.919 First of all, seq is nice. 00:31:31.919 --> 00:31:38.480 Its generic approach is probably 31:38.480 --> 00:31:41.919 quite decent for most of the situations, 31:41.919 --> 00:31:49.840 but there are places where it could be 31:49.840 --> 00:31:53.679 optimized better, so instead of having 31:53.679 --> 00:31:56.399 quadratic performance, it could use a 31:56.399 --> 00:31:59.200 hash table, like for instance, 31:59.200 --> 00:32:02.000 dash does here-- 32:02.000 --> 00:32:06.240 in dash, union or delete-dups 00:32:06.240 --> 00:32:12.960 does in its implementation. 32:12.960 --> 00:32:16.640 The dash itself is curiously fast, 00:32:16.640 --> 00:32:20.000 at least faster than I might have expected, 32:20.000 --> 00:32:21.600 possibly because of 00:32:21.600 --> 00:32:25.120 the implementation approach 00:32:25.120 --> 00:32:33.200 where it uses code generation 00:32:33.200 --> 00:32:35.840 to avoid function calls, 32:35.840 --> 00:32:37.840 at least some of them, 32:37.840 --> 00:32:41.120 which is interesting. 32:41.120 --> 00:32:45.600 But since both seq and dash 00:32:45.600 --> 00:32:49.919 avoid mutations--they don't really have 32:49.919 --> 00:32:52.399 mutating counterparts to common functions 00:32:52.399 --> 00:32:56.480 like you have with cl-remove-if, cl-delete-if, 32:56.480 --> 00:33:02.000 or just cl-remove, cl-delete, 33:02.000 --> 00:33:08.000 it still can be valuable to look into 00:33:08.000 --> 00:33:12.240 destructive versions of those functions, 33:12.240 --> 00:33:16.880 something from the core library 00:33:16.880 --> 00:33:22.559 like delete-dups or nreverse, 33:22.559 --> 00:33:24.399 for your code when you're really trying 33:24.399 --> 00:33:29.519 to get as close to the metal 00:33:29.519 --> 00:33:34.320 or whatever as you can, 00:33:34.320 --> 00:33:40.080 because avoiding extra allocations, 00:33:40.080 --> 00:33:43.200 it can really be useful. 33:43.200 --> 00:33:46.159 You can really improve their performance 33:46.159 --> 00:33:54.080 if you don't do a lot of other stuff. 33:54.080 --> 00:34:00.080 delete-consecutive-dups is blazing faster. 34:00.080 --> 00:34:03.919 It only requires pre-sorted strings. 34:03.919 --> 00:34:08.399 What else to say... 34:08.399 --> 00:34:13.200 If you are going to read these measurements, 34:13.200 --> 00:34:15.359 make sure to keep in mind 00:34:15.359 --> 00:34:18.000 that reverse is not free, 34:18.000 --> 00:34:20.240 so for instance, if we're looking 00:34:20.240 --> 00:34:22.480 at this comparison 00:34:22.480 --> 00:34:26.399 between remove and delete, for instance, 34:26.399 --> 00:34:27.919 they're using reverse 00:34:27.919 --> 00:34:32.159 to avoid modifying the data, 34:32.159 --> 00:34:34.800 the sample data, so we don't have to 34:34.800 --> 00:34:36.720 create it every time. 34:36.720 --> 00:34:41.919 But to compare how much faster 00:34:41.919 --> 00:34:43.760 delete is than remove, 34:43.760 --> 00:34:50.800 we need to subtract 787 milliseconds 34:50.800 --> 00:34:52.320 from here and from here 00:34:52.320 --> 00:34:58.480 so it comes out to like 230 milliseconds 00:34:58.480 --> 00:35:02.560 in this example, the last example, 00:35:02.560 --> 00:35:12.000 and 100 to 1 second, 250 milliseconds here, 00:35:12.000 --> 00:35:17.599 so the difference is 5-fold here. 35:17.599 --> 00:35:20.160 Not 2-fold. 35:20.160 --> 00:35:26.480 All right. With this, I'm going to 35:26.480 --> 00:35:29.520 thank you for listening, for watching 35:29.520 --> 00:35:31.920 and I'll be taking questions. 35:31.920 --> 00:35:32.920 Thank you. 00:35:32.920 --> 00:35:35.160 [captions by sachac]