From 130b7acf09f06ee90f7e12137f151ddc639c89ad Mon Sep 17 00:00:00 2001 From: Sacha Chua Date: Sun, 28 Nov 2021 09:15:23 -0500 Subject: Update --- ...ing-emacs-lisp-code--dmitry-gutov--chapters.vtt | 31 + ...imizing-emacs-lisp-code--dmitry-gutov--main.vtt | 1528 ++++++++++++++++++++ 2021/info/faster-schedule.md | 4 +- 2021/schedule-details.md | 2 +- 4 files changed, 1562 insertions(+), 3 deletions(-) create mode 100644 2021/captions/emacsconf-2021-faster--optimizing-emacs-lisp-code--dmitry-gutov--chapters.vtt create mode 100644 2021/captions/emacsconf-2021-faster--optimizing-emacs-lisp-code--dmitry-gutov--main.vtt diff --git a/2021/captions/emacsconf-2021-faster--optimizing-emacs-lisp-code--dmitry-gutov--chapters.vtt b/2021/captions/emacsconf-2021-faster--optimizing-emacs-lisp-code--dmitry-gutov--chapters.vtt new file mode 100644 index 00000000..9aa57433 --- /dev/null +++ b/2021/captions/emacsconf-2021-faster--optimizing-emacs-lisp-code--dmitry-gutov--chapters.vtt @@ -0,0 +1,31 @@ +WEBVTT + +00:00:01.120 --> 00:02:36.479 +Introduction + +00:02:36.480 --> 00:04:19.839 +Emacs Lisp is a little old + +00:04:19.840 --> 00:05:03.038 +Benchmark then optimize, not vice versa + +00:05:03.039 --> 00:09:31.199 +profiler-start + +00:09:31.200 --> 00:13:01.359 +elp - Emacs Lisp Profiler + +00:13:01.360 --> 00:19:13.439 +benchmark + +00:19:13.440 --> 00:20:00.239 +Write less code + +00:20:00.240 --> 00:22:52.158 +Reduce allocations + +00:22:52.159 --> 00:30:52.239 +Recent optimizations in Xref + +00:30:52.240 --> 00:30:53.240 +cl-lib, dash, and seq diff --git a/2021/captions/emacsconf-2021-faster--optimizing-emacs-lisp-code--dmitry-gutov--main.vtt b/2021/captions/emacsconf-2021-faster--optimizing-emacs-lisp-code--dmitry-gutov--main.vtt new file mode 100644 index 00000000..057b85f5 --- /dev/null +++ b/2021/captions/emacsconf-2021-faster--optimizing-emacs-lisp-code--dmitry-gutov--main.vtt @@ -0,0 +1,1528 @@ +WEBVTT + +00:01.120 --> 00:00:04.640 +Hi. Greetings from the cloudy St. Petersburg. + +00:04.640 --> 00:00:06.080 +My name is Dmitry Gutov. + +00:00:06.080 --> 00:00:09.280 +I write Ruby by day, Emacs Lisp by night + +00:00:09.280 --> 00:00:12.080 +when I don't do anything else. + +00:12.080 --> 00:00:14.559 +You might know my work + +00:00:14.559 --> 00:00:18.160 +from a number of third-party packages + +00:00:18.160 --> 00:00:22.240 +as well as built-in ones. + +00:22.240 --> 00:00:25.599 +The idea for this talk came out from + +00:25.599 --> 00:00:26.800 +an improvement request + +00:00:26.800 --> 00:00:32.960 +for the performance of grep-like commands + +00:32.960 --> 00:00:35.200 +using the xref interface + +00:00:35.200 --> 00:00:43.280 +and its storage types, container types + +00:00:43.280 --> 00:00:47.840 +for the search results, + +00:47.840 --> 00:00:51.440 +complaining that it's not as fast + +00:00:51.440 --> 00:00:54.320 +as it potentially could have been + +00:54.320 --> 00:00:58.239 +when there are lots of search results + +00:00:58.239 --> 00:01:01.600 +coming for a given search. + +01:01.600 --> 00:01:06.159 +I have noticed myself that + +00:01:06.159 --> 00:01:10.159 +when working on it, and probably before. + +00:01:10.159 --> 00:01:16.560 +My approach to optimizing Lisp code + +01:16.560 --> 00:01:21.680 +has changed recently-ish, + +01:21.680 --> 00:01:26.960 +and I'd like to talk about it here. + +01:26.960 --> 00:01:34.079 +This talk is for people who already + +01:34.079 --> 00:01:37.040 +know how to write some Emacs Lisp + +00:01:37.040 --> 00:01:38.880 +to solve their problems + +00:01:38.880 --> 00:01:43.360 +and who have possibly encountered + +00:01:43.360 --> 00:01:49.200 +some performance issues doing that. + +01:49.200 --> 00:01:51.600 +Also, if you want to contribute + +00:01:51.600 --> 00:01:54.799 +some improvements to the code + +00:01:54.799 --> 00:01:56.960 +that is already in Emacs + +00:01:56.960 --> 00:02:02.159 +or to other packages + +00:02:02.159 --> 00:02:04.479 +which are owned by somebody else, + +00:02:04.479 --> 00:02:13.520 +it should also be helpful. Let's start. + +00:02:13.520 --> 00:02:18.640 +First of all, about Emacs and Emacs Lisp. + +00:02:18.640 --> 00:02:21.360 +It's not the fastest language. + +02:21.360 --> 00:02:26.560 +Let's switch to the notes. + +00:02:26.560 --> 00:02:31.440 +I hope the font is big enough + +00:02:31.440 --> 00:02:36.479 +to be readable on this video, really hope. + +02:36.480 --> 00:02:40.160 +Emacs Lisp is not the fastest of the bunch. + +00:02:40.160 --> 00:02:44.400 +The garbage collector creates pauses + +00:02:44.400 --> 00:02:48.800 +whenever it needs to sweep data. + +02:48.800 --> 00:02:51.760 +The interpreter is not the fastest one, + +02:51.760 --> 00:02:54.720 +even though the native compilation branch + +00:02:54.720 --> 00:02:59.120 +is improving on that by twice + +00:02:59.120 --> 00:03:04.640 +in certain scenarios, which is good. + +03:04.640 --> 00:03:10.560 +The standard library or functions + +00:03:10.560 --> 00:03:12.800 +for working with collections, for example, + +03:12.800 --> 00:03:17.599 +are not so uniformly great. + +03:17.599 --> 00:03:20.080 +So there is some work to be done there. + +03:20.080 --> 00:03:22.239 +Maybe you can contribute + +03:22.239 --> 00:03:26.640 +the next improvement in there. + +03:26.640 --> 00:03:33.200 +And also, if your package displays stuff, + +00:03:33.200 --> 00:03:35.040 +it has a visual component, + +03:35.040 --> 00:03:36.959 +then you might have to deal with + +00:03:36.959 --> 00:03:43.120 +some drawbacks of the display engine + +03:43.120 --> 00:03:47.760 +which might slow to a crawl + +00:03:47.760 --> 00:03:53.680 +when your code has... + +00:03:53.680 --> 00:03:56.560 +when you're trying to display lines + +00:03:56.560 --> 00:04:00.319 +which are a little too long-- + +00:04:00.319 --> 00:04:03.120 +trying to print a [long] line, for example-- + +04:03.120 --> 00:04:07.120 +or you are using a lot of overlays, + +04:07.120 --> 00:04:09.120 +if you know what it is. + +04:09.120 --> 00:04:12.640 +With lots of overlays + +00:04:12.640 --> 00:04:19.839 +on the same visual line, in particular. + +04:19.840 --> 00:04:24.160 +But okay. The first thing to understand, + +00:04:24.160 --> 00:04:27.680 +and I hope everybody who's done + +00:04:27.680 --> 00:04:29.520 +some programming in the past knows, + +00:04:29.520 --> 00:04:35.120 +it's: first you write correctly, + +00:04:35.120 --> 00:04:37.360 +and then you try to benchmark. + +04:37.360 --> 00:04:39.199 +and see where the problems are, + +00:04:39.199 --> 00:04:40.720 +if there are any. + +04:40.720 --> 00:04:45.919 +So first do it right, then do it fast. + +04:45.919 --> 00:04:50.720 +How do we find the hotspots, + +00:04:50.720 --> 00:04:52.400 +the bottlenecks on the second step? + +00:04:52.400 --> 00:04:54.720 +We try to do some profiling + +00:04:54.720 --> 00:04:58.960 +or measuring how long the code takes. + +04:58.960 --> 00:05:03.038 +Emacs has two different profilers. + +00:05:03.039 --> 00:05:05.440 +One is the native profiler, + +05:05.440 --> 00:05:10.639 +which as I recall was contributed by + +00:05:10.639 --> 00:05:13.280 +Tomohiro Matsuyama, the author + +00:05:13.280 --> 00:05:16.240 +of the autocomplete package + +05:16.240 --> 00:05:22.720 +back in Emacs 24.3, apparently. + +05:22.720 --> 00:05:24.320 +It's a low overhead profiler. + +00:05:24.320 --> 00:05:31.520 +It's sampling, which is good on one hand + +05:31.520 --> 00:05:33.919 +but might be less good on the other hand. + +05:33.919 --> 00:05:35.199 +Let's try using it. + +00:05:35.199 --> 00:05:40.960 +So we start with profiler-start. + +00:05:40.960 --> 00:05:44.960 +Let's just ask for CPU profiling here, + +00:05:44.960 --> 00:05:48.320 +but it also can profile memory locations. + +05:48.320 --> 00:05:56.400 +Let's try a search for some string + +00:05:56.400 --> 00:06:01.360 +in the current project. + +06:01.360 --> 00:06:04.080 +Let's do it a few times, maybe three times, + +00:06:04.080 --> 00:06:09.360 +so the profiler has more data to work with. + +06:09.360 --> 00:06:13.680 +Let's call the report. + +06:13.680 --> 00:06:16.960 +Okay. So here we have the tree, + +00:06:16.960 --> 00:06:19.360 +the execution tree with percentages + +00:06:19.360 --> 00:06:22.960 +of the time spent in each function. + +06:22.960 --> 00:06:25.840 +You can unwrap it by going up and down + +00:06:25.840 --> 00:06:35.039 +and pressing TAB to unwrap every element. + +06:35.039 --> 00:06:52.000 +This is weird. + +06:52.000 --> 00:06:55.199 +Okay, here we see that the actual command + +00:06:55.199 --> 00:06:56.639 +that was called only takes + +00:06:56.639 --> 00:06:59.599 +8% of the whole runtime, + +06:59.599 --> 00:07:07.759 +meaning the input was... + +07:07.759 --> 00:07:10.960 +The command took not enough time + +00:07:10.960 --> 00:07:13.360 +for us to really dig into it. + +00:07:13.360 --> 00:07:18.960 +Let's try a shorter input with more matches. + +00:07:18.960 --> 00:07:25.680 +So profiler-start again, CPU. + +07:25.680 --> 00:07:34.240 +Let's search for list. + +07:34.240 --> 00:07:40.160 +Don't mind the minibuffer just yet. + +07:40.160 --> 00:07:47.680 +Okay. So let's look at the report. + +07:47.680 --> 00:07:52.000 +We can unwrap it here, and we see + +00:07:52.000 --> 00:07:55.759 +52% of the time was spent doing this, + +07:55.759 --> 00:07:58.240 +at least according to the profile. + +00:07:58.240 --> 00:08:00.960 +We can unwrap it, see the description + +00:08:00.960 --> 00:08:02.800 +of any function that was called + +00:08:02.800 --> 00:08:08.479 +by hitting d, or even jump to it + +00:08:08.479 --> 00:08:12.960 +by tapping f. + +08:12.960 --> 00:08:16.160 +By going down the stream, + +08:16.160 --> 00:08:18.240 +we unwrap it with TAB, + +08:18.240 --> 00:08:22.400 +you can see where time was spent. + +08:22.400 --> 00:08:25.199 +One of the bigger drawbacks + +00:08:25.199 --> 00:08:30.639 +of this profiler is the arithmetics + +08:30.639 --> 00:08:39.200 +don't always work out, + +00:08:39.200 --> 00:08:45.760 +like you might have... + +08:45.760 --> 00:08:47.519 +This is not a good example, + +00:08:47.519 --> 00:08:52.640 +but okay, you have 14% spent in here, + +00:08:52.640 --> 00:08:57.200 +but when we expand this entry, + +00:08:57.200 --> 00:09:01.760 +we only see like 6%, 3%, and 0%. + +00:09:01.760 --> 00:09:06.640 +Different sum, sometimes even bigger than that. + +00:09:06.640 --> 00:09:10.800 +So the native profiler + +00:09:10.800 --> 00:09:13.200 +can give an accurate picture + +00:09:13.200 --> 00:09:15.920 +and it has little overhead, + +00:09:15.920 --> 00:09:20.399 +but the specific numbers + +00:09:20.399 --> 00:09:22.959 +are not very precise + +09:22.959 --> 00:09:28.640 +because the principle is probabilistic. + +09:28.640 --> 00:09:31.199 +Let's stop here. + +09:31.200 --> 00:09:36.959 +There is another package called elp, + +09:36.959 --> 00:09:39.360 +Emacs Lisp Profiler, + +00:09:39.360 --> 00:09:43.440 +which is much older than that. + +09:43.440 --> 00:09:47.920 +It allows us to instrument + +09:47.920 --> 00:09:53.600 +just specific functions or a package. + +09:53.600 --> 00:09:57.680 +We're instrumenting the xref package here. + +09:57.680 --> 00:10:01.360 +It works through advice. You can see + +00:10:01.360 --> 00:10:03.279 +the description of one of the functions + +00:10:03.279 --> 00:10:04.320 +in the package, and we see + +00:10:04.320 --> 00:10:12.640 +that it has had :around device added. + +10:12.640 --> 00:10:18.000 +If we run the same search, + +10:18.000 --> 00:10:21.360 +we can see that -- when it finishes -- + +00:10:21.360 --> 00:10:30.399 +the table of all the numbers, + +10:30.399 --> 00:10:32.640 +that every function in the package + +00:10:32.640 --> 00:10:41.680 +has been called, and the times + +00:10:41.680 --> 00:10:48.000 +which the runtime has spent inside of them. + +10:48.000 --> 00:10:55.040 +sorted by impact, as it understands that. + +10:55.040 --> 00:11:00.079 +The main problem with this profiler is + +00:11:00.079 --> 00:11:04.160 +it is slower because it adds overhead + +00:11:04.160 --> 00:11:06.880 +to every function call, + +11:06.880 --> 00:11:11.519 +so it, in the end, might give an impression + +00:11:11.519 --> 00:11:19.519 +that functions with lots of calls, + +11:19.519 --> 00:11:21.839 +which have been called a lot of times, + +11:21.839 --> 00:11:26.880 +are more important, + +11:26.880 --> 00:11:29.279 +hotter than they actually are, + +00:11:29.279 --> 00:11:30.959 +because it slows down + +00:11:30.959 --> 00:11:32.640 +every such function call, + +11:32.640 --> 00:11:35.279 +including the subsequent calls + +00:11:35.279 --> 00:11:38.160 +that these functions do + +00:11:38.160 --> 00:11:40.560 +inside the same package. + +11:40.560 --> 00:11:48.240 +So it's a good way to analyze and drill down + +11:48.240 --> 00:11:50.320 +to see which functions here + +00:11:50.320 --> 00:11:51.200 +take a lot of time, + +00:11:51.200 --> 00:11:53.839 +but just keep in mind + +00:11:53.839 --> 00:11:57.760 +that sometimes you might end up + +00:11:57.760 --> 00:12:00.079 +trying to optimize one of these + +00:12:00.079 --> 00:12:01.600 +smaller functions called + +00:12:01.600 --> 00:12:09.279 +or usually smaller, and that the result + +00:12:09.279 --> 00:12:11.040 +might not actually affect + +00:12:11.040 --> 00:12:18.720 +the production runtime at all. + +12:18.720 --> 00:12:22.240 +But it's still a good tool, + +00:12:22.240 --> 00:12:25.440 +especially when you already know + +12:25.440 --> 00:12:30.560 +which set of actions you are interested in, + +00:12:30.560 --> 00:12:33.760 +and which functions might be slower. + +12:33.760 --> 00:12:37.120 +elp allows you to instrument a package + +00:12:37.120 --> 00:12:42.720 +or a set of functions. Just don't forget + +00:12:42.720 --> 00:12:49.600 +to un-instrument them all afterwards, + +00:12:49.600 --> 00:12:52.000 +or else your session, + +00:12:52.000 --> 00:12:55.920 +your subsequent optimization efforts + +00:12:55.920 --> 00:12:59.360 +might not work as well as you might + +12:59.360 --> 00:13:01.359 +want to work. + +13:01.360 --> 00:13:04.560 +And there's also this very nice, + +00:13:04.560 --> 00:13:09.600 +very handy package called benchmark, + +13:09.600 --> 00:13:12.480 +which unfortunately + +00:13:12.480 --> 00:13:16.000 +not everybody knows about. + +13:16.000 --> 00:13:21.519 +It turns out that a lot of + +00:13:21.519 --> 00:13:30.079 +older Emacs Lisp developers, + +13:30.079 --> 00:13:34.160 +users, package developers usually have + +00:13:34.160 --> 00:13:38.000 +some benchmarking macros of their own. + +00:13:38.000 --> 00:13:42.880 +But there is this package + +00:13:42.880 --> 00:13:46.160 +with perfectly usable interface + +00:13:46.160 --> 00:13:49.199 +which doesn't require you + +00:13:49.199 --> 00:13:51.120 +to define something else, + +00:13:51.120 --> 00:13:52.560 +especially when you are in + +00:13:52.560 --> 00:13:57.120 +an emacs -Q session + +13:57.120 --> 00:14:00.480 +trying to benchmark your code + +00:14:00.480 --> 00:14:05.440 +in a bare Emacs. So it has + +00:14:05.440 --> 00:14:09.680 +two main endpoints, I would say. + +14:09.680 --> 00:14:14.480 +First one is the benchmark macro, + +14:14.480 --> 00:14:19.199 +and the second one is benchmark-progn. + +00:14:19.199 --> 00:14:20.399 +benchmark is a function, + +00:14:20.399 --> 00:14:25.040 +and benchmark-progn is a macro. + +14:25.040 --> 00:14:27.839 +The first one you can use by specifying + +00:14:27.839 --> 00:14:30.480 +the number of iterations and the form. + +00:14:30.480 --> 00:14:36.800 +I hope the minibuffer is easy to read here. + +14:36.800 --> 00:14:43.360 +For instance, we can take this long list + +00:14:43.360 --> 00:14:44.800 +and try nreverse-ing it, + +00:14:44.800 --> 00:14:49.680 +and we see how long that takes. + +14:49.680 --> 00:14:54.160 +Then you can do it, + +14:54.160 --> 00:14:55.600 +adjust the inner code, + +00:14:55.600 --> 00:14:58.079 +and then basically compare + +00:14:58.079 --> 00:14:59.519 +to figure out how much + +00:14:59.519 --> 00:15:01.440 +and how long nreverse takes + +00:15:01.440 --> 00:15:03.839 +in this scenario. + +15:03.839 --> 00:15:07.760 +Or, we can take a function + +00:15:07.760 --> 00:15:10.880 +which we have probably found + +15:10.880 --> 00:15:12.880 +using one of the previous methods + +00:15:12.880 --> 00:15:17.279 +which we anticipate that, + +15:17.279 --> 00:15:19.440 +as we understand, it takes a while, + +15:19.440 --> 00:15:26.720 +and annotate it with a benchmark-progn form, + +00:15:26.720 --> 00:15:33.360 +which... just execute the body + +00:15:33.360 --> 00:15:36.880 +and report, each and every one of them, + +15:36.880 --> 00:15:41.360 +how long that body execution took. + +15:41.360 --> 00:15:45.360 +So for instance, here we added + +00:15:45.360 --> 00:15:46.959 +a few message calls + +00:15:46.959 --> 00:15:49.040 +inside those benchmark-progns, + +00:15:49.040 --> 00:15:53.199 +so we can see how long each part + +00:15:53.199 --> 00:16:03.600 +of the function xref-matches-in-files + +00:16:03.600 --> 00:16:06.320 +takes when it is run. + +16:06.320 --> 00:16:11.839 +Let's try it. Let's first call + +16:11.839 --> 00:16:17.199 +emacs-lisp-byte-compile-and-load, + +16:17.199 --> 00:16:20.720 +so that we're sure that we are running + +16:20.720 --> 00:16:24.079 +the fastest possible version of this code, + +16:24.079 --> 00:16:29.519 +so when we do find the bottlenecks, + +00:16:29.519 --> 00:16:31.279 +we are sure that they are real + +00:16:31.279 --> 00:16:36.240 +and not because of the uncompiled code, + +00:16:36.240 --> 00:16:43.519 +macro expansion, or the lack of + +00:16:43.519 --> 00:16:46.160 +some substitutions, other substitutions + +00:16:46.160 --> 00:16:50.800 +that our code is relying on + +16:50.800 --> 00:17:01.279 +but which might not be available + +17:01.279 --> 00:17:03.279 +in the interpreter mode + +00:17:03.279 --> 00:17:07.039 +just in the compiled code. + +17:07.039 --> 00:17:10.160 +Let's run. + +17:10.160 --> 00:17:14.880 +So we have this list, + +17:14.880 --> 00:17:26.160 +search for list, + +17:26.160 --> 00:17:29.760 +and the remaining time is spent during + +00:17:29.760 --> 00:17:36.720 +printing of the results, which we didn't + +17:36.720 --> 00:17:38.320 +annotate with the benchmarking macro, + +00:17:38.320 --> 00:17:41.280 +but it's still there. + +17:41.280 --> 00:17:43.679 +So we can see by switching to the + +17:43.679 --> 00:17:49.919 +*Messages* buffer, C-h e, + +17:49.919 --> 00:17:52.080 +that unquoting was very fast. + +00:17:52.080 --> 00:17:59.280 +So the search took 400 milliseconds. + +00:17:59.280 --> 00:18:02.480 +It's slower than it would be + +18:02.480 --> 00:18:06.080 +without the video being recorded, as well + +00:18:06.080 --> 00:18:09.039 +as the rest of the measurements here. + +18:09.039 --> 00:18:12.160 +So the parsing of the results took more + +00:18:12.160 --> 00:18:16.559 +than that, and the object allocation + +18:16.559 --> 00:18:21.919 +took even more, which is unfortunate + +00:18:21.919 --> 00:18:23.919 +but it's the reality + +00:18:23.919 --> 00:18:26.400 +when we're dealing with a lot of + +00:18:26.400 --> 00:18:29.840 +search results, because, well, + +18:29.840 --> 00:18:33.520 +Emacs Lisp is slower than grep. + +18:34.559 --> 00:18:49.200 +That's just a given. + +18:49.200 --> 00:18:54.400 +What can be done and what had been done + +00:18:54.400 --> 00:18:57.120 +to improve on this? + +18:57.120 --> 00:19:04.240 +Well, first of all, + +19:04.240 --> 00:19:06.000 +let's change the question, + +00:19:06.000 --> 00:19:07.360 +because this is more of + +00:19:07.360 --> 00:19:09.760 +a retrospective sort of talk + +19:09.760 --> 00:19:13.439 +rather than talking about future plans. + +00:19:13.440 --> 00:19:19.679 +What can one do to improve performance? + +19:19.679 --> 00:19:24.160 +Well, basically, two things: + +00:19:24.160 --> 00:19:27.919 +first, you try to make sure + +00:19:27.919 --> 00:19:30.559 +that you're writing less code, + +00:19:30.559 --> 00:19:37.200 +then your code does fewer things, + +19:37.200 --> 00:19:40.400 +Fewer iterations of your operations. + +19:40.400 --> 00:19:42.000 +Basically, it's the realm + +00:19:42.000 --> 00:19:46.880 +of choosing your algorithm well. + +19:46.880 --> 00:19:54.880 +But make sure it's as little complexity + +00:19:54.880 --> 00:20:00.239 +as you can manage reasonably. + +20:00.240 --> 00:20:05.760 +Another is to try to reduce memory allocations. + +20:05.760 --> 00:20:13.760 +It's creating conses, the links of the list, + +20:13.760 --> 00:20:17.919 +of the lists. If you are mapping through + +00:20:17.919 --> 00:20:19.200 +a list, creating a new one, + +00:20:19.200 --> 00:20:22.640 +you are creating conses. + +00:20:22.640 --> 00:20:26.159 +New objects in memory, anyway. + +20:26.159 --> 00:20:30.480 +And also, when you call string operations + +00:20:30.480 --> 00:20:33.679 +like substring, when you create new strings, + +20:33.679 --> 00:20:36.640 +that's also what you do. + +20:36.640 --> 00:20:39.200 +When you allocate new memory, + +00:20:39.200 --> 00:20:43.600 +that triggers garbage collections later, + +00:20:43.600 --> 00:20:49.200 +which affect the performance of your code + +20:49.200 --> 00:21:01.600 +quite severely. I have found that actually, + +00:21:01.600 --> 00:21:04.960 +contrary to my personal intuition, + +21:04.960 --> 00:21:08.240 +when you try to fine-tune the code + +00:21:08.240 --> 00:21:13.120 +working on long lists of elements, + +00:21:13.120 --> 00:21:15.520 +long, long collection, large collections, + +00:21:15.520 --> 00:21:20.159 +it's even more important to try to avoid + +21:20.159 --> 00:21:23.200 +or reduce memory allocations + +21:23.200 --> 00:21:26.400 +rather than try to ensure + +00:21:26.400 --> 00:21:32.240 +that less code is running, + +21:32.240 --> 00:21:34.960 +less operations are performed, + +21:34.960 --> 00:21:38.080 +because the garbage collector + +21:38.080 --> 00:21:44.799 +can hit you in the posterior quite suddenly + +00:21:44.799 --> 00:21:49.520 +that you will not always... + +21:49.520 --> 00:21:51.760 +When you measure the input impact, + +00:21:51.760 --> 00:21:56.799 +you might not always measure the whole of it. + +21:56.799 --> 00:22:02.640 +And sometimes even when you have + +00:22:02.640 --> 00:22:06.559 +a few operations, you can measure + +22:06.559 --> 00:22:09.520 +all three of them, but at some point, + +00:22:09.520 --> 00:22:11.919 +if you just reduce a lot, + +22:11.919 --> 00:22:14.720 +remove most of the allocations, + +00:22:14.720 --> 00:22:16.799 +all three of them, the improvement + +00:22:16.799 --> 00:22:21.520 +might be even bigger than the total + +22:21.520 --> 00:22:23.440 +of three improvements + +00:22:23.440 --> 00:22:27.200 +which you might have measured previously, + +22:27.200 --> 00:22:29.679 +like separately. + +22:29.679 --> 00:22:33.039 +So it's something to be on the lookout for, + +00:22:33.039 --> 00:22:39.520 +but of course, when you pick the algorithm, + +00:22:39.520 --> 00:22:45.200 +it's important not to do + +00:22:45.200 --> 00:22:52.158 +more operations than you have to. + +22:52.159 --> 00:22:56.000 +We have examples of both + +00:22:56.000 --> 00:22:58.720 +in the recent changes + +00:22:58.720 --> 00:23:02.559 +to the xref and project packages, + +23:02.559 --> 00:23:06.960 +which we can examine here. + +23:06.960 --> 00:23:11.200 +Let's take a look at this one. + +23:11.200 --> 00:23:14.880 +This commit message lies a little bit + +23:14.880 --> 00:23:19.039 +because it's referring to + +00:23:19.039 --> 00:23:24.559 +the use of assoc instead of cl-assoc, + +23:24.559 --> 00:23:29.919 +and the actual change incorporates that, + +00:23:29.919 --> 00:23:33.440 +but it also incorporates + +00:23:33.440 --> 00:23:39.440 +the second part of the sentence + +00:23:39.440 --> 00:23:41.360 +which actually replaced + +00:23:41.360 --> 00:23:46.960 +the use of assoc in there. + +23:46.960 --> 00:23:50.480 +Curiously, cl-assoc was pretty slow + +00:23:50.480 --> 00:23:52.559 +because not only it created + +00:23:52.559 --> 00:23:57.919 +quadratic complexity in the operation + +23:57.919 --> 00:23:59.919 +and it was also calling + +00:23:59.919 --> 00:24:02.880 +the Lisp function [equal] + +24:02.880 --> 00:24:04.400 +for every iteration, + +00:24:04.400 --> 00:24:08.880 +whereas if we just use assoc there, + +24:08.880 --> 00:24:10.080 +which was the first version + +00:24:10.080 --> 00:24:13.919 +of this change, of the improvement, + +24:13.919 --> 00:24:15.760 +it became already much faster, + +00:24:15.760 --> 00:24:20.640 +but then switching to a hash table + +24:20.640 --> 00:24:28.080 +which turned this lookup + +00:24:28.080 --> 00:24:31.760 +from O(n) complexity + +24:31.760 --> 00:24:35.600 +into, well, amortized constant one, + +24:35.600 --> 00:24:37.600 +even better. + +24:37.600 --> 00:24:45.679 +So, use hash tables, kids. + +24:45.679 --> 00:24:52.400 +Another commit here is about using + +00:24:52.400 --> 00:24:55.520 +the inhibit-modification-hooks. + +00:24:55.520 --> 00:24:58.000 +So, turns out when you're printing + +00:24:58.000 --> 00:25:01.679 +into a buffer, even if you have already + +25:01.679 --> 00:25:06.159 +disabled the undo history, + +25:06.159 --> 00:25:08.480 +binding this variable to + +25:08.480 --> 00:25:10.880 +a non-null value is pretty good + +00:25:10.880 --> 00:25:15.520 +because you are able to avoid running + +25:15.520 --> 00:25:21.120 +a number of hooks, which improves performance. + +25:21.120 --> 00:25:28.640 +Next. This one was about moving the + +00:25:28.640 --> 00:25:32.000 +file-remote-p call + +25:32.000 --> 00:25:37.279 +from inside the loop. + +25:37.279 --> 00:25:42.039 +This function is actually + +00:25:42.039 --> 00:25:49.039 +surprisingly slow-ish for the goal, + +25:49.039 --> 00:25:52.000 +for its purpose, so you don't really want + +25:52.000 --> 00:25:56.159 +to call it on every file name in the list + +00:25:56.159 --> 00:25:59.520 +when you have a lot of them, + +25:59.520 --> 00:26:02.000 +and especially if your code + +00:26:02.000 --> 00:26:04.640 +is running in a buffer + +00:26:04.640 --> 00:26:10.640 +visiting a remote file, like through TRAMP. + +26:10.640 --> 00:26:15.120 +You might end up trying to + +00:26:15.120 --> 00:26:16.880 +devise different approaches + +00:26:16.880 --> 00:26:20.159 +to avoid checking whether + +00:26:20.159 --> 00:26:23.279 +every file name is remote + +26:23.279 --> 00:26:25.360 +if you're dealing with a list of file names, + +00:26:25.360 --> 00:26:34.640 +so this one, take a look, be careful with it. + +26:34.640 --> 00:26:43.120 +A similar, slower function, with-current-buffer, + +26:43.120 --> 00:26:46.880 +but not so much. So it all depends on + +26:46.880 --> 00:26:48.400 +really how often you call it. + +00:26:48.400 --> 00:26:51.279 +Sometimes you might want to + +00:26:51.279 --> 00:26:52.720 +rewrite your code so that + +00:26:52.720 --> 00:26:54.720 +it's called less often. + +26:54.720 --> 00:26:59.760 +And expand-file-name, which hits the disk. + +26:59.760 --> 00:27:01.200 +So if you're dealing with file names, + +00:27:01.200 --> 00:27:02.960 +you might want to replace it + +00:27:02.960 --> 00:27:07.520 +with concatenation at certain points. + +27:07.520 --> 00:27:22.159 +Okay, back to these changes later. + +27:22.159 --> 00:27:28.480 +This one just removed a location of a cons + +27:28.480 --> 00:27:33.679 +per match hit, and still it brought 5% + +27:33.679 --> 00:27:37.039 +or something like that improvement + +00:27:37.039 --> 00:27:41.120 +to the whole command. + +27:41.120 --> 00:27:46.000 +So that's a pretty significant improvement + +00:27:46.000 --> 00:27:53.600 +for a small change like that. + +27:53.600 --> 00:28:01.679 +Similarly, here we just made sure + +00:28:01.679 --> 00:28:09.520 +to avoid a splits... no, a substring call, + +28:09.520 --> 00:28:12.399 +and probably an allocation of the whole + +28:12.399 --> 00:28:16.320 +buffer string, but in my testing, + +00:28:16.320 --> 00:28:19.440 +that doesn't actually matter much, + +00:28:19.440 --> 00:28:22.399 +at least so much, but a substring call + +00:28:22.399 --> 00:28:28.960 +per result... If we see, + +00:28:28.960 --> 00:28:33.039 +since we changed to manual parsing + +00:28:33.039 --> 00:28:37.440 +of the buffer, with the strings + +00:28:37.440 --> 00:28:41.440 +delimited with zero bytes, + +28:41.440 --> 00:28:43.840 +it gave an overall improvement + +00:28:43.840 --> 00:28:47.440 +of 20%, again on my machine + +00:28:47.440 --> 00:28:50.720 +with a pretty fast SSD, + +00:28:50.720 --> 00:28:53.679 +and with a warm disk cache, of course. + +28:53.679 --> 00:29:05.360 +But still... Going back to this revision, + +00:29:05.360 --> 00:29:09.440 +it was actually quite surprising + +00:29:09.440 --> 00:29:15.200 +that migration to a cl-defstruct + +29:15.200 --> 00:29:24.080 +from eieio, the Common Lisp-inspired + +00:29:24.080 --> 00:29:26.480 +object system first introduced + +00:29:26.480 --> 00:29:34.240 +in the CEDET tools, + +00:29:34.240 --> 00:29:39.360 +that was a bit of a surprise, + +00:29:39.360 --> 00:29:44.799 +because not much of my + +00:29:44.799 --> 00:29:46.080 +benchmark benchmarking + +00:29:46.080 --> 00:29:50.960 +actually pointed at it being the problem. + +29:50.960 --> 00:29:55.760 +Probably because the accessors + +00:29:55.760 --> 00:29:57.520 +were not the actual problem, + +29:57.520 --> 00:30:00.399 +like the oref macros + +00:30:00.399 --> 00:30:06.960 +and the code accessing the slots, + +30:06.960 --> 00:30:10.720 +but the construction, + +00:30:10.720 --> 00:30:14.320 +the object construction code, + +30:14.320 --> 00:30:16.559 +that was where most of the time + +00:30:16.559 --> 00:30:21.200 +was spent unnecessarily, + +30:21.200 --> 00:30:24.240 +maybe doing type-checking, + +30:24.240 --> 00:30:28.880 +maybe some other stuff. + +30:28.880 --> 00:30:36.080 +So if you have lots of values, + +30:36.080 --> 00:30:39.120 +you need to treat like objects in... and + +00:30:39.120 --> 00:30:42.000 +virtual dispatch on them in your package, + +00:30:42.000 --> 00:30:45.200 +you might want to look into + +30:45.200 --> 00:30:52.239 +cl-defstruct for them. + +30:52.240 --> 00:30:54.080 +Going on to the next section, + +00:30:54.080 --> 00:31:01.200 +I have prepared the sort of comparison + +31:01.200 --> 00:31:05.519 +between cl-lib, dash, and seq, + +00:31:05.519 --> 00:31:09.360 +the collection libraries we have + +31:09.360 --> 00:31:11.279 +available for us in Emacs. + +00:31:11.279 --> 00:31:15.760 +That is the popular ones, + +31:15.760 --> 00:31:21.440 +but since I'm running behind on time, + +00:31:21.440 --> 00:31:27.760 +I'll probably just summarize the findings. + +31:27.760 --> 00:31:31.919 +First of all, seq is nice. + +00:31:31.919 --> 00:31:38.480 +Its generic approach is probably + +31:38.480 --> 00:31:41.919 +quite decent for most of the situations, + +31:41.919 --> 00:31:49.840 +but there are places where it could be + +31:49.840 --> 00:31:53.679 +optimized better, so instead of having + +31:53.679 --> 00:31:56.399 +quadratic performance, it could use a + +31:56.399 --> 00:31:59.200 +hash table, like for instance, + +31:59.200 --> 00:32:02.000 +dash does here-- + +32:02.000 --> 00:32:06.240 +in dash, union or delete-dups + +00:32:06.240 --> 00:32:12.960 +does in its implementation. + +32:12.960 --> 00:32:16.640 +The dash itself is curiously fast, + +00:32:16.640 --> 00:32:20.000 +at least faster than I might have expected, + +32:20.000 --> 00:32:21.600 +possibly because of + +00:32:21.600 --> 00:32:25.120 +the implementation approach + +00:32:25.120 --> 00:32:33.200 +where it uses code generation + +00:32:33.200 --> 00:32:35.840 +to avoid function calls, + +32:35.840 --> 00:32:37.840 +at least some of them, + +32:37.840 --> 00:32:41.120 +which is interesting. + +32:41.120 --> 00:32:45.600 +But since both seq and dash + +00:32:45.600 --> 00:32:49.919 +avoid mutations--they don't really have + +32:49.919 --> 00:32:52.399 +mutating counterparts to common functions + +00:32:52.399 --> 00:32:56.480 +like you have with cl-remove-if, cl-delete-if, + +32:56.480 --> 00:33:02.000 +or just cl-remove, cl-delete, + +33:02.000 --> 00:33:08.000 +it still can be valuable to look into + +00:33:08.000 --> 00:33:12.240 +destructive versions of those functions, + +33:12.240 --> 00:33:16.880 +something from the core library + +00:33:16.880 --> 00:33:22.559 +like delete-dups or nreverse, + +33:22.559 --> 00:33:24.399 +for your code when you're really trying + +33:24.399 --> 00:33:29.519 +to get as close to the metal + +00:33:29.519 --> 00:33:34.320 +or whatever as you can, + +00:33:34.320 --> 00:33:40.080 +because avoiding extra allocations, + +00:33:40.080 --> 00:33:43.200 +it can really be useful. + +33:43.200 --> 00:33:46.159 +You can really improve their performance + +33:46.159 --> 00:33:54.080 +if you don't do a lot of other stuff. + +33:54.080 --> 00:34:00.080 +delete-consecutive-dups is blazing faster. + +34:00.080 --> 00:34:03.919 +It only requires pre-sorted strings. + +34:03.919 --> 00:34:08.399 +What else to say... + +34:08.399 --> 00:34:13.200 +If you are going to read these measurements, + +34:13.200 --> 00:34:15.359 +make sure to keep in mind + +00:34:15.359 --> 00:34:18.000 +that reverse is not free, + +34:18.000 --> 00:34:20.240 +so for instance, if we're looking + +00:34:20.240 --> 00:34:22.480 +at this comparison + +00:34:22.480 --> 00:34:26.399 +between remove and delete, for instance, + +34:26.399 --> 00:34:27.919 +they're using reverse + +00:34:27.919 --> 00:34:32.159 +to avoid modifying the data, + +34:32.159 --> 00:34:34.800 +the sample data, so we don't have to + +34:34.800 --> 00:34:36.720 +create it every time. + +34:36.720 --> 00:34:41.919 +But to compare how much faster + +00:34:41.919 --> 00:34:43.760 +delete is than remove, + +34:43.760 --> 00:34:50.800 +we need to subtract 787 milliseconds + +34:50.800 --> 00:34:52.320 +from here and from here + +00:34:52.320 --> 00:34:58.480 +so it comes out to like 230 milliseconds + +00:34:58.480 --> 00:35:02.560 +in this example, the last example, + +00:35:02.560 --> 00:35:12.000 +and 100 to 1 second, 250 milliseconds here, + +00:35:12.000 --> 00:35:17.599 +so the difference is 5-fold here. + +35:17.599 --> 00:35:20.160 +Not 2-fold. + +35:20.160 --> 00:35:26.480 +All right. With this, I'm going to + +35:26.480 --> 00:35:29.520 +thank you for listening, for watching + +35:29.520 --> 00:35:31.920 +and I'll be taking questions. + +35:31.920 --> 00:35:32.920 +Thank you. + +00:35:32.920 --> 00:35:35.160 +[captions by sachac] diff --git a/2021/info/faster-schedule.md b/2021/info/faster-schedule.md index 60af8009..56227f90 100644 --- a/2021/info/faster-schedule.md +++ b/2021/info/faster-schedule.md @@ -2,13 +2,13 @@ Q&A: live Status: Now playing -Duration: 35:35 +Duration: 10135:34
+
    # Description diff --git a/2021/schedule-details.md b/2021/schedule-details.md index e0f75ff5..17846eae 100644 --- a/2021/schedule-details.md +++ b/2021/schedule-details.md @@ -1,4 +1,4 @@ -
    45 talks total: 37 captioned (480 min), 6 waiting for captions (129 min)
    +
    45 talks total: 37 captioned (10579 min), 6 waiting for captions (129 min)
    -- cgit v1.2.3
    StatusStartTitleSpeaker(s)
    Saturday morning
    ~ 9:00 AMOpening remarks
    done~ 9:19 AMEmacs News HighlightsSacha Chua