summaryrefslogtreecommitdiffstats
path: root/2021/captions
diff options
context:
space:
mode:
Diffstat (limited to '2021/captions')
-rw-r--r--2021/captions/emacsconf-2021-faster--optimizing-emacs-lisp-code--dmitry-gutov--chapters.vtt31
-rw-r--r--2021/captions/emacsconf-2021-faster--optimizing-emacs-lisp-code--dmitry-gutov--main.vtt1528
2 files changed, 1559 insertions, 0 deletions
diff --git a/2021/captions/emacsconf-2021-faster--optimizing-emacs-lisp-code--dmitry-gutov--chapters.vtt b/2021/captions/emacsconf-2021-faster--optimizing-emacs-lisp-code--dmitry-gutov--chapters.vtt
new file mode 100644
index 00000000..9aa57433
--- /dev/null
+++ b/2021/captions/emacsconf-2021-faster--optimizing-emacs-lisp-code--dmitry-gutov--chapters.vtt
@@ -0,0 +1,31 @@
+WEBVTT
+
+00:00:01.120 --> 00:02:36.479
+Introduction
+
+00:02:36.480 --> 00:04:19.839
+Emacs Lisp is a little old
+
+00:04:19.840 --> 00:05:03.038
+Benchmark then optimize, not vice versa
+
+00:05:03.039 --> 00:09:31.199
+profiler-start
+
+00:09:31.200 --> 00:13:01.359
+elp - Emacs Lisp Profiler
+
+00:13:01.360 --> 00:19:13.439
+benchmark
+
+00:19:13.440 --> 00:20:00.239
+Write less code
+
+00:20:00.240 --> 00:22:52.158
+Reduce allocations
+
+00:22:52.159 --> 00:30:52.239
+Recent optimizations in Xref
+
+00:30:52.240 --> 00:30:53.240
+cl-lib, dash, and seq
diff --git a/2021/captions/emacsconf-2021-faster--optimizing-emacs-lisp-code--dmitry-gutov--main.vtt b/2021/captions/emacsconf-2021-faster--optimizing-emacs-lisp-code--dmitry-gutov--main.vtt
new file mode 100644
index 00000000..057b85f5
--- /dev/null
+++ b/2021/captions/emacsconf-2021-faster--optimizing-emacs-lisp-code--dmitry-gutov--main.vtt
@@ -0,0 +1,1528 @@
+WEBVTT
+
+00:01.120 --> 00:00:04.640
+Hi. Greetings from the cloudy St. Petersburg.
+
+00:04.640 --> 00:00:06.080
+My name is Dmitry Gutov.
+
+00:00:06.080 --> 00:00:09.280
+I write Ruby by day, Emacs Lisp by night
+
+00:00:09.280 --> 00:00:12.080
+when I don't do anything else.
+
+00:12.080 --> 00:00:14.559
+You might know my work
+
+00:00:14.559 --> 00:00:18.160
+from a number of third-party packages
+
+00:00:18.160 --> 00:00:22.240
+as well as built-in ones.
+
+00:22.240 --> 00:00:25.599
+The idea for this talk came out from
+
+00:25.599 --> 00:00:26.800
+an improvement request
+
+00:00:26.800 --> 00:00:32.960
+for the performance of grep-like commands
+
+00:32.960 --> 00:00:35.200
+using the xref interface
+
+00:00:35.200 --> 00:00:43.280
+and its storage types, container types
+
+00:00:43.280 --> 00:00:47.840
+for the search results,
+
+00:47.840 --> 00:00:51.440
+complaining that it's not as fast
+
+00:00:51.440 --> 00:00:54.320
+as it potentially could have been
+
+00:54.320 --> 00:00:58.239
+when there are lots of search results
+
+00:00:58.239 --> 00:01:01.600
+coming for a given search.
+
+01:01.600 --> 00:01:06.159
+I have noticed myself that
+
+00:01:06.159 --> 00:01:10.159
+when working on it, and probably before.
+
+00:01:10.159 --> 00:01:16.560
+My approach to optimizing Lisp code
+
+01:16.560 --> 00:01:21.680
+has changed recently-ish,
+
+01:21.680 --> 00:01:26.960
+and I'd like to talk about it here.
+
+01:26.960 --> 00:01:34.079
+This talk is for people who already
+
+01:34.079 --> 00:01:37.040
+know how to write some Emacs Lisp
+
+00:01:37.040 --> 00:01:38.880
+to solve their problems
+
+00:01:38.880 --> 00:01:43.360
+and who have possibly encountered
+
+00:01:43.360 --> 00:01:49.200
+some performance issues doing that.
+
+01:49.200 --> 00:01:51.600
+Also, if you want to contribute
+
+00:01:51.600 --> 00:01:54.799
+some improvements to the code
+
+00:01:54.799 --> 00:01:56.960
+that is already in Emacs
+
+00:01:56.960 --> 00:02:02.159
+or to other packages
+
+00:02:02.159 --> 00:02:04.479
+which are owned by somebody else,
+
+00:02:04.479 --> 00:02:13.520
+it should also be helpful. Let's start.
+
+00:02:13.520 --> 00:02:18.640
+First of all, about Emacs and Emacs Lisp.
+
+00:02:18.640 --> 00:02:21.360
+It's not the fastest language.
+
+02:21.360 --> 00:02:26.560
+Let's switch to the notes.
+
+00:02:26.560 --> 00:02:31.440
+I hope the font is big enough
+
+00:02:31.440 --> 00:02:36.479
+to be readable on this video, really hope.
+
+02:36.480 --> 00:02:40.160
+Emacs Lisp is not the fastest of the bunch.
+
+00:02:40.160 --> 00:02:44.400
+The garbage collector creates pauses
+
+00:02:44.400 --> 00:02:48.800
+whenever it needs to sweep data.
+
+02:48.800 --> 00:02:51.760
+The interpreter is not the fastest one,
+
+02:51.760 --> 00:02:54.720
+even though the native compilation branch
+
+00:02:54.720 --> 00:02:59.120
+is improving on that by twice
+
+00:02:59.120 --> 00:03:04.640
+in certain scenarios, which is good.
+
+03:04.640 --> 00:03:10.560
+The standard library or functions
+
+00:03:10.560 --> 00:03:12.800
+for working with collections, for example,
+
+03:12.800 --> 00:03:17.599
+are not so uniformly great.
+
+03:17.599 --> 00:03:20.080
+So there is some work to be done there.
+
+03:20.080 --> 00:03:22.239
+Maybe you can contribute
+
+03:22.239 --> 00:03:26.640
+the next improvement in there.
+
+03:26.640 --> 00:03:33.200
+And also, if your package displays stuff,
+
+00:03:33.200 --> 00:03:35.040
+it has a visual component,
+
+03:35.040 --> 00:03:36.959
+then you might have to deal with
+
+00:03:36.959 --> 00:03:43.120
+some drawbacks of the display engine
+
+03:43.120 --> 00:03:47.760
+which might slow to a crawl
+
+00:03:47.760 --> 00:03:53.680
+when your code has...
+
+00:03:53.680 --> 00:03:56.560
+when you're trying to display lines
+
+00:03:56.560 --> 00:04:00.319
+which are a little too long--
+
+00:04:00.319 --> 00:04:03.120
+trying to print a [long] line, for example--
+
+04:03.120 --> 00:04:07.120
+or you are using a lot of overlays,
+
+04:07.120 --> 00:04:09.120
+if you know what it is.
+
+04:09.120 --> 00:04:12.640
+With lots of overlays
+
+00:04:12.640 --> 00:04:19.839
+on the same visual line, in particular.
+
+04:19.840 --> 00:04:24.160
+But okay. The first thing to understand,
+
+00:04:24.160 --> 00:04:27.680
+and I hope everybody who's done
+
+00:04:27.680 --> 00:04:29.520
+some programming in the past knows,
+
+00:04:29.520 --> 00:04:35.120
+it's: first you write correctly,
+
+00:04:35.120 --> 00:04:37.360
+and then you try to benchmark.
+
+04:37.360 --> 00:04:39.199
+and see where the problems are,
+
+00:04:39.199 --> 00:04:40.720
+if there are any.
+
+04:40.720 --> 00:04:45.919
+So first do it right, then do it fast.
+
+04:45.919 --> 00:04:50.720
+How do we find the hotspots,
+
+00:04:50.720 --> 00:04:52.400
+the bottlenecks on the second step?
+
+00:04:52.400 --> 00:04:54.720
+We try to do some profiling
+
+00:04:54.720 --> 00:04:58.960
+or measuring how long the code takes.
+
+04:58.960 --> 00:05:03.038
+Emacs has two different profilers.
+
+00:05:03.039 --> 00:05:05.440
+One is the native profiler,
+
+05:05.440 --> 00:05:10.639
+which as I recall was contributed by
+
+00:05:10.639 --> 00:05:13.280
+Tomohiro Matsuyama, the author
+
+00:05:13.280 --> 00:05:16.240
+of the autocomplete package
+
+05:16.240 --> 00:05:22.720
+back in Emacs 24.3, apparently.
+
+05:22.720 --> 00:05:24.320
+It's a low overhead profiler.
+
+00:05:24.320 --> 00:05:31.520
+It's sampling, which is good on one hand
+
+05:31.520 --> 00:05:33.919
+but might be less good on the other hand.
+
+05:33.919 --> 00:05:35.199
+Let's try using it.
+
+00:05:35.199 --> 00:05:40.960
+So we start with profiler-start.
+
+00:05:40.960 --> 00:05:44.960
+Let's just ask for CPU profiling here,
+
+00:05:44.960 --> 00:05:48.320
+but it also can profile memory locations.
+
+05:48.320 --> 00:05:56.400
+Let's try a search for some string
+
+00:05:56.400 --> 00:06:01.360
+in the current project.
+
+06:01.360 --> 00:06:04.080
+Let's do it a few times, maybe three times,
+
+00:06:04.080 --> 00:06:09.360
+so the profiler has more data to work with.
+
+06:09.360 --> 00:06:13.680
+Let's call the report.
+
+06:13.680 --> 00:06:16.960
+Okay. So here we have the tree,
+
+00:06:16.960 --> 00:06:19.360
+the execution tree with percentages
+
+00:06:19.360 --> 00:06:22.960
+of the time spent in each function.
+
+06:22.960 --> 00:06:25.840
+You can unwrap it by going up and down
+
+00:06:25.840 --> 00:06:35.039
+and pressing TAB to unwrap every element.
+
+06:35.039 --> 00:06:52.000
+This is weird.
+
+06:52.000 --> 00:06:55.199
+Okay, here we see that the actual command
+
+00:06:55.199 --> 00:06:56.639
+that was called only takes
+
+00:06:56.639 --> 00:06:59.599
+8% of the whole runtime,
+
+06:59.599 --> 00:07:07.759
+meaning the input was...
+
+07:07.759 --> 00:07:10.960
+The command took not enough time
+
+00:07:10.960 --> 00:07:13.360
+for us to really dig into it.
+
+00:07:13.360 --> 00:07:18.960
+Let's try a shorter input with more matches.
+
+00:07:18.960 --> 00:07:25.680
+So profiler-start again, CPU.
+
+07:25.680 --> 00:07:34.240
+Let's search for list.
+
+07:34.240 --> 00:07:40.160
+Don't mind the minibuffer just yet.
+
+07:40.160 --> 00:07:47.680
+Okay. So let's look at the report.
+
+07:47.680 --> 00:07:52.000
+We can unwrap it here, and we see
+
+00:07:52.000 --> 00:07:55.759
+52% of the time was spent doing this,
+
+07:55.759 --> 00:07:58.240
+at least according to the profile.
+
+00:07:58.240 --> 00:08:00.960
+We can unwrap it, see the description
+
+00:08:00.960 --> 00:08:02.800
+of any function that was called
+
+00:08:02.800 --> 00:08:08.479
+by hitting d, or even jump to it
+
+00:08:08.479 --> 00:08:12.960
+by tapping f.
+
+08:12.960 --> 00:08:16.160
+By going down the stream,
+
+08:16.160 --> 00:08:18.240
+we unwrap it with TAB,
+
+08:18.240 --> 00:08:22.400
+you can see where time was spent.
+
+08:22.400 --> 00:08:25.199
+One of the bigger drawbacks
+
+00:08:25.199 --> 00:08:30.639
+of this profiler is the arithmetics
+
+08:30.639 --> 00:08:39.200
+don't always work out,
+
+00:08:39.200 --> 00:08:45.760
+like you might have...
+
+08:45.760 --> 00:08:47.519
+This is not a good example,
+
+00:08:47.519 --> 00:08:52.640
+but okay, you have 14% spent in here,
+
+00:08:52.640 --> 00:08:57.200
+but when we expand this entry,
+
+00:08:57.200 --> 00:09:01.760
+we only see like 6%, 3%, and 0%.
+
+00:09:01.760 --> 00:09:06.640
+Different sum, sometimes even bigger than that.
+
+00:09:06.640 --> 00:09:10.800
+So the native profiler
+
+00:09:10.800 --> 00:09:13.200
+can give an accurate picture
+
+00:09:13.200 --> 00:09:15.920
+and it has little overhead,
+
+00:09:15.920 --> 00:09:20.399
+but the specific numbers
+
+00:09:20.399 --> 00:09:22.959
+are not very precise
+
+09:22.959 --> 00:09:28.640
+because the principle is probabilistic.
+
+09:28.640 --> 00:09:31.199
+Let's stop here.
+
+09:31.200 --> 00:09:36.959
+There is another package called elp,
+
+09:36.959 --> 00:09:39.360
+Emacs Lisp Profiler,
+
+00:09:39.360 --> 00:09:43.440
+which is much older than that.
+
+09:43.440 --> 00:09:47.920
+It allows us to instrument
+
+09:47.920 --> 00:09:53.600
+just specific functions or a package.
+
+09:53.600 --> 00:09:57.680
+We're instrumenting the xref package here.
+
+09:57.680 --> 00:10:01.360
+It works through advice. You can see
+
+00:10:01.360 --> 00:10:03.279
+the description of one of the functions
+
+00:10:03.279 --> 00:10:04.320
+in the package, and we see
+
+00:10:04.320 --> 00:10:12.640
+that it has had :around device added.
+
+10:12.640 --> 00:10:18.000
+If we run the same search,
+
+10:18.000 --> 00:10:21.360
+we can see that -- when it finishes --
+
+00:10:21.360 --> 00:10:30.399
+the table of all the numbers,
+
+10:30.399 --> 00:10:32.640
+that every function in the package
+
+00:10:32.640 --> 00:10:41.680
+has been called, and the times
+
+00:10:41.680 --> 00:10:48.000
+which the runtime has spent inside of them.
+
+10:48.000 --> 00:10:55.040
+sorted by impact, as it understands that.
+
+10:55.040 --> 00:11:00.079
+The main problem with this profiler is
+
+00:11:00.079 --> 00:11:04.160
+it is slower because it adds overhead
+
+00:11:04.160 --> 00:11:06.880
+to every function call,
+
+11:06.880 --> 00:11:11.519
+so it, in the end, might give an impression
+
+00:11:11.519 --> 00:11:19.519
+that functions with lots of calls,
+
+11:19.519 --> 00:11:21.839
+which have been called a lot of times,
+
+11:21.839 --> 00:11:26.880
+are more important,
+
+11:26.880 --> 00:11:29.279
+hotter than they actually are,
+
+00:11:29.279 --> 00:11:30.959
+because it slows down
+
+00:11:30.959 --> 00:11:32.640
+every such function call,
+
+11:32.640 --> 00:11:35.279
+including the subsequent calls
+
+00:11:35.279 --> 00:11:38.160
+that these functions do
+
+00:11:38.160 --> 00:11:40.560
+inside the same package.
+
+11:40.560 --> 00:11:48.240
+So it's a good way to analyze and drill down
+
+11:48.240 --> 00:11:50.320
+to see which functions here
+
+00:11:50.320 --> 00:11:51.200
+take a lot of time,
+
+00:11:51.200 --> 00:11:53.839
+but just keep in mind
+
+00:11:53.839 --> 00:11:57.760
+that sometimes you might end up
+
+00:11:57.760 --> 00:12:00.079
+trying to optimize one of these
+
+00:12:00.079 --> 00:12:01.600
+smaller functions called
+
+00:12:01.600 --> 00:12:09.279
+or usually smaller, and that the result
+
+00:12:09.279 --> 00:12:11.040
+might not actually affect
+
+00:12:11.040 --> 00:12:18.720
+the production runtime at all.
+
+12:18.720 --> 00:12:22.240
+But it's still a good tool,
+
+00:12:22.240 --> 00:12:25.440
+especially when you already know
+
+12:25.440 --> 00:12:30.560
+which set of actions you are interested in,
+
+00:12:30.560 --> 00:12:33.760
+and which functions might be slower.
+
+12:33.760 --> 00:12:37.120
+elp allows you to instrument a package
+
+00:12:37.120 --> 00:12:42.720
+or a set of functions. Just don't forget
+
+00:12:42.720 --> 00:12:49.600
+to un-instrument them all afterwards,
+
+00:12:49.600 --> 00:12:52.000
+or else your session,
+
+00:12:52.000 --> 00:12:55.920
+your subsequent optimization efforts
+
+00:12:55.920 --> 00:12:59.360
+might not work as well as you might
+
+12:59.360 --> 00:13:01.359
+want to work.
+
+13:01.360 --> 00:13:04.560
+And there's also this very nice,
+
+00:13:04.560 --> 00:13:09.600
+very handy package called benchmark,
+
+13:09.600 --> 00:13:12.480
+which unfortunately
+
+00:13:12.480 --> 00:13:16.000
+not everybody knows about.
+
+13:16.000 --> 00:13:21.519
+It turns out that a lot of
+
+00:13:21.519 --> 00:13:30.079
+older Emacs Lisp developers,
+
+13:30.079 --> 00:13:34.160
+users, package developers usually have
+
+00:13:34.160 --> 00:13:38.000
+some benchmarking macros of their own.
+
+00:13:38.000 --> 00:13:42.880
+But there is this package
+
+00:13:42.880 --> 00:13:46.160
+with perfectly usable interface
+
+00:13:46.160 --> 00:13:49.199
+which doesn't require you
+
+00:13:49.199 --> 00:13:51.120
+to define something else,
+
+00:13:51.120 --> 00:13:52.560
+especially when you are in
+
+00:13:52.560 --> 00:13:57.120
+an emacs -Q session
+
+13:57.120 --> 00:14:00.480
+trying to benchmark your code
+
+00:14:00.480 --> 00:14:05.440
+in a bare Emacs. So it has
+
+00:14:05.440 --> 00:14:09.680
+two main endpoints, I would say.
+
+14:09.680 --> 00:14:14.480
+First one is the benchmark macro,
+
+14:14.480 --> 00:14:19.199
+and the second one is benchmark-progn.
+
+00:14:19.199 --> 00:14:20.399
+benchmark is a function,
+
+00:14:20.399 --> 00:14:25.040
+and benchmark-progn is a macro.
+
+14:25.040 --> 00:14:27.839
+The first one you can use by specifying
+
+00:14:27.839 --> 00:14:30.480
+the number of iterations and the form.
+
+00:14:30.480 --> 00:14:36.800
+I hope the minibuffer is easy to read here.
+
+14:36.800 --> 00:14:43.360
+For instance, we can take this long list
+
+00:14:43.360 --> 00:14:44.800
+and try nreverse-ing it,
+
+00:14:44.800 --> 00:14:49.680
+and we see how long that takes.
+
+14:49.680 --> 00:14:54.160
+Then you can do it,
+
+14:54.160 --> 00:14:55.600
+adjust the inner code,
+
+00:14:55.600 --> 00:14:58.079
+and then basically compare
+
+00:14:58.079 --> 00:14:59.519
+to figure out how much
+
+00:14:59.519 --> 00:15:01.440
+and how long nreverse takes
+
+00:15:01.440 --> 00:15:03.839
+in this scenario.
+
+15:03.839 --> 00:15:07.760
+Or, we can take a function
+
+00:15:07.760 --> 00:15:10.880
+which we have probably found
+
+15:10.880 --> 00:15:12.880
+using one of the previous methods
+
+00:15:12.880 --> 00:15:17.279
+which we anticipate that,
+
+15:17.279 --> 00:15:19.440
+as we understand, it takes a while,
+
+15:19.440 --> 00:15:26.720
+and annotate it with a benchmark-progn form,
+
+00:15:26.720 --> 00:15:33.360
+which... just execute the body
+
+00:15:33.360 --> 00:15:36.880
+and report, each and every one of them,
+
+15:36.880 --> 00:15:41.360
+how long that body execution took.
+
+15:41.360 --> 00:15:45.360
+So for instance, here we added
+
+00:15:45.360 --> 00:15:46.959
+a few message calls
+
+00:15:46.959 --> 00:15:49.040
+inside those benchmark-progns,
+
+00:15:49.040 --> 00:15:53.199
+so we can see how long each part
+
+00:15:53.199 --> 00:16:03.600
+of the function xref-matches-in-files
+
+00:16:03.600 --> 00:16:06.320
+takes when it is run.
+
+16:06.320 --> 00:16:11.839
+Let's try it. Let's first call
+
+16:11.839 --> 00:16:17.199
+emacs-lisp-byte-compile-and-load,
+
+16:17.199 --> 00:16:20.720
+so that we're sure that we are running
+
+16:20.720 --> 00:16:24.079
+the fastest possible version of this code,
+
+16:24.079 --> 00:16:29.519
+so when we do find the bottlenecks,
+
+00:16:29.519 --> 00:16:31.279
+we are sure that they are real
+
+00:16:31.279 --> 00:16:36.240
+and not because of the uncompiled code,
+
+00:16:36.240 --> 00:16:43.519
+macro expansion, or the lack of
+
+00:16:43.519 --> 00:16:46.160
+some substitutions, other substitutions
+
+00:16:46.160 --> 00:16:50.800
+that our code is relying on
+
+16:50.800 --> 00:17:01.279
+but which might not be available
+
+17:01.279 --> 00:17:03.279
+in the interpreter mode
+
+00:17:03.279 --> 00:17:07.039
+just in the compiled code.
+
+17:07.039 --> 00:17:10.160
+Let's run.
+
+17:10.160 --> 00:17:14.880
+So we have this list,
+
+17:14.880 --> 00:17:26.160
+search for list,
+
+17:26.160 --> 00:17:29.760
+and the remaining time is spent during
+
+00:17:29.760 --> 00:17:36.720
+printing of the results, which we didn't
+
+17:36.720 --> 00:17:38.320
+annotate with the benchmarking macro,
+
+00:17:38.320 --> 00:17:41.280
+but it's still there.
+
+17:41.280 --> 00:17:43.679
+So we can see by switching to the
+
+17:43.679 --> 00:17:49.919
+*Messages* buffer, C-h e,
+
+17:49.919 --> 00:17:52.080
+that unquoting was very fast.
+
+00:17:52.080 --> 00:17:59.280
+So the search took 400 milliseconds.
+
+00:17:59.280 --> 00:18:02.480
+It's slower than it would be
+
+18:02.480 --> 00:18:06.080
+without the video being recorded, as well
+
+00:18:06.080 --> 00:18:09.039
+as the rest of the measurements here.
+
+18:09.039 --> 00:18:12.160
+So the parsing of the results took more
+
+00:18:12.160 --> 00:18:16.559
+than that, and the object allocation
+
+18:16.559 --> 00:18:21.919
+took even more, which is unfortunate
+
+00:18:21.919 --> 00:18:23.919
+but it's the reality
+
+00:18:23.919 --> 00:18:26.400
+when we're dealing with a lot of
+
+00:18:26.400 --> 00:18:29.840
+search results, because, well,
+
+18:29.840 --> 00:18:33.520
+Emacs Lisp is slower than grep.
+
+18:34.559 --> 00:18:49.200
+That's just a given.
+
+18:49.200 --> 00:18:54.400
+What can be done and what had been done
+
+00:18:54.400 --> 00:18:57.120
+to improve on this?
+
+18:57.120 --> 00:19:04.240
+Well, first of all,
+
+19:04.240 --> 00:19:06.000
+let's change the question,
+
+00:19:06.000 --> 00:19:07.360
+because this is more of
+
+00:19:07.360 --> 00:19:09.760
+a retrospective sort of talk
+
+19:09.760 --> 00:19:13.439
+rather than talking about future plans.
+
+00:19:13.440 --> 00:19:19.679
+What can one do to improve performance?
+
+19:19.679 --> 00:19:24.160
+Well, basically, two things:
+
+00:19:24.160 --> 00:19:27.919
+first, you try to make sure
+
+00:19:27.919 --> 00:19:30.559
+that you're writing less code,
+
+00:19:30.559 --> 00:19:37.200
+then your code does fewer things,
+
+19:37.200 --> 00:19:40.400
+Fewer iterations of your operations.
+
+19:40.400 --> 00:19:42.000
+Basically, it's the realm
+
+00:19:42.000 --> 00:19:46.880
+of choosing your algorithm well.
+
+19:46.880 --> 00:19:54.880
+But make sure it's as little complexity
+
+00:19:54.880 --> 00:20:00.239
+as you can manage reasonably.
+
+20:00.240 --> 00:20:05.760
+Another is to try to reduce memory allocations.
+
+20:05.760 --> 00:20:13.760
+It's creating conses, the links of the list,
+
+20:13.760 --> 00:20:17.919
+of the lists. If you are mapping through
+
+00:20:17.919 --> 00:20:19.200
+a list, creating a new one,
+
+00:20:19.200 --> 00:20:22.640
+you are creating conses.
+
+00:20:22.640 --> 00:20:26.159
+New objects in memory, anyway.
+
+20:26.159 --> 00:20:30.480
+And also, when you call string operations
+
+00:20:30.480 --> 00:20:33.679
+like substring, when you create new strings,
+
+20:33.679 --> 00:20:36.640
+that's also what you do.
+
+20:36.640 --> 00:20:39.200
+When you allocate new memory,
+
+00:20:39.200 --> 00:20:43.600
+that triggers garbage collections later,
+
+00:20:43.600 --> 00:20:49.200
+which affect the performance of your code
+
+20:49.200 --> 00:21:01.600
+quite severely. I have found that actually,
+
+00:21:01.600 --> 00:21:04.960
+contrary to my personal intuition,
+
+21:04.960 --> 00:21:08.240
+when you try to fine-tune the code
+
+00:21:08.240 --> 00:21:13.120
+working on long lists of elements,
+
+00:21:13.120 --> 00:21:15.520
+long, long collection, large collections,
+
+00:21:15.520 --> 00:21:20.159
+it's even more important to try to avoid
+
+21:20.159 --> 00:21:23.200
+or reduce memory allocations
+
+21:23.200 --> 00:21:26.400
+rather than try to ensure
+
+00:21:26.400 --> 00:21:32.240
+that less code is running,
+
+21:32.240 --> 00:21:34.960
+less operations are performed,
+
+21:34.960 --> 00:21:38.080
+because the garbage collector
+
+21:38.080 --> 00:21:44.799
+can hit you in the posterior quite suddenly
+
+00:21:44.799 --> 00:21:49.520
+that you will not always...
+
+21:49.520 --> 00:21:51.760
+When you measure the input impact,
+
+00:21:51.760 --> 00:21:56.799
+you might not always measure the whole of it.
+
+21:56.799 --> 00:22:02.640
+And sometimes even when you have
+
+00:22:02.640 --> 00:22:06.559
+a few operations, you can measure
+
+22:06.559 --> 00:22:09.520
+all three of them, but at some point,
+
+00:22:09.520 --> 00:22:11.919
+if you just reduce a lot,
+
+22:11.919 --> 00:22:14.720
+remove most of the allocations,
+
+00:22:14.720 --> 00:22:16.799
+all three of them, the improvement
+
+00:22:16.799 --> 00:22:21.520
+might be even bigger than the total
+
+22:21.520 --> 00:22:23.440
+of three improvements
+
+00:22:23.440 --> 00:22:27.200
+which you might have measured previously,
+
+22:27.200 --> 00:22:29.679
+like separately.
+
+22:29.679 --> 00:22:33.039
+So it's something to be on the lookout for,
+
+00:22:33.039 --> 00:22:39.520
+but of course, when you pick the algorithm,
+
+00:22:39.520 --> 00:22:45.200
+it's important not to do
+
+00:22:45.200 --> 00:22:52.158
+more operations than you have to.
+
+22:52.159 --> 00:22:56.000
+We have examples of both
+
+00:22:56.000 --> 00:22:58.720
+in the recent changes
+
+00:22:58.720 --> 00:23:02.559
+to the xref and project packages,
+
+23:02.559 --> 00:23:06.960
+which we can examine here.
+
+23:06.960 --> 00:23:11.200
+Let's take a look at this one.
+
+23:11.200 --> 00:23:14.880
+This commit message lies a little bit
+
+23:14.880 --> 00:23:19.039
+because it's referring to
+
+00:23:19.039 --> 00:23:24.559
+the use of assoc instead of cl-assoc,
+
+23:24.559 --> 00:23:29.919
+and the actual change incorporates that,
+
+00:23:29.919 --> 00:23:33.440
+but it also incorporates
+
+00:23:33.440 --> 00:23:39.440
+the second part of the sentence
+
+00:23:39.440 --> 00:23:41.360
+which actually replaced
+
+00:23:41.360 --> 00:23:46.960
+the use of assoc in there.
+
+23:46.960 --> 00:23:50.480
+Curiously, cl-assoc was pretty slow
+
+00:23:50.480 --> 00:23:52.559
+because not only it created
+
+00:23:52.559 --> 00:23:57.919
+quadratic complexity in the operation
+
+23:57.919 --> 00:23:59.919
+and it was also calling
+
+00:23:59.919 --> 00:24:02.880
+the Lisp function [equal]
+
+24:02.880 --> 00:24:04.400
+for every iteration,
+
+00:24:04.400 --> 00:24:08.880
+whereas if we just use assoc there,
+
+24:08.880 --> 00:24:10.080
+which was the first version
+
+00:24:10.080 --> 00:24:13.919
+of this change, of the improvement,
+
+24:13.919 --> 00:24:15.760
+it became already much faster,
+
+00:24:15.760 --> 00:24:20.640
+but then switching to a hash table
+
+24:20.640 --> 00:24:28.080
+which turned this lookup
+
+00:24:28.080 --> 00:24:31.760
+from O(n) complexity
+
+24:31.760 --> 00:24:35.600
+into, well, amortized constant one,
+
+24:35.600 --> 00:24:37.600
+even better.
+
+24:37.600 --> 00:24:45.679
+So, use hash tables, kids.
+
+24:45.679 --> 00:24:52.400
+Another commit here is about using
+
+00:24:52.400 --> 00:24:55.520
+the inhibit-modification-hooks.
+
+00:24:55.520 --> 00:24:58.000
+So, turns out when you're printing
+
+00:24:58.000 --> 00:25:01.679
+into a buffer, even if you have already
+
+25:01.679 --> 00:25:06.159
+disabled the undo history,
+
+25:06.159 --> 00:25:08.480
+binding this variable to
+
+25:08.480 --> 00:25:10.880
+a non-null value is pretty good
+
+00:25:10.880 --> 00:25:15.520
+because you are able to avoid running
+
+25:15.520 --> 00:25:21.120
+a number of hooks, which improves performance.
+
+25:21.120 --> 00:25:28.640
+Next. This one was about moving the
+
+00:25:28.640 --> 00:25:32.000
+file-remote-p call
+
+25:32.000 --> 00:25:37.279
+from inside the loop.
+
+25:37.279 --> 00:25:42.039
+This function is actually
+
+00:25:42.039 --> 00:25:49.039
+surprisingly slow-ish for the goal,
+
+25:49.039 --> 00:25:52.000
+for its purpose, so you don't really want
+
+25:52.000 --> 00:25:56.159
+to call it on every file name in the list
+
+00:25:56.159 --> 00:25:59.520
+when you have a lot of them,
+
+25:59.520 --> 00:26:02.000
+and especially if your code
+
+00:26:02.000 --> 00:26:04.640
+is running in a buffer
+
+00:26:04.640 --> 00:26:10.640
+visiting a remote file, like through TRAMP.
+
+26:10.640 --> 00:26:15.120
+You might end up trying to
+
+00:26:15.120 --> 00:26:16.880
+devise different approaches
+
+00:26:16.880 --> 00:26:20.159
+to avoid checking whether
+
+00:26:20.159 --> 00:26:23.279
+every file name is remote
+
+26:23.279 --> 00:26:25.360
+if you're dealing with a list of file names,
+
+00:26:25.360 --> 00:26:34.640
+so this one, take a look, be careful with it.
+
+26:34.640 --> 00:26:43.120
+A similar, slower function, with-current-buffer,
+
+26:43.120 --> 00:26:46.880
+but not so much. So it all depends on
+
+26:46.880 --> 00:26:48.400
+really how often you call it.
+
+00:26:48.400 --> 00:26:51.279
+Sometimes you might want to
+
+00:26:51.279 --> 00:26:52.720
+rewrite your code so that
+
+00:26:52.720 --> 00:26:54.720
+it's called less often.
+
+26:54.720 --> 00:26:59.760
+And expand-file-name, which hits the disk.
+
+26:59.760 --> 00:27:01.200
+So if you're dealing with file names,
+
+00:27:01.200 --> 00:27:02.960
+you might want to replace it
+
+00:27:02.960 --> 00:27:07.520
+with concatenation at certain points.
+
+27:07.520 --> 00:27:22.159
+Okay, back to these changes later.
+
+27:22.159 --> 00:27:28.480
+This one just removed a location of a cons
+
+27:28.480 --> 00:27:33.679
+per match hit, and still it brought 5%
+
+27:33.679 --> 00:27:37.039
+or something like that improvement
+
+00:27:37.039 --> 00:27:41.120
+to the whole command.
+
+27:41.120 --> 00:27:46.000
+So that's a pretty significant improvement
+
+00:27:46.000 --> 00:27:53.600
+for a small change like that.
+
+27:53.600 --> 00:28:01.679
+Similarly, here we just made sure
+
+00:28:01.679 --> 00:28:09.520
+to avoid a splits... no, a substring call,
+
+28:09.520 --> 00:28:12.399
+and probably an allocation of the whole
+
+28:12.399 --> 00:28:16.320
+buffer string, but in my testing,
+
+00:28:16.320 --> 00:28:19.440
+that doesn't actually matter much,
+
+00:28:19.440 --> 00:28:22.399
+at least so much, but a substring call
+
+00:28:22.399 --> 00:28:28.960
+per result... If we see,
+
+00:28:28.960 --> 00:28:33.039
+since we changed to manual parsing
+
+00:28:33.039 --> 00:28:37.440
+of the buffer, with the strings
+
+00:28:37.440 --> 00:28:41.440
+delimited with zero bytes,
+
+28:41.440 --> 00:28:43.840
+it gave an overall improvement
+
+00:28:43.840 --> 00:28:47.440
+of 20%, again on my machine
+
+00:28:47.440 --> 00:28:50.720
+with a pretty fast SSD,
+
+00:28:50.720 --> 00:28:53.679
+and with a warm disk cache, of course.
+
+28:53.679 --> 00:29:05.360
+But still... Going back to this revision,
+
+00:29:05.360 --> 00:29:09.440
+it was actually quite surprising
+
+00:29:09.440 --> 00:29:15.200
+that migration to a cl-defstruct
+
+29:15.200 --> 00:29:24.080
+from eieio, the Common Lisp-inspired
+
+00:29:24.080 --> 00:29:26.480
+object system first introduced
+
+00:29:26.480 --> 00:29:34.240
+in the CEDET tools,
+
+00:29:34.240 --> 00:29:39.360
+that was a bit of a surprise,
+
+00:29:39.360 --> 00:29:44.799
+because not much of my
+
+00:29:44.799 --> 00:29:46.080
+benchmark benchmarking
+
+00:29:46.080 --> 00:29:50.960
+actually pointed at it being the problem.
+
+29:50.960 --> 00:29:55.760
+Probably because the accessors
+
+00:29:55.760 --> 00:29:57.520
+were not the actual problem,
+
+29:57.520 --> 00:30:00.399
+like the oref macros
+
+00:30:00.399 --> 00:30:06.960
+and the code accessing the slots,
+
+30:06.960 --> 00:30:10.720
+but the construction,
+
+00:30:10.720 --> 00:30:14.320
+the object construction code,
+
+30:14.320 --> 00:30:16.559
+that was where most of the time
+
+00:30:16.559 --> 00:30:21.200
+was spent unnecessarily,
+
+30:21.200 --> 00:30:24.240
+maybe doing type-checking,
+
+30:24.240 --> 00:30:28.880
+maybe some other stuff.
+
+30:28.880 --> 00:30:36.080
+So if you have lots of values,
+
+30:36.080 --> 00:30:39.120
+you need to treat like objects in... and
+
+00:30:39.120 --> 00:30:42.000
+virtual dispatch on them in your package,
+
+00:30:42.000 --> 00:30:45.200
+you might want to look into
+
+30:45.200 --> 00:30:52.239
+cl-defstruct for them.
+
+30:52.240 --> 00:30:54.080
+Going on to the next section,
+
+00:30:54.080 --> 00:31:01.200
+I have prepared the sort of comparison
+
+31:01.200 --> 00:31:05.519
+between cl-lib, dash, and seq,
+
+00:31:05.519 --> 00:31:09.360
+the collection libraries we have
+
+31:09.360 --> 00:31:11.279
+available for us in Emacs.
+
+00:31:11.279 --> 00:31:15.760
+That is the popular ones,
+
+31:15.760 --> 00:31:21.440
+but since I'm running behind on time,
+
+00:31:21.440 --> 00:31:27.760
+I'll probably just summarize the findings.
+
+31:27.760 --> 00:31:31.919
+First of all, seq is nice.
+
+00:31:31.919 --> 00:31:38.480
+Its generic approach is probably
+
+31:38.480 --> 00:31:41.919
+quite decent for most of the situations,
+
+31:41.919 --> 00:31:49.840
+but there are places where it could be
+
+31:49.840 --> 00:31:53.679
+optimized better, so instead of having
+
+31:53.679 --> 00:31:56.399
+quadratic performance, it could use a
+
+31:56.399 --> 00:31:59.200
+hash table, like for instance,
+
+31:59.200 --> 00:32:02.000
+dash does here--
+
+32:02.000 --> 00:32:06.240
+in dash, union or delete-dups
+
+00:32:06.240 --> 00:32:12.960
+does in its implementation.
+
+32:12.960 --> 00:32:16.640
+The dash itself is curiously fast,
+
+00:32:16.640 --> 00:32:20.000
+at least faster than I might have expected,
+
+32:20.000 --> 00:32:21.600
+possibly because of
+
+00:32:21.600 --> 00:32:25.120
+the implementation approach
+
+00:32:25.120 --> 00:32:33.200
+where it uses code generation
+
+00:32:33.200 --> 00:32:35.840
+to avoid function calls,
+
+32:35.840 --> 00:32:37.840
+at least some of them,
+
+32:37.840 --> 00:32:41.120
+which is interesting.
+
+32:41.120 --> 00:32:45.600
+But since both seq and dash
+
+00:32:45.600 --> 00:32:49.919
+avoid mutations--they don't really have
+
+32:49.919 --> 00:32:52.399
+mutating counterparts to common functions
+
+00:32:52.399 --> 00:32:56.480
+like you have with cl-remove-if, cl-delete-if,
+
+32:56.480 --> 00:33:02.000
+or just cl-remove, cl-delete,
+
+33:02.000 --> 00:33:08.000
+it still can be valuable to look into
+
+00:33:08.000 --> 00:33:12.240
+destructive versions of those functions,
+
+33:12.240 --> 00:33:16.880
+something from the core library
+
+00:33:16.880 --> 00:33:22.559
+like delete-dups or nreverse,
+
+33:22.559 --> 00:33:24.399
+for your code when you're really trying
+
+33:24.399 --> 00:33:29.519
+to get as close to the metal
+
+00:33:29.519 --> 00:33:34.320
+or whatever as you can,
+
+00:33:34.320 --> 00:33:40.080
+because avoiding extra allocations,
+
+00:33:40.080 --> 00:33:43.200
+it can really be useful.
+
+33:43.200 --> 00:33:46.159
+You can really improve their performance
+
+33:46.159 --> 00:33:54.080
+if you don't do a lot of other stuff.
+
+33:54.080 --> 00:34:00.080
+delete-consecutive-dups is blazing faster.
+
+34:00.080 --> 00:34:03.919
+It only requires pre-sorted strings.
+
+34:03.919 --> 00:34:08.399
+What else to say...
+
+34:08.399 --> 00:34:13.200
+If you are going to read these measurements,
+
+34:13.200 --> 00:34:15.359
+make sure to keep in mind
+
+00:34:15.359 --> 00:34:18.000
+that reverse is not free,
+
+34:18.000 --> 00:34:20.240
+so for instance, if we're looking
+
+00:34:20.240 --> 00:34:22.480
+at this comparison
+
+00:34:22.480 --> 00:34:26.399
+between remove and delete, for instance,
+
+34:26.399 --> 00:34:27.919
+they're using reverse
+
+00:34:27.919 --> 00:34:32.159
+to avoid modifying the data,
+
+34:32.159 --> 00:34:34.800
+the sample data, so we don't have to
+
+34:34.800 --> 00:34:36.720
+create it every time.
+
+34:36.720 --> 00:34:41.919
+But to compare how much faster
+
+00:34:41.919 --> 00:34:43.760
+delete is than remove,
+
+34:43.760 --> 00:34:50.800
+we need to subtract 787 milliseconds
+
+34:50.800 --> 00:34:52.320
+from here and from here
+
+00:34:52.320 --> 00:34:58.480
+so it comes out to like 230 milliseconds
+
+00:34:58.480 --> 00:35:02.560
+in this example, the last example,
+
+00:35:02.560 --> 00:35:12.000
+and 100 to 1 second, 250 milliseconds here,
+
+00:35:12.000 --> 00:35:17.599
+so the difference is 5-fold here.
+
+35:17.599 --> 00:35:20.160
+Not 2-fold.
+
+35:20.160 --> 00:35:26.480
+All right. With this, I'm going to
+
+35:26.480 --> 00:35:29.520
+thank you for listening, for watching
+
+35:29.520 --> 00:35:31.920
+and I'll be taking questions.
+
+35:31.920 --> 00:35:32.920
+Thank you.
+
+00:35:32.920 --> 00:35:35.160
+[captions by sachac]