WEBVTT
00:01.120 --> 00:00:04.640
Hi. Greetings from the cloudy St. Petersburg.
00:04.640 --> 00:00:06.080
My name is Dmitry Gutov.
00:00:06.080 --> 00:00:09.280
I write Ruby by day, Emacs Lisp by night
00:00:09.280 --> 00:00:12.080
when I don't do anything else.
00:12.080 --> 00:00:14.559
You might know my work
00:00:14.559 --> 00:00:18.160
from a number of third-party packages
00:00:18.160 --> 00:00:22.240
as well as built-in ones.
00:22.240 --> 00:00:25.599
The idea for this talk came out from
00:25.599 --> 00:00:26.800
an improvement request
00:00:26.800 --> 00:00:32.960
for the performance of grep-like commands
00:32.960 --> 00:00:35.200
using the xref interface
00:00:35.200 --> 00:00:43.280
and its storage types, container types
00:00:43.280 --> 00:00:47.840
for the search results,
00:47.840 --> 00:00:51.440
complaining that it's not as fast
00:00:51.440 --> 00:00:54.320
as it potentially could have been
00:54.320 --> 00:00:58.239
when there are lots of search results
00:00:58.239 --> 00:01:01.600
coming for a given search.
01:01.600 --> 00:01:06.159
I have noticed myself that
00:01:06.159 --> 00:01:10.159
when working on it, and probably before.
00:01:10.159 --> 00:01:16.560
My approach to optimizing Lisp code
01:16.560 --> 00:01:21.680
has changed recently-ish,
01:21.680 --> 00:01:26.960
and I'd like to talk about it here.
01:26.960 --> 00:01:34.079
This talk is for people who already
01:34.079 --> 00:01:37.040
know how to write some Emacs Lisp
00:01:37.040 --> 00:01:38.880
to solve their problems
00:01:38.880 --> 00:01:43.360
and who have possibly encountered
00:01:43.360 --> 00:01:49.200
some performance issues doing that.
01:49.200 --> 00:01:51.600
Also, if you want to contribute
00:01:51.600 --> 00:01:54.799
some improvements to the code
00:01:54.799 --> 00:01:56.960
that is already in Emacs
00:01:56.960 --> 00:02:02.159
or to other packages
00:02:02.159 --> 00:02:04.479
which are owned by somebody else,
00:02:04.479 --> 00:02:13.520
it should also be helpful. Let's start.
00:02:13.520 --> 00:02:18.640
First of all, about Emacs and Emacs Lisp.
00:02:18.640 --> 00:02:21.360
It's not the fastest language.
02:21.360 --> 00:02:26.560
Let's switch to the notes.
00:02:26.560 --> 00:02:31.440
I hope the font is big enough
00:02:31.440 --> 00:02:36.479
to be readable on this video, really hope.
02:36.480 --> 00:02:40.160
Emacs Lisp is not the fastest of the bunch.
00:02:40.160 --> 00:02:44.400
The garbage collector creates pauses
00:02:44.400 --> 00:02:48.800
whenever it needs to sweep data.
02:48.800 --> 00:02:51.760
The interpreter is not the fastest one,
02:51.760 --> 00:02:54.720
even though the native compilation branch
00:02:54.720 --> 00:02:59.120
is improving on that by twice
00:02:59.120 --> 00:03:04.640
in certain scenarios, which is good.
03:04.640 --> 00:03:10.560
The standard library or functions
00:03:10.560 --> 00:03:12.800
for working with collections, for example,
03:12.800 --> 00:03:17.599
are not so uniformly great.
03:17.599 --> 00:03:20.080
So there is some work to be done there.
03:20.080 --> 00:03:22.239
Maybe you can contribute
03:22.239 --> 00:03:26.640
the next improvement in there.
03:26.640 --> 00:03:33.200
And also, if your package displays stuff,
00:03:33.200 --> 00:03:35.040
it has a visual component,
03:35.040 --> 00:03:36.959
then you might have to deal with
00:03:36.959 --> 00:03:43.120
some drawbacks of the display engine
03:43.120 --> 00:03:47.760
which might slow to a crawl
00:03:47.760 --> 00:03:53.680
when your code has...
00:03:53.680 --> 00:03:56.560
when you're trying to display lines
00:03:56.560 --> 00:04:00.319
which are a little too long--
00:04:00.319 --> 00:04:03.120
trying to print a [long] line, for example--
04:03.120 --> 00:04:07.120
or you are using a lot of overlays,
04:07.120 --> 00:04:09.120
if you know what it is.
04:09.120 --> 00:04:12.640
With lots of overlays
00:04:12.640 --> 00:04:19.839
on the same visual line, in particular.
04:19.840 --> 00:04:24.160
But okay. The first thing to understand,
00:04:24.160 --> 00:04:27.680
and I hope everybody who's done
00:04:27.680 --> 00:04:29.520
some programming in the past knows,
00:04:29.520 --> 00:04:35.120
it's: first you write correctly,
00:04:35.120 --> 00:04:37.360
and then you try to benchmark.
04:37.360 --> 00:04:39.199
and see where the problems are,
00:04:39.199 --> 00:04:40.720
if there are any.
04:40.720 --> 00:04:45.919
So first do it right, then do it fast.
04:45.919 --> 00:04:50.720
How do we find the hotspots,
00:04:50.720 --> 00:04:52.400
the bottlenecks on the second step?
00:04:52.400 --> 00:04:54.720
We try to do some profiling
00:04:54.720 --> 00:04:58.960
or measuring how long the code takes.
04:58.960 --> 00:05:03.038
Emacs has two different profilers.
00:05:03.039 --> 00:05:05.440
One is the native profiler,
05:05.440 --> 00:05:10.639
which as I recall was contributed by
00:05:10.639 --> 00:05:13.280
Tomohiro Matsuyama, the author
00:05:13.280 --> 00:05:16.240
of the autocomplete package
05:16.240 --> 00:05:22.720
back in Emacs 24.3, apparently.
05:22.720 --> 00:05:24.320
It's a low overhead profiler.
00:05:24.320 --> 00:05:31.520
It's sampling, which is good on one hand
05:31.520 --> 00:05:33.919
but might be less good on the other hand.
05:33.919 --> 00:05:35.199
Let's try using it.
00:05:35.199 --> 00:05:40.960
So we start with profiler-start.
00:05:40.960 --> 00:05:44.960
Let's just ask for CPU profiling here,
00:05:44.960 --> 00:05:48.320
but it also can profile memory locations.
05:48.320 --> 00:05:56.400
Let's try a search for some string
00:05:56.400 --> 00:06:01.360
in the current project.
06:01.360 --> 00:06:04.080
Let's do it a few times, maybe three times,
00:06:04.080 --> 00:06:09.360
so the profiler has more data to work with.
06:09.360 --> 00:06:13.680
Let's call the report.
06:13.680 --> 00:06:16.960
Okay. So here we have the tree,
00:06:16.960 --> 00:06:19.360
the execution tree with percentages
00:06:19.360 --> 00:06:22.960
of the time spent in each function.
06:22.960 --> 00:06:25.840
You can unwrap it by going up and down
00:06:25.840 --> 00:06:35.039
and pressing TAB to unwrap every element.
06:35.039 --> 00:06:52.000
This is weird.
06:52.000 --> 00:06:55.199
Okay, here we see that the actual command
00:06:55.199 --> 00:06:56.639
that was called only takes
00:06:56.639 --> 00:06:59.599
8% of the whole runtime,
06:59.599 --> 00:07:07.759
meaning the input was...
07:07.759 --> 00:07:10.960
The command took not enough time
00:07:10.960 --> 00:07:13.360
for us to really dig into it.
00:07:13.360 --> 00:07:18.960
Let's try a shorter input with more matches.
00:07:18.960 --> 00:07:25.680
So profiler-start again, CPU.
07:25.680 --> 00:07:34.240
Let's search for list.
07:34.240 --> 00:07:40.160
Don't mind the minibuffer just yet.
07:40.160 --> 00:07:47.680
Okay. So let's look at the report.
07:47.680 --> 00:07:52.000
We can unwrap it here, and we see
00:07:52.000 --> 00:07:55.759
52% of the time was spent doing this,
07:55.759 --> 00:07:58.240
at least according to the profile.
00:07:58.240 --> 00:08:00.960
We can unwrap it, see the description
00:08:00.960 --> 00:08:02.800
of any function that was called
00:08:02.800 --> 00:08:08.479
by hitting d, or even jump to it
00:08:08.479 --> 00:08:12.960
by tapping f.
08:12.960 --> 00:08:16.160
By going down the stream,
08:16.160 --> 00:08:18.240
we unwrap it with TAB,
08:18.240 --> 00:08:22.400
you can see where time was spent.
08:22.400 --> 00:08:25.199
One of the bigger drawbacks
00:08:25.199 --> 00:08:30.639
of this profiler is the arithmetics
08:30.639 --> 00:08:39.200
don't always work out,
00:08:39.200 --> 00:08:45.760
like you might have...
08:45.760 --> 00:08:47.519
This is not a good example,
00:08:47.519 --> 00:08:52.640
but okay, you have 14% spent in here,
00:08:52.640 --> 00:08:57.200
but when we expand this entry,
00:08:57.200 --> 00:09:01.760
we only see like 6%, 3%, and 0%.
00:09:01.760 --> 00:09:06.640
Different sum, sometimes even bigger than that.
00:09:06.640 --> 00:09:10.800
So the native profiler
00:09:10.800 --> 00:09:13.200
can give an accurate picture
00:09:13.200 --> 00:09:15.920
and it has little overhead,
00:09:15.920 --> 00:09:20.399
but the specific numbers
00:09:20.399 --> 00:09:22.959
are not very precise
09:22.959 --> 00:09:28.640
because the principle is probabilistic.
09:28.640 --> 00:09:31.199
Let's stop here.
09:31.200 --> 00:09:36.959
There is another package called elp,
09:36.959 --> 00:09:39.360
Emacs Lisp Profiler,
00:09:39.360 --> 00:09:43.440
which is much older than that.
09:43.440 --> 00:09:47.920
It allows us to instrument
09:47.920 --> 00:09:53.600
just specific functions or a package.
09:53.600 --> 00:09:57.680
We're instrumenting the xref package here.
09:57.680 --> 00:10:01.360
It works through advice. You can see
00:10:01.360 --> 00:10:03.279
the description of one of the functions
00:10:03.279 --> 00:10:04.320
in the package, and we see
00:10:04.320 --> 00:10:12.640
that it has had :around device added.
10:12.640 --> 00:10:18.000
If we run the same search,
10:18.000 --> 00:10:21.360
we can see that -- when it finishes --
00:10:21.360 --> 00:10:30.399
the table of all the numbers,
10:30.399 --> 00:10:32.640
that every function in the package
00:10:32.640 --> 00:10:41.680
has been called, and the times
00:10:41.680 --> 00:10:48.000
which the runtime has spent inside of them.
10:48.000 --> 00:10:55.040
sorted by impact, as it understands that.
10:55.040 --> 00:11:00.079
The main problem with this profiler is
00:11:00.079 --> 00:11:04.160
it is slower because it adds overhead
00:11:04.160 --> 00:11:06.880
to every function call,
11:06.880 --> 00:11:11.519
so it, in the end, might give an impression
00:11:11.519 --> 00:11:19.519
that functions with lots of calls,
11:19.519 --> 00:11:21.839
which have been called a lot of times,
11:21.839 --> 00:11:26.880
are more important,
11:26.880 --> 00:11:29.279
hotter than they actually are,
00:11:29.279 --> 00:11:30.959
because it slows down
00:11:30.959 --> 00:11:32.640
every such function call,
11:32.640 --> 00:11:35.279
including the subsequent calls
00:11:35.279 --> 00:11:38.160
that these functions do
00:11:38.160 --> 00:11:40.560
inside the same package.
11:40.560 --> 00:11:48.240
So it's a good way to analyze and drill down
11:48.240 --> 00:11:50.320
to see which functions here
00:11:50.320 --> 00:11:51.200
take a lot of time,
00:11:51.200 --> 00:11:53.839
but just keep in mind
00:11:53.839 --> 00:11:57.760
that sometimes you might end up
00:11:57.760 --> 00:12:00.079
trying to optimize one of these
00:12:00.079 --> 00:12:01.600
smaller functions called
00:12:01.600 --> 00:12:09.279
or usually smaller, and that the result
00:12:09.279 --> 00:12:11.040
might not actually affect
00:12:11.040 --> 00:12:18.720
the production runtime at all.
12:18.720 --> 00:12:22.240
But it's still a good tool,
00:12:22.240 --> 00:12:25.440
especially when you already know
12:25.440 --> 00:12:30.560
which set of actions you are interested in,
00:12:30.560 --> 00:12:33.760
and which functions might be slower.
12:33.760 --> 00:12:37.120
elp allows you to instrument a package
00:12:37.120 --> 00:12:42.720
or a set of functions. Just don't forget
00:12:42.720 --> 00:12:49.600
to un-instrument them all afterwards,
00:12:49.600 --> 00:12:52.000
or else your session,
00:12:52.000 --> 00:12:55.920
your subsequent optimization efforts
00:12:55.920 --> 00:12:59.360
might not work as well as you might
12:59.360 --> 00:13:01.359
want to work.
13:01.360 --> 00:13:04.560
And there's also this very nice,
00:13:04.560 --> 00:13:09.600
very handy package called benchmark,
13:09.600 --> 00:13:12.480
which unfortunately
00:13:12.480 --> 00:13:16.000
not everybody knows about.
13:16.000 --> 00:13:21.519
It turns out that a lot of
00:13:21.519 --> 00:13:30.079
older Emacs Lisp developers,
13:30.079 --> 00:13:34.160
users, package developers usually have
00:13:34.160 --> 00:13:38.000
some benchmarking macros of their own.
00:13:38.000 --> 00:13:42.880
But there is this package
00:13:42.880 --> 00:13:46.160
with perfectly usable interface
00:13:46.160 --> 00:13:49.199
which doesn't require you
00:13:49.199 --> 00:13:51.120
to define something else,
00:13:51.120 --> 00:13:52.560
especially when you are in
00:13:52.560 --> 00:13:57.120
an emacs -Q session
13:57.120 --> 00:14:00.480
trying to benchmark your code
00:14:00.480 --> 00:14:05.440
in a bare Emacs. So it has
00:14:05.440 --> 00:14:09.680
two main endpoints, I would say.
14:09.680 --> 00:14:14.480
First one is the benchmark macro,
14:14.480 --> 00:14:19.199
and the second one is benchmark-progn.
00:14:19.199 --> 00:14:20.399
benchmark is a function,
00:14:20.399 --> 00:14:25.040
and benchmark-progn is a macro.
14:25.040 --> 00:14:27.839
The first one you can use by specifying
00:14:27.839 --> 00:14:30.480
the number of iterations and the form.
00:14:30.480 --> 00:14:36.800
I hope the minibuffer is easy to read here.
14:36.800 --> 00:14:43.360
For instance, we can take this long list
00:14:43.360 --> 00:14:44.800
and try nreverse-ing it,
00:14:44.800 --> 00:14:49.680
and we see how long that takes.
14:49.680 --> 00:14:54.160
Then you can do it,
14:54.160 --> 00:14:55.600
adjust the inner code,
00:14:55.600 --> 00:14:58.079
and then basically compare
00:14:58.079 --> 00:14:59.519
to figure out how much
00:14:59.519 --> 00:15:01.440
and how long nreverse takes
00:15:01.440 --> 00:15:03.839
in this scenario.
15:03.839 --> 00:15:07.760
Or, we can take a function
00:15:07.760 --> 00:15:10.880
which we have probably found
15:10.880 --> 00:15:12.880
using one of the previous methods
00:15:12.880 --> 00:15:17.279
which we anticipate that,
15:17.279 --> 00:15:19.440
as we understand, it takes a while,
15:19.440 --> 00:15:26.720
and annotate it with a benchmark-progn form,
00:15:26.720 --> 00:15:33.360
which... just execute the body
00:15:33.360 --> 00:15:36.880
and report, each and every one of them,
15:36.880 --> 00:15:41.360
how long that body execution took.
15:41.360 --> 00:15:45.360
So for instance, here we added
00:15:45.360 --> 00:15:46.959
a few message calls
00:15:46.959 --> 00:15:49.040
inside those benchmark-progns,
00:15:49.040 --> 00:15:53.199
so we can see how long each part
00:15:53.199 --> 00:16:03.600
of the function xref-matches-in-files
00:16:03.600 --> 00:16:06.320
takes when it is run.
16:06.320 --> 00:16:11.839
Let's try it. Let's first call
16:11.839 --> 00:16:17.199
emacs-lisp-byte-compile-and-load,
16:17.199 --> 00:16:20.720
so that we're sure that we are running
16:20.720 --> 00:16:24.079
the fastest possible version of this code,
16:24.079 --> 00:16:29.519
so when we do find the bottlenecks,
00:16:29.519 --> 00:16:31.279
we are sure that they are real
00:16:31.279 --> 00:16:36.240
and not because of the uncompiled code,
00:16:36.240 --> 00:16:43.519
macro expansion, or the lack of
00:16:43.519 --> 00:16:46.160
some substitutions, other substitutions
00:16:46.160 --> 00:16:50.800
that our code is relying on
16:50.800 --> 00:17:01.279
but which might not be available
17:01.279 --> 00:17:03.279
in the interpreter mode
00:17:03.279 --> 00:17:07.039
just in the compiled code.
17:07.039 --> 00:17:10.160
Let's run.
17:10.160 --> 00:17:14.880
So we have this list,
17:14.880 --> 00:17:26.160
search for list,
17:26.160 --> 00:17:29.760
and the remaining time is spent during
00:17:29.760 --> 00:17:36.720
printing of the results, which we didn't
17:36.720 --> 00:17:38.320
annotate with the benchmarking macro,
00:17:38.320 --> 00:17:41.280
but it's still there.
17:41.280 --> 00:17:43.679
So we can see by switching to the
17:43.679 --> 00:17:49.919
*Messages* buffer, C-h e,
17:49.919 --> 00:17:52.080
that unquoting was very fast.
00:17:52.080 --> 00:17:59.280
So the search took 400 milliseconds.
00:17:59.280 --> 00:18:02.480
It's slower than it would be
18:02.480 --> 00:18:06.080
without the video being recorded, as well
00:18:06.080 --> 00:18:09.039
as the rest of the measurements here.
18:09.039 --> 00:18:12.160
So the parsing of the results took more
00:18:12.160 --> 00:18:16.559
than that, and the object allocation
18:16.559 --> 00:18:21.919
took even more, which is unfortunate
00:18:21.919 --> 00:18:23.919
but it's the reality
00:18:23.919 --> 00:18:26.400
when we're dealing with a lot of
00:18:26.400 --> 00:18:29.840
search results, because, well,
18:29.840 --> 00:18:33.520
Emacs Lisp is slower than grep.
18:34.559 --> 00:18:49.200
That's just a given.
18:49.200 --> 00:18:54.400
What can be done and what had been done
00:18:54.400 --> 00:18:57.120
to improve on this?
18:57.120 --> 00:19:04.240
Well, first of all,
19:04.240 --> 00:19:06.000
let's change the question,
00:19:06.000 --> 00:19:07.360
because this is more of
00:19:07.360 --> 00:19:09.760
a retrospective sort of talk
19:09.760 --> 00:19:13.439
rather than talking about future plans.
00:19:13.440 --> 00:19:19.679
What can one do to improve performance?
19:19.679 --> 00:19:24.160
Well, basically, two things:
00:19:24.160 --> 00:19:27.919
first, you try to make sure
00:19:27.919 --> 00:19:30.559
that you're writing less code,
00:19:30.559 --> 00:19:37.200
then your code does fewer things,
19:37.200 --> 00:19:40.400
Fewer iterations of your operations.
19:40.400 --> 00:19:42.000
Basically, it's the realm
00:19:42.000 --> 00:19:46.880
of choosing your algorithm well.
19:46.880 --> 00:19:54.880
But make sure it's as little complexity
00:19:54.880 --> 00:20:00.239
as you can manage reasonably.
20:00.240 --> 00:20:05.760
Another is to try to reduce memory allocations.
20:05.760 --> 00:20:13.760
It's creating conses, the links of the list,
20:13.760 --> 00:20:17.919
of the lists. If you are mapping through
00:20:17.919 --> 00:20:19.200
a list, creating a new one,
00:20:19.200 --> 00:20:22.640
you are creating conses.
00:20:22.640 --> 00:20:26.159
New objects in memory, anyway.
20:26.159 --> 00:20:30.480
And also, when you call string operations
00:20:30.480 --> 00:20:33.679
like substring, when you create new strings,
20:33.679 --> 00:20:36.640
that's also what you do.
20:36.640 --> 00:20:39.200
When you allocate new memory,
00:20:39.200 --> 00:20:43.600
that triggers garbage collections later,
00:20:43.600 --> 00:20:49.200
which affect the performance of your code
20:49.200 --> 00:21:01.600
quite severely. I have found that actually,
00:21:01.600 --> 00:21:04.960
contrary to my personal intuition,
21:04.960 --> 00:21:08.240
when you try to fine-tune the code
00:21:08.240 --> 00:21:13.120
working on long lists of elements,
00:21:13.120 --> 00:21:15.520
long, long collection, large collections,
00:21:15.520 --> 00:21:20.159
it's even more important to try to avoid
21:20.159 --> 00:21:23.200
or reduce memory allocations
21:23.200 --> 00:21:26.400
rather than try to ensure
00:21:26.400 --> 00:21:32.240
that less code is running,
21:32.240 --> 00:21:34.960
less operations are performed,
21:34.960 --> 00:21:38.080
because the garbage collector
21:38.080 --> 00:21:44.799
can hit you in the posterior quite suddenly
00:21:44.799 --> 00:21:49.520
that you will not always...
21:49.520 --> 00:21:51.760
When you measure the input impact,
00:21:51.760 --> 00:21:56.799
you might not always measure the whole of it.
21:56.799 --> 00:22:02.640
And sometimes even when you have
00:22:02.640 --> 00:22:06.559
a few operations, you can measure
22:06.559 --> 00:22:09.520
all three of them, but at some point,
00:22:09.520 --> 00:22:11.919
if you just reduce a lot,
22:11.919 --> 00:22:14.720
remove most of the allocations,
00:22:14.720 --> 00:22:16.799
all three of them, the improvement
00:22:16.799 --> 00:22:21.520
might be even bigger than the total
22:21.520 --> 00:22:23.440
of three improvements
00:22:23.440 --> 00:22:27.200
which you might have measured previously,
22:27.200 --> 00:22:29.679
like separately.
22:29.679 --> 00:22:33.039
So it's something to be on the lookout for,
00:22:33.039 --> 00:22:39.520
but of course, when you pick the algorithm,
00:22:39.520 --> 00:22:45.200
it's important not to do
00:22:45.200 --> 00:22:52.158
more operations than you have to.
22:52.159 --> 00:22:56.000
We have examples of both
00:22:56.000 --> 00:22:58.720
in the recent changes
00:22:58.720 --> 00:23:02.559
to the xref and project packages,
23:02.559 --> 00:23:06.960
which we can examine here.
23:06.960 --> 00:23:11.200
Let's take a look at this one.
23:11.200 --> 00:23:14.880
This commit message lies a little bit
23:14.880 --> 00:23:19.039
because it's referring to
00:23:19.039 --> 00:23:24.559
the use of assoc instead of cl-assoc,
23:24.559 --> 00:23:29.919
and the actual change incorporates that,
00:23:29.919 --> 00:23:33.440
but it also incorporates
00:23:33.440 --> 00:23:39.440
the second part of the sentence
00:23:39.440 --> 00:23:41.360
which actually replaced
00:23:41.360 --> 00:23:46.960
the use of assoc in there.
23:46.960 --> 00:23:50.480
Curiously, cl-assoc was pretty slow
00:23:50.480 --> 00:23:52.559
because not only it created
00:23:52.559 --> 00:23:57.919
quadratic complexity in the operation
23:57.919 --> 00:23:59.919
and it was also calling
00:23:59.919 --> 00:24:02.880
the Lisp function [equal]
24:02.880 --> 00:24:04.400
for every iteration,
00:24:04.400 --> 00:24:08.880
whereas if we just use assoc there,
24:08.880 --> 00:24:10.080
which was the first version
00:24:10.080 --> 00:24:13.919
of this change, of the improvement,
24:13.919 --> 00:24:15.760
it became already much faster,
00:24:15.760 --> 00:24:20.640
but then switching to a hash table
24:20.640 --> 00:24:28.080
which turned this lookup
00:24:28.080 --> 00:24:31.760
from O(n) complexity
24:31.760 --> 00:24:35.600
into, well, amortized constant one,
24:35.600 --> 00:24:37.600
even better.
24:37.600 --> 00:24:45.679
So, use hash tables, kids.
24:45.679 --> 00:24:52.400
Another commit here is about using
00:24:52.400 --> 00:24:55.520
the inhibit-modification-hooks.
00:24:55.520 --> 00:24:58.000
So, turns out when you're printing
00:24:58.000 --> 00:25:01.679
into a buffer, even if you have already
25:01.679 --> 00:25:06.159
disabled the undo history,
25:06.159 --> 00:25:08.480
binding this variable to
25:08.480 --> 00:25:10.880
a non-null value is pretty good
00:25:10.880 --> 00:25:15.520
because you are able to avoid running
25:15.520 --> 00:25:21.120
a number of hooks, which improves performance.
25:21.120 --> 00:25:28.640
Next. This one was about moving the
00:25:28.640 --> 00:25:32.000
file-remote-p call
25:32.000 --> 00:25:37.279
from inside the loop.
25:37.279 --> 00:25:42.039
This function is actually
00:25:42.039 --> 00:25:49.039
surprisingly slow-ish for the goal,
25:49.039 --> 00:25:52.000
for its purpose, so you don't really want
25:52.000 --> 00:25:56.159
to call it on every file name in the list
00:25:56.159 --> 00:25:59.520
when you have a lot of them,
25:59.520 --> 00:26:02.000
and especially if your code
00:26:02.000 --> 00:26:04.640
is running in a buffer
00:26:04.640 --> 00:26:10.640
visiting a remote file, like through TRAMP.
26:10.640 --> 00:26:15.120
You might end up trying to
00:26:15.120 --> 00:26:16.880
devise different approaches
00:26:16.880 --> 00:26:20.159
to avoid checking whether
00:26:20.159 --> 00:26:23.279
every file name is remote
26:23.279 --> 00:26:25.360
if you're dealing with a list of file names,
00:26:25.360 --> 00:26:34.640
so this one, take a look, be careful with it.
26:34.640 --> 00:26:43.120
A similar, slower function, with-current-buffer,
26:43.120 --> 00:26:46.880
but not so much. So it all depends on
26:46.880 --> 00:26:48.400
really how often you call it.
00:26:48.400 --> 00:26:51.279
Sometimes you might want to
00:26:51.279 --> 00:26:52.720
rewrite your code so that
00:26:52.720 --> 00:26:54.720
it's called less often.
26:54.720 --> 00:26:59.760
And expand-file-name, which hits the disk.
26:59.760 --> 00:27:01.200
So if you're dealing with file names,
00:27:01.200 --> 00:27:02.960
you might want to replace it
00:27:02.960 --> 00:27:07.520
with concatenation at certain points.
27:07.520 --> 00:27:22.159
Okay, back to these changes later.
27:22.159 --> 00:27:28.480
This one just removed a location of a cons
27:28.480 --> 00:27:33.679
per match hit, and still it brought 5%
27:33.679 --> 00:27:37.039
or something like that improvement
00:27:37.039 --> 00:27:41.120
to the whole command.
27:41.120 --> 00:27:46.000
So that's a pretty significant improvement
00:27:46.000 --> 00:27:53.600
for a small change like that.
27:53.600 --> 00:28:01.679
Similarly, here we just made sure
00:28:01.679 --> 00:28:09.520
to avoid a splits... no, a substring call,
28:09.520 --> 00:28:12.399
and probably an allocation of the whole
28:12.399 --> 00:28:16.320
buffer string, but in my testing,
00:28:16.320 --> 00:28:19.440
that doesn't actually matter much,
00:28:19.440 --> 00:28:22.399
at least so much, but a substring call
00:28:22.399 --> 00:28:28.960
per result... If we see,
00:28:28.960 --> 00:28:33.039
since we changed to manual parsing
00:28:33.039 --> 00:28:37.440
of the buffer, with the strings
00:28:37.440 --> 00:28:41.440
delimited with zero bytes,
28:41.440 --> 00:28:43.840
it gave an overall improvement
00:28:43.840 --> 00:28:47.440
of 20%, again on my machine
00:28:47.440 --> 00:28:50.720
with a pretty fast SSD,
00:28:50.720 --> 00:28:53.679
and with a warm disk cache, of course.
28:53.679 --> 00:29:05.360
But still... Going back to this revision,
00:29:05.360 --> 00:29:09.440
it was actually quite surprising
00:29:09.440 --> 00:29:15.200
that migration to a cl-defstruct
29:15.200 --> 00:29:24.080
from eieio, the Common Lisp-inspired
00:29:24.080 --> 00:29:26.480
object system first introduced
00:29:26.480 --> 00:29:34.240
in the CEDET tools,
00:29:34.240 --> 00:29:39.360
that was a bit of a surprise,
00:29:39.360 --> 00:29:44.799
because not much of my
00:29:44.799 --> 00:29:46.080
benchmark benchmarking
00:29:46.080 --> 00:29:50.960
actually pointed at it being the problem.
29:50.960 --> 00:29:55.760
Probably because the accessors
00:29:55.760 --> 00:29:57.520
were not the actual problem,
29:57.520 --> 00:30:00.399
like the oref macros
00:30:00.399 --> 00:30:06.960
and the code accessing the slots,
30:06.960 --> 00:30:10.720
but the construction,
00:30:10.720 --> 00:30:14.320
the object construction code,
30:14.320 --> 00:30:16.559
that was where most of the time
00:30:16.559 --> 00:30:21.200
was spent unnecessarily,
30:21.200 --> 00:30:24.240
maybe doing type-checking,
30:24.240 --> 00:30:28.880
maybe some other stuff.
30:28.880 --> 00:30:36.080
So if you have lots of values,
30:36.080 --> 00:30:39.120
you need to treat like objects in... and
00:30:39.120 --> 00:30:42.000
virtual dispatch on them in your package,
00:30:42.000 --> 00:30:45.200
you might want to look into
30:45.200 --> 00:30:52.239
cl-defstruct for them.
30:52.240 --> 00:30:54.080
Going on to the next section,
00:30:54.080 --> 00:31:01.200
I have prepared the sort of comparison
31:01.200 --> 00:31:05.519
between cl-lib, dash, and seq,
00:31:05.519 --> 00:31:09.360
the collection libraries we have
31:09.360 --> 00:31:11.279
available for us in Emacs.
00:31:11.279 --> 00:31:15.760
That is the popular ones,
31:15.760 --> 00:31:21.440
but since I'm running behind on time,
00:31:21.440 --> 00:31:27.760
I'll probably just summarize the findings.
31:27.760 --> 00:31:31.919
First of all, seq is nice.
00:31:31.919 --> 00:31:38.480
Its generic approach is probably
31:38.480 --> 00:31:41.919
quite decent for most of the situations,
31:41.919 --> 00:31:49.840
but there are places where it could be
31:49.840 --> 00:31:53.679
optimized better, so instead of having
31:53.679 --> 00:31:56.399
quadratic performance, it could use a
31:56.399 --> 00:31:59.200
hash table, like for instance,
31:59.200 --> 00:32:02.000
dash does here--
32:02.000 --> 00:32:06.240
in dash, union or delete-dups
00:32:06.240 --> 00:32:12.960
does in its implementation.
32:12.960 --> 00:32:16.640
The dash itself is curiously fast,
00:32:16.640 --> 00:32:20.000
at least faster than I might have expected,
32:20.000 --> 00:32:21.600
possibly because of
00:32:21.600 --> 00:32:25.120
the implementation approach
00:32:25.120 --> 00:32:33.200
where it uses code generation
00:32:33.200 --> 00:32:35.840
to avoid function calls,
32:35.840 --> 00:32:37.840
at least some of them,
32:37.840 --> 00:32:41.120
which is interesting.
32:41.120 --> 00:32:45.600
But since both seq and dash
00:32:45.600 --> 00:32:49.919
avoid mutations--they don't really have
32:49.919 --> 00:32:52.399
mutating counterparts to common functions
00:32:52.399 --> 00:32:56.480
like you have with cl-remove-if, cl-delete-if,
32:56.480 --> 00:33:02.000
or just cl-remove, cl-delete,
33:02.000 --> 00:33:08.000
it still can be valuable to look into
00:33:08.000 --> 00:33:12.240
destructive versions of those functions,
33:12.240 --> 00:33:16.880
something from the core library
00:33:16.880 --> 00:33:22.559
like delete-dups or nreverse,
33:22.559 --> 00:33:24.399
for your code when you're really trying
33:24.399 --> 00:33:29.519
to get as close to the metal
00:33:29.519 --> 00:33:34.320
or whatever as you can,
00:33:34.320 --> 00:33:40.080
because avoiding extra allocations,
00:33:40.080 --> 00:33:43.200
it can really be useful.
33:43.200 --> 00:33:46.159
You can really improve their performance
33:46.159 --> 00:33:54.080
if you don't do a lot of other stuff.
33:54.080 --> 00:34:00.080
delete-consecutive-dups is blazing faster.
34:00.080 --> 00:34:03.919
It only requires pre-sorted strings.
34:03.919 --> 00:34:08.399
What else to say...
34:08.399 --> 00:34:13.200
If you are going to read these measurements,
34:13.200 --> 00:34:15.359
make sure to keep in mind
00:34:15.359 --> 00:34:18.000
that reverse is not free,
34:18.000 --> 00:34:20.240
so for instance, if we're looking
00:34:20.240 --> 00:34:22.480
at this comparison
00:34:22.480 --> 00:34:26.399
between remove and delete, for instance,
34:26.399 --> 00:34:27.919
they're using reverse
00:34:27.919 --> 00:34:32.159
to avoid modifying the data,
34:32.159 --> 00:34:34.800
the sample data, so we don't have to
34:34.800 --> 00:34:36.720
create it every time.
34:36.720 --> 00:34:41.919
But to compare how much faster
00:34:41.919 --> 00:34:43.760
delete is than remove,
34:43.760 --> 00:34:50.800
we need to subtract 787 milliseconds
34:50.800 --> 00:34:52.320
from here and from here
00:34:52.320 --> 00:34:58.480
so it comes out to like 230 milliseconds
00:34:58.480 --> 00:35:02.560
in this example, the last example,
00:35:02.560 --> 00:35:12.000
and 100 to 1 second, 250 milliseconds here,
00:35:12.000 --> 00:35:17.599
so the difference is 5-fold here.
35:17.599 --> 00:35:20.160
Not 2-fold.
35:20.160 --> 00:35:26.480
All right. With this, I'm going to
35:26.480 --> 00:35:29.520
thank you for listening, for watching
35:29.520 --> 00:35:31.920
and I'll be taking questions.
35:31.920 --> 00:35:32.920
Thank you.
00:35:32.920 --> 00:35:35.160
[captions by sachac]