path: root/2021/captions/emacsconf-2021-faster--optimizing-emacs-lisp-code--dmitry-gutov--main.vtt



WEBVTT

00:01.120 --> 00:00:04.640
Hi. Greetings from the cloudy St. Petersburg.

00:04.640 --> 00:00:06.080
My name is Dmitry Gutov. 

00:00:06.080 --> 00:00:09.280
I write Ruby by day, Emacs Lisp by night

00:00:09.280 --> 00:00:12.080
when I don't do anything else.

00:12.080 --> 00:00:14.559
You might know my work 

00:00:14.559 --> 00:00:18.160
from a number of third-party packages

00:00:18.160 --> 00:00:22.240
as well as built-in ones.

00:22.240 --> 00:00:25.599
The idea for this talk came out from

00:25.599 --> 00:00:26.800
an improvement request 

00:00:26.800 --> 00:00:32.960
for the performance of grep-like commands

00:32.960 --> 00:00:35.200
using the xref interface 

00:00:35.200 --> 00:00:43.280
and its storage types, container types 

00:00:43.280 --> 00:00:47.840
for the search results,

00:47.840 --> 00:00:51.440
complaining that it's not as fast

00:00:51.440 --> 00:00:54.320
as it potentially could have been

00:54.320 --> 00:00:58.239
when there are lots of search results 

00:00:58.239 --> 00:01:01.600
coming for a given search.

01:01.600 --> 00:01:06.159
I have noticed myself that 

00:01:06.159 --> 00:01:10.159
when working on it, and probably before. 

00:01:10.159 --> 00:01:16.560
My approach to optimizing Lisp code 

01:16.560 --> 00:01:21.680
has changed recently-ish,

01:21.680 --> 00:01:26.960
and I'd like to talk about it here.

01:26.960 --> 00:01:34.079
This talk is for people who already

01:34.079 --> 00:01:37.040
know how to write some Emacs Lisp 

00:01:37.040 --> 00:01:38.880
to solve their problems 

00:01:38.880 --> 00:01:43.360
and who have possibly encountered 

00:01:43.360 --> 00:01:49.200
some performance issues doing that.

01:49.200 --> 00:01:51.600
Also, if you want to contribute 

00:01:51.600 --> 00:01:54.799
some improvements to the code 

00:01:54.799 --> 00:01:56.960
that is already in Emacs 

00:01:56.960 --> 00:02:02.159
or to other packages 

00:02:02.159 --> 00:02:04.479
which are owned by somebody else, 

00:02:04.479 --> 00:02:13.520
it should also be helpful. Let's start. 

00:02:13.520 --> 00:02:18.640
First of all, about Emacs and Emacs Lisp. 

00:02:18.640 --> 00:02:21.360
It's not the fastest language.

02:21.360 --> 00:02:26.560
Let's switch to the notes. 

00:02:26.560 --> 00:02:31.440
I hope the font is big enough 

00:02:31.440 --> 00:02:36.479
to be readable on this video, really hope.

02:36.480 --> 00:02:40.160
Emacs Lisp is not the fastest of the bunch.

00:02:40.160 --> 00:02:44.400
The garbage collector creates pauses 

00:02:44.400 --> 00:02:48.800
whenever it needs to sweep data.

02:48.800 --> 00:02:51.760
The interpreter is not the fastest one,

02:51.760 --> 00:02:54.720
even though the native compilation branch

00:02:54.720 --> 00:02:59.120
is improving on that by twice

00:02:59.120 --> 00:03:04.640
in certain scenarios, which is good.

03:04.640 --> 00:03:10.560
The standard library or functions

00:03:10.560 --> 00:03:12.800
for working with collections, for example,

03:12.800 --> 00:03:17.599
are not so uniformly great.

03:17.599 --> 00:03:20.080
So there is some work to be done there.

03:20.080 --> 00:03:22.239
Maybe you can contribute

03:22.239 --> 00:03:26.640
the next improvement in there.

03:26.640 --> 00:03:33.200
And also, if your package displays stuff, 

00:03:33.200 --> 00:03:35.040
it has a visual component,

03:35.040 --> 00:03:36.959
then you might have to deal with

00:03:36.959 --> 00:03:43.120
some drawbacks of the display engine

03:43.120 --> 00:03:47.760
which might slow to a crawl 

00:03:47.760 --> 00:03:53.680
when your code has... 

00:03:53.680 --> 00:03:56.560
when you're trying to display lines 

00:03:56.560 --> 00:04:00.319
which are a little too long--

00:04:00.319 --> 00:04:03.120
trying to print a [long] line, for example--

04:03.120 --> 00:04:07.120
or you are using a lot of overlays,

04:07.120 --> 00:04:09.120
if you know what it is.

04:09.120 --> 00:04:12.640
With lots of overlays 

00:04:12.640 --> 00:04:19.839
on the same visual line, in particular.

04:19.840 --> 00:04:24.160
But okay. The first thing to understand,

00:04:24.160 --> 00:04:27.680
and I hope everybody who's done

00:04:27.680 --> 00:04:29.520
some programming in the past knows, 

00:04:29.520 --> 00:04:35.120
it's: first you write correctly, 

00:04:35.120 --> 00:04:37.360
and then you try to benchmark.

04:37.360 --> 00:04:39.199
and see where the problems are, 

00:04:39.199 --> 00:04:40.720
if there are any.

04:40.720 --> 00:04:45.919
So first do it right, then do it fast.

04:45.919 --> 00:04:50.720
How do we find the hotspots, 

00:04:50.720 --> 00:04:52.400
the bottlenecks on the second step? 

00:04:52.400 --> 00:04:54.720
We try to do some profiling 

00:04:54.720 --> 00:04:58.960
or measuring how long the code takes.

04:58.960 --> 00:05:03.038
Emacs has two different profilers. 

00:05:03.039 --> 00:05:05.440
One is the native profiler,

05:05.440 --> 00:05:10.639
which as I recall was contributed by

00:05:10.639 --> 00:05:13.280
Tomohiro Matsuyama, the author 

00:05:13.280 --> 00:05:16.240
of the autocomplete package

05:16.240 --> 00:05:22.720
back in Emacs 24.3, apparently.

05:22.720 --> 00:05:24.320
It's a low overhead profiler. 

00:05:24.320 --> 00:05:31.520
It's sampling, which is good on one hand

05:31.520 --> 00:05:33.919
but might be less good on the other hand.

05:33.919 --> 00:05:35.199
Let's try using it. 

00:05:35.199 --> 00:05:40.960
So we start with profiler-start. 

00:05:40.960 --> 00:05:44.960
Let's just ask for CPU profiling here,

00:05:44.960 --> 00:05:48.320
but it also can profile memory locations.

05:48.320 --> 00:05:56.400
Let's try a search for some string

00:05:56.400 --> 00:06:01.360
in the current project.

06:01.360 --> 00:06:04.080
Let's do it a few times, maybe three times,

00:06:04.080 --> 00:06:09.360
so the profiler has more data to work with.

06:09.360 --> 00:06:13.680
Let's call the report.

06:13.680 --> 00:06:16.960
Okay. So here we have the tree, 

00:06:16.960 --> 00:06:19.360
the execution tree with percentages

00:06:19.360 --> 00:06:22.960
of the time spent in each function.

06:22.960 --> 00:06:25.840
You can unwrap it by going up and down

00:06:25.840 --> 00:06:35.039
and pressing TAB to unwrap every element.

06:35.039 --> 00:06:52.000
This is weird.

06:52.000 --> 00:06:55.199
Okay, here we see that the actual command

00:06:55.199 --> 00:06:56.639
that was called only takes 

00:06:56.639 --> 00:06:59.599
8% of the whole runtime,

06:59.599 --> 00:07:07.759
meaning the input was...

07:07.759 --> 00:07:10.960
The command took not enough time 

00:07:10.960 --> 00:07:13.360
for us to really dig into it. 

00:07:13.360 --> 00:07:18.960
Let's try a shorter input with more matches. 

00:07:18.960 --> 00:07:25.680
So profiler-start again, CPU.

07:25.680 --> 00:07:34.240
Let's search for list.

07:34.240 --> 00:07:40.160
Don't mind the minibuffer just yet.

07:40.160 --> 00:07:47.680
Okay. So let's look at the report.

07:47.680 --> 00:07:52.000
We can unwrap it here, and we see

00:07:52.000 --> 00:07:55.759
52% of the time was spent doing this,

07:55.759 --> 00:07:58.240
at least according to the profile. 

00:07:58.240 --> 00:08:00.960
We can unwrap it, see the description 

00:08:00.960 --> 00:08:02.800
of any function that was called 

00:08:02.800 --> 00:08:08.479
by hitting d, or even jump to it 

00:08:08.479 --> 00:08:12.960
by tapping f.

08:12.960 --> 00:08:16.160
By going down the stream,

08:16.160 --> 00:08:18.240
we unwrap it with TAB,

08:18.240 --> 00:08:22.400
you can see where time was spent.

08:22.400 --> 00:08:25.199
One of the bigger drawbacks 

00:08:25.199 --> 00:08:30.639
of this profiler is the arithmetics

08:30.639 --> 00:08:39.200
don't always work out, 

00:08:39.200 --> 00:08:45.760
like you might have...

08:45.760 --> 00:08:47.519
This is not a good example, 

00:08:47.519 --> 00:08:52.640
but okay, you have 14% spent in here,

00:08:52.640 --> 00:08:57.200
but when we expand this entry,

00:08:57.200 --> 00:09:01.760
we only see like 6%, 3%, and 0%. 

00:09:01.760 --> 00:09:06.640
Different sum, sometimes even bigger than that.

00:09:06.640 --> 00:09:10.800
So the native profiler 

00:09:10.800 --> 00:09:13.200
can give an accurate picture 

00:09:13.200 --> 00:09:15.920
and it has little overhead, 

00:09:15.920 --> 00:09:20.399
but the specific numbers 

00:09:20.399 --> 00:09:22.959
are not very precise

09:22.959 --> 00:09:28.640
because the principle is probabilistic.

09:28.640 --> 00:09:31.199
Let's stop here.

09:31.200 --> 00:09:36.959
There is another package called elp,

09:36.959 --> 00:09:39.360
Emacs Lisp Profiler, 

00:09:39.360 --> 00:09:43.440
which is much older than that.

09:43.440 --> 00:09:47.920
It allows us to instrument

09:47.920 --> 00:09:53.600
just specific functions or a package.

09:53.600 --> 00:09:57.680
We're instrumenting the xref package here.

09:57.680 --> 00:10:01.360
It works through advice. You can see

00:10:01.360 --> 00:10:03.279
the description of one of the functions

00:10:03.279 --> 00:10:04.320
in the package, and we see 

00:10:04.320 --> 00:10:12.640
that it has had :around device added.

10:12.640 --> 00:10:18.000
If we run the same search,

10:18.000 --> 00:10:21.360
we can see that -- when it finishes --

00:10:21.360 --> 00:10:30.399
the table of all the numbers,

10:30.399 --> 00:10:32.640
that every function in the package 

00:10:32.640 --> 00:10:41.680
has been called, and the times 

00:10:41.680 --> 00:10:48.000
which the runtime has spent inside of them.

10:48.000 --> 00:10:55.040
sorted by impact, as it understands that.

10:55.040 --> 00:11:00.079
The main problem with this profiler is 

00:11:00.079 --> 00:11:04.160
it is slower because it adds overhead

00:11:04.160 --> 00:11:06.880
to every function call,

11:06.880 --> 00:11:11.519
so it, in the end, might give an impression

00:11:11.519 --> 00:11:19.519
that functions with lots of calls,

11:19.519 --> 00:11:21.839
which have been called a lot of times,

11:21.839 --> 00:11:26.880
are more important, 

11:26.880 --> 00:11:29.279
hotter than they actually are, 

00:11:29.279 --> 00:11:30.959
because it slows down 

00:11:30.959 --> 00:11:32.640
every such function call,

11:32.640 --> 00:11:35.279
including the subsequent calls 

00:11:35.279 --> 00:11:38.160
that these functions do 

00:11:38.160 --> 00:11:40.560
inside the same package.

11:40.560 --> 00:11:48.240
So it's a good way to analyze and drill down

11:48.240 --> 00:11:50.320
to see which functions here 

00:11:50.320 --> 00:11:51.200
take a lot of time,

00:11:51.200 --> 00:11:53.839
but just keep in mind 

00:11:53.839 --> 00:11:57.760
that sometimes you might end up 

00:11:57.760 --> 00:12:00.079
trying to optimize one of these 

00:12:00.079 --> 00:12:01.600
smaller functions called 

00:12:01.600 --> 00:12:09.279
or usually smaller, and that the result

00:12:09.279 --> 00:12:11.040
might not actually affect 

00:12:11.040 --> 00:12:18.720
the production runtime at all.

12:18.720 --> 00:12:22.240
But it's still a good tool,

00:12:22.240 --> 00:12:25.440
especially when you already know

12:25.440 --> 00:12:30.560
which set of actions you are interested in, 

00:12:30.560 --> 00:12:33.760
and which functions might be slower.

12:33.760 --> 00:12:37.120
elp allows you to instrument a package 

00:12:37.120 --> 00:12:42.720
or a set of functions. Just don't forget 

00:12:42.720 --> 00:12:49.600
to un-instrument them all afterwards, 

00:12:49.600 --> 00:12:52.000
or else your session, 

00:12:52.000 --> 00:12:55.920
your subsequent optimization efforts 

00:12:55.920 --> 00:12:59.360
might not work as well as you might

12:59.360 --> 00:13:01.359
want to work.

13:01.360 --> 00:13:04.560
And there's also this very nice, 

00:13:04.560 --> 00:13:09.600
very handy package called benchmark,

13:09.600 --> 00:13:12.480
which unfortunately 

00:13:12.480 --> 00:13:16.000
not everybody knows about.

13:16.000 --> 00:13:21.519
It turns out that a lot of 

00:13:21.519 --> 00:13:30.079
older Emacs Lisp developers,

13:30.079 --> 00:13:34.160
users, package developers usually have 

00:13:34.160 --> 00:13:38.000
some benchmarking macros of their own.

00:13:38.000 --> 00:13:42.880
But there is this package 

00:13:42.880 --> 00:13:46.160
with perfectly usable interface 

00:13:46.160 --> 00:13:49.199
which doesn't require you 

00:13:49.199 --> 00:13:51.120
to define something else,

00:13:51.120 --> 00:13:52.560
especially when you are in

00:13:52.560 --> 00:13:57.120
an emacs -Q session

13:57.120 --> 00:14:00.480
trying to benchmark your code

00:14:00.480 --> 00:14:05.440
in a bare Emacs. So it has 

00:14:05.440 --> 00:14:09.680
two main endpoints, I would say.

14:09.680 --> 00:14:14.480
First one is the benchmark macro,

14:14.480 --> 00:14:19.199
and the second one is benchmark-progn. 

00:14:19.199 --> 00:14:20.399
benchmark is a function,

00:14:20.399 --> 00:14:25.040
and benchmark-progn is a macro.

14:25.040 --> 00:14:27.839
The first one you can use by specifying

00:14:27.839 --> 00:14:30.480
the number of iterations and the form. 

00:14:30.480 --> 00:14:36.800
I hope the minibuffer is easy to read here.

14:36.800 --> 00:14:43.360
For instance, we can take this long list

00:14:43.360 --> 00:14:44.800
and try nreverse-ing it,

00:14:44.800 --> 00:14:49.680
and we see how long that takes.

14:49.680 --> 00:14:54.160
Then you can do it,

14:54.160 --> 00:14:55.600
adjust the inner code, 

00:14:55.600 --> 00:14:58.079
and then basically compare 

00:14:58.079 --> 00:14:59.519
to figure out how much

00:14:59.519 --> 00:15:01.440
and how long nreverse takes 

00:15:01.440 --> 00:15:03.839
in this scenario.

15:03.839 --> 00:15:07.760
Or, we can take a function 

00:15:07.760 --> 00:15:10.880
which we have probably found

15:10.880 --> 00:15:12.880
using one of the previous methods 

00:15:12.880 --> 00:15:17.279
which we anticipate that,

15:17.279 --> 00:15:19.440
as we understand, it takes a while,

15:19.440 --> 00:15:26.720
and annotate it with a benchmark-progn form, 

00:15:26.720 --> 00:15:33.360
which... just execute the body 

00:15:33.360 --> 00:15:36.880
and report, each and every one of them,

15:36.880 --> 00:15:41.360
how long that body execution took.

15:41.360 --> 00:15:45.360
So for instance, here we added

00:15:45.360 --> 00:15:46.959
a few message calls 

00:15:46.959 --> 00:15:49.040
inside those benchmark-progns, 

00:15:49.040 --> 00:15:53.199
so we can see how long each part

00:15:53.199 --> 00:16:03.600
of the function xref-matches-in-files 

00:16:03.600 --> 00:16:06.320
takes when it is run.

16:06.320 --> 00:16:11.839
Let's try it. Let's first call

16:11.839 --> 00:16:17.199
emacs-lisp-byte-compile-and-load,

16:17.199 --> 00:16:20.720
so that we're sure that we are running

16:20.720 --> 00:16:24.079
the fastest possible version of this code,

16:24.079 --> 00:16:29.519
so when we do find the bottlenecks,

00:16:29.519 --> 00:16:31.279
we are sure that they are real

00:16:31.279 --> 00:16:36.240
and not because of the uncompiled code, 

00:16:36.240 --> 00:16:43.519
macro expansion, or the lack of 

00:16:43.519 --> 00:16:46.160
some substitutions, other substitutions

00:16:46.160 --> 00:16:50.800
that our code is relying on

16:50.800 --> 00:17:01.279
but which might not be available

17:01.279 --> 00:17:03.279
in the interpreter mode 

00:17:03.279 --> 00:17:07.039
just in the compiled code.

17:07.039 --> 00:17:10.160
Let's run.

17:10.160 --> 00:17:14.880
So we have this list,

17:14.880 --> 00:17:26.160
search for list,

17:26.160 --> 00:17:29.760
and the remaining time is spent during 

00:17:29.760 --> 00:17:36.720
printing of the results, which we didn't

17:36.720 --> 00:17:38.320
annotate with the benchmarking macro, 

00:17:38.320 --> 00:17:41.280
but it's still there.

17:41.280 --> 00:17:43.679
So we can see by switching to the

17:43.679 --> 00:17:49.919
*Messages* buffer, C-h e,

17:49.919 --> 00:17:52.080
that unquoting was very fast.

00:17:52.080 --> 00:17:59.280
So the search took 400 milliseconds. 

00:17:59.280 --> 00:18:02.480
It's slower than it would be

18:02.480 --> 00:18:06.080
without the video being recorded, as well

00:18:06.080 --> 00:18:09.039
as the rest of the measurements here.

18:09.039 --> 00:18:12.160
So the parsing of the results took more

00:18:12.160 --> 00:18:16.559
than that, and the object allocation

18:16.559 --> 00:18:21.919
took even more, which is unfortunate 

00:18:21.919 --> 00:18:23.919
but it's the reality 

00:18:23.919 --> 00:18:26.400
when we're dealing with a lot of

00:18:26.400 --> 00:18:29.840
search results, because, well,

18:29.840 --> 00:18:33.520
Emacs Lisp is slower than grep.

18:34.559 --> 00:18:49.200
That's just a given.

18:49.200 --> 00:18:54.400
What can be done and what had been done

00:18:54.400 --> 00:18:57.120
to improve on this?

18:57.120 --> 00:19:04.240
Well, first of all,

19:04.240 --> 00:19:06.000
let's change the question, 

00:19:06.000 --> 00:19:07.360
because this is more of 

00:19:07.360 --> 00:19:09.760
a retrospective sort of talk

19:09.760 --> 00:19:13.439
rather than talking about future plans. 

00:19:13.440 --> 00:19:19.679
What can one do to improve performance?

19:19.679 --> 00:19:24.160
Well, basically, two things: 

00:19:24.160 --> 00:19:27.919
first, you try to make sure 

00:19:27.919 --> 00:19:30.559
that you're writing less code, 

00:19:30.559 --> 00:19:37.200
then your code does fewer things,

19:37.200 --> 00:19:40.400
Fewer iterations of your operations.

19:40.400 --> 00:19:42.000
Basically, it's the realm 

00:19:42.000 --> 00:19:46.880
of choosing your algorithm well.

19:46.880 --> 00:19:54.880
But make sure it's as little complexity 

00:19:54.880 --> 00:20:00.239
as you can manage reasonably.

20:00.240 --> 00:20:05.760
Another is to try to reduce memory allocations.

20:05.760 --> 00:20:13.760
It's creating conses, the links of the list,

20:13.760 --> 00:20:17.919
of the lists. If you are mapping through 

00:20:17.919 --> 00:20:19.200
a list, creating a new one, 

00:20:19.200 --> 00:20:22.640
you are creating conses.

00:20:22.640 --> 00:20:26.159
New objects in memory, anyway.

20:26.159 --> 00:20:30.480
And also, when you call string operations

00:20:30.480 --> 00:20:33.679
like substring, when you create new strings,

20:33.679 --> 00:20:36.640
that's also what you do.

20:36.640 --> 00:20:39.200
When you allocate new memory, 

00:20:39.200 --> 00:20:43.600
that triggers garbage collections later,

00:20:43.600 --> 00:20:49.200
which affect the performance of your code

20:49.200 --> 00:21:01.600
quite severely. I have found that actually, 

00:21:01.600 --> 00:21:04.960
contrary to my personal intuition,

21:04.960 --> 00:21:08.240
when you try to fine-tune the code

00:21:08.240 --> 00:21:13.120
working on long lists of elements,

00:21:13.120 --> 00:21:15.520
long, long collection,  large collections, 

00:21:15.520 --> 00:21:20.159
it's even more important to try to avoid

21:20.159 --> 00:21:23.200
or reduce memory allocations

21:23.200 --> 00:21:26.400
rather than try to ensure 

00:21:26.400 --> 00:21:32.240
that less code is running,

21:32.240 --> 00:21:34.960
less operations are performed,

21:34.960 --> 00:21:38.080
because the garbage collector

21:38.080 --> 00:21:44.799
can hit you in the posterior quite suddenly

00:21:44.799 --> 00:21:49.520
that you will not always...

21:49.520 --> 00:21:51.760
When you measure the input impact,

00:21:51.760 --> 00:21:56.799
you might not always measure the whole of it.

21:56.799 --> 00:22:02.640
And sometimes even when you have

00:22:02.640 --> 00:22:06.559
a few operations, you can measure

22:06.559 --> 00:22:09.520
all three of them, but at some point,

00:22:09.520 --> 00:22:11.919
if you just reduce a lot,

22:11.919 --> 00:22:14.720
remove most of the allocations, 

00:22:14.720 --> 00:22:16.799
all three of them, the improvement

00:22:16.799 --> 00:22:21.520
might be even bigger than the total

22:21.520 --> 00:22:23.440
of three improvements 

00:22:23.440 --> 00:22:27.200
which you might have measured previously,

22:27.200 --> 00:22:29.679
like separately.

22:29.679 --> 00:22:33.039
So it's something to be on the lookout for,

00:22:33.039 --> 00:22:39.520
but of course, when you pick the algorithm,

00:22:39.520 --> 00:22:45.200
it's important not to do

00:22:45.200 --> 00:22:52.158
more operations than you have to.

22:52.159 --> 00:22:56.000
We have examples of both 

00:22:56.000 --> 00:22:58.720
in the recent changes 

00:22:58.720 --> 00:23:02.559
to the xref and project packages,

23:02.559 --> 00:23:06.960
which we can examine here.

23:06.960 --> 00:23:11.200
Let's take a look at this one.

23:11.200 --> 00:23:14.880
This commit message lies a little bit

23:14.880 --> 00:23:19.039
because it's referring to 

00:23:19.039 --> 00:23:24.559
the use of assoc instead of cl-assoc,

23:24.559 --> 00:23:29.919
and the actual change incorporates that,

00:23:29.919 --> 00:23:33.440
but it also incorporates 

00:23:33.440 --> 00:23:39.440
the second part of the sentence 

00:23:39.440 --> 00:23:41.360
which actually replaced 

00:23:41.360 --> 00:23:46.960
the use of assoc in there.

23:46.960 --> 00:23:50.480
Curiously, cl-assoc was pretty slow 

00:23:50.480 --> 00:23:52.559
because not only it created 

00:23:52.559 --> 00:23:57.919
quadratic complexity in the operation

23:57.919 --> 00:23:59.919
and it was also calling 

00:23:59.919 --> 00:24:02.880
the Lisp function [equal]

24:02.880 --> 00:24:04.400
for every iteration, 

00:24:04.400 --> 00:24:08.880
whereas if we just use assoc there,

24:08.880 --> 00:24:10.080
which was the first version 

00:24:10.080 --> 00:24:13.919
of this change, of the improvement,

24:13.919 --> 00:24:15.760
it became already much faster, 

00:24:15.760 --> 00:24:20.640
but then switching to a hash table

24:20.640 --> 00:24:28.080
which turned this lookup 

00:24:28.080 --> 00:24:31.760
from O(n) complexity

24:31.760 --> 00:24:35.600
into, well, amortized constant one,

24:35.600 --> 00:24:37.600
even better.

24:37.600 --> 00:24:45.679
So, use hash tables, kids.

24:45.679 --> 00:24:52.400
Another commit here is about using 

00:24:52.400 --> 00:24:55.520
the inhibit-modification-hooks. 

00:24:55.520 --> 00:24:58.000
So, turns out when you're printing 

00:24:58.000 --> 00:25:01.679
into a buffer, even if you have already

25:01.679 --> 00:25:06.159
disabled the undo history,

25:06.159 --> 00:25:08.480
binding this variable to

25:08.480 --> 00:25:10.880
a non-null value is pretty good 

00:25:10.880 --> 00:25:15.520
because you are able to avoid running

25:15.520 --> 00:25:21.120
a number of hooks, which improves performance.

25:21.120 --> 00:25:28.640
Next. This one was about moving the 

00:25:28.640 --> 00:25:32.000
file-remote-p call

25:32.000 --> 00:25:37.279
from inside the loop.

25:37.279 --> 00:25:42.039
This function is actually 

00:25:42.039 --> 00:25:49.039
surprisingly slow-ish for the goal,

25:49.039 --> 00:25:52.000
for its purpose, so you don't really want

25:52.000 --> 00:25:56.159
to call it on every file name in the list

00:25:56.159 --> 00:25:59.520
when you have a lot of them,

25:59.520 --> 00:26:02.000
and especially if your code 

00:26:02.000 --> 00:26:04.640
is running in a buffer 

00:26:04.640 --> 00:26:10.640
visiting a remote file, like through TRAMP.

26:10.640 --> 00:26:15.120
You might end up trying to

00:26:15.120 --> 00:26:16.880
devise different approaches

00:26:16.880 --> 00:26:20.159
to avoid checking whether 

00:26:20.159 --> 00:26:23.279
every file name is remote

26:23.279 --> 00:26:25.360
if you're dealing with a list of file names, 

00:26:25.360 --> 00:26:34.640
so this one, take a look, be careful with it.

26:34.640 --> 00:26:43.120
A similar, slower function, with-current-buffer,

26:43.120 --> 00:26:46.880
but not so much. So it all depends on

26:46.880 --> 00:26:48.400
really how often you call it. 

00:26:48.400 --> 00:26:51.279
Sometimes you might want to 

00:26:51.279 --> 00:26:52.720
rewrite your code so that 

00:26:52.720 --> 00:26:54.720
it's called less often.

26:54.720 --> 00:26:59.760
And expand-file-name, which hits the disk.

26:59.760 --> 00:27:01.200
So if you're dealing with file names, 

00:27:01.200 --> 00:27:02.960
you might want to replace it 

00:27:02.960 --> 00:27:07.520
with concatenation at certain points.

27:07.520 --> 00:27:22.159
Okay, back to these changes later.

27:22.159 --> 00:27:28.480
This one just removed a location of a cons

27:28.480 --> 00:27:33.679
per match hit, and still it brought 5%

27:33.679 --> 00:27:37.039
or something like that improvement 

00:27:37.039 --> 00:27:41.120
to the whole command.

27:41.120 --> 00:27:46.000
So that's a pretty significant improvement 

00:27:46.000 --> 00:27:53.600
for a small change like that.

27:53.600 --> 00:28:01.679
Similarly, here we just made sure 

00:28:01.679 --> 00:28:09.520
to avoid a splits... no, a substring call,

28:09.520 --> 00:28:12.399
and probably an allocation of the whole

28:12.399 --> 00:28:16.320
buffer string, but in my testing,

00:28:16.320 --> 00:28:19.440
that doesn't actually matter much, 

00:28:19.440 --> 00:28:22.399
at least so much, but a substring call

00:28:22.399 --> 00:28:28.960
per result... If we see, 

00:28:28.960 --> 00:28:33.039
since we changed to manual parsing 

00:28:33.039 --> 00:28:37.440
of the buffer, with the strings 

00:28:37.440 --> 00:28:41.440
delimited with zero bytes,

28:41.440 --> 00:28:43.840
it gave an overall improvement 

00:28:43.840 --> 00:28:47.440
of 20%, again on my machine

00:28:47.440 --> 00:28:50.720
with a pretty fast SSD, 

00:28:50.720 --> 00:28:53.679
and with a warm disk cache, of course.

28:53.679 --> 00:29:05.360
But still... Going back to this revision,

00:29:05.360 --> 00:29:09.440
it was actually quite surprising

00:29:09.440 --> 00:29:15.200
that migration to a cl-defstruct

29:15.200 --> 00:29:24.080
from eieio, the Common Lisp-inspired 

00:29:24.080 --> 00:29:26.480
object system first introduced 

00:29:26.480 --> 00:29:34.240
in the CEDET tools, 

00:29:34.240 --> 00:29:39.360
that was a bit of a surprise,

00:29:39.360 --> 00:29:44.799
because not much of my 

00:29:44.799 --> 00:29:46.080
benchmark benchmarking 

00:29:46.080 --> 00:29:50.960
actually pointed at it being the problem.

29:50.960 --> 00:29:55.760
Probably because the accessors 

00:29:55.760 --> 00:29:57.520
were not the actual problem,

29:57.520 --> 00:30:00.399
like the oref macros 

00:30:00.399 --> 00:30:06.960
and the code accessing the slots,

30:06.960 --> 00:30:10.720
but the construction, 

00:30:10.720 --> 00:30:14.320
the object construction code,

30:14.320 --> 00:30:16.559
that was where most of the time 

00:30:16.559 --> 00:30:21.200
was spent unnecessarily,

30:21.200 --> 00:30:24.240
maybe doing type-checking,

30:24.240 --> 00:30:28.880
maybe some other stuff.

30:28.880 --> 00:30:36.080
So if you have lots of values,

30:36.080 --> 00:30:39.120
you need to treat like objects in... and

00:30:39.120 --> 00:30:42.000
virtual dispatch on them in your package, 

00:30:42.000 --> 00:30:45.200
you might want to look into

30:45.200 --> 00:30:52.239
cl-defstruct for them.

30:52.240 --> 00:30:54.080
Going on to the next section, 

00:30:54.080 --> 00:31:01.200
I have prepared the sort of comparison 

31:01.200 --> 00:31:05.519
between cl-lib, dash, and seq, 

00:31:05.519 --> 00:31:09.360
the collection libraries we have

31:09.360 --> 00:31:11.279
available for us in Emacs. 

00:31:11.279 --> 00:31:15.760
That is the popular ones,

31:15.760 --> 00:31:21.440
but since I'm running behind on time, 

00:31:21.440 --> 00:31:27.760
I'll probably just summarize the findings.

31:27.760 --> 00:31:31.919
First of all, seq is nice.

00:31:31.919 --> 00:31:38.480
Its generic approach is probably

31:38.480 --> 00:31:41.919
quite decent for most of the situations,

31:41.919 --> 00:31:49.840
but there are places where it could be

31:49.840 --> 00:31:53.679
optimized better, so instead of having

31:53.679 --> 00:31:56.399
quadratic performance, it could use a

31:56.399 --> 00:31:59.200
hash table, like for instance,

31:59.200 --> 00:32:02.000
dash does here--

32:02.000 --> 00:32:06.240
in dash, union or delete-dups 

00:32:06.240 --> 00:32:12.960
does in its implementation.

32:12.960 --> 00:32:16.640
The dash itself is curiously fast,

00:32:16.640 --> 00:32:20.000
at least faster than I might have expected,

32:20.000 --> 00:32:21.600
possibly because of 

00:32:21.600 --> 00:32:25.120
the implementation approach 

00:32:25.120 --> 00:32:33.200
where it uses code generation

00:32:33.200 --> 00:32:35.840
to avoid function calls,

32:35.840 --> 00:32:37.840
at least some of them,

32:37.840 --> 00:32:41.120
which is interesting.

32:41.120 --> 00:32:45.600
But since both seq and dash 

00:32:45.600 --> 00:32:49.919
avoid mutations--they don't really have

32:49.919 --> 00:32:52.399
mutating counterparts to common functions

00:32:52.399 --> 00:32:56.480
like you have with cl-remove-if, cl-delete-if,

32:56.480 --> 00:33:02.000
or just cl-remove, cl-delete,

33:02.000 --> 00:33:08.000
it still can be valuable to look into

00:33:08.000 --> 00:33:12.240
destructive versions of those functions,

33:12.240 --> 00:33:16.880
something from the core library 

00:33:16.880 --> 00:33:22.559
like delete-dups or nreverse,

33:22.559 --> 00:33:24.399
for your code when you're really trying

33:24.399 --> 00:33:29.519
to get as close to the metal 

00:33:29.519 --> 00:33:34.320
or whatever as you can, 

00:33:34.320 --> 00:33:40.080
because avoiding extra allocations, 

00:33:40.080 --> 00:33:43.200
it can really be useful.

33:43.200 --> 00:33:46.159
You can really improve their performance

33:46.159 --> 00:33:54.080
if you don't do a lot of other stuff.

33:54.080 --> 00:34:00.080
delete-consecutive-dups is blazing faster.

34:00.080 --> 00:34:03.919
It only requires pre-sorted strings.

34:03.919 --> 00:34:08.399
What else to say...

34:08.399 --> 00:34:13.200
If you are going to read these measurements,

34:13.200 --> 00:34:15.359
make sure to keep in mind 

00:34:15.359 --> 00:34:18.000
that reverse is not free,

34:18.000 --> 00:34:20.240
so for instance, if we're looking 

00:34:20.240 --> 00:34:22.480
at this comparison 

00:34:22.480 --> 00:34:26.399
between remove and delete, for instance,

34:26.399 --> 00:34:27.919
they're using reverse 

00:34:27.919 --> 00:34:32.159
to avoid modifying the data,

34:32.159 --> 00:34:34.800
the sample data, so we don't have to

34:34.800 --> 00:34:36.720
create it every time.

34:36.720 --> 00:34:41.919
But to compare how much faster 

00:34:41.919 --> 00:34:43.760
delete is than remove,

34:43.760 --> 00:34:50.800
we need to subtract 787 milliseconds

34:50.800 --> 00:34:52.320
from here and from here 

00:34:52.320 --> 00:34:58.480
so it comes out to like 230 milliseconds 

00:34:58.480 --> 00:35:02.560
in this example, the last example, 

00:35:02.560 --> 00:35:12.000
and 100 to 1 second, 250 milliseconds here,

00:35:12.000 --> 00:35:17.599
so the difference is 5-fold here.

35:17.599 --> 00:35:20.160
Not 2-fold.

35:20.160 --> 00:35:26.480
All right. With this, I'm going to

35:26.480 --> 00:35:29.520
thank you for listening, for watching

35:29.520 --> 00:35:31.920
and I'll be taking questions.

35:31.920 --> 00:35:32.920
Thank you.

00:35:32.920 --> 00:35:35.160
[captions by sachac]