path: root/2023/captions/emacsconf-2023-test--what-i-learned-by-writing-test-cases-for-gnu-hyperbole--mats-lidell--main.vtt



WEBVTT captioned by sachac, checked by sachac

NOTE Introduction

00:00:03.120 --> 00:00:07.439
Hi everyone! I'm Mats Lidell.

00:00:07.440 --> 00:00:09.879
I'm going to talk about my journey

00:00:09.880 --> 00:00:12.480
writing test cases for GNU Hyperbole

00:00:12.481 --> 00:00:19.399
and what I learned on the way.

00:00:19.400 --> 00:00:24.079
So, why write tests for GNU Hyperbole?

00:00:24.080 --> 00:00:25.679
There is some background.

00:00:25.680 --> 00:00:27.959
I'm the co-maintainer of GNU Hyperbole

00:00:27.960 --> 00:00:33.479
together with Bob Weiner. Bob is the author of the package.

00:00:33.480 --> 00:00:34.680
The package is available through

00:00:34.681 --> 00:00:38.799
the Emacs package manager and GNU Elpa

00:00:38.800 --> 00:00:42.599
if you would want to try it out.

00:00:42.600 --> 00:00:46.359
The package has some age. I think it dates back to

00:00:46.360 --> 00:00:50.119
a first release around 1993, which is also

00:00:50.120 --> 00:00:54.799
when I got in contact with the package the first time.

00:00:54.800 --> 00:00:58.239
I was a user of the package for many years.

00:00:58.240 --> 00:01:03.119
Later, I became the maintainer of the package for the FSF.

00:01:03.120 --> 00:01:04.679
That was although I did not have

00:01:04.680 --> 00:01:09.039
much knowledge of Emacs Lisp,

00:01:09.040 --> 00:01:12.679
and I still have a lot to learn.

00:01:12.680 --> 00:01:15.959
A few years ago, we started to work actively on the package,

00:01:15.960 --> 00:01:20.839
with setting up goals and having meetings.

00:01:20.840 --> 00:01:24.959
So my starting point is that I had experience

00:01:24.960 --> 00:01:27.439
with test automation from development

00:01:27.440 --> 00:01:30.599
in C++, Java and Python

00:01:30.600 --> 00:01:37.239
using different x-unit frameworks like cppunit, junit.

00:01:37.240 --> 00:01:40.039
That was in my daytime work where

00:01:40.040 --> 00:01:41.959
the technique of using pull requests

00:01:41.960 --> 00:01:46.719
with changes backed up by tests were the daily routine.

00:01:46.720 --> 00:01:49.199
It was really a requirement for a change to go in

00:01:49.200 --> 00:01:52.159
to have supporting test cases.

00:01:52.160 --> 00:01:58.559
I believe, a quite common setup and requirement these days.

00:01:58.560 --> 00:02:02.039
I also had been an Emacs user for many years,

00:02:02.040 --> 00:02:04.279
but with focus on being a user.

00:02:04.280 --> 00:02:09.839
So as I mentioned, I have limited Emacs Lisp knowledge.

00:02:09.840 --> 00:02:11.359
When we decided to start

00:02:11.360 --> 00:02:13.959
to work actively on Hyperbole again,

00:02:13.960 --> 00:02:15.519
it was natural for me to look into

00:02:15.520 --> 00:02:18.679
raising the quality by adding unit tests.

00:02:18.680 --> 00:02:20.679
This also goes hand in hand

00:02:20.680 --> 00:02:25.239
with running these regularly as part of a build process.

00:02:25.240 --> 00:02:28.439
All in all, following the current best practice

00:02:28.440 --> 00:02:31.359
of software development.

00:02:31.360 --> 00:02:36.479
But since Hyperbole had no tests at all,

00:02:36.480 --> 00:02:38.719
it would not be enough just to add tests

00:02:38.720 --> 00:02:41.799
for new or changed functionality.

00:02:41.800 --> 00:02:44.639
We wanted to add it even broader; ideally, everywhere.

00:02:44.640 --> 00:02:48.399
So work started with adding tests here and there

00:02:48.400 --> 00:02:52.039
based on our gut feeling where it would be most useful.

00:02:52.040 --> 00:02:55.799
This work is still ongoing.

00:02:55.800 --> 00:02:58.119
So this is where my journey starts

00:02:58.120 --> 00:03:00.759
with much functionality to test,

00:03:00.760 --> 00:03:03.359
no knowledge of what testing frameworks existed,

00:03:03.360 --> 00:03:11.159
and not really knowing a lot about Emacs Lisp at all.

NOTE ERT: Emacs Lisp Regression Testing

00:03:11.160 --> 00:03:13.799
Luckily there is a package for writing tests in Emacs.

00:03:13.800 --> 00:03:17.919
It is called ERT: Emacs Lisp Regression Testing.

00:03:17.920 --> 00:03:20.959
It contains both support for defining tests and running them.

00:03:20.960 --> 00:03:24.639
Defining a test is done with the macro `ert-deftest`.

00:03:24.640 --> 00:03:28.919
In its simplest form, a test has a name, a doc string, and a body.

00:03:28.920 --> 00:03:31.439
The doc string is where you typically can give

00:03:31.440 --> 00:03:33.799
a detailed description of the test

00:03:33.800 --> 00:03:35.559
and has space for more info

00:03:35.560 --> 00:03:42.279
than what can be given in the test name.

00:03:42.280 --> 00:03:45.239
The body is where all the interesting things happen.

00:03:45.240 --> 00:03:51.959
It is here you prepare the test, run it and verify the outcome.

00:03:51.960 --> 00:03:54.239
Schematically, it looks like this.

00:03:54.240 --> 00:04:00.239
You have the ert-deftest, you have the test name,

00:04:00.240 --> 00:04:02.799
and the doc string, and then the body.

00:04:02.800 --> 00:04:06.559
It is in the body where everything interesting happens.

00:04:06.560 --> 00:04:09.759
The test is prepared, the function of the test is executed,

00:04:09.760 --> 00:04:13.119
and the outcome of the test is evaluated.

00:04:13.120 --> 00:04:14.359
Did the test succeed or not?

NOTE Assertions with `should`

00:04:14.360 --> 00:04:18.479
The verification of a test is performed with

00:04:18.480 --> 00:04:21.479
one or more so-called assertions.

00:04:21.480 --> 00:04:24.999
In ERT, they are implemented

00:04:25.000 --> 00:04:26.599
with the macro `should`

00:04:26.600 --> 00:04:33.559
together with a set of related macros.

00:04:33.560 --> 00:04:35.519
`should` takes a form as argument,

00:04:35.520 --> 00:04:37.839
and if the form evaluates to nil,

00:04:37.840 --> 00:04:48.580
the test has failed. So let's look at an example.

00:04:48.581 --> 00:04:51.919
This simple test verifies that the function `+`

00:04:51.920 --> 00:04:56.919
can add the numbers 2 and 3 and get the result 5.

NOTE Running a test case

00:04:56.920 --> 00:05:01.959
So now we have defined a test case. How do we run it?

00:05:01.960 --> 00:05:03.919
The ERT package has the function (or

00:05:03.920 --> 00:05:09.519
rather convenience alias) `ert`. It takes a test selector.

00:05:09.520 --> 00:05:19.759
The test name works as a selector for running just one test.

00:05:19.760 --> 00:05:27.900
So here we have the example. Let's evaluate it.

00:05:27.901 --> 00:05:34.519
We define it and then we run it using ERT.

00:05:34.520 --> 00:05:42.399
As you see, we get prompted for a test selector

00:05:42.400 --> 00:05:46.319
but we only have one test case defined at the moment.

00:05:46.320 --> 00:05:55.919
It's the example 0. So let's hit RET.

00:05:55.920 --> 00:05:58.959
As you see here, we get some output

00:05:58.960 --> 00:06:01.359
describing what we have just done.

00:06:01.360 --> 00:06:04.839
There is one test case it has passed, zero failed,

00:06:04.840 --> 00:06:07.839
zero skipped, total 1 of 1 test case

00:06:07.840 --> 00:06:14.439
and some time stamps for the execution.

00:06:14.440 --> 00:06:18.519
We also see this green mark here indicating one test case

00:06:18.520 --> 00:06:23.039
and that it was successful.

00:06:23.040 --> 00:06:29.659
For inspecting the test, we can hit the letter `l`

00:06:29.660 --> 00:06:32.839
which shows all the `should` forms

00:06:32.840 --> 00:06:37.779
that was executed during this test case.

00:06:37.780 --> 00:06:39.919
So here we see that we have the `should`,

00:06:39.920 --> 00:06:47.999
one `should` executed, and we see the form equals to 2,

00:06:48.000 --> 00:06:49.799
and it was 5 equals to 5.

00:06:49.800 --> 00:06:54.559
So a good example of a successful test case.

NOTE Debug a test

00:06:54.560 --> 00:06:57.919
So now we've seen how we can run a test case.

00:06:57.920 --> 00:07:03.799
Can we debug it? Yes. For debugging a test case,

00:07:03.800 --> 00:07:07.939
the `ert-deftest` can be set up using `edebug-defun`,

00:07:07.940 --> 00:07:10.319
just as a function or macro is set up

00:07:10.320 --> 00:07:18.819
or instrumented for debugging. So let's try that.

00:07:18.820 --> 00:07:24.119
So we try `edebug-defun` here.

00:07:24.120 --> 00:07:28.279
Now it's instrumented for debugging.

00:07:28.280 --> 00:07:35.659
And we run it, `ert`, and we're inside the debugger,

00:07:35.660 --> 00:07:40.679
and we can inspect here what's happening.

00:07:40.680 --> 00:07:46.960
Step through it and yes it succeeded just as before.

NOTE Commercial break: Hyperbole

00:07:50.380 --> 00:07:56.879
It's time for a commercial break!

00:07:56.880 --> 00:08:00.079
Hyperbole itself can help with running tests

00:08:00.080 --> 00:08:03.639
and also help with running them in debug mode.

00:08:03.640 --> 00:08:08.519
That is because hyperbole identifies the `ert-deftest`

00:08:08.520 --> 00:08:12.679
as an implicit button. An implicit button is basically

00:08:12.680 --> 00:08:13.759
a string or pattern

00:08:13.760 --> 00:08:16.799
that Hyperbole has assigned some meaning to.

00:08:16.800 --> 00:08:19.959
For the string `ert-deftest`, it is to run the test case.

00:08:19.960 --> 00:08:24.559
You activate the button with the action-key.

00:08:24.560 --> 00:08:27.079
The standard binding is the middle mouse button,

00:08:27.080 --> 00:08:33.040
or from the keyboard, M-RET.

00:08:33.041 --> 00:08:34.799
So let's try that.

00:08:34.800 --> 00:08:42.219
We move the cursor here and then we type M-RET.

00:08:42.220 --> 00:08:47.959
And boom, the test case was executed.

00:08:47.960 --> 00:08:54.479
And to run it in debug mode we type C-u M-RET

00:08:54.480 --> 00:08:57.719
to get the assist key, and then we're in the debugger.

00:08:57.720 --> 00:09:10.479
So that's pretty useful and convenient.

NOTE Instrument function on the fly

00:09:10.480 --> 00:09:13.719
A related useful feature here is the step-in functionality

00:09:13.720 --> 00:09:16.399
bound to the letter i in `debug-mode`.

00:09:16.400 --> 00:09:18.119
It allows you to step into a function

00:09:18.120 --> 00:09:20.479
and continue debugging from there.

00:09:20.480 --> 00:09:22.839
For the cases where your test does not do what you want,

00:09:22.840 --> 00:09:25.119
looking at what happens in the function of the test

00:09:25.120 --> 00:09:37.259
can be really useful. Let's try that with another example.

00:09:37.260 --> 00:09:43.359
So here we have two helper functions, one `f1-add`,

00:09:43.360 --> 00:09:47.439
that use the built-in `+` function

00:09:47.440 --> 00:09:52.239
and then we have `my-add` that uses that function.

00:09:52.240 --> 00:09:59.399
So we're going to test myadd.

00:09:59.400 --> 00:10:02.919
And then let's run this.

00:10:02.920 --> 00:10:05.959
Let's run this using hyperbole in debug mode

00:10:05.960 --> 00:10:10.079
C-u M-RET. We're in the debugger again,

00:10:10.080 --> 00:10:15.639
and let's step up front to my function under test

00:10:15.640 --> 00:10:19.359
and then press `i` for getting it instrumented

00:10:19.360 --> 00:10:23.019
and going into it for debugging.

00:10:23.020 --> 00:10:25.139
And here we can expect that it's getting

00:10:25.140 --> 00:10:26.559
the arguments 1 and 3,

00:10:26.560 --> 00:10:30.999
and it returns the result 4 as expected.

00:10:31.000 --> 00:10:39.119
And yes, of course, our test case will then succeed.

NOTE Mocking

00:10:39.120 --> 00:10:41.839
The next tool in our toolbox is mocking.

00:10:41.840 --> 00:10:46.239
Mocking is needed when we want to simulate the response

00:10:46.240 --> 00:10:49.279
from a function used by the function under test.

00:10:49.280 --> 00:10:53.139
That is the implementation of the function.

00:10:53.140 --> 00:10:56.119
This could be for various reasons.

00:10:56.120 --> 00:11:00.879
One example could be because it would be hard or impossible

00:11:00.880 --> 00:11:04.199
in the test setup to get the behavior you want to test for,

00:11:04.200 --> 00:11:06.279
like an external error case.

00:11:06.280 --> 00:11:08.679
But the mock can also be used to verify

00:11:08.680 --> 00:11:11.619
that the function is called with a specific argument.

00:11:11.620 --> 00:11:14.559
We can view it as a way to isolate the function on the test

00:11:14.560 --> 00:11:16.719
from its dependencies.

00:11:16.720 --> 00:11:18.959
So in order to test the function in isolation,

00:11:18.960 --> 00:11:22.079
we need to cut out any dependencies to external behavior.

00:11:22.080 --> 00:11:25.839
Most obvious would be dependencies to external resources,

00:11:25.840 --> 00:11:27.639
such as web pages. As an example:

00:11:27.640 --> 00:11:30.639
Hyperbole contains functionality to link you to

00:11:30.640 --> 00:11:34.239
social media resources and other resources on the net.

00:11:34.240 --> 00:11:37.899
Testing that would require the test system to call out

00:11:37.900 --> 00:11:39.639
to the social media resources

00:11:39.640 --> 00:11:43.539
and would depend on it being available, etc.

00:11:43.540 --> 00:11:45.479
Nothing technically stops a test case

00:11:45.480 --> 00:11:47.239
to depend on the external resources,

00:11:47.240 --> 00:11:51.319
but would, if nothing else, be flaky or slow.

00:11:51.320 --> 00:11:53.759
It could be part of an end-to-end suite

00:11:53.760 --> 00:11:57.179
where we want to test that it works all the way.

00:11:57.180 --> 00:11:59.719
In this case, we want to look at the isolated case

00:11:59.720 --> 00:12:04.099
that can be run with no dependency on external resources.

00:12:04.100 --> 00:12:06.679
What you want to do is to replace the function with a mock

00:12:06.680 --> 00:12:10.339
that behaves as the real function would do.

00:12:10.340 --> 00:12:11.639
The package I have found

00:12:11.640 --> 00:12:14.319
and have used for mocking is `el-mock`.

00:12:14.320 --> 00:12:21.839
The workhorse in this package is the `with-mock` macro.

00:12:21.840 --> 00:12:26.519
It looks like this: `with-mock` followed by a body.

00:12:26.520 --> 00:12:30.439
In the execution of the body, stubs and mocks

00:12:30.440 --> 00:12:32.899
defined in the body is respected.

00:12:32.900 --> 00:12:39.199
Let's look at some examples to make that clearer.

00:12:39.200 --> 00:12:42.079
In this case, we have the macro `with-mock`.

00:12:42.080 --> 00:12:43.959
It works so that the expression

00:12:43.960 --> 00:12:48.639
`stub + => 10` is interpreted

00:12:48.640 --> 00:12:51.919
so that the function `+` will be replaced with the stub.

00:12:51.920 --> 00:12:56.779
The stub will return 10 regardless how it is called.

00:12:56.780 --> 00:12:58.119
Note that the stub function

00:12:58.120 --> 00:13:00.199
does not have to be called at this level

00:13:00.200 --> 00:13:02.799
but could be called at any level in the call chain.

00:13:02.800 --> 00:13:07.479
By knowing how the function under test is implemented

00:13:07.480 --> 00:13:09.319
and how the implementation works,

00:13:09.320 --> 00:13:11.959
you can find function calls you want to mock

00:13:11.960 --> 00:13:14.999
to force certain behavior that you want to test,

00:13:15.000 --> 00:13:18.999
or to avoid calls to external resources, slow calls, etc.

00:13:19.000 --> 00:13:21.959
Simply isolate the function under test

00:13:21.960 --> 00:13:26.119
and simulate its environment.

00:13:26.120 --> 00:13:28.639
Mock is a little bit more sophisticated

00:13:28.640 --> 00:13:30.079
and depends on the arguments

00:13:30.080 --> 00:13:31.479
that the mock function is called with.

00:13:31.480 --> 00:13:33.847
Or more precise, it is checked

00:13:33.848 --> 00:13:35.519
after the `with-mock` clause

00:13:35.520 --> 00:13:38.079
that the arguments match the arguments it was called with

00:13:38.080 --> 00:13:39.759
or even if it was called at all.

00:13:39.760 --> 00:13:41.839
If it is called with other arguments

00:13:41.840 --> 00:13:43.719
there will be an error,

00:13:43.720 --> 00:13:46.479
and if it's not called, it is also an error.

00:13:46.480 --> 00:13:48.359
So this way, we are sure that the function

00:13:48.360 --> 00:13:51.319
we were expected to be called actually was called.

00:13:51.320 --> 00:13:53.399
An important piece of the testing.

00:13:53.400 --> 00:13:56.239
So we are sure that the mock we have provided

00:13:56.240 --> 00:14:03.999
actually is triggered by the test case.

00:14:04.000 --> 00:14:08.159
So here we have an example of `with-mock`

00:14:08.160 --> 00:14:18.879
where the `f1-add` function is mocked,

00:14:18.880 --> 00:14:21.999
so that if it's called with 2 and 3 as arguments,

00:14:22.000 --> 00:14:24.919
it will return 10. Then we have a test case

00:14:24.920 --> 00:14:27.999
where we try the `my-add` function,

00:14:28.000 --> 00:14:30.319
as you might remember, and call that with 2 and 3

00:14:30.320 --> 00:14:32.799
and see that it should also then return 10

00:14:32.800 --> 00:14:41.239
because it's using `f1-add`.

NOTE cl-letf

00:14:41.240 --> 00:14:44.559
Moving over to `cl-letf`.

00:14:44.560 --> 00:14:47.679
In rare occasions, the limitations of `el-mock` means

00:14:47.680 --> 00:14:50.239
you would want to implement a full-fledged function

00:14:50.240 --> 00:14:52.979
to be used under test.

00:14:52.980 --> 00:14:55.439
Then the macro `cl-letf` can be useful.

00:14:55.440 --> 00:14:57.879
However, you need to handle the case yourself

00:14:57.880 --> 00:15:00.099
if the function was not called.

00:15:00.100 --> 00:15:03.519
Looking through the test cases where I have used `cl-letf`,

00:15:03.520 --> 00:15:06.119
I think most can be implemented using plain mocking.

00:15:06.120 --> 00:15:11.239
Cases left is where the args to the mock might be different

00:15:11.240 --> 00:15:13.739
due to environment issues.

00:15:13.740 --> 00:15:24.099
In that case, a static mock will not work.

NOTE Hooks

00:15:24.100 --> 00:15:30.719
Another trick is that functions that uses hooks.

00:15:30.720 --> 00:15:35.639
You can overload or replace the hooks to do the testing.

00:15:35.640 --> 00:15:40.759
So you can use the hook function just to do the verification

00:15:40.760 --> 00:15:43.119
and not do anything useful in the hook.

00:15:43.120 --> 00:15:45.079
Also, here you need to be careful

00:15:45.080 --> 00:15:55.719
to make sure the test handler is called and nothing else.

NOTE Side effects and initial buffer state

00:15:55.720 --> 00:15:57.679
So far we have been talking about testing

00:15:57.680 --> 00:15:59.039
and what the function returns.

00:15:59.040 --> 00:16:01.119
In the best of words, we have a pure function

00:16:01.120 --> 00:16:02.959
that only depends on its arguments

00:16:02.960 --> 00:16:04.939
and produces no side effects.

00:16:04.940 --> 00:16:06.899
Many operations produce side effects

00:16:06.900 --> 00:16:09.479
or operate on the contents of buffers

00:16:09.480 --> 00:16:12.379
such as writing a message in the message buffer,

00:16:12.380 --> 00:16:15.659
change the state of a buffer, move point etc.

00:16:15.660 --> 00:16:18.859
Hyperbole is not an exception. Quite the contrary.

00:16:18.860 --> 00:16:20.839
Much of the functions creating links

00:16:20.840 --> 00:16:24.420
are just about updating buffers.

00:16:24.421 --> 00:16:28.559
This poses a special problem for tests.

00:16:28.560 --> 00:16:29.839
The test gets longer

00:16:29.840 --> 00:16:31.919
since you need to create buffers and files,

00:16:31.920 --> 00:16:33.279
initialize the contents.

00:16:33.280 --> 00:16:35.159
Verifying the outcome becomes trickier

00:16:35.160 --> 00:16:39.019
since you need to make sure you look at the right place.

00:16:39.020 --> 00:16:41.039
At the end of the test, you need to clean up,

00:16:41.040 --> 00:16:43.439
both for not leaving a lot of garbage

00:16:43.440 --> 00:16:45.279
in buffers and files around,

00:16:45.280 --> 00:16:48.479
and even worse, not cause later tests

00:16:48.480 --> 00:16:50.959
to depend on the leftovers from the other tests.

00:16:50.960 --> 00:16:53.079
Here are some functions and variables

00:16:53.080 --> 00:17:05.099
I have found useful for this.

NOTE with-temp-buffer

00:17:05.100 --> 00:17:09.199
For creating tests: `with-temp-buffer`:

00:17:09.200 --> 00:17:11.919
it provides you a temp buffer that you visit,

00:17:11.920 --> 00:17:13.719
and afterwards, there is no need to clean up.

00:17:13.720 --> 00:17:16.519
This is the first choice if that is all you need.

NOTE make-temp-file

00:17:16.520 --> 00:17:20.519
`make-temp-file`: If you need a file,

00:17:20.520 --> 00:17:21.959
this is the function to use.

00:17:21.960 --> 00:17:24.279
It creates a temp file or a directory.

00:17:24.280 --> 00:17:26.959
The file can be filled with initial contents.

00:17:26.960 --> 00:17:31.019
This needs to be cleaned up after a test.

00:17:31.020 --> 00:17:33.287
Moving on to verifying and debugging:

NOTE buffer-string

00:17:33.288 --> 00:17:38.247
`buffer-string`: returns the full contents

00:17:38.248 --> 00:17:39.499
of the buffer as a string.

00:17:39.500 --> 00:17:41.399
That can sound a bit voluminous,

00:17:41.400 --> 00:17:46.139
but since tests are normally small, this often works well.

00:17:46.140 --> 00:17:48.439
I have in particular found good use of comparing

00:17:48.440 --> 00:17:50.399
the contents of buffers with the empty string.

00:17:50.400 --> 00:17:53.359
That would give an error, but as we have seen

00:17:53.360 --> 00:17:56.079
with the output produced by the `should` assertion,

00:17:56.080 --> 00:17:58.079
this is almost like a print statement

00:17:58.080 --> 00:18:01.199
and can be compared with the good old technique

00:18:01.200 --> 00:18:04.399
of debugging with print statements.

00:18:04.400 --> 00:18:06.247
There might be other ways to do the same

00:18:06.248 --> 00:18:09.919
as we saw with debugging.

NOTE buffer-name

00:18:09.920 --> 00:18:13.719
buffer-name: Getting the buffer name is good

00:18:13.720 --> 00:18:16.239
to verify what buffer we are looking at.

00:18:16.240 --> 00:18:18.359
I often found it useful to check

00:18:18.360 --> 00:18:21.119
that my assumptions on what buffer I am acting on

00:18:21.120 --> 00:18:23.399
is correct by adding `should` clauses

00:18:23.400 --> 00:18:25.399
in the middle of the test execution

00:18:25.400 --> 00:18:27.399
or after preparing the test input.

00:18:27.400 --> 00:18:31.679
Sometimes Emacs can switch buffers in strange ways,

00:18:31.680 --> 00:18:34.199
maybe because the test case is badly written,

00:18:34.200 --> 00:18:37.239
and making sure your assumptions are correct

00:18:37.240 --> 00:18:40.339
is a good sanity check.

00:18:40.340 --> 00:18:42.239
Even the ert package does

00:18:42.240 --> 00:18:44.879
some buffer and windows manipulation for its reporting

00:18:44.880 --> 00:18:47.487
that I have not fully learned how to master,

00:18:47.488 --> 00:18:51.979
so assertion for checking the sanity of the test is good.

NOTE major-mode

00:18:51.980 --> 00:18:55.679
Finally, `major-mode`: Verify the buffer has the proper mode.

00:18:55.680 --> 00:19:02.679
Can also be very useful and is a good sanity check.

NOTE unwind-protect

00:19:02.680 --> 00:19:06.599
Finally, cleaning up. `unwind-protect`.

00:19:06.600 --> 00:19:09.039
The tool for cleaning up is the `unwind-protect` form

00:19:09.040 --> 00:19:12.479
which ensures that the unwind forms

00:19:12.480 --> 00:19:15.439
always are executed regardless of the outcome of the body.

00:19:15.440 --> 00:19:20.419
So if your test fails, you are sure the cleanup is executed.

00:19:20.420 --> 00:19:22.759
Let's look at unwind-protect together with

00:19:22.760 --> 00:19:30.519
the temporary file example. Many tests look like this.

00:19:30.520 --> 00:19:35.279
You create some resource, you call `unwind-protect`,

00:19:35.280 --> 00:19:42.759
you do the test, and then afterwards you do the cleanup.

00:19:42.760 --> 00:19:46.359
The cleanup for a file and a buffer is so common,

00:19:46.360 --> 00:19:50.999
so I have created a helper for that.

00:19:51.000 --> 00:19:56.559
It looks like this.

00:19:56.560 --> 00:19:59.179
The trick with the `buffer-modified` flag

00:19:59.180 --> 00:20:00.719
is to avoid getting prompted

00:20:00.720 --> 00:20:03.219
for killing a buffer that is not saved.

00:20:03.220 --> 00:20:05.439
The test buffers are often in the state

00:20:05.440 --> 00:20:15.099
where they have not been saved but modified.

NOTE Input, with-simulated-input

00:20:15.100 --> 00:20:19.679
Another problem for tests are input.

00:20:19.680 --> 00:20:21.559
In the middle of execution a function

00:20:21.560 --> 00:20:24.039
might want to have some interaction with the user.

00:20:24.040 --> 00:20:26.959
Testing this poses a problem, not only in that

00:20:26.960 --> 00:20:31.199
the input matters, but also as how even to get the test case

00:20:31.200 --> 00:20:34.079
to recognize the input!?

00:20:34.080 --> 00:20:36.039
Ideally the tests are run in batch mode,

00:20:36.040 --> 00:20:38.919
which in some sense means no user interaction.

00:20:38.920 --> 00:20:42.999
In batch mode, there is no event loop running.

00:20:43.000 --> 00:20:47.179
Fortunately, there is a package `with-simulated-input`

00:20:47.180 --> 00:20:53.259
that gets you around these issues.

00:20:53.260 --> 00:20:55.399
This is a macro that allows us

00:20:55.400 --> 00:20:56.999
to define a set of characters

00:20:57.000 --> 00:20:59.079
that will be read by the function under the test,

00:20:59.080 --> 00:21:02.579
and all of this works in batch mode. It looks like this.

00:21:02.580 --> 00:21:04.159
We have `with-simulated-input`,

00:21:04.160 --> 00:21:09.839
and then a string of characters, and then a body.

00:21:09.840 --> 00:21:11.647
The form takes a string of keys

00:21:11.648 --> 00:21:13.119
and runs the rest of the body,

00:21:13.120 --> 00:21:15.439
and if there are input required,

00:21:15.440 --> 00:21:18.119
it is picked from the string of keys.

00:21:18.120 --> 00:21:20.421
In our example, the `read-string` call

00:21:20.422 --> 00:21:21.719
will read up until RET,

00:21:21.720 --> 00:21:26.119
and then return the characters read.

00:21:26.120 --> 00:21:29.639
As you see in the example, space needs to be provided

00:21:29.640 --> 00:21:38.459
by the string SPC, as return by the string RET.

NOTE Running all tests

00:21:38.460 --> 00:21:40.799
So now we have seen ways to create test cases

00:21:40.800 --> 00:21:43.219
and even make it possible to run some of them

00:21:43.220 --> 00:21:44.679
that has I/O in batch mode.

00:21:44.680 --> 00:21:47.279
But the initial goal was to run them all at once.

00:21:47.280 --> 00:21:48.919
How do you do that?

00:21:48.920 --> 00:21:51.759
Let's go back to the `ert` command.

00:21:51.760 --> 00:21:53.799
It prompts for a test selector.

00:21:53.800 --> 00:21:56.279
If we give it the selector `t`,

00:21:56.280 --> 00:21:59.259
it will run all tests we have currently defined.

00:21:59.260 --> 00:22:05.779
Let's try that with the subset of the Hyperbole tests.

00:22:05.780 --> 00:22:09.559
Here is the test folder in the Hyperbole directory.

00:22:09.560 --> 00:22:18.819
Let's go up here and load all the demo tests.

00:22:18.820 --> 00:22:21.207
And then try to run `ert`.

00:22:21.208 --> 00:22:26.119
Now we see that we have a bunch of test cases.

00:22:26.120 --> 00:22:27.919
We can all run them individually,

00:22:27.920 --> 00:22:31.719
but we can run them with `t` instead.

00:22:31.720 --> 00:22:35.459
We will run them all at once.

00:22:35.460 --> 00:22:51.419
So now, ert is executing all our test cases.

00:22:51.420 --> 00:22:57.079
So here we have a nice green display

00:22:57.080 --> 00:23:03.219
with all the test cases.

NOTE Batch mode

00:23:03.220 --> 00:23:08.159
So that was fine, but we were still running it manually

00:23:08.160 --> 00:23:11.980
by calling ert. How could we run it from the command line?

00:23:17.180 --> 00:23:21.499
Ert comes with functions for running it in batch mode.

00:23:21.500 --> 00:23:25.639
For Hyperbole, we use `make` for repetitive tasks.

00:23:25.640 --> 00:23:27.119
So we have a make target

00:23:27.120 --> 00:23:29.279
that uses the ert batch functionality,

00:23:29.280 --> 00:23:33.259
and this is the line from the Makefile.

00:23:33.260 --> 00:23:35.479
This is a bit detailed,

00:23:35.480 --> 00:23:37.539
but you see that we have a part here

00:23:37.540 --> 00:23:40.779
where we load the test dependencies.

00:23:40.780 --> 00:23:43.520
For getting the packages

00:23:43.521 --> 00:23:48.459
such as `el-mock` and `with-simulated-input` etc. loaded.

00:23:48.460 --> 00:23:53.559
We also have... I also want to point out here the call to

00:23:53.560 --> 00:23:58.159
or the setting of `auto-save-default` to `nil`

00:23:58.160 --> 00:24:02.439
to get away with the prompt for excessive backup files

00:24:02.440 --> 00:24:05.059
that can pile up after running the tests a few times.

NOTE Skipping tests

00:24:05.060 --> 00:24:06.879
Even with the help of simulated input,

00:24:06.880 --> 00:24:08.919
not all tests can be run in batch mode.

00:24:08.920 --> 00:24:10.559
They would simply not work there

00:24:10.560 --> 00:24:12.439
and have to be run in an interactive Emacs

00:24:12.440 --> 00:24:14.179
with the running event loop.

00:24:14.180 --> 00:24:17.919
One trick still to be able to use batch mode for automation

00:24:17.920 --> 00:24:20.319
is to put the guard at the top of each test case

00:24:20.320 --> 00:24:22.559
as the first thing to be executed,

00:24:22.560 --> 00:24:25.719
so that it kicks in before anything else and stops Emacs

00:24:25.720 --> 00:24:27.199
to try to run the test case.

00:24:27.200 --> 00:24:35.519
Now, it looks like this: `(skip-unless (not noninteractive))`.

00:24:35.520 --> 00:24:38.639
So when ert sees that the test should be skipped, it skips it

00:24:38.640 --> 00:24:40.439
and makes a note of that,

00:24:40.440 --> 00:24:44.579
so you will see how many tests that have been skipped.

00:24:44.580 --> 00:24:47.559
Too bad. We have a number of test cases defined,

00:24:47.560 --> 00:24:51.359
and to run them, we need to run them manually. Well sort of.

00:24:51.360 --> 00:24:53.807
Not being able to run all tests easily

00:24:53.808 --> 00:24:58.419
is a bit counterproductive

00:24:58.420 --> 00:25:00.999
since our goal is to run all tests.

00:25:01.000 --> 00:25:04.719
There is however no ert function to run tests in batch mode

00:25:04.720 --> 00:25:06.779
with an interactive Emacs.

00:25:06.780 --> 00:25:08.479
The closest I have got is either

00:25:08.480 --> 00:25:10.079
to start the Emacs from the command line

00:25:10.080 --> 00:25:12.439
calling the ert function as we just have seen,

00:25:12.440 --> 00:25:14.799
and then killing it manually when done;

00:25:14.800 --> 00:25:19.599
or add a function to extract the contents of the ERT buffer

00:25:19.600 --> 00:25:24.599
when done and echo it to standard output.

00:25:24.600 --> 00:25:27.800
This is how it looks in the Makefile

00:25:27.801 --> 00:25:31.207
to get the behavior of cutting and paste,

00:25:31.208 --> 00:25:34.580
getting the ERT output into a file

00:25:34.581 --> 00:25:36.239
so we can then kill Emacs

00:25:36.240 --> 00:25:44.799
and spit out the content of the ERT buffer.

00:25:44.800 --> 00:25:47.739
One final word here is that

00:25:47.740 --> 00:25:54.559
when you run this in a continuous integration pipeline,

00:25:54.560 --> 00:25:59.399
you might not have a TTY for getting Emacs to start,

00:25:59.400 --> 00:26:03.200
and that is then another problem

00:26:03.201 --> 00:26:05.160
with getting the interactive mode.

NOTE Conclusion

00:26:08.460 --> 00:26:11.120
We have reached the end of the talk.

00:26:11.121 --> 00:26:14.159
If you have any new ideas

00:26:14.160 --> 00:26:16.759
or have some suggestions for improvements,

00:26:16.760 --> 00:26:18.239
feel free to reach out

00:26:18.240 --> 00:26:21.100
because I am still on the learning curve of writing,

00:26:21.101 --> 00:26:25.299
how to write good test cases.

00:26:25.300 --> 00:26:27.639
If you look at the test cases we have in Hyperbole

00:26:27.640 --> 00:26:29.799
and you think they might contradict what I am saying here,

00:26:29.800 --> 00:26:32.579
it is OK. It is probably right.

00:26:32.580 --> 00:26:34.599
I have changed the style as I go

00:26:34.600 --> 00:26:36.639
and we have not yet refactored all tests

00:26:36.640 --> 00:26:38.579
to benefit from new designs.

00:26:38.580 --> 00:26:40.599
That is also the beauty of the test case.

00:26:40.600 --> 00:26:43.319
As long as it serves its purpose, it is not terrible

00:26:43.320 --> 00:26:47.799
if it is not optimal or not having the best style.

00:26:47.800 --> 00:26:55.240
And yes, thanks for listening. Bye.