WEBVTT
00:01.360 --> 00:04.080
Hi. So I'm going to talk today
00:04.180 --> 00:10.000
about a fun rewrite I did of the BinDat package.
00:10.000 --> 00:00:12.400
I call this Turbo BinDat.
00:00:12.400 --> 00:00:14.001
Actually, the package hasn't changed name,
00:14.101 --> 00:16.801
it's just that the result happens to be faster.
00:16.901 --> 00:19.521
The point was not to make it faster though,
00:19.621 --> 00:22.241
and the point was not to make you understand
00:22.341 --> 00:23.440
that data is not code.
00:23.540 --> 00:27.120
It's just one more experience I've had
00:27.120 --> 00:31.280
where I've seen that treating data as code
00:31.381 --> 00:33.522
is not always a good idea.
00:33.622 --> 00:36.162
It's important to keep the difference.
00:36.162 --> 00:38.880
So let's get started.
00:38.881 --> 00:40.642
So what is BinDat anyway?
00:40.742 --> 00:43.602
Here's just the overview of basically
00:43.602 --> 00:44.962
what I'm going to present.
00:45.062 --> 00:47.842
So I'm first going to present BinDat itself
00:47.843 --> 00:00:49.039
for those who don't know it,
00:00:49.039 --> 00:00:51.923
which is probably the majority of you.
00:51.923 --> 00:55.363
Then I'm going to talk about the actual problems
00:55.363 --> 00:58.882
that I encountered with this package
00:58.882 --> 01:01.843
that motivated me to rewrite it.
01:01.843 --> 01:05.043
Most of them were lack of flexibility,
01:05.044 --> 01:09.924
and some of it was just poor behavior
01:09.924 --> 01:13.364
with respect to scoping and variables,
01:13.364 --> 01:16.324
which of course, you know, is bad --
01:16.424 --> 01:20.724
basically uses of eval or, "eval is evil."
01:20.724 --> 01:24.884
Then I'm going to talk about the new design --
01:24.985 --> 01:28.005
how I redesigned it
01:28.105 --> 01:31.365
to make it both simpler and more flexible,
01:31.365 --> 01:32.965
and where the key idea was
01:33.065 --> 01:35.205
to expose code as code
01:35.305 --> 01:37.525
instead of having it as data,
01:37.625 --> 01:39.605
and so here the distinction between the two
01:39.706 --> 01:44.085
is important and made things simpler.
01:44.085 --> 01:46.405
I tried to keep efficiency in mind,
01:46.405 --> 01:52.405
which resulted in some of the aspects of the design
01:52.505 --> 01:54.886
which are not completely satisfactory,
01:54.886 --> 01:57.046
but the result is actually fairly efficient.
01:57.146 --> 01:59.286
Even though it was not the main motivation,
01:59.287 --> 02:02.967
it was one of the nice outcomes.
02:02.967 --> 00:02:06.006
And then I'm going to present some examples.
02:06.007 --> 02:08.167
So first: what is BinDat?
02:08.267 --> 02:10.567
Oh actually, rather than present THIS,
02:10.667 --> 02:12.407
I'm going to go straight to the code,
02:12.507 --> 02:14.246
because BinDat actually had
02:14.346 --> 02:16.647
an introduction which was fairly legible.
02:16.748 --> 02:21.128
So here we go: this is the old BinDat from Emacs 27
02:21.128 --> 02:23.448
and the commentary starts by explaining
02:23.448 --> 02:25.848
what is BinDat? Basically BinDat is a package
02:25.948 --> 02:30.247
that lets you parse and unparse
02:30.247 --> 02:31.527
basically binary data.
02:31.627 --> 02:34.648
The intent is to have typically network data
02:34.749 --> 02:35.849
or something like this.
02:35.949 --> 02:38.328
So assuming you have network data,
02:38.328 --> 02:41.528
presented or defined
02:41.628 --> 02:44.569
with some kind of C-style structs, typically,
02:44.669 --> 02:46.009
or something along these lines.
02:46.109 --> 02:49.120
So you presumably start with documentation
02:49.120 --> 02:52.809
that presents something like those structs here,
02:52.810 --> 02:57.130
and you want to be able to generate such packets
02:57.230 --> 03:00.249
and read such packets,
03:00.349 --> 03:02.090
so the way you do it is
03:02.190 --> 03:04.570
you rewrite those specifications
03:04.670 --> 03:06.010
into the BinDat syntax.
03:06.110 --> 03:07.529
So here's the BinDat syntax
03:07.529 --> 03:10.490
for the the previous specification.
03:10.491 --> 03:11.610
So here, for example,
03:11.610 --> 03:16.970
you see the case for a data packet
03:16.970 --> 03:20.411
which will have a 'type' field which is a byte
03:20.411 --> 03:24.091
(an unsigned 8-bit entity),
03:24.091 --> 03:26.411
then an 'opcode' which is also a byte,
03:26.411 --> 03:30.731
then a 'length' which is a 16-bit unsigned integer
03:30.732 --> 03:34.092
in little endian order,
03:34.092 --> 03:38.732
and then some 'id' for this entry, which is
03:38.732 --> 03:43.531
8 bytes containing a zero-terminated string,
03:43.531 --> 03:47.531
and then the actual data, basically the payload,
03:47.532 --> 03:51.453
which is in this case a vector of bytes,
03:51.453 --> 03:54.812
('bytes' here doesn't doesn't need to be specified)
03:54.812 --> 03:58.172
and here we specify the length of this vector.
03:58.172 --> 03:59.773
This 'length' here
03:59.773 --> 04:02.252
happens to be actually the name of THIS field,
04:02.252 --> 04:03.853
so the length of the data
04:03.854 --> 04:06.574
is specified by the 'length' field here,
04:06.574 --> 04:08.574
and BinDat will understand this part,
04:08.574 --> 04:12.333
which is the the nice part of BinDat.
04:12.333 --> 04:15.774
And then you have an alignment field at the end,
04:15.774 --> 04:18.253
which is basically padding.
04:18.253 --> 04:20.574
It says that it is padded
04:20.575 --> 04:23.295
until the next multiple of four.
04:23.295 --> 04:25.855
Okay. So this works reasonably well.
04:25.855 --> 04:27.455
This is actually very nice.
04:27.455 --> 04:30.335
With this, you can then call
04:30.335 --> 04:32.975
bindat-pack or bindat-unpack,
04:32.975 --> 04:37.774
passing it a string, or passing it an alist,
04:37.774 --> 04:40.415
to do the packing and unpacking.
04:40.416 --> 04:43.296
So, for example, if you take this string--
04:43.296 --> 04:45.856
actually, in this case, it's a vector of bytes
04:45.856 --> 04:49.456
but it works the same; it works in both ways--
04:49.456 --> 04:53.536
if you pass this to bindat-unpack,
04:53.536 --> 04:57.456
it will presumably return you this structure
04:57.457 --> 05:00.017
if you've given it the corresponding type.
05:00.017 --> 05:01.776
So it will extract--
05:01.776 --> 05:05.617
you will see that there is an IP address,
05:05.617 --> 05:08.017
which is a destination IP, a source IP,
05:08.017 --> 05:09.857
and some port number,
05:09.857 --> 05:12.977
and some actual data here and there, etc.
05:12.977 --> 05:18.017
So this is quite convenient if you need to do this,
05:18.018 --> 05:20.898
and that's what it was designed for.
05:20.898 --> 00:05:27.537
So here we are. Let's go back to the actual talk.
05:27.538 --> 05:34.338
I converted BinDat to lexical scoping at some point
05:34.339 --> 05:37.299
and things seemed to work fine,
05:37.299 --> 05:42.819
except, at some point, probably weeks later,
05:42.819 --> 05:47.139
I saw a bug report
05:47.139 --> 05:53.058
about the new version using lexical scoping
05:53.059 --> 05:56.339
not working correctly with WeeChat.
05:56.339 --> 06:00.580
So here's the actual chunk of code
06:00.580 --> 06:02.820
that appears in WeeChat.
06:02.820 --> 06:08.420
Here you see that they also define a BinDat spec.
06:08.421 --> 06:14.741
It's a packet that has a 32-bit unsigned length,
06:14.741 --> 06:18.500
then some compression byte/compression information,
06:18.500 --> 06:23.780
then an id which contains basically another struct
06:23.780 --> 06:26.901
(which is specified elsewhere; doesn't matter here),
06:26.902 --> 06:28.661
and after that, a vector
06:28.661 --> 06:33.382
whose size is not just specified by 'length',
06:33.382 --> 06:35.142
but is computed from 'length'.
06:35.142 --> 06:39.142
So here's how they used to compute it in WeeChat.
06:39.142 --> 06:42.822
So the length here can be specified in BinDat.
06:42.822 --> 06:43.941
Instead of having
06:43.942 --> 06:45.863
just a reference to one of the fields,
06:45.863 --> 06:48.903
or having a constant, you can actually compute it,
06:48.903 --> 06:52.502
where you have to use this '(eval',
06:52.502 --> 06:54.743
and then followed by the actual expression
06:54.743 --> 06:58.103
where you say how you compute it.
06:58.103 --> 07:01.463
And here you see that it actually computes it
07:01.464 --> 07:04.904
based on the 'length of the structure --
07:04.904 --> 07:07.783
that's supposed to be this 'length' field here --
07:07.783 --> 07:11.223
and it's referred to using the bindat-get-field
07:11.223 --> 07:14.503
to extract the field from the variable 'struct'.
07:14.503 --> 07:17.943
And then it subtracts four, it subtracts one,
07:17.943 --> 07:19.467
and adds some other things
07:19.468 --> 07:22.185
which depend on some field
07:22.185 --> 07:26.905
that's found in this 'id' field here.
07:26.905 --> 07:28.425
And the problem with this code
07:28.425 --> 07:30.425
was that it broke
07:30.425 --> 07:32.745
because of this 'struct' variable here,
07:32.745 --> 07:35.145
because this 'struct' variable is not defined
07:35.145 --> 07:38.105
anywhere in the specification of BinDat.
07:38.106 --> 07:41.866
It was used internally as a local variable,
07:41.866 --> 07:45.306
and because it was using dynamic scoping,
07:45.306 --> 07:47.386
it actually happened to be available here,
07:47.386 --> 07:50.826
but the documentation nowhere specifies it.
07:50.826 --> 07:52.506
So it was not exactly
07:52.506 --> 07:55.546
a bug of the conversion to lexical scoping,
07:55.547 --> 07:58.906
but it ended up breaking this code.
07:58.906 --> 08:01.226
And there was no way to actually
08:01.226 --> 08:05.066
fix the code within the specification of BinDat.
08:05.066 --> 08:08.287
You had to go outside the specification of BinDat
08:08.287 --> 08:10.427
to fix this problem.
08:10.427 --> 08:14.346
This is basically how I started looking at BinDat.
08:14.347 --> 08:17.808
Then I went to actually investigate a bit more
08:17.808 --> 08:19.627
what was going on,
08:19.627 --> 08:22.108
and the thing I noticed along the way
08:22.108 --> 08:25.787
was basically that the specification of BinDat
08:25.787 --> 08:29.528
is fairly complex and has a lot of eval
08:29.528 --> 08:30.748
and things like this.
08:30.749 --> 08:32.288
So let's take a look
08:32.288 --> 08:35.068
at what the BinDat specification looks like.
08:35.068 --> 08:36.589
So here it's actually documented
08:36.589 --> 08:40.269
as a kind of grammar rules.
08:40.269 --> 08:45.308
A specification is basically a sequence of items,
08:45.308 --> 08:47.389
and then each of the items is basically
08:47.389 --> 08:51.248
a FIELD of a struct, so it has a FIELD name,
08:51.249 --> 08:53.249
and then a TYPE.
08:53.249 --> 08:54.510
Instead of a TYPE,
08:54.510 --> 08:56.590
it could have some other FORM for eval,
08:56.590 --> 08:58.989
which was basically never used as far as I know,
08:58.989 --> 09:00.190
or it can be some filler,
09:00.190 --> 09:02.750
or you can have some 'align' specification,
09:02.750 --> 09:05.150
or you can refer to another struct.
09:05.150 --> 09:07.390
It could also be some kind of union,
09:07.391 --> 09:10.430
or it can be some kind of repetition of something.
09:10.430 --> 09:12.430
And then you have the TYPE specified here,
09:12.430 --> 09:18.271
which can be some integers, strings, or a vector,
09:18.271 --> 09:21.631
and there are a few other special cases.
09:21.631 --> 09:25.310
And then the actual field itself
09:25.311 --> 09:28.192
can be either a NAME, or something that's computed,
09:28.192 --> 09:30.752
and then everywhere here, you have LEN,
09:30.752 --> 00:09:32.480
which specifies the length of vectors,
00:09:32.480 --> 00:09:34.672
for example, or length of strings.
09:34.672 --> 09:37.632
This is actually either nil to mean one,
09:37.632 --> 09:39.072
or it can be an ARG,
09:39.072 --> 09:40.952
where ARG is defined to be
09:40.952 --> 09:42.672
either an integer or DEREF,
09:42.673 --> 09:46.673
where DEREF is basically a specification
09:46.673 --> 09:48.833
that can refer, for example, to the 'length' field
09:48.833 --> 09:51.956
-- that's what we saw between parentheses: (length)
09:51.956 --> 09:56.273
was this way to refer to the 'length' field.
09:56.273 --> 09:59.793
Or it can be an expression, which is what we saw
09:59.794 --> 10:02.834
in the computation of the length for WeeChat,
10:02.834 --> 10:04.914
where you just had a '(eval'
10:04.914 --> 10:06.334
and then some computation
10:06.334 --> 10:10.274
of the length of the payload.
10:10.274 --> 10:12.354
And so if you look here, you see that
10:12.354 --> 10:14.674
it is fairly large and complex,
10:14.674 --> 10:18.514
and it uses eval everywhere. And actually,
10:18.515 --> 10:20.675
it's not just that it has eval in its syntax,
10:20.675 --> 10:23.395
but the implementation has to use eval everywhere,
10:23.395 --> 10:25.314
because, if you go back
10:25.314 --> 10:27.475
to see the kind of code we see,
10:27.475 --> 10:29.538
we see here we just define
10:29.538 --> 10:34.195
weechat--relay-message-spec as a constant!
10:34.195 --> 10:37.314
It's nothing than just data, right?
10:37.315 --> 10:38.836
So within this data
10:38.836 --> 10:41.076
there are things we need to evaluate,
10:41.076 --> 10:42.356
but it's pure data,
10:42.356 --> 10:44.356
so it will have to be evaluated
10:44.356 --> 10:46.596
by passing it to eval. It can't be compiled,
10:46.596 --> 10:50.196
because it's within a quote, right?
10:50.196 --> 10:52.836
And so for that reason, kittens really
10:52.837 --> 10:55.956
suffer terribly with uses of BinDat.
10:55.956 --> 10:59.957
You really have to be very careful with that.
10:59.957 --> 11:02.037
More seriously,
11:02.037 --> 11:05.157
the 'struct' variable was not documented,
11:05.157 --> 11:07.797
and yet it's indispensable
11:07.797 --> 11:08.996
for important applications,
11:08.996 --> 11:11.157
such as using in WeeChat.
11:11.158 --> 11:13.078
So clearly this needs to be fixed.
11:13.078 --> 11:15.481
Of course, we can just document 'struct'
11:15.481 --> 11:18.038
as some variable that's used there,
11:18.038 --> 11:19.798
but of course we don't want to do that,
11:19.798 --> 11:23.398
because 'struct' is not obviously
11:23.398 --> 11:25.398
a dynamically scoped variable,
11:25.398 --> 11:29.317
so it's not very clean.
11:29.318 --> 11:31.939
Also other problems I noticed was that the grammar
11:31.939 --> 11:35.239
is significantly more complex than necessary.
11:35.239 --> 11:38.199
We have nine distinct non-terminals.
11:38.199 --> 11:39.639
There is ambiguity.
11:39.639 --> 11:44.919
If you try to use a field whose name is 'align',
11:44.919 --> 11:48.679
or 'fill', or something like this,
11:48.680 --> 11:50.920
then it's going to be misinterpreted,
11:50.920 --> 11:54.920
or it can be misinterpreted.
11:54.920 --> 11:58.760
The vector length can be either an expression,
11:58.760 --> 12:02.280
or an integer, or a reference to a label,
12:02.280 --> 12:03.720
but the expression
12:03.720 --> 12:06.360
should already be the general case,
12:06.361 --> 12:08.041
and this expression can itself be
12:08.041 --> 12:09.401
just a constant integer,
12:09.401 --> 12:13.961
so this complexity is probably not indispensable,
12:13.961 --> 12:15.641
or it could be replaced with something simpler.
12:15.641 --> 12:17.401
That's what I felt like.
12:17.401 --> 12:19.161
And basically lots of places
12:19.161 --> 12:21.721
allow an (eval EXP) form somewhere
12:21.721 --> 12:25.081
to open up the door for more flexibility,
12:25.082 --> 12:26.922
but not all of them do,
12:26.922 --> 12:29.482
and we don't really want
12:29.482 --> 12:31.001
to have this eval there, right?
12:31.001 --> 12:33.802
It's not very convenient syntactically either.
12:33.802 --> 12:36.042
So it makes the uses of eval
12:36.042 --> 12:38.362
a bit heavier than they need to be,
12:38.362 --> 12:41.722
and so I didn't really like this part.
12:41.723 --> 12:42.603
Another part is that
12:42.603 --> 12:45.183
when I tried to figure out what was going on,
12:45.183 --> 12:46.666
[dog barks and distracts Stefan]
12:46.666 --> 12:50.043
I had trouble... Winnie as well, as you can hear.
12:50.043 --> 12:50.923
She had trouble as well.
12:50.923 --> 12:53.083
But one of the troubles was that
12:53.083 --> 12:55.002
there was no way to debug the code
12:55.002 --> 12:57.562
via Edebug, because it's just data,
12:57.562 --> 13:00.523
so Edebug doesn't know that it has to look at it
13:00.524 --> 13:02.683
and instrument it.
13:02.683 --> 13:05.644
And of course it was not conveniently extensible.
13:05.644 --> 13:07.164
That's also one of the things
13:07.164 --> 13:08.487
I noticed along the way.
13:09.084 --> 13:12.844
Okay, so here's an example of
13:12.844 --> 13:15.484
problems not that I didn't just see there,
13:15.485 --> 13:18.684
but that were actually present in code.
13:18.684 --> 13:22.124
I went to look at code that was using BinDat
13:22.124 --> 13:24.285
to see what uses looked like,
13:24.285 --> 13:28.765
and I saw that BinDat was not used very heavily,
13:28.765 --> 13:30.365
but some of the main uses
13:30.365 --> 13:33.884
were just to read and write integers.
13:33.885 --> 13:37.565
And here you can see a very typical case.
13:37.565 --> 13:41.726
This is also coming from WeeChat.
13:41.726 --> 13:43.565
We do a bindat-get-field
13:43.565 --> 13:48.445
of the length of some struct we read.
13:48.445 --> 13:50.685
Actually, the struct we read is here.
13:50.685 --> 13:51.646
It has a single field,
13:51.647 --> 13:53.006
because the only thing we want to do
13:53.006 --> 13:56.287
is actually to unpack a 32-bit integer,
13:56.287 --> 13:58.287
but the only way we can do that
13:58.287 --> 14:01.647
is by specifying a struct with one field.
14:01.647 --> 14:04.847
And so we have to extract this struct of one field,
14:04.847 --> 14:07.246
which constructs an alist
14:07.246 --> 14:09.647
containing the actual integer,
14:09.648 --> 14:11.887
and then we just use get-field to extract it.
14:11.887 --> 14:15.007
So this doesn't seem very elegant
14:15.007 --> 14:16.528
to have to construct an alist
14:16.528 --> 14:20.368
just to then extract the integer from it.
14:20.368 --> 14:21.648
Same thing if you try to pack it:
14:21.648 --> 14:25.007
you first have to construct the alist
14:25.007 --> 14:31.247
to pass it to bindat-pack unnecessarily.
14:31.248 --> 14:33.248
Another problem that I saw in this case
14:33.248 --> 14:35.729
(it was in the websocket package)
14:35.729 --> 14:39.568
was here, where they actually have a function
14:39.568 --> 14:41.169
where they need to write
14:41.169 --> 14:43.888
an integer of a size that will vary
14:43.888 --> 14:45.888
depending on the circumstances.
14:45.889 --> 14:49.650
And so they have to test the value of this integer,
14:49.650 --> 14:52.210
and depending on which one it is,
14:52.210 --> 14:54.449
they're going to use different types.
14:54.449 --> 14:56.290
So here it's a case
14:56.290 --> 14:59.490
where we want to have some kind of way to eval --
14:59.490 --> 15:02.530
to compute the length of the integer --
15:02.531 --> 15:08.130
instead of it being predefined or fixed.
15:08.130 --> 15:10.211
So this is one of the cases
15:10.211 --> 15:16.531
where the lack of eval was a problem.
15:16.531 --> 15:20.051
And actually in all of websocket,
15:20.051 --> 15:22.611
BinDat is only used to pack and unpack integers,
15:22.612 --> 15:24.612
even though there are many more opportunities
15:24.612 --> 15:26.772
to use BinDat in there.
15:26.772 --> 15:29.331
But it's not very convenient to use BinDat,
15:29.331 --> 00:15:35.890
as it stands, for those other cases.
15:35.891 --> 15:39.732
So what does the new design look like?
15:39.733 --> 15:44.132
Well in the new design, here's the problematic code
15:44.132 --> 15:46.373
for WeeChat.
15:46.373 --> 15:49.012
So we basically have the same fields as before,
15:49.012 --> 15:50.853
you just see that instead of u32,
15:50.853 --> 15:53.733
we now have 'uint 32' separately.
15:53.733 --> 15:55.332
The idea is that now this 32
15:55.332 --> 15:59.093
can be an expression you can evaluate,
15:59.094 --> 16:04.054
and so the u8 is also replaced by 'uint 8',
16:04.054 --> 16:07.253
and the id type is basically the same as before,
16:07.253 --> 16:08.854
and here another difference we see,
16:08.854 --> 16:11.654
and the main difference...
16:11.654 --> 16:13.494
Actually, it's the second main difference.
16:13.494 --> 16:15.174
The first main difference is that
16:15.175 --> 16:18.694
we don't actually quote this whole thing.
16:18.694 --> 16:23.095
Instead, we pass it to the bindat-type macro.
16:23.095 --> 16:25.095
So this is a macro
16:25.095 --> 16:27.574
that's going to actually build the type.
16:27.574 --> 16:29.254
This is a big difference
16:29.254 --> 16:30.535
in terms of performance also,
16:30.535 --> 16:32.694
because by making it a macro,
16:32.695 --> 16:34.296
we can pre-compute the code
16:34.296 --> 16:37.255
that's going to pack and unpack this thing,
16:37.255 --> 16:38.936
instead of having to interpret it
16:38.936 --> 16:41.096
every time we pack and unpack.
16:41.096 --> 16:43.815
So this macro will generate more efficient code
16:43.815 --> 16:45.815
along the way.
16:45.815 --> 16:48.695
Also it makes the code that appears in here
16:48.695 --> 16:50.296
visible to the compiler
16:50.297 --> 16:54.617
because we can give an Edebug spec for it.
16:54.617 --> 16:57.497
And so here as an argument to vec,
16:57.497 --> 16:59.016
instead of having to specify
16:59.016 --> 17:00.937
that this is an evaluated expression,
17:00.937 --> 17:02.777
we just write the expression directly,
17:02.777 --> 17:05.096
because all the expressions that appear there
17:05.096 --> 17:07.417
will just be evaluated,
17:07.418 --> 17:11.418
and we don't need to use the 'struct' variable
17:11.418 --> 17:14.137
and then extract the length field from it.
17:14.137 --> 17:16.938
We can just use length as a variable.
17:16.938 --> 17:18.698
So this variable 'length' here
17:18.698 --> 17:20.778
will refer to this field here,
17:20.778 --> 17:23.578
and then this variable 'id' here
17:23.578 --> 17:25.897
will refer to this field here,
17:25.898 --> 17:27.738
and so we can just use the field values
17:27.738 --> 17:30.459
as local variables, which is very natural
17:30.459 --> 00:17:31.679
and very efficient also,
00:17:31.679 --> 00:17:34.618
because the code would actually directly do that,
17:34.618 --> 17:37.899
and the code that unpacks those data
17:37.899 --> 17:40.299
will just extract an integer
17:40.299 --> 17:42.219
and bind it to the length variable,
17:42.219 --> 17:47.579
and so that makes it immediately available there.
17:47.580 --> 17:51.340
Okay, let's see also
17:51.340 --> 17:54.220
what the actual documentation looks like.
17:54.220 --> 17:57.739
And so if we look at the doc of BinDat,
17:57.739 --> 18:01.180
we see the actual specification of the grammar.
18:01.181 --> 18:03.181
And so here we see instead of having
18:03.181 --> 18:06.461
these nine different non-terminals,
18:06.461 --> 18:08.061
we basically have two:
18:08.061 --> 18:10.781
we have the non-terminal for TYPE,
18:10.781 --> 18:15.021
which can be either a uint, a uintr, or a string,
18:15.021 --> 18:17.421
or bits, or fill, or align, or vec,
18:17.421 --> 18:19.901
or those various other forms;
18:19.902 --> 18:22.621
or it can be a struct, in which case,
18:22.621 --> 18:23.981
in the case of struct,
18:23.981 --> 18:27.502
then it will be followed by a sequence --
18:27.502 --> 18:30.142
a list of FIELDs, where each of the FIELDs
18:30.142 --> 18:33.902
is basically a LABEL followed by another TYPE.
18:33.902 --> 18:37.342
And so this makes the whole specification
18:37.343 --> 18:39.823
much simpler. We don't have any distinction now
18:39.823 --> 18:42.862
between struct being a special case,
18:42.862 --> 18:46.383
as opposed to just the normal types.
18:46.383 --> 18:49.263
struct is just now one of the possible types
18:49.263 --> 18:52.543
that can appear here.
18:52.543 --> 18:53.263
The other thing is that
18:53.263 --> 18:55.742
the LABEL is always present in the structure,
18:55.743 --> 18:58.384
so there's no ambiguity.
18:58.384 --> 19:00.304
Also all the above things,
19:00.304 --> 19:03.103
like the BITLEN we have here,
19:03.103 --> 19:04.384
the LEN we have here,
19:04.384 --> 19:07.504
the COUNT for vector we have here,
19:07.504 --> 19:10.224
these are all plain Elisp expressions,
19:10.224 --> 19:13.024
so they are implicitly evaluated if necessary.
19:13.025 --> 19:14.705
If you want them to be constant,
19:14.705 --> 19:16.705
and really constant, you can just use quotes,
19:16.705 --> 19:20.145
for those rare cases where it's necessary.
19:20.145 --> 19:21.905
Another thing is that you can extend it
19:21.905 --> 19:25.505
with with bindat-defmacro.
19:25.505 --> 19:30.225
Okay, let's go back here.
19:30.226 --> 19:32.706
So what are the advantages of this approach?
19:32.706 --> 19:34.625
As I said, one of the main advantages
19:34.625 --> 19:39.346
is that we now have support for Edebug.
19:39.346 --> 19:41.426
We don't have 'struct', 'repeat', and 'align'
19:41.426 --> 19:42.946
as special cases anymore.
19:42.946 --> 19:44.625
These are just normal types.
19:44.625 --> 19:48.066
Before, there was uint as type, int as type,
19:48.067 --> 19:49.267
and those kinds of things.
19:49.267 --> 19:51.110
'struct' and 'repeat' and 'align'
19:51.110 --> 19:53.267
were in a different case.
19:53.267 --> 19:54.387
So there were
19:54.387 --> 19:56.787
some subtle differences between those
19:56.787 --> 19:59.027
that completely disappeared.
19:59.027 --> 20:02.626
Also in the special cases, there was 'union',
20:02.626 --> 20:05.027
and union now has completely disappeared.
20:05.027 --> 20:07.827
We don't need it anymore, because instead,
20:07.828 --> 20:09.588
we can actually use code anywhere.
20:09.588 --> 20:11.908
That's one of the things I didn't mention here,
20:11.908 --> 20:17.268
but in this note here,
20:17.268 --> 20:19.747
that's one of the important notes.
20:19.747 --> 20:21.987
Not only are BITLEN, LEN, COUNT etc.
20:21.987 --> 20:23.028
Elisp expressions,
20:23.028 --> 20:26.788
but the type itself -- any type itself --
20:26.789 --> 20:29.029
is basically an expression.
20:29.029 --> 20:32.709
And so you can, instead of having 'uint BITLEN',
20:32.709 --> 20:36.628
you can have '(if blah-blah-blah uint string)',
20:36.628 --> 20:38.149
and so you can have a field
20:38.149 --> 20:40.549
that can be either string or an int,
20:40.549 --> 20:44.789
depending on some condition.
20:44.790 --> 20:46.869
And for that reason we don't need a union.
20:46.869 --> 20:47.910
Instead of having a union,
20:47.910 --> 20:50.710
we can just have a 'cond' or a 'pcase'
20:50.710 --> 20:53.590
that will return the type we want to use,
20:53.590 --> 20:55.109
depending on the context,
20:55.109 --> 21:00.950
which will generally depend on some previous field.
21:00.951 --> 21:03.750
Also we don't need to use single-field structs
21:03.750 --> 21:05.351
for simple types anymore,
21:05.351 --> 21:09.271
because there's no distinction between struct
21:09.271 --> 21:11.271
and other types.
21:11.271 --> 21:17.191
So we can pass to bindat-pack and bindat-unpack
21:17.191 --> 21:20.951
a specification which just says "here's an integer"
21:20.952 --> 21:24.392
and we'll just pack and unpack the integer.
21:24.392 --> 21:26.472
And of course now all the code is exposed,
21:26.472 --> 21:29.192
so not only Edebug works, but also Flymake,
21:29.192 --> 21:30.392
and the compiler, etc. --
21:30.392 --> 21:33.111
they can complain about it,
21:33.111 --> 21:38.871
and give you warnings and errors as we like them.
21:38.872 --> 21:44.553
And of course the kittens are much happier.
21:44.553 --> 21:48.153
Okay. This is going a bit over time,
21:48.153 --> 00:21:51.272
so let's try to go faster.
21:51.273 --> 21:53.752
Here are some of the new features
21:53.753 --> 21:54.794
that are introduced.
21:54.794 --> 21:56.314
I already mentioned briefly
21:56.314 --> 22:00.633
that you can define new types with bindat-defmacro.
22:00.633 --> 22:04.474
that's one of the important novelties,
22:04.474 --> 22:08.794
and you can extend BinDat with new types this way.
22:08.794 --> 22:10.714
The other thing you can do is
22:10.714 --> 22:16.233
you can control how values or packets
22:16.234 --> 22:20.315
are unpacked, and how they are represented.
22:20.315 --> 22:22.555
In the old BinDat,
22:22.555 --> 22:24.315
the packet is necessarily represented,
22:24.315 --> 22:28.634
when you unpack it, as an alist, basically,
22:28.635 --> 22:30.396
or a struct becomes an alist,
22:30.396 --> 22:31.676
and that's all there is.
22:31.676 --> 22:34.076
You don't have any choice about it.
22:34.076 --> 22:35.596
With the new system,
22:35.596 --> 22:38.076
by default, it also returns just an alist,
22:38.076 --> 22:41.916
but you can actually control what it's unpacked as,
22:41.916 --> 22:46.396
or what it's packed from, using these keywords.
22:46.396 --> 22:49.596
With :unpack-val, you can give an expression
22:49.597 --> 22:53.357
that will construct the unpacked value
22:53.357 --> 22:56.957
from the various fields.
22:56.957 --> 22:59.197
And with :pack-val and :pack-var,
22:59.197 --> 23:02.557
you can specify how to extract the information
23:02.557 --> 23:05.116
from the unpacked value
23:05.117 --> 00:23:08.077
to generate the pack value.
23:08.078 --> 23:12.637
So here are some examples.
23:12.637 --> 23:15.358
Here's an example taken from osc.
23:15.358 --> 23:17.438
osc actually doesn't use BinDat currently,
23:17.438 --> 23:22.478
but I have played with it
23:22.479 --> 23:23.758
to see what it would look like
23:23.758 --> 23:26.159
if we were to use BinDat.
23:26.159 --> 23:28.638
So here's the definition
23:28.638 --> 23:30.638
of the timetag representation,
23:30.638 --> 23:35.279
which represents timestamps in osc.
23:35.279 --> 23:37.998
So you would use bindat-type
23:37.998 --> 23:40.559
and then you have here :pack-var
23:40.559 --> 23:42.080
basically gives a name
23:42.080 --> 23:48.559
when we try to pack a timestamp.
23:48.559 --> 23:51.520
'time' will be the variable whose name contains
23:51.520 --> 23:54.159
the actual timestamp we will receive.
23:54.159 --> 23:57.520
So we want to represent the unpacked value
23:57.520 --> 24:00.240
as a normal Emacs timestamp,
24:00.240 --> 24:02.480
and then basically convert from this timestamp
24:02.480 --> 24:06.401
to a string, or from a string to this timestamp.
24:06.401 --> 24:10.080
When we receive it, it will be called time,
24:10.080 --> 24:12.240
so we can refer to it,
24:12.240 --> 24:15.360
and so in order to actually encode it,
24:15.360 --> 24:18.320
we basically turn this timestamp into an integer --
24:18.320 --> 24:20.799
that's what this :pack-val does.
24:20.799 --> 24:23.442
It says when we try to pack it,
24:23.442 --> 24:26.082
here's the the value that we should use.
24:26.082 --> 24:27.760
We turn it into an integer,
24:27.760 --> 24:30.320
and then this integer is going to be encoded
24:30.320 --> 24:36.162
as a uint 64-bit. So a 64-bit unsigned integer.
24:36.163 --> 24:38.960
When we try to unpack the value,
24:38.960 --> 24:40.720
this 'ticks' field
24:40.720 --> 24:45.679
will contain an unsigned int of 64 bits.
24:45.679 --> 24:50.559
We want to return instead a timestamp --
24:50.559 --> 24:53.924
a time value -- from Emacs.
24:53.924 --> 24:59.363
Here we use the representation of time
24:59.363 --> 25:02.799
as a pair of number of ticks
25:02.799 --> 25:06.720
and the corresponding frequency of those ticks.
25:06.720 --> 25:09.120
So that's what we do here with :unpack-val,
25:09.120 --> 25:12.004
which is construct the cons corresponding to it.
25:12.004 --> 25:16.400
With this definition, bindat-pack/unpack
25:16.400 --> 00:25:19.039
are going to convert to and from
00:25:19.039 --> 00:25:21.760
proper time values on one side,
25:21.760 --> 25:26.159
and binary strings on the other.
25:26.159 --> 25:27.520
Note, of course,
25:27.520 --> 25:30.320
that I complained that the old BinDat
25:30.320 --> 25:36.080
had to use single-field structs for simple types,
25:36.080 --> 25:37.039
and here, basically,
25:37.039 --> 25:39.840
I'm back using single-field structs as well
25:39.840 --> 25:41.120
for this particular case --
25:41.120 --> 25:44.640
actually a reasonably frequent case, to be honest.
25:44.640 --> 25:49.279
But at least this is not so problematic,
25:49.279 --> 25:51.840
because we actually control what is returned,
25:51.840 --> 25:54.159
so even though it's a single-field struct,
25:54.159 --> 25:56.640
it's not going to construct an alist
25:56.640 --> 25:58.320
or force you to construct an alist.
25:58.320 --> 26:02.720
Instead, it really receives and takes a value
26:02.720 --> 26:07.367
in the ideal representation that we chose.
26:07.367 --> 26:10.007
Here we have a more complex example,
26:10.007 --> 26:12.488
where the actual type is recursive,
26:12.488 --> 26:18.640
because it's representing those "LEB"...
26:18.640 --> 26:20.400
I can't remember what "LEB" stands for,
26:20.400 --> 26:22.559
but it's a representation
26:22.559 --> 26:25.600
for arbitrary length integers,
26:25.600 --> 26:27.520
where basically
26:27.520 --> 26:33.360
every byte is either smaller than 128,
26:33.360 --> 26:36.799
in which case it's the end of the of the value,
26:36.799 --> 26:39.760
or it's a value bigger than 128,
26:39.760 --> 26:42.159
in which case there's an extra byte on the end
26:42.159 --> 26:44.490
that's going to continue.
26:44.490 --> 26:46.640
Here we see the representation
26:46.640 --> 26:52.240
is basically a structure that starts with a byte,
26:52.240 --> 26:53.679
which contains this value,
26:53.679 --> 26:56.000
which can be either the last value or not,
26:56.000 --> 26:59.770
and the tail, which will either be empty,
26:59.770 --> 27:01.279
or contain something else.
27:01.279 --> 27:04.000
The empty [case] is here;
27:04.000 --> 27:07.039
if the head value is smaller than 128,
27:07.039 --> 27:11.840
then the type of this tail is going to be (unit 0),
27:11.840 --> 27:16.492
so basically 'unit' is the empty type,
27:16.492 --> 27:20.880
and 0 is the value we will receive when we read it.
27:20.880 --> 27:25.520
And if not, then it has as type 'loop',
27:25.520 --> 27:28.240
which is the type we're defining,
27:28.240 --> 27:30.491
so it's the recursive case,
27:30.491 --> 27:35.132
where then the rest of the type is the type itself.
27:35.132 --> 27:37.120
And so this lets us pack and unpack.
27:37.120 --> 27:39.600
We pass it an arbitrary size integer,
27:39.600 --> 27:42.240
and it's going to turn it into
27:42.240 --> 27:48.492
this LEB128 binary representation, and vice versa.
27:48.492 --> 27:52.480
I have other examples if you're interested,
27:52.480 --> 00:27:56.093
but anyway, here's the conclusion.
27:56.094 --> 27:58.320
We have a simpler, more flexible,
27:58.320 --> 28:01.039
and more powerful BinDat now,
28:01.039 --> 28:03.454
which is also significantly faster.
28:03.454 --> 28:06.799
And I can't remember the exact speed-up,
28:06.799 --> 28:08.720
but it's definitely not a few percents.
28:08.720 --> 28:12.640
I vaguely remember about 4x faster in my tests,
28:12.640 --> 28:16.815
but it's probably very different in different cases
28:16.815 --> 28:20.159
so it might be just 4x, 2x -- who knows?
28:20.159 --> 28:23.374
Try it for yourself, but I was pretty pleased,
28:23.374 --> 00:28:28.335
because it wasn't the main motivation, so anyway...
28:28.336 --> 28:31.135
The negatives are here.
28:31.135 --> 28:34.480
In the new system, there's this bindat-defmacro
28:34.480 --> 28:36.720
which lets us define, kind of, new types,
28:36.720 --> 28:40.895
and bindat-type also lets us define new types,
28:40.895 --> 28:45.360
and the distinction between them is a bit subtle;
28:45.360 --> 28:48.080
it kind of depends on...
28:48.080 --> 28:50.880
well it has an impact on efficiency
28:50.880 --> 28:53.520
more than anything, so it's not very satisfactory.
28:53.520 --> 28:56.737
There's a bit of redundancy between the two.
28:56.737 --> 28:59.039
There is no bit-level control, just as before.
28:59.039 --> 29:02.097
We can only manipulate basically bytes.
29:02.098 --> 29:03.360
So this is definitely not usable
29:03.360 --> 29:09.058
for a Huffman encoding kind of thing.
29:09.058 --> 29:10.880
Also, it's not nearly as flexible
29:10.880 --> 29:12.240
as some of the alternatives.
29:12.240 --> 29:13.760
So you know GNU Poke
29:13.760 --> 29:20.017
has been a vague inspiration for this work,
29:20.018 --> 29:22.480
and GNU Poke gives you a lot more power
29:22.480 --> 29:25.059
in how to specify the types, etc.
29:25.059 --> 29:26.579
And of course one of the main downsides
29:26.579 --> 29:28.018
is that it's still not used very much.
29:28.018 --> 29:29.283
Actually, the new BinDat
29:29.283 --> 29:31.039
is not used by any package
29:31.039 --> 29:33.059
as far as I know right now,
29:33.059 --> 29:35.279
but even the old one is not used very often,
29:35.279 --> 29:36.799
so who knows
29:36.799 --> 29:38.799
whether it's actually going to
29:38.799 --> 29:41.520
work very much better or not?
29:41.520 --> 29:44.399
Anyway, this is it for this talk.
29:44.399 --> 29:46.683
Thank you very much. Have a nice day.
29:46.683 --> 29:47.883
[captions by John Cummings]