path: root/2021/captions/emacsconf-2021-bindat--turbo-bindat--stefan-monnier--main.vtt



WEBVTT

00:01.360 --> 00:04.080
Hi. So I'm going to talk today

00:04.180 --> 00:10.000
about a fun rewrite I did of the BinDat package.

00:10.000 --> 00:00:12.400
I call this Turbo BinDat. 

00:00:12.400 --> 00:00:14.001
Actually, the package hasn't changed name,

00:14.101 --> 00:16.801
it's just that the result happens to be faster.

00:16.901 --> 00:19.521
The point was not to make it faster though,

00:19.621 --> 00:22.241
and the point was not to make you understand

00:22.341 --> 00:23.440
that data is not code.

00:23.540 --> 00:27.120
It's just one more experience I've had

00:27.120 --> 00:31.280
where I've seen that treating data as code

00:31.381 --> 00:33.522
is not always a good idea.

00:33.622 --> 00:36.162
It's important to keep the difference.

00:36.162 --> 00:38.880
So let's get started.

00:38.881 --> 00:40.642
So what is BinDat anyway?

00:40.742 --> 00:43.602
Here's just the overview of basically

00:43.602 --> 00:44.962
what I'm going to present.

00:45.062 --> 00:47.842
So I'm first going to present BinDat itself

00:47.843 --> 00:00:49.039
for those who don't know it, 

00:00:49.039 --> 00:00:51.923
which is probably the majority of you.

00:51.923 --> 00:55.363
Then I'm going to talk about the actual problems

00:55.363 --> 00:58.882
that I encountered with this package

00:58.882 --> 01:01.843
that motivated me to rewrite it.

01:01.843 --> 01:05.043
Most of them were lack of flexibility,

01:05.044 --> 01:09.924
and some of it was just poor behavior

01:09.924 --> 01:13.364
with respect to scoping and variables,

01:13.364 --> 01:16.324
which of course, you know, is bad --

01:16.424 --> 01:20.724
basically uses of eval or, "eval is evil."

01:20.724 --> 01:24.884
Then I'm going to talk about the new design --

01:24.985 --> 01:28.005
how I redesigned it

01:28.105 --> 01:31.365
to make it both simpler and more flexible,

01:31.365 --> 01:32.965
and where the key idea was

01:33.065 --> 01:35.205
to expose code as code

01:35.305 --> 01:37.525
instead of having it as data,

01:37.625 --> 01:39.605
and so here the distinction between the two

01:39.706 --> 01:44.085
is important and made things simpler.

01:44.085 --> 01:46.405
I tried to keep efficiency in mind,

01:46.405 --> 01:52.405
which resulted in some of the aspects of the design

01:52.505 --> 01:54.886
which are not completely satisfactory,

01:54.886 --> 01:57.046
but the result is actually fairly efficient.

01:57.146 --> 01:59.286
Even though it was not the main motivation,

01:59.287 --> 02:02.967
it was one of the nice outcomes.

02:02.967 --> 00:02:06.006
And then I'm going to present some examples.

02:06.007 --> 02:08.167
So first: what is BinDat?

02:08.267 --> 02:10.567
Oh actually, rather than present THIS,

02:10.667 --> 02:12.407
I'm going to go straight to the code,

02:12.507 --> 02:14.246
because BinDat actually had

02:14.346 --> 02:16.647
an introduction which was fairly legible.

02:16.748 --> 02:21.128
So here we go: this is the old BinDat from Emacs 27

02:21.128 --> 02:23.448
and the commentary starts by explaining

02:23.448 --> 02:25.848
what is BinDat? Basically BinDat is a package

02:25.948 --> 02:30.247
that lets you parse and unparse

02:30.247 --> 02:31.527
basically binary data.

02:31.627 --> 02:34.648
The intent is to have typically network data

02:34.749 --> 02:35.849
or something like this.

02:35.949 --> 02:38.328
So assuming you have network data,

02:38.328 --> 02:41.528
presented or defined

02:41.628 --> 02:44.569
with some kind of C-style structs, typically,

02:44.669 --> 02:46.009
or something along these lines.

02:46.109 --> 02:49.120
So you presumably start with documentation

02:49.120 --> 02:52.809
that presents something like those structs here,

02:52.810 --> 02:57.130
and you want to be able to generate such packets

02:57.230 --> 03:00.249
and read such packets,

03:00.349 --> 03:02.090
so the way you do it is

03:02.190 --> 03:04.570
you rewrite those specifications

03:04.670 --> 03:06.010
into the BinDat syntax.

03:06.110 --> 03:07.529
So here's the BinDat syntax

03:07.529 --> 03:10.490
for the the previous specification.

03:10.491 --> 03:11.610
So here, for example,

03:11.610 --> 03:16.970
you see the case for a data packet

03:16.970 --> 03:20.411
which will have a 'type' field which is a byte

03:20.411 --> 03:24.091
(an unsigned 8-bit entity),

03:24.091 --> 03:26.411
then an 'opcode' which is also a byte,

03:26.411 --> 03:30.731
then a 'length' which is a 16-bit unsigned integer

03:30.732 --> 03:34.092
in little endian order,

03:34.092 --> 03:38.732
and then some 'id' for this entry, which is

03:38.732 --> 03:43.531
8 bytes containing a zero-terminated string,

03:43.531 --> 03:47.531
and then the actual data, basically the payload,

03:47.532 --> 03:51.453
which is in this case a vector of bytes,

03:51.453 --> 03:54.812
('bytes' here doesn't doesn't need to be specified)

03:54.812 --> 03:58.172
and here we specify the length of this vector.

03:58.172 --> 03:59.773
This 'length' here

03:59.773 --> 04:02.252
happens to be actually the name of THIS field,

04:02.252 --> 04:03.853
so the length of the data

04:03.854 --> 04:06.574
is specified by the 'length' field here,

04:06.574 --> 04:08.574
and BinDat will understand this part,

04:08.574 --> 04:12.333
which is the the nice part of BinDat.

04:12.333 --> 04:15.774
And then you have an alignment field at the end,

04:15.774 --> 04:18.253
which is basically padding.

04:18.253 --> 04:20.574
It says that it is padded

04:20.575 --> 04:23.295
until the next multiple of four.

04:23.295 --> 04:25.855
Okay. So this works reasonably well.

04:25.855 --> 04:27.455
This is actually very nice.

04:27.455 --> 04:30.335
With this, you can then call

04:30.335 --> 04:32.975
bindat-pack or bindat-unpack,

04:32.975 --> 04:37.774
passing it a string, or passing it an alist,

04:37.774 --> 04:40.415
to do the packing and unpacking.

04:40.416 --> 04:43.296
So, for example, if you take this string--

04:43.296 --> 04:45.856
actually, in this case, it's a vector of bytes

04:45.856 --> 04:49.456
but it works the same; it works in both ways--

04:49.456 --> 04:53.536
if you pass this to bindat-unpack,

04:53.536 --> 04:57.456
it will presumably return you this structure

04:57.457 --> 05:00.017
if you've given it the corresponding type.

05:00.017 --> 05:01.776
So it will extract--

05:01.776 --> 05:05.617
you will see that there is an IP address,

05:05.617 --> 05:08.017
which is a destination IP, a source IP,

05:08.017 --> 05:09.857
and some port number,

05:09.857 --> 05:12.977
and some actual data here and there, etc.

05:12.977 --> 05:18.017
So this is quite convenient if you need to do this,

05:18.018 --> 05:20.898
and that's what it was designed for.

05:20.898 --> 00:05:27.537
So here we are. Let's go back to the actual talk.

05:27.538 --> 05:34.338
I converted BinDat to lexical scoping at some point

05:34.339 --> 05:37.299
and things seemed to work fine,

05:37.299 --> 05:42.819
except, at some point, probably weeks later,

05:42.819 --> 05:47.139
I saw a bug report

05:47.139 --> 05:53.058
about the new version using lexical scoping

05:53.059 --> 05:56.339
not working correctly with WeeChat.

05:56.339 --> 06:00.580
So here's the actual chunk of code

06:00.580 --> 06:02.820
that appears in WeeChat.

06:02.820 --> 06:08.420
Here you see that they also define a BinDat spec.

06:08.421 --> 06:14.741
It's a packet that has a 32-bit unsigned length,

06:14.741 --> 06:18.500
then some compression byte/compression information,

06:18.500 --> 06:23.780
then an id which contains basically another struct

06:23.780 --> 06:26.901
(which is specified elsewhere; doesn't matter here),

06:26.902 --> 06:28.661
and after that, a vector

06:28.661 --> 06:33.382
whose size is not just specified by 'length',

06:33.382 --> 06:35.142
but is computed from 'length'.

06:35.142 --> 06:39.142
So here's how they used to compute it in WeeChat.

06:39.142 --> 06:42.822
So the length here can be specified in BinDat.

06:42.822 --> 06:43.941
Instead of having

06:43.942 --> 06:45.863
just a reference to one of the fields,

06:45.863 --> 06:48.903
or having a constant, you can actually compute it,

06:48.903 --> 06:52.502
where you have to use this '(eval',

06:52.502 --> 06:54.743
and then followed by the actual expression

06:54.743 --> 06:58.103
where you say how you compute it.

06:58.103 --> 07:01.463
And here you see that it actually computes it

07:01.464 --> 07:04.904
based on the 'length of the structure --

07:04.904 --> 07:07.783
that's supposed to be this 'length' field here --

07:07.783 --> 07:11.223
and it's referred to using the bindat-get-field

07:11.223 --> 07:14.503
to extract the field from the variable 'struct'.

07:14.503 --> 07:17.943
And then it subtracts four, it subtracts one,

07:17.943 --> 07:19.467
and adds some other things

07:19.468 --> 07:22.185
which depend on some field

07:22.185 --> 07:26.905
that's found in this 'id' field here.

07:26.905 --> 07:28.425
And the problem with this code

07:28.425 --> 07:30.425
was that it broke

07:30.425 --> 07:32.745
because of this 'struct' variable here,

07:32.745 --> 07:35.145
because this 'struct' variable is not defined

07:35.145 --> 07:38.105
anywhere in the specification of BinDat.

07:38.106 --> 07:41.866
It was used internally as a local variable,

07:41.866 --> 07:45.306
and because it was using dynamic scoping,

07:45.306 --> 07:47.386
it actually happened to be available here,

07:47.386 --> 07:50.826
but the documentation nowhere specifies it.

07:50.826 --> 07:52.506
So it was not exactly

07:52.506 --> 07:55.546
a bug of the conversion to lexical scoping,

07:55.547 --> 07:58.906
but it ended up breaking this code.

07:58.906 --> 08:01.226
And there was no way to actually

08:01.226 --> 08:05.066
fix the code within the specification of BinDat.

08:05.066 --> 08:08.287
You had to go outside the specification of BinDat

08:08.287 --> 08:10.427
to fix this problem.

08:10.427 --> 08:14.346
This is basically how I started looking at BinDat.

08:14.347 --> 08:17.808
Then I went to actually investigate a bit more

08:17.808 --> 08:19.627
what was going on,

08:19.627 --> 08:22.108
and the thing I noticed along the way

08:22.108 --> 08:25.787
was basically that the specification of BinDat

08:25.787 --> 08:29.528
is fairly complex and has a lot of eval

08:29.528 --> 08:30.748
and things like this.

08:30.749 --> 08:32.288
So let's take a look

08:32.288 --> 08:35.068
at what the BinDat specification looks like.

08:35.068 --> 08:36.589
So here it's actually documented

08:36.589 --> 08:40.269
as a kind of grammar rules.

08:40.269 --> 08:45.308
A specification is basically a sequence of items,

08:45.308 --> 08:47.389
and then each of the items is basically

08:47.389 --> 08:51.248
a FIELD of a struct, so it has a FIELD name,

08:51.249 --> 08:53.249
and then a TYPE.

08:53.249 --> 08:54.510
Instead of a TYPE,

08:54.510 --> 08:56.590
it could have some other FORM for eval,

08:56.590 --> 08:58.989
which was basically never used as far as I know,

08:58.989 --> 09:00.190
or it can be some filler,

09:00.190 --> 09:02.750
or you can have some 'align' specification,

09:02.750 --> 09:05.150
or you can refer to another struct.

09:05.150 --> 09:07.390
It could also be some kind of union,

09:07.391 --> 09:10.430
or it can be some kind of repetition of something.

09:10.430 --> 09:12.430
And then you have the TYPE specified here,

09:12.430 --> 09:18.271
which can be some integers, strings, or a vector,

09:18.271 --> 09:21.631
and there are a few other special cases.

09:21.631 --> 09:25.310
And then the actual field itself

09:25.311 --> 09:28.192
can be either a NAME, or something that's computed,

09:28.192 --> 09:30.752
and then everywhere here, you have LEN,

09:30.752 --> 00:09:32.480
which specifies the length of vectors, 

00:09:32.480 --> 00:09:34.672
for example, or length of strings.

09:34.672 --> 09:37.632
This is actually either nil to mean one,

09:37.632 --> 09:39.072
or it can be an ARG,

09:39.072 --> 09:40.952
where ARG is defined to be

09:40.952 --> 09:42.672
either an integer or DEREF,

09:42.673 --> 09:46.673
where DEREF is basically a specification

09:46.673 --> 09:48.833
that can refer, for example, to the 'length' field

09:48.833 --> 09:51.956
-- that's what we saw between parentheses: (length)

09:51.956 --> 09:56.273
was this way to refer to the 'length' field.

09:56.273 --> 09:59.793
Or it can be an expression, which is what we saw

09:59.794 --> 10:02.834
in the computation of the length for WeeChat,

10:02.834 --> 10:04.914
where you just had a '(eval'

10:04.914 --> 10:06.334
and then some computation

10:06.334 --> 10:10.274
of the length of the payload.

10:10.274 --> 10:12.354
And so if you look here, you see that

10:12.354 --> 10:14.674
it is fairly large and complex,

10:14.674 --> 10:18.514
and it uses eval everywhere. And actually,

10:18.515 --> 10:20.675
it's not just that it has eval in its syntax,

10:20.675 --> 10:23.395
but the implementation has to use eval everywhere,

10:23.395 --> 10:25.314
because, if you go back

10:25.314 --> 10:27.475
to see the kind of code we see,

10:27.475 --> 10:29.538
we see here we just define

10:29.538 --> 10:34.195
weechat--relay-message-spec as a constant!

10:34.195 --> 10:37.314
It's nothing than just data, right?

10:37.315 --> 10:38.836
So within this data

10:38.836 --> 10:41.076
there are things we need to evaluate,

10:41.076 --> 10:42.356
but it's pure data,

10:42.356 --> 10:44.356
so it will have to be evaluated

10:44.356 --> 10:46.596
by passing it to eval. It can't be compiled,

10:46.596 --> 10:50.196
because it's within a quote, right?

10:50.196 --> 10:52.836
And so for that reason, kittens really

10:52.837 --> 10:55.956
suffer terribly with uses of BinDat.

10:55.956 --> 10:59.957
You really have to be very careful with that.

10:59.957 --> 11:02.037
More seriously,

11:02.037 --> 11:05.157
the 'struct' variable was not documented,

11:05.157 --> 11:07.797
and yet it's indispensable

11:07.797 --> 11:08.996
for important applications,

11:08.996 --> 11:11.157
such as using in WeeChat.

11:11.158 --> 11:13.078
So clearly this needs to be fixed.

11:13.078 --> 11:15.481
Of course, we can just document 'struct'

11:15.481 --> 11:18.038
as some variable that's used there,

11:18.038 --> 11:19.798
but of course we don't want to do that,

11:19.798 --> 11:23.398
because 'struct' is not obviously

11:23.398 --> 11:25.398
a dynamically scoped variable,

11:25.398 --> 11:29.317
so it's not very clean.

11:29.318 --> 11:31.939
Also other problems I noticed was that the grammar

11:31.939 --> 11:35.239
is significantly more complex than necessary.

11:35.239 --> 11:38.199
We have nine distinct non-terminals.

11:38.199 --> 11:39.639
There is ambiguity.

11:39.639 --> 11:44.919
If you try to use a field whose name is 'align',

11:44.919 --> 11:48.679
or 'fill', or something like this,

11:48.680 --> 11:50.920
then it's going to be misinterpreted,

11:50.920 --> 11:54.920
or it can be misinterpreted.

11:54.920 --> 11:58.760
The vector length can be either an expression,

11:58.760 --> 12:02.280
or an integer, or a reference to a label,

12:02.280 --> 12:03.720
but the expression

12:03.720 --> 12:06.360
should already be the general case,

12:06.361 --> 12:08.041
and this expression can itself be

12:08.041 --> 12:09.401
just a constant integer,

12:09.401 --> 12:13.961
so this complexity is probably not indispensable,

12:13.961 --> 12:15.641
or it could be replaced with something simpler.

12:15.641 --> 12:17.401
That's what I felt like.

12:17.401 --> 12:19.161
And basically lots of places

12:19.161 --> 12:21.721
allow an (eval EXP) form somewhere

12:21.721 --> 12:25.081
to open up the door for more flexibility,

12:25.082 --> 12:26.922
but not all of them do,

12:26.922 --> 12:29.482
and we don't really want

12:29.482 --> 12:31.001
to have this eval there, right?

12:31.001 --> 12:33.802
It's not very convenient syntactically either.

12:33.802 --> 12:36.042
So it makes the uses of eval

12:36.042 --> 12:38.362
a bit heavier than they need to be,

12:38.362 --> 12:41.722
and so I didn't really like this part.

12:41.723 --> 12:42.603
Another part is that

12:42.603 --> 12:45.183
when I tried to figure out what was going on,

12:45.183 --> 12:46.666
[dog barks and distracts Stefan]

12:46.666 --> 12:50.043
I had trouble... Winnie as well, as you can hear.

12:50.043 --> 12:50.923
She had trouble as well.

12:50.923 --> 12:53.083
But one of the troubles was that

12:53.083 --> 12:55.002
there was no way to debug the code

12:55.002 --> 12:57.562
via Edebug, because it's just data,

12:57.562 --> 13:00.523
so Edebug doesn't know that it has to look at it

13:00.524 --> 13:02.683
and instrument it.

13:02.683 --> 13:05.644
And of course it was not conveniently extensible.

13:05.644 --> 13:07.164
That's also one of the things

13:07.164 --> 13:08.487
I noticed along the way.

13:09.084 --> 13:12.844
Okay, so here's an example of

13:12.844 --> 13:15.484
problems not that I didn't just see there,

13:15.485 --> 13:18.684
but that were actually present in code.

13:18.684 --> 13:22.124
I went to look at code that was using BinDat

13:22.124 --> 13:24.285
to see what uses looked like,

13:24.285 --> 13:28.765
and I saw that BinDat was not used very heavily,

13:28.765 --> 13:30.365
but some of the main uses

13:30.365 --> 13:33.884
were just to read and write integers.

13:33.885 --> 13:37.565
And here you can see a very typical case.

13:37.565 --> 13:41.726
This is also coming from WeeChat.

13:41.726 --> 13:43.565
We do a bindat-get-field

13:43.565 --> 13:48.445
of the length of some struct we read.

13:48.445 --> 13:50.685
Actually, the struct we read is here.

13:50.685 --> 13:51.646
It has a single field,

13:51.647 --> 13:53.006
because the only thing we want to do

13:53.006 --> 13:56.287
is actually to unpack a 32-bit integer,

13:56.287 --> 13:58.287
but the only way we can do that

13:58.287 --> 14:01.647
is by specifying a struct with one field.

14:01.647 --> 14:04.847
And so we have to extract this struct of one field,

14:04.847 --> 14:07.246
which constructs an alist

14:07.246 --> 14:09.647
containing the actual integer,

14:09.648 --> 14:11.887
and then we just use get-field to extract it.

14:11.887 --> 14:15.007
So this doesn't seem very elegant

14:15.007 --> 14:16.528
to have to construct an alist

14:16.528 --> 14:20.368
just to then extract the integer from it.

14:20.368 --> 14:21.648
Same thing if you try to pack it:

14:21.648 --> 14:25.007
you first have to construct the alist

14:25.007 --> 14:31.247
to pass it to bindat-pack unnecessarily.

14:31.248 --> 14:33.248
Another problem that I saw in this case

14:33.248 --> 14:35.729
(it was in the websocket package)

14:35.729 --> 14:39.568
was here, where they actually have a function

14:39.568 --> 14:41.169
where they need to write

14:41.169 --> 14:43.888
an integer of a size that will vary

14:43.888 --> 14:45.888
depending on the circumstances.

14:45.889 --> 14:49.650
And so they have to test the value of this integer,

14:49.650 --> 14:52.210
and depending on which one it is,

14:52.210 --> 14:54.449
they're going to use different types.

14:54.449 --> 14:56.290
So here it's a case

14:56.290 --> 14:59.490
where we want to have some kind of way to eval --

14:59.490 --> 15:02.530
to compute the length of the integer --

15:02.531 --> 15:08.130
instead of it being predefined or fixed.

15:08.130 --> 15:10.211
So this is one of the cases

15:10.211 --> 15:16.531
where the lack of eval was a problem.

15:16.531 --> 15:20.051
And actually in all of websocket,

15:20.051 --> 15:22.611
BinDat is only used to pack and unpack integers,

15:22.612 --> 15:24.612
even though there are many more opportunities

15:24.612 --> 15:26.772
to use BinDat in there.

15:26.772 --> 15:29.331
But it's not very convenient to use BinDat,

15:29.331 --> 00:15:35.890
as it stands, for those other cases.

15:35.891 --> 15:39.732
So what does the new design look like?

15:39.733 --> 15:44.132
Well in the new design, here's the problematic code

15:44.132 --> 15:46.373
for WeeChat.

15:46.373 --> 15:49.012
So we basically have the same fields as before,

15:49.012 --> 15:50.853
you just see that instead of u32,

15:50.853 --> 15:53.733
we now have 'uint 32' separately.

15:53.733 --> 15:55.332
The idea is that now this 32

15:55.332 --> 15:59.093
can be an expression you can evaluate,

15:59.094 --> 16:04.054
and so the u8 is also replaced by 'uint 8',

16:04.054 --> 16:07.253
and the id type is basically the same as before,

16:07.253 --> 16:08.854
and here another difference we see,

16:08.854 --> 16:11.654
and the main difference...

16:11.654 --> 16:13.494
Actually, it's the second main difference.

16:13.494 --> 16:15.174
The first main difference is that

16:15.175 --> 16:18.694
we don't actually quote this whole thing.

16:18.694 --> 16:23.095
Instead, we pass it to the bindat-type macro.

16:23.095 --> 16:25.095
So this is a macro

16:25.095 --> 16:27.574
that's going to actually build the type.

16:27.574 --> 16:29.254
This is a big difference

16:29.254 --> 16:30.535
in terms of performance also,

16:30.535 --> 16:32.694
because by making it a macro,

16:32.695 --> 16:34.296
we can pre-compute the code

16:34.296 --> 16:37.255
that's going to pack and unpack this thing,

16:37.255 --> 16:38.936
instead of having to interpret it

16:38.936 --> 16:41.096
every time we pack and unpack.

16:41.096 --> 16:43.815
So this macro will generate more efficient code

16:43.815 --> 16:45.815
along the way.

16:45.815 --> 16:48.695
Also it makes the code that appears in here

16:48.695 --> 16:50.296
visible to the compiler

16:50.297 --> 16:54.617
because we can give an Edebug spec for it.

16:54.617 --> 16:57.497
And so here as an argument to vec,

16:57.497 --> 16:59.016
instead of having to specify

16:59.016 --> 17:00.937
that this is an evaluated expression,

17:00.937 --> 17:02.777
we just write the expression directly,

17:02.777 --> 17:05.096
because all the expressions that appear there

17:05.096 --> 17:07.417
will just be evaluated,

17:07.418 --> 17:11.418
and we don't need to use the 'struct' variable

17:11.418 --> 17:14.137
and then extract the length field from it.

17:14.137 --> 17:16.938
We can just use length as a variable.

17:16.938 --> 17:18.698
So this variable 'length' here

17:18.698 --> 17:20.778
will refer to this field here,

17:20.778 --> 17:23.578
and then this variable 'id' here

17:23.578 --> 17:25.897
will refer to this field here,

17:25.898 --> 17:27.738
and so we can just use the field values

17:27.738 --> 17:30.459
as local variables, which is very natural

17:30.459 --> 00:17:31.679
and very efficient also, 

00:17:31.679 --> 00:17:34.618
because the code would actually directly do that,

17:34.618 --> 17:37.899
and the code that unpacks those data

17:37.899 --> 17:40.299
will just extract an integer

17:40.299 --> 17:42.219
and bind it to the length variable,

17:42.219 --> 17:47.579
and so that makes it immediately available there.

17:47.580 --> 17:51.340
Okay, let's see also

17:51.340 --> 17:54.220
what the actual documentation looks like.

17:54.220 --> 17:57.739
And so if we look at the doc of BinDat,

17:57.739 --> 18:01.180
we see the actual specification of the grammar.

18:01.181 --> 18:03.181
And so here we see instead of having

18:03.181 --> 18:06.461
these nine different non-terminals,

18:06.461 --> 18:08.061
we basically have two:

18:08.061 --> 18:10.781
we have the non-terminal for TYPE,

18:10.781 --> 18:15.021
which can be either a uint, a uintr, or a string,

18:15.021 --> 18:17.421
or bits, or fill, or align, or vec,

18:17.421 --> 18:19.901
or those various other forms;

18:19.902 --> 18:22.621
or it can be a struct, in which case,

18:22.621 --> 18:23.981
in the case of struct,

18:23.981 --> 18:27.502
then it will be followed by a sequence --

18:27.502 --> 18:30.142
a list of FIELDs, where each of the FIELDs

18:30.142 --> 18:33.902
is basically a LABEL followed by another TYPE.

18:33.902 --> 18:37.342
And so this makes the whole specification

18:37.343 --> 18:39.823
much simpler. We don't have any distinction now

18:39.823 --> 18:42.862
between struct being a special case,

18:42.862 --> 18:46.383
as opposed to just the normal types.

18:46.383 --> 18:49.263
struct is just now one of the possible types

18:49.263 --> 18:52.543
that can appear here.

18:52.543 --> 18:53.263
The other thing is that

18:53.263 --> 18:55.742
the LABEL is always present in the structure,

18:55.743 --> 18:58.384
so there's no ambiguity.

18:58.384 --> 19:00.304
Also all the above things,

19:00.304 --> 19:03.103
like the BITLEN we have here,

19:03.103 --> 19:04.384
the LEN we have here,

19:04.384 --> 19:07.504
the COUNT for vector we have here,

19:07.504 --> 19:10.224
these are all plain Elisp expressions,

19:10.224 --> 19:13.024
so they are implicitly evaluated if necessary.

19:13.025 --> 19:14.705
If you want them to be constant,

19:14.705 --> 19:16.705
and really constant, you can just use quotes,

19:16.705 --> 19:20.145
for those rare cases where it's necessary.

19:20.145 --> 19:21.905
Another thing is that you can extend it

19:21.905 --> 19:25.505
with with bindat-defmacro.

19:25.505 --> 19:30.225
Okay, let's go back here.

19:30.226 --> 19:32.706
So what are the advantages of this approach?

19:32.706 --> 19:34.625
As I said, one of the main advantages

19:34.625 --> 19:39.346
is that we now have support for Edebug.

19:39.346 --> 19:41.426
We don't have 'struct', 'repeat', and 'align'

19:41.426 --> 19:42.946
as special cases anymore.

19:42.946 --> 19:44.625
These are just normal types.

19:44.625 --> 19:48.066
Before, there was uint as type, int as type,

19:48.067 --> 19:49.267
and those kinds of things.

19:49.267 --> 19:51.110
'struct' and 'repeat' and 'align'

19:51.110 --> 19:53.267
were in a different case.

19:53.267 --> 19:54.387
So there were

19:54.387 --> 19:56.787
some subtle differences between those

19:56.787 --> 19:59.027
that completely disappeared.

19:59.027 --> 20:02.626
Also in the special cases, there was 'union',

20:02.626 --> 20:05.027
and union now has completely disappeared.

20:05.027 --> 20:07.827
We don't need it anymore, because instead,

20:07.828 --> 20:09.588
we can actually use code anywhere.

20:09.588 --> 20:11.908
That's one of the things I didn't mention here,

20:11.908 --> 20:17.268
but in this note here,

20:17.268 --> 20:19.747
that's one of the important notes.

20:19.747 --> 20:21.987
Not only are BITLEN, LEN, COUNT etc.

20:21.987 --> 20:23.028
Elisp expressions,

20:23.028 --> 20:26.788
but the type itself -- any type itself --

20:26.789 --> 20:29.029
is basically an expression.

20:29.029 --> 20:32.709
And so you can, instead of having 'uint BITLEN',

20:32.709 --> 20:36.628
you can have '(if blah-blah-blah uint string)',

20:36.628 --> 20:38.149
and so you can have a field

20:38.149 --> 20:40.549
that can be either string or an int,

20:40.549 --> 20:44.789
depending on some condition.

20:44.790 --> 20:46.869
And for that reason we don't need a union.

20:46.869 --> 20:47.910
Instead of having a union,

20:47.910 --> 20:50.710
we can just have a 'cond' or a 'pcase'

20:50.710 --> 20:53.590
that will return the type we want to use,

20:53.590 --> 20:55.109
depending on the context,

20:55.109 --> 21:00.950
which will generally depend on some previous field.

21:00.951 --> 21:03.750
Also we don't need to use single-field structs

21:03.750 --> 21:05.351
for simple types anymore,

21:05.351 --> 21:09.271
because there's no distinction between struct

21:09.271 --> 21:11.271
and other types.

21:11.271 --> 21:17.191
So we can pass to bindat-pack and bindat-unpack

21:17.191 --> 21:20.951
a specification which just says "here's an integer"

21:20.952 --> 21:24.392
and we'll just pack and unpack the integer.

21:24.392 --> 21:26.472
And of course now all the code is exposed,

21:26.472 --> 21:29.192
so not only Edebug works, but also Flymake,

21:29.192 --> 21:30.392
and the compiler, etc. --

21:30.392 --> 21:33.111
they can complain about it,

21:33.111 --> 21:38.871
and give you warnings and errors as we like them.

21:38.872 --> 21:44.553
And of course the kittens are much happier.

21:44.553 --> 21:48.153
Okay. This is going a bit over time,

21:48.153 --> 00:21:51.272
so let's try to go faster.

21:51.273 --> 21:53.752
Here are some of the new features

21:53.753 --> 21:54.794
that are introduced.

21:54.794 --> 21:56.314
I already mentioned briefly

21:56.314 --> 22:00.633
that you can define new types with bindat-defmacro.

22:00.633 --> 22:04.474
that's one of the important novelties,

22:04.474 --> 22:08.794
and you can extend BinDat with new types this way.

22:08.794 --> 22:10.714
The other thing you can do is

22:10.714 --> 22:16.233
you can control how values or packets

22:16.234 --> 22:20.315
are unpacked, and how they are represented.

22:20.315 --> 22:22.555
In the old BinDat,

22:22.555 --> 22:24.315
the packet is necessarily represented,

22:24.315 --> 22:28.634
when you unpack it, as an alist, basically,

22:28.635 --> 22:30.396
or a struct becomes an alist,

22:30.396 --> 22:31.676
and that's all there is.

22:31.676 --> 22:34.076
You don't have any choice about it.

22:34.076 --> 22:35.596
With the new system,

22:35.596 --> 22:38.076
by default, it also returns just an alist,

22:38.076 --> 22:41.916
but you can actually control what it's unpacked as,

22:41.916 --> 22:46.396
or what it's packed from, using these keywords.

22:46.396 --> 22:49.596
With :unpack-val, you can give an expression

22:49.597 --> 22:53.357
that will construct the unpacked value

22:53.357 --> 22:56.957
from the various fields.

22:56.957 --> 22:59.197
And with :pack-val and :pack-var,

22:59.197 --> 23:02.557
you can specify how to extract the information

23:02.557 --> 23:05.116
from the unpacked value

23:05.117 --> 00:23:08.077
to generate the pack value.

23:08.078 --> 23:12.637
So here are some examples.

23:12.637 --> 23:15.358
Here's an example taken from osc.

23:15.358 --> 23:17.438
osc actually doesn't use BinDat currently,

23:17.438 --> 23:22.478
but I have played with it

23:22.479 --> 23:23.758
to see what it would look like

23:23.758 --> 23:26.159
if we were to use BinDat.

23:26.159 --> 23:28.638
So here's the definition

23:28.638 --> 23:30.638
of the timetag representation,

23:30.638 --> 23:35.279
which represents timestamps in osc.

23:35.279 --> 23:37.998
So you would use bindat-type

23:37.998 --> 23:40.559
and then you have here :pack-var

23:40.559 --> 23:42.080
basically gives a name

23:42.080 --> 23:48.559
when we try to pack a timestamp.

23:48.559 --> 23:51.520
'time' will be the variable whose name contains

23:51.520 --> 23:54.159
the actual timestamp we will receive.

23:54.159 --> 23:57.520
So we want to represent the unpacked value

23:57.520 --> 24:00.240
as a normal Emacs timestamp,

24:00.240 --> 24:02.480
and then basically convert from this timestamp

24:02.480 --> 24:06.401
to a string, or from a string to this timestamp.

24:06.401 --> 24:10.080
When we receive it, it will be called time,

24:10.080 --> 24:12.240
so we can refer to it,

24:12.240 --> 24:15.360
and so in order to actually encode it,

24:15.360 --> 24:18.320
we basically turn this timestamp into an integer --

24:18.320 --> 24:20.799
that's what this :pack-val does.

24:20.799 --> 24:23.442
It says when we try to pack it,

24:23.442 --> 24:26.082
here's the the value that we should use.

24:26.082 --> 24:27.760
We turn it into an integer,

24:27.760 --> 24:30.320
and then this integer is going to be encoded

24:30.320 --> 24:36.162
as a uint 64-bit. So a 64-bit unsigned integer.

24:36.163 --> 24:38.960
When we try to unpack the value,

24:38.960 --> 24:40.720
this 'ticks' field

24:40.720 --> 24:45.679
will contain an unsigned int of 64 bits.

24:45.679 --> 24:50.559
We want to return instead a timestamp --

24:50.559 --> 24:53.924
a time value -- from Emacs.

24:53.924 --> 24:59.363
Here we use the representation of time

24:59.363 --> 25:02.799
as a pair of number of ticks

25:02.799 --> 25:06.720
and the corresponding frequency of those ticks.

25:06.720 --> 25:09.120
So that's what we do here with :unpack-val,

25:09.120 --> 25:12.004
which is construct the cons corresponding to it.

25:12.004 --> 25:16.400
With this definition, bindat-pack/unpack

25:16.400 --> 00:25:19.039
are going to convert to and from

00:25:19.039 --> 00:25:21.760
proper time values on one side,

25:21.760 --> 25:26.159
and binary strings on the other.

25:26.159 --> 25:27.520
Note, of course,

25:27.520 --> 25:30.320
that I complained that the old BinDat

25:30.320 --> 25:36.080
had to use single-field structs for simple types,

25:36.080 --> 25:37.039
and here, basically,

25:37.039 --> 25:39.840
I'm back using single-field structs as well

25:39.840 --> 25:41.120
for this particular case --

25:41.120 --> 25:44.640
actually a reasonably frequent case, to be honest.

25:44.640 --> 25:49.279
But at least this is not so problematic,

25:49.279 --> 25:51.840
because we actually control what is returned,

25:51.840 --> 25:54.159
so even though it's a single-field struct,

25:54.159 --> 25:56.640
it's not going to construct an alist

25:56.640 --> 25:58.320
or force you to construct an alist.

25:58.320 --> 26:02.720
Instead, it really receives and takes a value

26:02.720 --> 26:07.367
in the ideal representation that we chose.

26:07.367 --> 26:10.007
Here we have a more complex example,

26:10.007 --> 26:12.488
where the actual type is recursive,

26:12.488 --> 26:18.640
because it's representing those "LEB"...

26:18.640 --> 26:20.400
I can't remember what "LEB" stands for,

26:20.400 --> 26:22.559
but it's a representation

26:22.559 --> 26:25.600
for arbitrary length integers,

26:25.600 --> 26:27.520
where basically

26:27.520 --> 26:33.360
every byte is either smaller than 128,

26:33.360 --> 26:36.799
in which case it's the end of the of the value,

26:36.799 --> 26:39.760
or it's a value bigger than 128,

26:39.760 --> 26:42.159
in which case there's an extra byte on the end

26:42.159 --> 26:44.490
that's going to continue.

26:44.490 --> 26:46.640
Here we see the representation

26:46.640 --> 26:52.240
is basically a structure that starts with a byte,

26:52.240 --> 26:53.679
which contains this value,

26:53.679 --> 26:56.000
which can be either the last value or not,

26:56.000 --> 26:59.770
and the tail, which will either be empty,

26:59.770 --> 27:01.279
or contain something else.

27:01.279 --> 27:04.000
The empty [case] is here;

27:04.000 --> 27:07.039
if the head value is smaller than 128,

27:07.039 --> 27:11.840
then the type of this tail is going to be (unit 0),

27:11.840 --> 27:16.492
so basically 'unit' is the empty type,

27:16.492 --> 27:20.880
and 0 is the value we will receive when we read it.

27:20.880 --> 27:25.520
And if not, then it has as type 'loop',

27:25.520 --> 27:28.240
which is the type we're defining,

27:28.240 --> 27:30.491
so it's the recursive case,

27:30.491 --> 27:35.132
where then the rest of the type is the type itself.

27:35.132 --> 27:37.120
And so this lets us pack and unpack.

27:37.120 --> 27:39.600
We pass it an arbitrary size integer,

27:39.600 --> 27:42.240
and it's going to turn it into

27:42.240 --> 27:48.492
this LEB128 binary representation, and vice versa.

27:48.492 --> 27:52.480
I have other examples if you're interested,

27:52.480 --> 00:27:56.093
but anyway, here's the conclusion.

27:56.094 --> 27:58.320
We have a simpler, more flexible,

27:58.320 --> 28:01.039
and more powerful BinDat now,

28:01.039 --> 28:03.454
which is also significantly faster.

28:03.454 --> 28:06.799
And I can't remember the exact speed-up,

28:06.799 --> 28:08.720
but it's definitely not a few percents.

28:08.720 --> 28:12.640
I vaguely remember about 4x faster in my tests,

28:12.640 --> 28:16.815
but it's probably very different in different cases

28:16.815 --> 28:20.159
so it might be just 4x, 2x -- who knows?

28:20.159 --> 28:23.374
Try it for yourself, but I was pretty pleased,

28:23.374 --> 00:28:28.335
because it wasn't the main motivation, so anyway...

28:28.336 --> 28:31.135
The negatives are here.

28:31.135 --> 28:34.480
In the new system, there's this bindat-defmacro

28:34.480 --> 28:36.720
which lets us define, kind of, new types,

28:36.720 --> 28:40.895
and bindat-type also lets us define new types,

28:40.895 --> 28:45.360
and the distinction between them is a bit subtle;

28:45.360 --> 28:48.080
it kind of depends on...

28:48.080 --> 28:50.880
well it has an impact on efficiency

28:50.880 --> 28:53.520
more than anything, so it's not very satisfactory.

28:53.520 --> 28:56.737
There's a bit of redundancy between the two.

28:56.737 --> 28:59.039
There is no bit-level control, just as before.

28:59.039 --> 29:02.097
We can only manipulate basically bytes.

29:02.098 --> 29:03.360
So this is definitely not usable

29:03.360 --> 29:09.058
for a Huffman encoding kind of thing.

29:09.058 --> 29:10.880
Also, it's not nearly as flexible

29:10.880 --> 29:12.240
as some of the alternatives.

29:12.240 --> 29:13.760
So you know GNU Poke

29:13.760 --> 29:20.017
has been a vague inspiration for this work,

29:20.018 --> 29:22.480
and GNU Poke gives you a lot more power

29:22.480 --> 29:25.059
in how to specify the types, etc.

29:25.059 --> 29:26.579
And of course one of the main downsides

29:26.579 --> 29:28.018
is that it's still not used very much.

29:28.018 --> 29:29.283
Actually, the new BinDat

29:29.283 --> 29:31.039
is not used by any package

29:31.039 --> 29:33.059
as far as I know right now,

29:33.059 --> 29:35.279
but even the old one is not used very often,

29:35.279 --> 29:36.799
so who knows

29:36.799 --> 29:38.799
whether it's actually going to

29:38.799 --> 29:41.520
work very much better or not?

29:41.520 --> 29:44.399
Anyway, this is it for this talk.

29:44.399 --> 29:46.683
Thank you very much. Have a nice day.

29:46.683 --> 29:47.883
[captions by John Cummings]