summaryrefslogtreecommitdiffstats
path: root/2021/captions/emacsconf-2021-bindat--turbo-bindat--stefan-monnier--main.vtt
diff options
context:
space:
mode:
Diffstat (limited to '2021/captions/emacsconf-2021-bindat--turbo-bindat--stefan-monnier--main.vtt')
-rw-r--r--2021/captions/emacsconf-2021-bindat--turbo-bindat--stefan-monnier--main.vtt1855
1 files changed, 1855 insertions, 0 deletions
diff --git a/2021/captions/emacsconf-2021-bindat--turbo-bindat--stefan-monnier--main.vtt b/2021/captions/emacsconf-2021-bindat--turbo-bindat--stefan-monnier--main.vtt
new file mode 100644
index 00000000..3eb2b4ed
--- /dev/null
+++ b/2021/captions/emacsconf-2021-bindat--turbo-bindat--stefan-monnier--main.vtt
@@ -0,0 +1,1855 @@
+WEBVTT
+
+00:01.360 --> 00:04.080
+Hi. So I'm going to talk today
+
+00:04.180 --> 00:10.000
+about a fun rewrite I did of the BinDat package.
+
+00:10.000 --> 00:00:12.400
+I call this Turbo BinDat.
+
+00:00:12.400 --> 00:00:14.001
+Actually, the package hasn't changed name,
+
+00:14.101 --> 00:16.801
+it's just that the result happens to be faster.
+
+00:16.901 --> 00:19.521
+The point was not to make it faster though,
+
+00:19.621 --> 00:22.241
+and the point was not to make you understand
+
+00:22.341 --> 00:23.440
+that data is not code.
+
+00:23.540 --> 00:27.120
+It's just one more experience I've had
+
+00:27.120 --> 00:31.280
+where I've seen that treating data as code
+
+00:31.381 --> 00:33.522
+is not always a good idea.
+
+00:33.622 --> 00:36.162
+It's important to keep the difference.
+
+00:36.162 --> 00:38.880
+So let's get started.
+
+00:38.881 --> 00:40.642
+So what is BinDat anyway?
+
+00:40.742 --> 00:43.602
+Here's just the overview of basically
+
+00:43.602 --> 00:44.962
+what I'm going to present.
+
+00:45.062 --> 00:47.842
+So I'm first going to present BinDat itself
+
+00:47.843 --> 00:00:49.039
+for those who don't know it,
+
+00:00:49.039 --> 00:00:51.923
+which is probably the majority of you.
+
+00:51.923 --> 00:55.363
+Then I'm going to talk about the actual problems
+
+00:55.363 --> 00:58.882
+that I encountered with this package
+
+00:58.882 --> 01:01.843
+that motivated me to rewrite it.
+
+01:01.843 --> 01:05.043
+Most of them were lack of flexibility,
+
+01:05.044 --> 01:09.924
+and some of it was just poor behavior
+
+01:09.924 --> 01:13.364
+with respect to scoping and variables,
+
+01:13.364 --> 01:16.324
+which of course, you know, is bad --
+
+01:16.424 --> 01:20.724
+basically uses of eval or, "eval is evil."
+
+01:20.724 --> 01:24.884
+Then I'm going to talk about the new design --
+
+01:24.985 --> 01:28.005
+how I redesigned it
+
+01:28.105 --> 01:31.365
+to make it both simpler and more flexible,
+
+01:31.365 --> 01:32.965
+and where the key idea was
+
+01:33.065 --> 01:35.205
+to expose code as code
+
+01:35.305 --> 01:37.525
+instead of having it as data,
+
+01:37.625 --> 01:39.605
+and so here the distinction between the two
+
+01:39.706 --> 01:44.085
+is important and made things simpler.
+
+01:44.085 --> 01:46.405
+I tried to keep efficiency in mind,
+
+01:46.405 --> 01:52.405
+which resulted in some of the aspects of the design
+
+01:52.505 --> 01:54.886
+which are not completely satisfactory,
+
+01:54.886 --> 01:57.046
+but the result is actually fairly efficient.
+
+01:57.146 --> 01:59.286
+Even though it was not the main motivation,
+
+01:59.287 --> 02:02.967
+it was one of the nice outcomes.
+
+02:02.967 --> 00:02:06.006
+And then I'm going to present some examples.
+
+02:06.007 --> 02:08.167
+So first: what is BinDat?
+
+02:08.267 --> 02:10.567
+Oh actually, rather than present THIS,
+
+02:10.667 --> 02:12.407
+I'm going to go straight to the code,
+
+02:12.507 --> 02:14.246
+because BinDat actually had
+
+02:14.346 --> 02:16.647
+an introduction which was fairly legible.
+
+02:16.748 --> 02:21.128
+So here we go: this is the old BinDat from Emacs 27
+
+02:21.128 --> 02:23.448
+and the commentary starts by explaining
+
+02:23.448 --> 02:25.848
+what is BinDat? Basically BinDat is a package
+
+02:25.948 --> 02:30.247
+that lets you parse and unparse
+
+02:30.247 --> 02:31.527
+basically binary data.
+
+02:31.627 --> 02:34.648
+The intent is to have typically network data
+
+02:34.749 --> 02:35.849
+or something like this.
+
+02:35.949 --> 02:38.328
+So assuming you have network data,
+
+02:38.328 --> 02:41.528
+presented or defined
+
+02:41.628 --> 02:44.569
+with some kind of C-style structs, typically,
+
+02:44.669 --> 02:46.009
+or something along these lines.
+
+02:46.109 --> 02:49.120
+So you presumably start with documentation
+
+02:49.120 --> 02:52.809
+that presents something like those structs here,
+
+02:52.810 --> 02:57.130
+and you want to be able to generate such packets
+
+02:57.230 --> 03:00.249
+and read such packets,
+
+03:00.349 --> 03:02.090
+so the way you do it is
+
+03:02.190 --> 03:04.570
+you rewrite those specifications
+
+03:04.670 --> 03:06.010
+into the BinDat syntax.
+
+03:06.110 --> 03:07.529
+So here's the BinDat syntax
+
+03:07.529 --> 03:10.490
+for the the previous specification.
+
+03:10.491 --> 03:11.610
+So here, for example,
+
+03:11.610 --> 03:16.970
+you see the case for a data packet
+
+03:16.970 --> 03:20.411
+which will have a 'type' field which is a byte
+
+03:20.411 --> 03:24.091
+(an unsigned 8-bit entity),
+
+03:24.091 --> 03:26.411
+then an 'opcode' which is also a byte,
+
+03:26.411 --> 03:30.731
+then a 'length' which is a 16-bit unsigned integer
+
+03:30.732 --> 03:34.092
+in little endian order,
+
+03:34.092 --> 03:38.732
+and then some 'id' for this entry, which is
+
+03:38.732 --> 03:43.531
+8 bytes containing a zero-terminated string,
+
+03:43.531 --> 03:47.531
+and then the actual data, basically the payload,
+
+03:47.532 --> 03:51.453
+which is in this case a vector of bytes,
+
+03:51.453 --> 03:54.812
+('bytes' here doesn't doesn't need to be specified)
+
+03:54.812 --> 03:58.172
+and here we specify the length of this vector.
+
+03:58.172 --> 03:59.773
+This 'length' here
+
+03:59.773 --> 04:02.252
+happens to be actually the name of THIS field,
+
+04:02.252 --> 04:03.853
+so the length of the data
+
+04:03.854 --> 04:06.574
+is specified by the 'length' field here,
+
+04:06.574 --> 04:08.574
+and BinDat will understand this part,
+
+04:08.574 --> 04:12.333
+which is the the nice part of BinDat.
+
+04:12.333 --> 04:15.774
+And then you have an alignment field at the end,
+
+04:15.774 --> 04:18.253
+which is basically padding.
+
+04:18.253 --> 04:20.574
+It says that it is padded
+
+04:20.575 --> 04:23.295
+until the next multiple of four.
+
+04:23.295 --> 04:25.855
+Okay. So this works reasonably well.
+
+04:25.855 --> 04:27.455
+This is actually very nice.
+
+04:27.455 --> 04:30.335
+With this, you can then call
+
+04:30.335 --> 04:32.975
+bindat-pack or bindat-unpack,
+
+04:32.975 --> 04:37.774
+passing it a string, or passing it an alist,
+
+04:37.774 --> 04:40.415
+to do the packing and unpacking.
+
+04:40.416 --> 04:43.296
+So, for example, if you take this string--
+
+04:43.296 --> 04:45.856
+actually, in this case, it's a vector of bytes
+
+04:45.856 --> 04:49.456
+but it works the same; it works in both ways--
+
+04:49.456 --> 04:53.536
+if you pass this to bindat-unpack,
+
+04:53.536 --> 04:57.456
+it will presumably return you this structure
+
+04:57.457 --> 05:00.017
+if you've given it the corresponding type.
+
+05:00.017 --> 05:01.776
+So it will extract--
+
+05:01.776 --> 05:05.617
+you will see that there is an IP address,
+
+05:05.617 --> 05:08.017
+which is a destination IP, a source IP,
+
+05:08.017 --> 05:09.857
+and some port number,
+
+05:09.857 --> 05:12.977
+and some actual data here and there, etc.
+
+05:12.977 --> 05:18.017
+So this is quite convenient if you need to do this,
+
+05:18.018 --> 05:20.898
+and that's what it was designed for.
+
+05:20.898 --> 00:05:27.537
+So here we are. Let's go back to the actual talk.
+
+05:27.538 --> 05:34.338
+I converted BinDat to lexical scoping at some point
+
+05:34.339 --> 05:37.299
+and things seemed to work fine,
+
+05:37.299 --> 05:42.819
+except, at some point, probably weeks later,
+
+05:42.819 --> 05:47.139
+I saw a bug report
+
+05:47.139 --> 05:53.058
+about the new version using lexical scoping
+
+05:53.059 --> 05:56.339
+not working correctly with WeeChat.
+
+05:56.339 --> 06:00.580
+So here's the actual chunk of code
+
+06:00.580 --> 06:02.820
+that appears in WeeChat.
+
+06:02.820 --> 06:08.420
+Here you see that they also define a BinDat spec.
+
+06:08.421 --> 06:14.741
+It's a packet that has a 32-bit unsigned length,
+
+06:14.741 --> 06:18.500
+then some compression byte/compression information,
+
+06:18.500 --> 06:23.780
+then an id which contains basically another struct
+
+06:23.780 --> 06:26.901
+(which is specified elsewhere; doesn't matter here),
+
+06:26.902 --> 06:28.661
+and after that, a vector
+
+06:28.661 --> 06:33.382
+whose size is not just specified by 'length',
+
+06:33.382 --> 06:35.142
+but is computed from 'length'.
+
+06:35.142 --> 06:39.142
+So here's how they used to compute it in WeeChat.
+
+06:39.142 --> 06:42.822
+So the length here can be specified in BinDat.
+
+06:42.822 --> 06:43.941
+Instead of having
+
+06:43.942 --> 06:45.863
+just a reference to one of the fields,
+
+06:45.863 --> 06:48.903
+or having a constant, you can actually compute it,
+
+06:48.903 --> 06:52.502
+where you have to use this '(eval',
+
+06:52.502 --> 06:54.743
+and then followed by the actual expression
+
+06:54.743 --> 06:58.103
+where you say how you compute it.
+
+06:58.103 --> 07:01.463
+And here you see that it actually computes it
+
+07:01.464 --> 07:04.904
+based on the 'length of the structure --
+
+07:04.904 --> 07:07.783
+that's supposed to be this 'length' field here --
+
+07:07.783 --> 07:11.223
+and it's referred to using the bindat-get-field
+
+07:11.223 --> 07:14.503
+to extract the field from the variable 'struct'.
+
+07:14.503 --> 07:17.943
+And then it subtracts four, it subtracts one,
+
+07:17.943 --> 07:19.467
+and adds some other things
+
+07:19.468 --> 07:22.185
+which depend on some field
+
+07:22.185 --> 07:26.905
+that's found in this 'id' field here.
+
+07:26.905 --> 07:28.425
+And the problem with this code
+
+07:28.425 --> 07:30.425
+was that it broke
+
+07:30.425 --> 07:32.745
+because of this 'struct' variable here,
+
+07:32.745 --> 07:35.145
+because this 'struct' variable is not defined
+
+07:35.145 --> 07:38.105
+anywhere in the specification of BinDat.
+
+07:38.106 --> 07:41.866
+It was used internally as a local variable,
+
+07:41.866 --> 07:45.306
+and because it was using dynamic scoping,
+
+07:45.306 --> 07:47.386
+it actually happened to be available here,
+
+07:47.386 --> 07:50.826
+but the documentation nowhere specifies it.
+
+07:50.826 --> 07:52.506
+So it was not exactly
+
+07:52.506 --> 07:55.546
+a bug of the conversion to lexical scoping,
+
+07:55.547 --> 07:58.906
+but it ended up breaking this code.
+
+07:58.906 --> 08:01.226
+And there was no way to actually
+
+08:01.226 --> 08:05.066
+fix the code within the specification of BinDat.
+
+08:05.066 --> 08:08.287
+You had to go outside the specification of BinDat
+
+08:08.287 --> 08:10.427
+to fix this problem.
+
+08:10.427 --> 08:14.346
+This is basically how I started looking at BinDat.
+
+08:14.347 --> 08:17.808
+Then I went to actually investigate a bit more
+
+08:17.808 --> 08:19.627
+what was going on,
+
+08:19.627 --> 08:22.108
+and the thing I noticed along the way
+
+08:22.108 --> 08:25.787
+was basically that the specification of BinDat
+
+08:25.787 --> 08:29.528
+is fairly complex and has a lot of eval
+
+08:29.528 --> 08:30.748
+and things like this.
+
+08:30.749 --> 08:32.288
+So let's take a look
+
+08:32.288 --> 08:35.068
+at what the BinDat specification looks like.
+
+08:35.068 --> 08:36.589
+So here it's actually documented
+
+08:36.589 --> 08:40.269
+as a kind of grammar rules.
+
+08:40.269 --> 08:45.308
+A specification is basically a sequence of items,
+
+08:45.308 --> 08:47.389
+and then each of the items is basically
+
+08:47.389 --> 08:51.248
+a FIELD of a struct, so it has a FIELD name,
+
+08:51.249 --> 08:53.249
+and then a TYPE.
+
+08:53.249 --> 08:54.510
+Instead of a TYPE,
+
+08:54.510 --> 08:56.590
+it could have some other FORM for eval,
+
+08:56.590 --> 08:58.989
+which was basically never used as far as I know,
+
+08:58.989 --> 09:00.190
+or it can be some filler,
+
+09:00.190 --> 09:02.750
+or you can have some 'align' specification,
+
+09:02.750 --> 09:05.150
+or you can refer to another struct.
+
+09:05.150 --> 09:07.390
+It could also be some kind of union,
+
+09:07.391 --> 09:10.430
+or it can be some kind of repetition of something.
+
+09:10.430 --> 09:12.430
+And then you have the TYPE specified here,
+
+09:12.430 --> 09:18.271
+which can be some integers, strings, or a vector,
+
+09:18.271 --> 09:21.631
+and there are a few other special cases.
+
+09:21.631 --> 09:25.310
+And then the actual field itself
+
+09:25.311 --> 09:28.192
+can be either a NAME, or something that's computed,
+
+09:28.192 --> 09:30.752
+and then everywhere here, you have LEN,
+
+09:30.752 --> 00:09:32.480
+which specifies the length of vectors,
+
+00:09:32.480 --> 00:09:34.672
+for example, or length of strings.
+
+09:34.672 --> 09:37.632
+This is actually either nil to mean one,
+
+09:37.632 --> 09:39.072
+or it can be an ARG,
+
+09:39.072 --> 09:40.952
+where ARG is defined to be
+
+09:40.952 --> 09:42.672
+either an integer or DEREF,
+
+09:42.673 --> 09:46.673
+where DEREF is basically a specification
+
+09:46.673 --> 09:48.833
+that can refer, for example, to the 'length' field
+
+09:48.833 --> 09:51.956
+-- that's what we saw between parentheses: (length)
+
+09:51.956 --> 09:56.273
+was this way to refer to the 'length' field.
+
+09:56.273 --> 09:59.793
+Or it can be an expression, which is what we saw
+
+09:59.794 --> 10:02.834
+in the computation of the length for WeeChat,
+
+10:02.834 --> 10:04.914
+where you just had a '(eval'
+
+10:04.914 --> 10:06.334
+and then some computation
+
+10:06.334 --> 10:10.274
+of the length of the payload.
+
+10:10.274 --> 10:12.354
+And so if you look here, you see that
+
+10:12.354 --> 10:14.674
+it is fairly large and complex,
+
+10:14.674 --> 10:18.514
+and it uses eval everywhere. And actually,
+
+10:18.515 --> 10:20.675
+it's not just that it has eval in its syntax,
+
+10:20.675 --> 10:23.395
+but the implementation has to use eval everywhere,
+
+10:23.395 --> 10:25.314
+because, if you go back
+
+10:25.314 --> 10:27.475
+to see the kind of code we see,
+
+10:27.475 --> 10:29.538
+we see here we just define
+
+10:29.538 --> 10:34.195
+weechat--relay-message-spec as a constant!
+
+10:34.195 --> 10:37.314
+It's nothing than just data, right?
+
+10:37.315 --> 10:38.836
+So within this data
+
+10:38.836 --> 10:41.076
+there are things we need to evaluate,
+
+10:41.076 --> 10:42.356
+but it's pure data,
+
+10:42.356 --> 10:44.356
+so it will have to be evaluated
+
+10:44.356 --> 10:46.596
+by passing it to eval. It can't be compiled,
+
+10:46.596 --> 10:50.196
+because it's within a quote, right?
+
+10:50.196 --> 10:52.836
+And so for that reason, kittens really
+
+10:52.837 --> 10:55.956
+suffer terribly with uses of BinDat.
+
+10:55.956 --> 10:59.957
+You really have to be very careful with that.
+
+10:59.957 --> 11:02.037
+More seriously,
+
+11:02.037 --> 11:05.157
+the 'struct' variable was not documented,
+
+11:05.157 --> 11:07.797
+and yet it's indispensable
+
+11:07.797 --> 11:08.996
+for important applications,
+
+11:08.996 --> 11:11.157
+such as using in WeeChat.
+
+11:11.158 --> 11:13.078
+So clearly this needs to be fixed.
+
+11:13.078 --> 11:15.481
+Of course, we can just document 'struct'
+
+11:15.481 --> 11:18.038
+as some variable that's used there,
+
+11:18.038 --> 11:19.798
+but of course we don't want to do that,
+
+11:19.798 --> 11:23.398
+because 'struct' is not obviously
+
+11:23.398 --> 11:25.398
+a dynamically scoped variable,
+
+11:25.398 --> 11:29.317
+so it's not very clean.
+
+11:29.318 --> 11:31.939
+Also other problems I noticed was that the grammar
+
+11:31.939 --> 11:35.239
+is significantly more complex than necessary.
+
+11:35.239 --> 11:38.199
+We have nine distinct non-terminals.
+
+11:38.199 --> 11:39.639
+There is ambiguity.
+
+11:39.639 --> 11:44.919
+If you try to use a field whose name is 'align',
+
+11:44.919 --> 11:48.679
+or 'fill', or something like this,
+
+11:48.680 --> 11:50.920
+then it's going to be misinterpreted,
+
+11:50.920 --> 11:54.920
+or it can be misinterpreted.
+
+11:54.920 --> 11:58.760
+The vector length can be either an expression,
+
+11:58.760 --> 12:02.280
+or an integer, or a reference to a label,
+
+12:02.280 --> 12:03.720
+but the expression
+
+12:03.720 --> 12:06.360
+should already be the general case,
+
+12:06.361 --> 12:08.041
+and this expression can itself be
+
+12:08.041 --> 12:09.401
+just a constant integer,
+
+12:09.401 --> 12:13.961
+so this complexity is probably not indispensable,
+
+12:13.961 --> 12:15.641
+or it could be replaced with something simpler.
+
+12:15.641 --> 12:17.401
+That's what I felt like.
+
+12:17.401 --> 12:19.161
+And basically lots of places
+
+12:19.161 --> 12:21.721
+allow an (eval EXP) form somewhere
+
+12:21.721 --> 12:25.081
+to open up the door for more flexibility,
+
+12:25.082 --> 12:26.922
+but not all of them do,
+
+12:26.922 --> 12:29.482
+and we don't really want
+
+12:29.482 --> 12:31.001
+to have this eval there, right?
+
+12:31.001 --> 12:33.802
+It's not very convenient syntactically either.
+
+12:33.802 --> 12:36.042
+So it makes the uses of eval
+
+12:36.042 --> 12:38.362
+a bit heavier than they need to be,
+
+12:38.362 --> 12:41.722
+and so I didn't really like this part.
+
+12:41.723 --> 12:42.603
+Another part is that
+
+12:42.603 --> 12:45.183
+when I tried to figure out what was going on,
+
+12:45.183 --> 12:46.666
+[dog barks and distracts Stefan]
+
+12:46.666 --> 12:50.043
+I had trouble... Winnie as well, as you can hear.
+
+12:50.043 --> 12:50.923
+She had trouble as well.
+
+12:50.923 --> 12:53.083
+But one of the troubles was that
+
+12:53.083 --> 12:55.002
+there was no way to debug the code
+
+12:55.002 --> 12:57.562
+via Edebug, because it's just data,
+
+12:57.562 --> 13:00.523
+so Edebug doesn't know that it has to look at it
+
+13:00.524 --> 13:02.683
+and instrument it.
+
+13:02.683 --> 13:05.644
+And of course it was not conveniently extensible.
+
+13:05.644 --> 13:07.164
+That's also one of the things
+
+13:07.164 --> 13:08.487
+I noticed along the way.
+
+13:09.084 --> 13:12.844
+Okay, so here's an example of
+
+13:12.844 --> 13:15.484
+problems not that I didn't just see there,
+
+13:15.485 --> 13:18.684
+but that were actually present in code.
+
+13:18.684 --> 13:22.124
+I went to look at code that was using BinDat
+
+13:22.124 --> 13:24.285
+to see what uses looked like,
+
+13:24.285 --> 13:28.765
+and I saw that BinDat was not used very heavily,
+
+13:28.765 --> 13:30.365
+but some of the main uses
+
+13:30.365 --> 13:33.884
+were just to read and write integers.
+
+13:33.885 --> 13:37.565
+And here you can see a very typical case.
+
+13:37.565 --> 13:41.726
+This is also coming from WeeChat.
+
+13:41.726 --> 13:43.565
+We do a bindat-get-field
+
+13:43.565 --> 13:48.445
+of the length of some struct we read.
+
+13:48.445 --> 13:50.685
+Actually, the struct we read is here.
+
+13:50.685 --> 13:51.646
+It has a single field,
+
+13:51.647 --> 13:53.006
+because the only thing we want to do
+
+13:53.006 --> 13:56.287
+is actually to unpack a 32-bit integer,
+
+13:56.287 --> 13:58.287
+but the only way we can do that
+
+13:58.287 --> 14:01.647
+is by specifying a struct with one field.
+
+14:01.647 --> 14:04.847
+And so we have to extract this struct of one field,
+
+14:04.847 --> 14:07.246
+which constructs an alist
+
+14:07.246 --> 14:09.647
+containing the actual integer,
+
+14:09.648 --> 14:11.887
+and then we just use get-field to extract it.
+
+14:11.887 --> 14:15.007
+So this doesn't seem very elegant
+
+14:15.007 --> 14:16.528
+to have to construct an alist
+
+14:16.528 --> 14:20.368
+just to then extract the integer from it.
+
+14:20.368 --> 14:21.648
+Same thing if you try to pack it:
+
+14:21.648 --> 14:25.007
+you first have to construct the alist
+
+14:25.007 --> 14:31.247
+to pass it to bindat-pack unnecessarily.
+
+14:31.248 --> 14:33.248
+Another problem that I saw in this case
+
+14:33.248 --> 14:35.729
+(it was in the websocket package)
+
+14:35.729 --> 14:39.568
+was here, where they actually have a function
+
+14:39.568 --> 14:41.169
+where they need to write
+
+14:41.169 --> 14:43.888
+an integer of a size that will vary
+
+14:43.888 --> 14:45.888
+depending on the circumstances.
+
+14:45.889 --> 14:49.650
+And so they have to test the value of this integer,
+
+14:49.650 --> 14:52.210
+and depending on which one it is,
+
+14:52.210 --> 14:54.449
+they're going to use different types.
+
+14:54.449 --> 14:56.290
+So here it's a case
+
+14:56.290 --> 14:59.490
+where we want to have some kind of way to eval --
+
+14:59.490 --> 15:02.530
+to compute the length of the integer --
+
+15:02.531 --> 15:08.130
+instead of it being predefined or fixed.
+
+15:08.130 --> 15:10.211
+So this is one of the cases
+
+15:10.211 --> 15:16.531
+where the lack of eval was a problem.
+
+15:16.531 --> 15:20.051
+And actually in all of websocket,
+
+15:20.051 --> 15:22.611
+BinDat is only used to pack and unpack integers,
+
+15:22.612 --> 15:24.612
+even though there are many more opportunities
+
+15:24.612 --> 15:26.772
+to use BinDat in there.
+
+15:26.772 --> 15:29.331
+But it's not very convenient to use BinDat,
+
+15:29.331 --> 00:15:35.890
+as it stands, for those other cases.
+
+15:35.891 --> 15:39.732
+So what does the new design look like?
+
+15:39.733 --> 15:44.132
+Well in the new design, here's the problematic code
+
+15:44.132 --> 15:46.373
+for WeeChat.
+
+15:46.373 --> 15:49.012
+So we basically have the same fields as before,
+
+15:49.012 --> 15:50.853
+you just see that instead of u32,
+
+15:50.853 --> 15:53.733
+we now have 'uint 32' separately.
+
+15:53.733 --> 15:55.332
+The idea is that now this 32
+
+15:55.332 --> 15:59.093
+can be an expression you can evaluate,
+
+15:59.094 --> 16:04.054
+and so the u8 is also replaced by 'uint 8',
+
+16:04.054 --> 16:07.253
+and the id type is basically the same as before,
+
+16:07.253 --> 16:08.854
+and here another difference we see,
+
+16:08.854 --> 16:11.654
+and the main difference...
+
+16:11.654 --> 16:13.494
+Actually, it's the second main difference.
+
+16:13.494 --> 16:15.174
+The first main difference is that
+
+16:15.175 --> 16:18.694
+we don't actually quote this whole thing.
+
+16:18.694 --> 16:23.095
+Instead, we pass it to the bindat-type macro.
+
+16:23.095 --> 16:25.095
+So this is a macro
+
+16:25.095 --> 16:27.574
+that's going to actually build the type.
+
+16:27.574 --> 16:29.254
+This is a big difference
+
+16:29.254 --> 16:30.535
+in terms of performance also,
+
+16:30.535 --> 16:32.694
+because by making it a macro,
+
+16:32.695 --> 16:34.296
+we can pre-compute the code
+
+16:34.296 --> 16:37.255
+that's going to pack and unpack this thing,
+
+16:37.255 --> 16:38.936
+instead of having to interpret it
+
+16:38.936 --> 16:41.096
+every time we pack and unpack.
+
+16:41.096 --> 16:43.815
+So this macro will generate more efficient code
+
+16:43.815 --> 16:45.815
+along the way.
+
+16:45.815 --> 16:48.695
+Also it makes the code that appears in here
+
+16:48.695 --> 16:50.296
+visible to the compiler
+
+16:50.297 --> 16:54.617
+because we can give an Edebug spec for it.
+
+16:54.617 --> 16:57.497
+And so here as an argument to vec,
+
+16:57.497 --> 16:59.016
+instead of having to specify
+
+16:59.016 --> 17:00.937
+that this is an evaluated expression,
+
+17:00.937 --> 17:02.777
+we just write the expression directly,
+
+17:02.777 --> 17:05.096
+because all the expressions that appear there
+
+17:05.096 --> 17:07.417
+will just be evaluated,
+
+17:07.418 --> 17:11.418
+and we don't need to use the 'struct' variable
+
+17:11.418 --> 17:14.137
+and then extract the length field from it.
+
+17:14.137 --> 17:16.938
+We can just use length as a variable.
+
+17:16.938 --> 17:18.698
+So this variable 'length' here
+
+17:18.698 --> 17:20.778
+will refer to this field here,
+
+17:20.778 --> 17:23.578
+and then this variable 'id' here
+
+17:23.578 --> 17:25.897
+will refer to this field here,
+
+17:25.898 --> 17:27.738
+and so we can just use the field values
+
+17:27.738 --> 17:30.459
+as local variables, which is very natural
+
+17:30.459 --> 00:17:31.679
+and very efficient also,
+
+00:17:31.679 --> 00:17:34.618
+because the code would actually directly do that,
+
+17:34.618 --> 17:37.899
+and the code that unpacks those data
+
+17:37.899 --> 17:40.299
+will just extract an integer
+
+17:40.299 --> 17:42.219
+and bind it to the length variable,
+
+17:42.219 --> 17:47.579
+and so that makes it immediately available there.
+
+17:47.580 --> 17:51.340
+Okay, let's see also
+
+17:51.340 --> 17:54.220
+what the actual documentation looks like.
+
+17:54.220 --> 17:57.739
+And so if we look at the doc of BinDat,
+
+17:57.739 --> 18:01.180
+we see the actual specification of the grammar.
+
+18:01.181 --> 18:03.181
+And so here we see instead of having
+
+18:03.181 --> 18:06.461
+these nine different non-terminals,
+
+18:06.461 --> 18:08.061
+we basically have two:
+
+18:08.061 --> 18:10.781
+we have the non-terminal for TYPE,
+
+18:10.781 --> 18:15.021
+which can be either a uint, a uintr, or a string,
+
+18:15.021 --> 18:17.421
+or bits, or fill, or align, or vec,
+
+18:17.421 --> 18:19.901
+or those various other forms;
+
+18:19.902 --> 18:22.621
+or it can be a struct, in which case,
+
+18:22.621 --> 18:23.981
+in the case of struct,
+
+18:23.981 --> 18:27.502
+then it will be followed by a sequence --
+
+18:27.502 --> 18:30.142
+a list of FIELDs, where each of the FIELDs
+
+18:30.142 --> 18:33.902
+is basically a LABEL followed by another TYPE.
+
+18:33.902 --> 18:37.342
+And so this makes the whole specification
+
+18:37.343 --> 18:39.823
+much simpler. We don't have any distinction now
+
+18:39.823 --> 18:42.862
+between struct being a special case,
+
+18:42.862 --> 18:46.383
+as opposed to just the normal types.
+
+18:46.383 --> 18:49.263
+struct is just now one of the possible types
+
+18:49.263 --> 18:52.543
+that can appear here.
+
+18:52.543 --> 18:53.263
+The other thing is that
+
+18:53.263 --> 18:55.742
+the LABEL is always present in the structure,
+
+18:55.743 --> 18:58.384
+so there's no ambiguity.
+
+18:58.384 --> 19:00.304
+Also all the above things,
+
+19:00.304 --> 19:03.103
+like the BITLEN we have here,
+
+19:03.103 --> 19:04.384
+the LEN we have here,
+
+19:04.384 --> 19:07.504
+the COUNT for vector we have here,
+
+19:07.504 --> 19:10.224
+these are all plain Elisp expressions,
+
+19:10.224 --> 19:13.024
+so they are implicitly evaluated if necessary.
+
+19:13.025 --> 19:14.705
+If you want them to be constant,
+
+19:14.705 --> 19:16.705
+and really constant, you can just use quotes,
+
+19:16.705 --> 19:20.145
+for those rare cases where it's necessary.
+
+19:20.145 --> 19:21.905
+Another thing is that you can extend it
+
+19:21.905 --> 19:25.505
+with with bindat-defmacro.
+
+19:25.505 --> 19:30.225
+Okay, let's go back here.
+
+19:30.226 --> 19:32.706
+So what are the advantages of this approach?
+
+19:32.706 --> 19:34.625
+As I said, one of the main advantages
+
+19:34.625 --> 19:39.346
+is that we now have support for Edebug.
+
+19:39.346 --> 19:41.426
+We don't have 'struct', 'repeat', and 'align'
+
+19:41.426 --> 19:42.946
+as special cases anymore.
+
+19:42.946 --> 19:44.625
+These are just normal types.
+
+19:44.625 --> 19:48.066
+Before, there was uint as type, int as type,
+
+19:48.067 --> 19:49.267
+and those kinds of things.
+
+19:49.267 --> 19:51.110
+'struct' and 'repeat' and 'align'
+
+19:51.110 --> 19:53.267
+were in a different case.
+
+19:53.267 --> 19:54.387
+So there were
+
+19:54.387 --> 19:56.787
+some subtle differences between those
+
+19:56.787 --> 19:59.027
+that completely disappeared.
+
+19:59.027 --> 20:02.626
+Also in the special cases, there was 'union',
+
+20:02.626 --> 20:05.027
+and union now has completely disappeared.
+
+20:05.027 --> 20:07.827
+We don't need it anymore, because instead,
+
+20:07.828 --> 20:09.588
+we can actually use code anywhere.
+
+20:09.588 --> 20:11.908
+That's one of the things I didn't mention here,
+
+20:11.908 --> 20:17.268
+but in this note here,
+
+20:17.268 --> 20:19.747
+that's one of the important notes.
+
+20:19.747 --> 20:21.987
+Not only are BITLEN, LEN, COUNT etc.
+
+20:21.987 --> 20:23.028
+Elisp expressions,
+
+20:23.028 --> 20:26.788
+but the type itself -- any type itself --
+
+20:26.789 --> 20:29.029
+is basically an expression.
+
+20:29.029 --> 20:32.709
+And so you can, instead of having 'uint BITLEN',
+
+20:32.709 --> 20:36.628
+you can have '(if blah-blah-blah uint string)',
+
+20:36.628 --> 20:38.149
+and so you can have a field
+
+20:38.149 --> 20:40.549
+that can be either string or an int,
+
+20:40.549 --> 20:44.789
+depending on some condition.
+
+20:44.790 --> 20:46.869
+And for that reason we don't need a union.
+
+20:46.869 --> 20:47.910
+Instead of having a union,
+
+20:47.910 --> 20:50.710
+we can just have a 'cond' or a 'pcase'
+
+20:50.710 --> 20:53.590
+that will return the type we want to use,
+
+20:53.590 --> 20:55.109
+depending on the context,
+
+20:55.109 --> 21:00.950
+which will generally depend on some previous field.
+
+21:00.951 --> 21:03.750
+Also we don't need to use single-field structs
+
+21:03.750 --> 21:05.351
+for simple types anymore,
+
+21:05.351 --> 21:09.271
+because there's no distinction between struct
+
+21:09.271 --> 21:11.271
+and other types.
+
+21:11.271 --> 21:17.191
+So we can pass to bindat-pack and bindat-unpack
+
+21:17.191 --> 21:20.951
+a specification which just says "here's an integer"
+
+21:20.952 --> 21:24.392
+and we'll just pack and unpack the integer.
+
+21:24.392 --> 21:26.472
+And of course now all the code is exposed,
+
+21:26.472 --> 21:29.192
+so not only Edebug works, but also Flymake,
+
+21:29.192 --> 21:30.392
+and the compiler, etc. --
+
+21:30.392 --> 21:33.111
+they can complain about it,
+
+21:33.111 --> 21:38.871
+and give you warnings and errors as we like them.
+
+21:38.872 --> 21:44.553
+And of course the kittens are much happier.
+
+21:44.553 --> 21:48.153
+Okay. This is going a bit over time,
+
+21:48.153 --> 00:21:51.272
+so let's try to go faster.
+
+21:51.273 --> 21:53.752
+Here are some of the new features
+
+21:53.753 --> 21:54.794
+that are introduced.
+
+21:54.794 --> 21:56.314
+I already mentioned briefly
+
+21:56.314 --> 22:00.633
+that you can define new types with bindat-defmacro.
+
+22:00.633 --> 22:04.474
+that's one of the important novelties,
+
+22:04.474 --> 22:08.794
+and you can extend BinDat with new types this way.
+
+22:08.794 --> 22:10.714
+The other thing you can do is
+
+22:10.714 --> 22:16.233
+you can control how values or packets
+
+22:16.234 --> 22:20.315
+are unpacked, and how they are represented.
+
+22:20.315 --> 22:22.555
+In the old BinDat,
+
+22:22.555 --> 22:24.315
+the packet is necessarily represented,
+
+22:24.315 --> 22:28.634
+when you unpack it, as an alist, basically,
+
+22:28.635 --> 22:30.396
+or a struct becomes an alist,
+
+22:30.396 --> 22:31.676
+and that's all there is.
+
+22:31.676 --> 22:34.076
+You don't have any choice about it.
+
+22:34.076 --> 22:35.596
+With the new system,
+
+22:35.596 --> 22:38.076
+by default, it also returns just an alist,
+
+22:38.076 --> 22:41.916
+but you can actually control what it's unpacked as,
+
+22:41.916 --> 22:46.396
+or what it's packed from, using these keywords.
+
+22:46.396 --> 22:49.596
+With :unpack-val, you can give an expression
+
+22:49.597 --> 22:53.357
+that will construct the unpacked value
+
+22:53.357 --> 22:56.957
+from the various fields.
+
+22:56.957 --> 22:59.197
+And with :pack-val and :pack-var,
+
+22:59.197 --> 23:02.557
+you can specify how to extract the information
+
+23:02.557 --> 23:05.116
+from the unpacked value
+
+23:05.117 --> 00:23:08.077
+to generate the pack value.
+
+23:08.078 --> 23:12.637
+So here are some examples.
+
+23:12.637 --> 23:15.358
+Here's an example taken from osc.
+
+23:15.358 --> 23:17.438
+osc actually doesn't use BinDat currently,
+
+23:17.438 --> 23:22.478
+but I have played with it
+
+23:22.479 --> 23:23.758
+to see what it would look like
+
+23:23.758 --> 23:26.159
+if we were to use BinDat.
+
+23:26.159 --> 23:28.638
+So here's the definition
+
+23:28.638 --> 23:30.638
+of the timetag representation,
+
+23:30.638 --> 23:35.279
+which represents timestamps in osc.
+
+23:35.279 --> 23:37.998
+So you would use bindat-type
+
+23:37.998 --> 23:40.559
+and then you have here :pack-var
+
+23:40.559 --> 23:42.080
+basically gives a name
+
+23:42.080 --> 23:48.559
+when we try to pack a timestamp.
+
+23:48.559 --> 23:51.520
+'time' will be the variable whose name contains
+
+23:51.520 --> 23:54.159
+the actual timestamp we will receive.
+
+23:54.159 --> 23:57.520
+So we want to represent the unpacked value
+
+23:57.520 --> 24:00.240
+as a normal Emacs timestamp,
+
+24:00.240 --> 24:02.480
+and then basically convert from this timestamp
+
+24:02.480 --> 24:06.401
+to a string, or from a string to this timestamp.
+
+24:06.401 --> 24:10.080
+When we receive it, it will be called time,
+
+24:10.080 --> 24:12.240
+so we can refer to it,
+
+24:12.240 --> 24:15.360
+and so in order to actually encode it,
+
+24:15.360 --> 24:18.320
+we basically turn this timestamp into an integer --
+
+24:18.320 --> 24:20.799
+that's what this :pack-val does.
+
+24:20.799 --> 24:23.442
+It says when we try to pack it,
+
+24:23.442 --> 24:26.082
+here's the the value that we should use.
+
+24:26.082 --> 24:27.760
+We turn it into an integer,
+
+24:27.760 --> 24:30.320
+and then this integer is going to be encoded
+
+24:30.320 --> 24:36.162
+as a uint 64-bit. So a 64-bit unsigned integer.
+
+24:36.163 --> 24:38.960
+When we try to unpack the value,
+
+24:38.960 --> 24:40.720
+this 'ticks' field
+
+24:40.720 --> 24:45.679
+will contain an unsigned int of 64 bits.
+
+24:45.679 --> 24:50.559
+We want to return instead a timestamp --
+
+24:50.559 --> 24:53.924
+a time value -- from Emacs.
+
+24:53.924 --> 24:59.363
+Here we use the representation of time
+
+24:59.363 --> 25:02.799
+as a pair of number of ticks
+
+25:02.799 --> 25:06.720
+and the corresponding frequency of those ticks.
+
+25:06.720 --> 25:09.120
+So that's what we do here with :unpack-val,
+
+25:09.120 --> 25:12.004
+which is construct the cons corresponding to it.
+
+25:12.004 --> 25:16.400
+With this definition, bindat-pack/unpack
+
+25:16.400 --> 00:25:19.039
+are going to convert to and from
+
+00:25:19.039 --> 00:25:21.760
+proper time values on one side,
+
+25:21.760 --> 25:26.159
+and binary strings on the other.
+
+25:26.159 --> 25:27.520
+Note, of course,
+
+25:27.520 --> 25:30.320
+that I complained that the old BinDat
+
+25:30.320 --> 25:36.080
+had to use single-field structs for simple types,
+
+25:36.080 --> 25:37.039
+and here, basically,
+
+25:37.039 --> 25:39.840
+I'm back using single-field structs as well
+
+25:39.840 --> 25:41.120
+for this particular case --
+
+25:41.120 --> 25:44.640
+actually a reasonably frequent case, to be honest.
+
+25:44.640 --> 25:49.279
+But at least this is not so problematic,
+
+25:49.279 --> 25:51.840
+because we actually control what is returned,
+
+25:51.840 --> 25:54.159
+so even though it's a single-field struct,
+
+25:54.159 --> 25:56.640
+it's not going to construct an alist
+
+25:56.640 --> 25:58.320
+or force you to construct an alist.
+
+25:58.320 --> 26:02.720
+Instead, it really receives and takes a value
+
+26:02.720 --> 26:07.367
+in the ideal representation that we chose.
+
+26:07.367 --> 26:10.007
+Here we have a more complex example,
+
+26:10.007 --> 26:12.488
+where the actual type is recursive,
+
+26:12.488 --> 26:18.640
+because it's representing those "LEB"...
+
+26:18.640 --> 26:20.400
+I can't remember what "LEB" stands for,
+
+26:20.400 --> 26:22.559
+but it's a representation
+
+26:22.559 --> 26:25.600
+for arbitrary length integers,
+
+26:25.600 --> 26:27.520
+where basically
+
+26:27.520 --> 26:33.360
+every byte is either smaller than 128,
+
+26:33.360 --> 26:36.799
+in which case it's the end of the of the value,
+
+26:36.799 --> 26:39.760
+or it's a value bigger than 128,
+
+26:39.760 --> 26:42.159
+in which case there's an extra byte on the end
+
+26:42.159 --> 26:44.490
+that's going to continue.
+
+26:44.490 --> 26:46.640
+Here we see the representation
+
+26:46.640 --> 26:52.240
+is basically a structure that starts with a byte,
+
+26:52.240 --> 26:53.679
+which contains this value,
+
+26:53.679 --> 26:56.000
+which can be either the last value or not,
+
+26:56.000 --> 26:59.770
+and the tail, which will either be empty,
+
+26:59.770 --> 27:01.279
+or contain something else.
+
+27:01.279 --> 27:04.000
+The empty [case] is here;
+
+27:04.000 --> 27:07.039
+if the head value is smaller than 128,
+
+27:07.039 --> 27:11.840
+then the type of this tail is going to be (unit 0),
+
+27:11.840 --> 27:16.492
+so basically 'unit' is the empty type,
+
+27:16.492 --> 27:20.880
+and 0 is the value we will receive when we read it.
+
+27:20.880 --> 27:25.520
+And if not, then it has as type 'loop',
+
+27:25.520 --> 27:28.240
+which is the type we're defining,
+
+27:28.240 --> 27:30.491
+so it's the recursive case,
+
+27:30.491 --> 27:35.132
+where then the rest of the type is the type itself.
+
+27:35.132 --> 27:37.120
+And so this lets us pack and unpack.
+
+27:37.120 --> 27:39.600
+We pass it an arbitrary size integer,
+
+27:39.600 --> 27:42.240
+and it's going to turn it into
+
+27:42.240 --> 27:48.492
+this LEB128 binary representation, and vice versa.
+
+27:48.492 --> 27:52.480
+I have other examples if you're interested,
+
+27:52.480 --> 00:27:56.093
+but anyway, here's the conclusion.
+
+27:56.094 --> 27:58.320
+We have a simpler, more flexible,
+
+27:58.320 --> 28:01.039
+and more powerful BinDat now,
+
+28:01.039 --> 28:03.454
+which is also significantly faster.
+
+28:03.454 --> 28:06.799
+And I can't remember the exact speed-up,
+
+28:06.799 --> 28:08.720
+but it's definitely not a few percents.
+
+28:08.720 --> 28:12.640
+I vaguely remember about 4x faster in my tests,
+
+28:12.640 --> 28:16.815
+but it's probably very different in different cases
+
+28:16.815 --> 28:20.159
+so it might be just 4x, 2x -- who knows?
+
+28:20.159 --> 28:23.374
+Try it for yourself, but I was pretty pleased,
+
+28:23.374 --> 00:28:28.335
+because it wasn't the main motivation, so anyway...
+
+28:28.336 --> 28:31.135
+The negatives are here.
+
+28:31.135 --> 28:34.480
+In the new system, there's this bindat-defmacro
+
+28:34.480 --> 28:36.720
+which lets us define, kind of, new types,
+
+28:36.720 --> 28:40.895
+and bindat-type also lets us define new types,
+
+28:40.895 --> 28:45.360
+and the distinction between them is a bit subtle;
+
+28:45.360 --> 28:48.080
+it kind of depends on...
+
+28:48.080 --> 28:50.880
+well it has an impact on efficiency
+
+28:50.880 --> 28:53.520
+more than anything, so it's not very satisfactory.
+
+28:53.520 --> 28:56.737
+There's a bit of redundancy between the two.
+
+28:56.737 --> 28:59.039
+There is no bit-level control, just as before.
+
+28:59.039 --> 29:02.097
+We can only manipulate basically bytes.
+
+29:02.098 --> 29:03.360
+So this is definitely not usable
+
+29:03.360 --> 29:09.058
+for a Huffman encoding kind of thing.
+
+29:09.058 --> 29:10.880
+Also, it's not nearly as flexible
+
+29:10.880 --> 29:12.240
+as some of the alternatives.
+
+29:12.240 --> 29:13.760
+So you know GNU Poke
+
+29:13.760 --> 29:20.017
+has been a vague inspiration for this work,
+
+29:20.018 --> 29:22.480
+and GNU Poke gives you a lot more power
+
+29:22.480 --> 29:25.059
+in how to specify the types, etc.
+
+29:25.059 --> 29:26.579
+And of course one of the main downsides
+
+29:26.579 --> 29:28.018
+is that it's still not used very much.
+
+29:28.018 --> 29:29.283
+Actually, the new BinDat
+
+29:29.283 --> 29:31.039
+is not used by any package
+
+29:31.039 --> 29:33.059
+as far as I know right now,
+
+29:33.059 --> 29:35.279
+but even the old one is not used very often,
+
+29:35.279 --> 29:36.799
+so who knows
+
+29:36.799 --> 29:38.799
+whether it's actually going to
+
+29:38.799 --> 29:41.520
+work very much better or not?
+
+29:41.520 --> 29:44.399
+Anyway, this is it for this talk.
+
+29:44.399 --> 29:46.683
+Thank you very much. Have a nice day.
+
+29:46.683 --> 29:47.883
+[captions by John Cummings]