diff options
Diffstat (limited to '')
-rw-r--r-- | 2021/captions/emacsconf-2021-bindat--turbo-bindat--stefan-monnier--main--chapters.vtt | 34 | ||||
-rw-r--r-- | 2021/captions/emacsconf-2021-bindat--turbo-bindat--stefan-monnier--main.vtt | 1855 |
2 files changed, 1889 insertions, 0 deletions
diff --git a/2021/captions/emacsconf-2021-bindat--turbo-bindat--stefan-monnier--main--chapters.vtt b/2021/captions/emacsconf-2021-bindat--turbo-bindat--stefan-monnier--main--chapters.vtt new file mode 100644 index 00000000..16c3fb37 --- /dev/null +++ b/2021/captions/emacsconf-2021-bindat--turbo-bindat--stefan-monnier--main--chapters.vtt @@ -0,0 +1,34 @@ +WEBVTT + +00:00:01.360 --> 00:02:06.006 +Introduction + +00:02:06.007 --> 00:05:27.537 +What is BinDat? + +00:05:27.538 --> 00:08:30.748 +Conversion to lexical scoping + +00:08:30.749 --> 00:15:35.890 +The BinDat specification + +00:15:35.891 --> 00:17:47.579 +New design + +00:17:47.580 --> 00:19:30.225 +Documentation + +00:19:30.226 --> 00:21:51.272 +Advantages + +00:21:51.273 --> 00:23:08.077 +New features + +00:23:08.078 --> 00:27:56.093 +Examples + +00:27:56.094 --> 00:28:28.335 +Conclusion + +00:28:28.336 --> 00:28:29.336 +Negatives diff --git a/2021/captions/emacsconf-2021-bindat--turbo-bindat--stefan-monnier--main.vtt b/2021/captions/emacsconf-2021-bindat--turbo-bindat--stefan-monnier--main.vtt new file mode 100644 index 00000000..2fed8d95 --- /dev/null +++ b/2021/captions/emacsconf-2021-bindat--turbo-bindat--stefan-monnier--main.vtt @@ -0,0 +1,1855 @@ +WEBVTT + +00:01.360 --> 00:04.080 +Hi. So I'm going to talk today + +00:04.180 --> 00:10.000 +about a fun rewrite I did of the BinDat package. + +00:10.000 --> 00:00:12.400 +I call this Turbo BinDat. + +00:00:12.400 --> 00:00:14.001 +Actually, the package hasn't changed name, + +00:14.101 --> 00:16.801 +it's just that the result happens to be faster. + +00:16.901 --> 00:19.521 +The point was not to make it faster though, + +00:19.621 --> 00:22.241 +and the point was not to make you understand + +00:22.341 --> 00:23.440 +that data is not code. + +00:23.540 --> 00:27.120 +It's just one more experience I've had + +00:27.120 --> 00:31.280 +where I've seen that treating data as code + +00:31.381 --> 00:33.522 +is not always a good idea. + +00:33.622 --> 00:36.162 +It's important to keep the difference. + +00:36.162 --> 00:38.881 +So let's get started. + +00:38.881 --> 00:40.642 +So what is BinDat anyway? + +00:40.742 --> 00:43.602 +Here's just the overview of basically + +00:43.602 --> 00:44.962 +what I'm going to present. + +00:45.062 --> 00:47.842 +So I'm first going to present BinDat itself + +00:47.843 --> 00:00:49.039 +for those who don't know it, + +00:00:49.039 --> 00:00:51.923 +which is probably the majority of you. + +00:51.923 --> 00:55.363 +Then I'm going to talk about the actual problems + +00:55.363 --> 00:58.882 +that I encountered with this package + +00:58.882 --> 01:01.843 +that motivated me to rewrite it. + +01:01.843 --> 01:05.043 +Most of them were lack of flexibility, + +01:05.044 --> 01:09.924 +and some of it was just poor behavior + +01:09.924 --> 01:13.364 +with respect to scoping and variables, + +01:13.364 --> 01:16.324 +which of course, you know, is bad -- + +01:16.424 --> 01:20.724 +basically uses of eval or, "eval is evil." + +01:20.724 --> 01:24.884 +Then I'm going to talk about the new design -- + +01:24.985 --> 01:28.005 +how I redesigned it + +01:28.105 --> 01:31.365 +to make it both simpler and more flexible, + +01:31.365 --> 01:32.965 +and where the key idea was + +01:33.065 --> 01:35.205 +to expose code as code + +01:35.305 --> 01:37.525 +instead of having it as data, + +01:37.625 --> 01:39.605 +and so here the distinction between the two + +01:39.706 --> 01:44.085 +is important and made things simpler. + +01:44.085 --> 01:46.405 +I tried to keep efficiency in mind, + +01:46.405 --> 01:52.405 +which resulted in some of the aspects of the design + +01:52.505 --> 01:54.886 +which are not completely satisfactory, + +01:54.886 --> 01:57.046 +but the result is actually fairly efficient. + +01:57.146 --> 01:59.286 +Even though it was not the main motivation, + +01:59.287 --> 02:02.967 +it was one of the nice outcomes. + +02:02.967 --> 00:02:06.006 +And then I'm going to present some examples. + +02:06.007 --> 02:08.167 +So first: what is BinDat? + +02:08.267 --> 02:10.567 +Oh actually, rather than present THIS, + +02:10.667 --> 02:12.407 +I'm going to go straight to the code, + +02:12.507 --> 02:14.246 +because BinDat actually had + +02:14.346 --> 02:16.647 +an introduction which was fairly legible. + +02:16.748 --> 02:21.128 +So here we go: this is the old BinDat from Emacs 27 + +02:21.128 --> 02:23.448 +and the commentary starts by explaining + +02:23.448 --> 02:25.848 +what is BinDat? Basically BinDat is a package + +02:25.948 --> 02:30.247 +that lets you parse and unparse + +02:30.247 --> 02:31.527 +basically binary data. + +02:31.627 --> 02:34.648 +The intent is to have typically network data + +02:34.749 --> 02:35.849 +or something like this. + +02:35.949 --> 02:38.328 +So assuming you have network data, + +02:38.328 --> 02:41.528 +presented or defined + +02:41.628 --> 02:44.569 +with some kind of C-style structs, typically, + +02:44.669 --> 02:46.009 +or something along these lines. + +02:46.109 --> 02:49.120 +So you [?] type with documentation + +02:49.120 --> 02:52.809 +that presents something like those structs here, + +02:52.810 --> 02:57.130 +and you want to be able to generate such packets + +02:57.230 --> 03:00.249 +and read such packets, + +03:00.349 --> 03:02.090 +so the way you do it is + +03:02.190 --> 03:04.570 +you rewrite those specifications + +03:04.670 --> 03:06.010 +into the BinDat syntax. + +03:06.110 --> 03:07.529 +So here's the BinDat syntax + +03:07.529 --> 03:10.490 +for the the previous specification. + +03:10.491 --> 03:11.610 +So here, for example, + +03:11.610 --> 03:16.970 +you see the case for a data packet + +03:16.970 --> 03:20.411 +which will have a 'type' field which is a byte + +03:20.411 --> 03:24.091 +(an unsigned 8-bit entity), + +03:24.091 --> 03:26.411 +then an 'opcode' which is also a byte, + +03:26.411 --> 03:30.731 +then a 'length' which is a 16-bit unsigned integer + +03:30.732 --> 03:34.092 +in little endian order, + +03:34.092 --> 03:38.732 +and then some 'id' for this entry, which is + +03:38.732 --> 03:43.531 +8 bytes containing a zero-terminated string, + +03:43.531 --> 03:47.531 +and then the actual data, basically the payload, + +03:47.532 --> 03:51.453 +which is in this case a vector of bytes, + +03:51.453 --> 03:54.812 +('bytes' here doesn't doesn't need to be specified) + +03:54.812 --> 03:58.172 +and here we specify the length of this vector. + +03:58.172 --> 03:59.773 +This 'length' here + +03:59.773 --> 04:02.252 +happens to be actually the name of THIS field, + +04:02.252 --> 04:03.853 +so the length of the data + +04:03.854 --> 04:06.574 +is specified by the 'length' field here, + +04:06.574 --> 04:08.574 +and BinDat will understand this part, + +04:08.574 --> 04:12.333 +which is the the nice part of BinDat. + +04:12.333 --> 04:15.774 +And then you have an alignment field at the end, + +04:15.774 --> 04:18.253 +which is basically padding. + +04:18.253 --> 04:20.574 +It says that it is padded + +04:20.575 --> 04:23.295 +until the next multiple of four. + +04:23.295 --> 04:25.855 +Okay. So this works reasonably well. + +04:25.855 --> 04:27.455 +This is actually very nice. + +04:27.455 --> 04:30.335 +With this, you can then call + +04:30.335 --> 04:32.975 +bindat-pack or bindat-unpack, + +04:32.975 --> 04:37.774 +passing it a string, or passing it an alist, + +04:37.774 --> 04:40.415 +to do the packing and unpacking. + +04:40.416 --> 04:43.296 +So, for example, if you take this string-- + +04:43.296 --> 04:45.856 +actually, in this case, it's a vector of bytes + +04:45.856 --> 04:49.456 +but it works the same; it works in both ways-- + +04:49.456 --> 04:53.536 +if you pass this to bindat-unpack, + +04:53.536 --> 04:57.456 +it will presumably return you this structure + +04:57.457 --> 05:00.017 +if you've given it the corresponding type. + +05:00.017 --> 05:01.776 +So it will extract-- + +05:01.776 --> 05:05.617 +you will see that there is an IP address, + +05:05.617 --> 05:08.017 +which is a destination IP, a source IP, + +05:08.017 --> 05:09.857 +and some port number, + +05:09.857 --> 05:12.977 +and some actual data here and there, etc. + +05:12.977 --> 05:18.017 +So this is quite convenient if you need to do this, + +05:18.018 --> 05:20.898 +and that's what it was designed for. + +05:20.898 --> 00:05:27.537 +So here we are. Let's go back to the actual talk. + +05:27.538 --> 05:34.338 +I converted BinDat to lexical scoping at some point + +05:34.339 --> 05:37.299 +and things seemed to work fine, + +05:37.299 --> 05:42.819 +except, at some point, probably weeks later, + +05:42.819 --> 05:47.139 +I saw a bug report + +05:47.139 --> 05:53.058 +about the new version using lexical scoping + +05:53.059 --> 05:56.339 +not working correctly with WeeChat. + +05:56.339 --> 06:00.580 +So here's the actual chunk of code + +06:00.580 --> 06:02.820 +that appears in WeeChat. + +06:02.820 --> 06:08.420 +Here you see that they also define a BinDat spec. + +06:08.421 --> 06:14.741 +It's a packet that has a 32-bit unsigned length, + +06:14.741 --> 06:18.500 +then some compression byte/compression information, + +06:18.500 --> 06:23.780 +then an id which contains basically another struct + +06:23.780 --> 06:26.901 +(which is specified elsewhere; doesn't matter here), + +06:26.902 --> 06:28.661 +and after that, a vector + +06:28.661 --> 06:33.382 +whose size is not just specified by 'length', + +06:33.382 --> 06:35.142 +but is computed from 'length'. + +06:35.142 --> 06:39.142 +So here's how they used to compute it in WeeChat. + +06:39.142 --> 06:42.822 +So the length here can be specified in BinDat. + +06:42.822 --> 06:43.941 +Instead of having + +06:43.942 --> 06:45.863 +just a reference to one of the fields, + +06:45.863 --> 06:48.903 +or having a constant, you can actually compute it, + +06:48.903 --> 06:52.502 +where you have to use this '(eval', + +06:52.502 --> 06:54.743 +and then followed by the actual expression + +06:54.743 --> 06:58.103 +where you say how you compute it. + +06:58.103 --> 07:01.463 +And here you see that it actually computes it + +07:01.464 --> 07:04.904 +based on the 'length of the structure -- + +07:04.904 --> 07:07.783 +that's supposed to be this 'length' field here -- + +07:07.783 --> 07:11.223 +and it's referred to using the bindat-get-field + +07:11.223 --> 07:14.503 +to extract the field from the variable 'struct'. + +07:14.503 --> 07:17.943 +And then it subtracts four, it subtracts one, + +07:17.943 --> 07:19.467 +and adds some other things + +07:19.468 --> 07:22.185 +which depend on some field + +07:22.185 --> 07:26.905 +that's found in this 'id' field here. + +07:26.905 --> 07:28.425 +And the problem with this code + +07:28.425 --> 07:30.425 +was that it broke + +07:30.425 --> 07:32.745 +because of this 'struct' variable here, + +07:32.745 --> 07:35.145 +because this 'struct' variable is not defined + +07:35.145 --> 07:38.105 +anywhere in the specification of BinDat. + +07:38.106 --> 07:41.866 +It was used internally as a local variable, + +07:41.866 --> 07:45.306 +and because it was using dynamic scoping, + +07:45.306 --> 07:47.386 +it actually happened to be available here, + +07:47.386 --> 07:50.826 +but the documentation nowhere specifies it. + +07:50.826 --> 07:52.506 +So it was not exactly + +07:52.506 --> 07:55.546 +a bug of the conversion to lexical scoping, + +07:55.547 --> 07:58.906 +but it ended up breaking this code. + +07:58.906 --> 08:01.226 +And there was no way to actually + +08:01.226 --> 08:05.066 +fix the code within the specification of BinDat. + +08:05.066 --> 08:08.287 +You had to go outside the specification of BinDat + +08:08.287 --> 08:10.427 +to fix this problem. + +08:10.427 --> 08:14.346 +This is basically how I started looking at BinDat. + +08:14.347 --> 08:17.808 +Then I went to actually investigate a bit more + +08:17.808 --> 08:19.627 +what was going on, + +08:19.627 --> 08:22.108 +and the thing I noticed along the way + +08:22.108 --> 08:25.787 +was basically that the specification of BinDat + +08:25.787 --> 08:29.528 +is fairly complex and has a lot of eval + +08:29.528 --> 08:30.748 +and things like this. + +08:30.749 --> 08:32.288 +So let's take a look + +08:32.288 --> 08:35.068 +at what the BinDat specification looks like. + +08:35.068 --> 08:36.589 +So here it's actually documented + +08:36.589 --> 08:40.269 +as a kind of grammar rules. + +08:40.269 --> 08:45.308 +A specification is basically a sequence of items, + +08:45.308 --> 08:47.389 +and then each of the items is basically + +08:47.389 --> 08:51.248 +a FIELD of a struct, so it has a FIELD name, + +08:51.249 --> 08:53.249 +and then a TYPE. + +08:53.249 --> 08:54.510 +Instead of a TYPE, + +08:54.510 --> 08:56.590 +it could have some other FORM for eval, + +08:56.590 --> 08:58.989 +which was basically never used as far as I know, + +08:58.989 --> 09:00.190 +or it can be some filler, + +09:00.190 --> 09:02.750 +or you can have some 'align' specification, + +09:02.750 --> 09:05.150 +or you can refer to another struct. + +09:05.150 --> 09:07.390 +It could also be some kind of union, + +09:07.391 --> 09:10.430 +or it can be some kind of repetition of something. + +09:10.430 --> 09:12.430 +And then you have the TYPE specified here, + +09:12.430 --> 09:18.271 +which can be some integers, strings, or a vector, + +09:18.271 --> 09:21.631 +and there are a few other special cases. + +09:21.631 --> 09:25.310 +And then the actual field itself + +09:25.311 --> 09:28.192 +can be either a NAME, or something that's computed, + +09:28.192 --> 09:30.752 +and then everywhere here, you have LEN, + +09:30.752 --> 00:09:32.480 +which specifies the length of vectors, + +00:09:32.480 --> 00:09:34.672 +for example, or length of strings. + +09:34.672 --> 09:37.632 +This is actually either nil to mean one, + +09:37.632 --> 09:39.072 +or it can be an ARG, + +09:39.072 --> 09:40.952 +where ARG is defined to be + +09:40.952 --> 09:42.672 +either an integer or DEREF, + +09:42.673 --> 09:46.673 +where DEREF is basically a specification + +09:46.673 --> 09:48.833 +that can refer, for example, to the 'length' field + +09:48.833 --> 09:51.956 +-- that's what we saw between parentheses: (length) + +09:51.956 --> 09:56.273 +was this way to refer to the 'length' field. + +09:56.273 --> 09:59.793 +Or it can be an expression, which is what we saw + +09:59.794 --> 10:02.834 +in the computation of the length for WeeChat, + +10:02.834 --> 10:04.914 +where you just had a '(eval' + +10:04.914 --> 10:06.334 +and then some computation + +10:06.334 --> 10:10.274 +of the length of the payload. + +10:10.274 --> 10:12.354 +And so if you look here, you see that + +10:12.354 --> 10:14.674 +it is fairly large and complex, + +10:14.674 --> 10:18.514 +and it uses eval everywhere. And actually, + +10:18.515 --> 10:20.675 +it's not just that it has eval in its syntax, + +10:20.675 --> 10:23.395 +but the implementation has to use eval everywhere, + +10:23.395 --> 10:25.314 +because, if you go back + +10:25.314 --> 10:27.475 +to see the kind of code we see, + +10:27.475 --> 10:29.538 +we see here we just define + +10:29.538 --> 10:34.195 +weechat--relay-message-spec as a constant! + +10:34.195 --> 10:37.314 +It's nothing than just data, right? + +10:37.315 --> 10:38.836 +So within this data + +10:38.836 --> 10:41.076 +there are things we need to evaluate, + +10:41.076 --> 10:42.356 +but it's pure data, + +10:42.356 --> 10:44.356 +so it will have to be evaluated + +10:44.356 --> 10:46.596 +by passing it to eval. It can't be compiled, + +10:46.596 --> 10:50.196 +because it's within a quote, right? + +10:50.196 --> 10:52.836 +And so for that reason, kittens really + +10:52.837 --> 10:55.956 +suffer terribly with uses of BinDat. + +10:55.956 --> 10:59.957 +You really have to be very careful with that. + +10:59.957 --> 11:02.037 +More seriously, + +11:02.037 --> 11:05.157 +the 'struct' variable was not documented, + +11:05.157 --> 11:07.797 +and yet it's indispensable + +11:07.797 --> 11:08.996 +for important applications, + +11:08.996 --> 11:11.157 +such as using in WeeChat. + +11:11.158 --> 11:13.078 +So clearly this needs to be fixed. + +11:13.078 --> 11:15.481 +Of course, we can just document 'struct' + +11:15.481 --> 11:18.038 +as some variable that's used there, + +11:18.038 --> 11:19.798 +but of course we don't want to do that, + +11:19.798 --> 11:23.398 +because 'struct' is not obviously + +11:23.398 --> 11:25.398 +a dynamically scoped variable, + +11:25.398 --> 11:29.317 +so it's not very clean. + +11:29.318 --> 11:31.939 +Also other problems I noticed was that the grammar + +11:31.939 --> 11:35.239 +is significantly more complex than necessary. + +11:35.239 --> 11:38.199 +We have nine distinct non-terminals. + +11:38.199 --> 11:39.639 +There is ambiguity. + +11:39.639 --> 11:44.919 +If you try to use a field whose name is 'align', + +11:44.919 --> 11:48.679 +or 'fill', or something like this, + +11:48.680 --> 11:50.920 +then it's going to be misinterpreted, + +11:50.920 --> 11:54.920 +or it can be misinterpreted. + +11:54.920 --> 11:58.760 +The vector length can be either an expression, + +11:58.760 --> 12:02.280 +or an integer, or a reference to a label, + +12:02.280 --> 12:03.720 +but the expression + +12:03.720 --> 12:06.360 +should already be the general case, + +12:06.361 --> 12:08.041 +and this expression can itself be + +12:08.041 --> 12:09.401 +just a constant integer, + +12:09.401 --> 12:13.961 +so this complexity is probably not indispensable, + +12:13.961 --> 12:15.641 +or it could be replaced with something simpler. + +12:15.641 --> 12:17.401 +That's what I felt like. + +12:17.401 --> 12:19.161 +And basically lots of places + +12:19.161 --> 12:21.721 +allow an (eval EXP) form somewhere + +12:21.721 --> 12:25.081 +to open up the door for more flexibility, + +12:25.082 --> 12:26.922 +but not all of them do, + +12:26.922 --> 12:29.482 +and we don't really want + +12:29.482 --> 12:31.001 +to have this eval there, right? + +12:31.001 --> 12:33.802 +It's not very convenient syntactically either. + +12:33.802 --> 12:36.042 +So it makes the uses of eval + +12:36.042 --> 12:38.362 +a bit heavier than they need to be, + +12:38.362 --> 12:41.722 +and so I didn't really like this part. + +12:41.723 --> 12:42.603 +Another part is that + +12:42.603 --> 12:45.183 +when I tried to figure out what was going on, + +12:45.183 --> 12:46.666 +[dog barks and distracts Stefan] + +12:46.666 --> 12:50.043 +I had trouble... [Winnie] as well, as you can hear. + +12:50.043 --> 12:50.923 +She had trouble as well. + +12:50.923 --> 12:53.083 +But one of the troubles was that + +12:53.083 --> 12:55.002 +there was no way to debug the code + +12:55.002 --> 12:57.562 +via Edebug, because it's just data, + +12:57.562 --> 13:00.523 +so Edebug doesn't know that it has to look at it + +13:00.524 --> 13:02.683 +and instrument it. + +13:02.683 --> 13:05.644 +And of course it was not conveniently extensible. + +13:05.644 --> 13:07.164 +That's also one of the things + +13:07.164 --> 13:08.487 +I noticed along the way. + +13:09.084 --> 13:12.844 +Okay, so here's an example of + +13:12.844 --> 13:15.484 +problems not that I didn't just see there, + +13:15.485 --> 13:18.684 +but that were actually present in code. + +13:18.684 --> 13:22.124 +I went to look at code that was using BinDat + +13:22.124 --> 13:24.285 +to see what uses looked like, + +13:24.285 --> 13:28.765 +and I saw that BinDat was not used very heavily, + +13:28.765 --> 13:30.365 +but some of the main uses + +13:30.365 --> 13:33.884 +were just to read and write integers. + +13:33.885 --> 13:37.565 +And here you can see a very typical case. + +13:37.565 --> 13:41.726 +This is also coming from WeeChat. + +13:41.726 --> 13:43.565 +We do a bindat-get-field + +13:43.565 --> 13:48.445 +of the length of some struct we read. + +13:48.445 --> 13:50.685 +Actually, the struct we read is here. + +13:50.685 --> 13:51.646 +It has a single field, + +13:51.647 --> 13:53.006 +because the only thing we want to do + +13:53.006 --> 13:56.287 +is actually to unpack a 32-bit integer, + +13:56.287 --> 13:58.287 +but the only way we can do that + +13:58.287 --> 14:01.647 +is by specifying a struct with one field. + +14:01.647 --> 14:04.847 +And so we have to extract this struct of one field, + +14:04.847 --> 14:07.246 +which constructs an alist + +14:07.246 --> 14:09.647 +containing the actual integer, + +14:09.648 --> 14:11.887 +and then we just use get-field to extract it. + +14:11.887 --> 14:15.007 +So this doesn't seem very elegant + +14:15.007 --> 14:16.528 +to have to construct an alist + +14:16.528 --> 14:20.368 +just to then extract the integer from it. + +14:20.368 --> 14:21.648 +Same thing if you try to pack it: + +14:21.648 --> 14:25.007 +you first have to construct the alist + +14:25.007 --> 14:31.247 +to pass it to bindat-pack unnecessarily. + +14:31.248 --> 14:33.248 +Another problem that I saw in this case + +14:33.248 --> 14:35.729 +(it was in the websocket package) + +14:35.729 --> 14:39.568 +was here, where they actually have a function + +14:39.568 --> 14:41.169 +where they need to write + +14:41.169 --> 14:43.888 +an integer of a size that will vary + +14:43.888 --> 14:45.888 +depending on the circumstances. + +14:45.889 --> 14:49.650 +And so they have to test the value of this integer, + +14:49.650 --> 14:52.210 +and depending on which one it is, + +14:52.210 --> 14:54.449 +they're going to use different types. + +14:54.449 --> 14:56.290 +So here it's a case + +14:56.290 --> 14:59.490 +where we want to have some kind of way to eval -- + +14:59.490 --> 15:02.530 +to compute the length of the integer -- + +15:02.531 --> 15:08.130 +instead of it being predefined or fixed. + +15:08.130 --> 15:10.211 +So this is one of the cases + +15:10.211 --> 15:16.531 +where the lack of eval was a problem. + +15:16.531 --> 15:20.051 +And actually in all of websocket, + +15:20.051 --> 15:22.611 +BinDat is only used to pack and unpack integers, + +15:22.612 --> 15:24.612 +even though there are many more opportunities + +15:24.612 --> 15:26.772 +to use BinDat in there. + +15:26.772 --> 15:29.331 +But it's not very convenient to use BinDat, + +15:29.331 --> 00:15:35.890 +as it stands, for those other cases. + +15:35.891 --> 15:39.732 +So what does the new design look like? + +15:39.733 --> 15:44.132 +Well in the new design, here's the problematic code + +15:44.132 --> 15:46.373 +for WeeChat. + +15:46.373 --> 15:49.012 +So we basically have the same fields as before, + +15:49.012 --> 15:50.853 +you just see that instead of u32, + +15:50.853 --> 15:53.733 +we now have 'uint 32' separately. + +15:53.733 --> 15:55.332 +The idea is that now this 32 + +15:55.332 --> 15:59.093 +can be an expression you can evaluate, + +15:59.094 --> 16:04.054 +and so the u8 is also replaced by 'uint 8', + +16:04.054 --> 16:07.253 +and the id type is basically the same as before, + +16:07.253 --> 16:08.854 +and here another difference we see, + +16:08.854 --> 16:11.654 +and the main difference... + +16:11.654 --> 16:13.494 +Actually, it's the second main difference. + +16:13.494 --> 16:15.174 +The first main difference is that + +16:15.175 --> 16:18.694 +we don't actually quote this whole thing. + +16:18.694 --> 16:23.095 +Instead, we pass it to the bindat-type macro. + +16:23.095 --> 16:25.095 +So this is a macro + +16:25.095 --> 16:27.574 +that's going to actually build the type. + +16:27.574 --> 16:29.254 +This is a big difference + +16:29.254 --> 16:30.535 +in terms of performance also, + +16:30.535 --> 16:32.694 +because by making it a macro, + +16:32.695 --> 16:34.296 +we can pre-compute the code + +16:34.296 --> 16:37.255 +that's going to pack and unpack this thing, + +16:37.255 --> 16:38.936 +instead of having to interpret it + +16:38.936 --> 16:41.096 +every time we pack and unpack. + +16:41.096 --> 16:43.815 +So this macro will generate more efficient code + +16:43.815 --> 16:45.815 +along the way. + +16:45.815 --> 16:48.695 +Also it makes the code that appears in here + +16:48.695 --> 16:50.296 +visible to the compiler + +16:50.297 --> 16:54.617 +because we can give an Edebug spec for it. + +16:54.617 --> 16:57.497 +And so here as an argument to vec, + +16:57.497 --> 16:59.016 +instead of having to specify + +16:59.016 --> 17:00.937 +that this is an evaluated expression, + +17:00.937 --> 17:02.777 +we just write the expression directly, + +17:02.777 --> 17:05.096 +because all the expressions that appear there + +17:05.096 --> 17:07.417 +will just be evaluated, + +17:07.418 --> 17:11.418 +and we don't need to use the 'struct' variable + +17:11.418 --> 17:14.137 +and then extract the length field from it. + +17:14.137 --> 17:16.938 +We can just use length as a variable. + +17:16.938 --> 17:18.698 +So this variable 'length' here + +17:18.698 --> 17:20.778 +will refer to this field here, + +17:20.778 --> 17:23.578 +and then this variable 'id' here + +17:23.578 --> 17:25.897 +will refer to this field here, + +17:25.898 --> 17:27.738 +and so we can just use the field values + +17:27.738 --> 17:30.459 +as local variables, which is very natural + +17:30.459 --> 00:17:31.679 +and very efficient also, + +00:17:31.679 --> 00:17:34.618 +because the code would actually directly do that, + +17:34.618 --> 17:37.899 +and the code that unpacks those data + +17:37.899 --> 17:40.299 +will just extract an integer + +17:40.299 --> 17:42.219 +and bind it to the length variable, + +17:42.219 --> 17:47.579 +and so that makes it immediately available there. + +17:47.580 --> 17:51.340 +Okay, let's see also + +17:51.340 --> 17:54.220 +what the actual documentation looks like. + +17:54.220 --> 17:57.739 +And so if we look at the doc of BinDat, + +17:57.739 --> 18:01.180 +we see the actual specification of the grammar. + +18:01.181 --> 18:03.181 +And so here we see instead of having + +18:03.181 --> 18:06.461 +these nine different non-terminals, + +18:06.461 --> 18:08.061 +we basically have two: + +18:08.061 --> 18:10.781 +we have the non-terminal for TYPE, + +18:10.781 --> 18:15.021 +which can be either a uint, a uintr, or a string, + +18:15.021 --> 18:17.421 +or bits, or fill, or align, or vec, + +18:17.421 --> 18:19.901 +or those various other forms; + +18:19.902 --> 18:22.621 +or it can be a struct, in which case, + +18:22.621 --> 18:23.981 +in the case of struct, + +18:23.981 --> 18:27.502 +then it will be followed by a sequence -- + +18:27.502 --> 18:30.142 +a list of FIELDs, where each of the FIELDs + +18:30.142 --> 18:33.902 +is basically a LABEL followed by another TYPE. + +18:33.902 --> 18:37.342 +And so this makes the whole specification + +18:37.343 --> 18:39.823 +much simpler. We don't have any distinction now + +18:39.823 --> 18:42.862 +between struct being a special case, + +18:42.862 --> 18:46.383 +as opposed to just the normal types. + +18:46.383 --> 18:49.263 +struct is just now one of the possible types + +18:49.263 --> 18:52.543 +that can appear here. + +18:52.543 --> 18:53.263 +The other thing is that + +18:53.263 --> 18:55.742 +the LABEL is always present in the structure, + +18:55.743 --> 18:58.384 +so there's no ambiguity. + +18:58.384 --> 19:00.304 +Also all the above things, + +19:00.304 --> 19:03.103 +like the BITLEN we have here, + +19:03.103 --> 19:04.384 +the LEN we have here, + +19:04.384 --> 19:07.504 +the COUNT for vector we have here, + +19:07.504 --> 19:10.224 +these are all plain Elisp expressions, + +19:10.224 --> 19:13.024 +so they are implicitly evaluated if necessary. + +19:13.025 --> 19:14.705 +If you want them to be constant, + +19:14.705 --> 19:16.705 +and really constant, you can just use quotes, + +19:16.705 --> 19:20.145 +for those [rare cases] where it's necessary. + +19:20.145 --> 19:21.905 +Another thing is that you can extend it + +19:21.905 --> 19:25.505 +with with bindat-defmacro. + +19:25.505 --> 19:30.225 +Okay, let's go back here. + +19:30.226 --> 19:32.706 +So what are the advantages of this approach? + +19:32.706 --> 19:34.625 +As I said, one of the main advantages + +19:34.625 --> 19:39.346 +is that we now have support for Edebug. + +19:39.346 --> 19:41.426 +We don't have 'struct', 'repeat', and 'align' + +19:41.426 --> 19:42.946 +as special cases anymore. + +19:42.946 --> 19:44.625 +These are just normal types. + +19:44.625 --> 19:48.066 +Before, there was uint as type, int as type, + +19:48.067 --> 19:49.267 +and those kinds of things. + +19:49.267 --> 19:51.110 +'struct' and 'repeat' and 'align' + +19:51.110 --> 19:53.267 +were in a different case. + +19:53.267 --> 19:54.387 +So there were + +19:54.387 --> 19:56.787 +some subtle differences between those + +19:56.787 --> 19:59.027 +that completely disappeared. + +19:59.027 --> 20:02.626 +Also in the special cases, there was 'union', + +20:02.626 --> 20:05.027 +and union now has completely disappeared. + +20:05.027 --> 20:07.827 +We don't need it anymore, because instead, + +20:07.828 --> 20:09.588 +we can actually use code anywhere. + +20:09.588 --> 20:11.908 +That's one of the things I didn't mention here, + +20:11.908 --> 20:17.268 +but in this note here, + +20:17.268 --> 20:19.747 +that's one of the important notes. + +20:19.747 --> 20:21.987 +Not only are BITLEN, LEN, COUNT etc. + +20:21.987 --> 20:23.028 +Elisp expressions, + +20:23.028 --> 20:26.788 +but the type itself -- any type itself -- + +20:26.789 --> 20:29.029 +is basically an expression. + +20:29.029 --> 20:32.709 +And so you can, instead of having 'uint BITLEN', + +20:32.709 --> 20:36.628 +you can have '(if blah-blah-blah uint string)', + +20:36.628 --> 20:38.149 +and so you can have a field + +20:38.149 --> 20:40.549 +that can be either string or an int, + +20:40.549 --> 20:44.789 +depending on some condition. + +20:44.790 --> 20:46.869 +And for that reason we don't need a union. + +20:46.869 --> 20:47.910 +Instead of having a union, + +20:47.910 --> 20:50.710 +we can just have a 'cond' or a 'pcase' + +20:50.710 --> 20:53.590 +that will return the type we want to use, + +20:53.590 --> 20:55.109 +depending on the context, + +20:55.109 --> 21:00.950 +which will generally depend on some previous field. + +21:00.951 --> 21:03.750 +Also we don't need to use single-field structs + +21:03.750 --> 21:05.351 +for simple types anymore, + +21:05.351 --> 21:09.271 +because there's no distinction between struct + +21:09.271 --> 21:11.271 +and other types. + +21:11.271 --> 21:17.191 +So we can pass to bindat-pack and bindat-unpack + +21:17.191 --> 21:20.951 +a specification which just says "here's an integer" + +21:20.952 --> 21:24.392 +and we'll just pack and unpack the integer. + +21:24.392 --> 21:26.472 +And of course now all the code is exposed, + +21:26.472 --> 21:29.192 +so not only Edebug works, but also Flymake, + +21:29.192 --> 21:30.392 +and the compiler, etc. -- + +21:30.392 --> 21:33.111 +they can complain about it, + +21:33.111 --> 21:38.871 +and give you warnings and errors as we like them. + +21:38.872 --> 21:44.553 +And of course the kittens are much happier. + +21:44.553 --> 21:48.153 +Okay. This is going a bit over time, + +21:48.153 --> 00:21:51.272 +so let's try to go faster. + +21:51.273 --> 21:53.752 +Here are some of the new features + +21:53.753 --> 21:54.794 +that are introduced. + +21:54.794 --> 21:56.314 +I already mentioned briefly + +21:56.314 --> 22:00.633 +that you can define new types with bindat-defmacro. + +22:00.633 --> 22:04.474 +that's one of the important novelties, + +22:04.474 --> 22:08.794 +and you can extend BinDat with new types this way. + +22:08.794 --> 22:10.714 +The other thing you can do is + +22:10.714 --> 22:16.233 +you can control how values, or packets[, rather,] + +22:16.234 --> 22:20.315 +are unpacked, and how they are represented. + +22:20.315 --> 22:22.555 +In the old BinDat, + +22:22.555 --> 22:24.315 +the packet is necessarily represented, + +22:24.315 --> 22:28.634 +when you unpack it, as an alist, basically, + +22:28.635 --> 22:30.396 +or a struct becomes an alist, + +22:30.396 --> 22:31.676 +and that's all there is. + +22:31.676 --> 22:34.076 +You don't have any choice about it. + +22:34.076 --> 22:35.596 +With the new system, + +22:35.596 --> 22:38.076 +by default, it also returns just an alist, + +22:38.076 --> 22:41.916 +but you can actually control what it's unpacked as, + +22:41.916 --> 22:46.396 +or what it's packed from, using these keywords. + +22:46.396 --> 22:49.596 +With :unpack-val, you can give an expression + +22:49.597 --> 22:53.357 +that will construct the unpacked value + +22:53.357 --> 22:56.957 +from the various fields. + +22:56.957 --> 22:59.197 +And with :pack-val and :pack-var, + +22:59.197 --> 23:02.557 +you can specify how to extract the information + +23:02.557 --> 23:05.116 +from the unpacked value + +23:05.117 --> 00:23:08.077 +to generate the pack value. + +23:08.078 --> 23:12.637 +So here are some examples. + +23:12.637 --> 23:15.358 +Here's an example taken from osc. + +23:15.358 --> 23:17.438 +osc actually doesn't use BinDat currently, + +23:17.438 --> 23:22.478 +but I have played with it + +23:22.479 --> 23:23.758 +to see what it would look like + +23:23.758 --> 23:26.159 +if we were to use BinDat. + +23:26.159 --> 23:28.638 +So here's the definition + +23:28.638 --> 23:30.638 +of the timetag representation, + +23:30.638 --> 23:35.279 +which represents timestamps in osc. + +23:35.279 --> 23:37.998 +So you would use bindat-type + +23:37.998 --> 23:40.559 +and then you have here :pack-var + +23:40.559 --> 23:42.080 +basically gives a name + +23:42.080 --> 23:48.559 +when we try to pack a timestamp. + +23:48.559 --> 23:51.520 +'time' will be the variable whose name contains + +23:51.520 --> 23:54.159 +the actual timestamp we will receive. + +23:54.159 --> 23:57.520 +So we want to represent the unpacked value + +23:57.520 --> 24:00.240 +as a normal Emacs timestamp, + +24:00.240 --> 24:02.480 +and then basically convert from this timestamp + +24:02.480 --> 24:06.401 +to a string, or from a string to this timestamp. + +24:06.401 --> 24:10.080 +When we receive it, it will be called time, + +24:10.080 --> 24:12.240 +so we can refer to it, + +24:12.240 --> 24:15.360 +and so in order to actually encode it, + +24:15.360 --> 24:18.320 +we basically turn this timestamp into an integer -- + +24:18.320 --> 24:20.799 +that's what this :pack-val does. + +24:20.799 --> 24:23.442 +It says when we try to pack it, + +24:23.442 --> 24:26.082 +here's the the value that we should use. + +24:26.082 --> 24:27.760 +We turn it into an integer, + +24:27.760 --> 24:30.320 +and then this integer is going to be encoded + +24:30.320 --> 24:36.162 +as a uint 64-bit. So a 64-bit unsigned integer. + +24:36.163 --> 24:38.960 +When we try to unpack the value, + +24:38.960 --> 24:40.720 +this 'ticks' field + +24:40.720 --> 24:45.679 +will contain an unsigned int of 64 bits. + +24:45.679 --> 24:50.559 +We want to return instead a timestamp -- + +24:50.559 --> 24:53.924 +a time value -- from Emacs. + +24:53.924 --> 24:59.363 +Here we use the representation of time + +24:59.363 --> 25:02.799 +as a pair of number of ticks + +25:02.799 --> 25:06.720 +and the corresponding frequency of those ticks. + +25:06.720 --> 25:09.120 +So that's what we do here with :unpack-val, + +25:09.120 --> 25:12.004 +which is construct the cons corresponding to it. + +25:12.004 --> 25:16.400 +With this definition, bindat-pack/unpack + +25:16.400 --> 00:25:19.039 +are going to convert to and from + +00:25:19.039 --> 00:25:21.760 +proper time values on one side, + +25:21.760 --> 25:26.159 +and binary strings on the other. + +25:26.159 --> 25:27.520 +Note, of course, + +25:27.520 --> 25:30.320 +that I complained that the old BinDat + +25:30.320 --> 25:36.080 +had to use single-field structs for simple types, + +25:36.080 --> 25:37.039 +and here, basically, + +25:37.039 --> 25:39.840 +I'm back using single-field structs as well + +25:39.840 --> 25:41.120 +for this particular case -- + +25:41.120 --> 25:44.640 +actually a reasonably frequent case, to be honest. + +25:44.640 --> 25:49.279 +But at least this is not so problematic, + +25:49.279 --> 25:51.840 +because we actually control what is returned, + +25:51.840 --> 25:54.159 +so even though it's a single-field struct, + +25:54.159 --> 25:56.640 +it's not going to construct an alist + +25:56.640 --> 25:58.320 +or force you to construct an alist. + +25:58.320 --> 26:02.720 +Instead, it really receives and takes a value + +26:02.720 --> 26:07.367 +in the ideal representation that we chose. + +26:07.367 --> 26:10.007 +Here we have a more complex example, + +26:10.007 --> 26:12.488 +where the actual type is recursive, + +26:12.488 --> 26:18.640 +because it's representing those "LEB"... + +26:18.640 --> 26:20.400 +I can't remember what "LEB" stands for, + +26:20.400 --> 26:22.559 +but it's a representation + +26:22.559 --> 26:25.600 +for arbitrary length integers, + +26:25.600 --> 26:27.520 +where basically + +26:27.520 --> 26:33.360 +every byte is either smaller than 128, + +26:33.360 --> 26:36.799 +in which case it's the end of the of the value, + +26:36.799 --> 26:39.760 +or it's a value bigger than 128, + +26:39.760 --> 26:42.159 +in which case there's an extra byte on the end + +26:42.159 --> 26:44.490 +that's going to continue. + +26:44.490 --> 26:46.640 +Here we see the representation + +26:46.640 --> 26:52.240 +is basically a structure that starts with a byte, + +26:52.240 --> 26:53.679 +which contains this value, + +26:53.679 --> 26:56.000 +which can be either the last value or not, + +26:56.000 --> 26:59.770 +and the tail, which will either be empty, + +26:59.770 --> 27:01.279 +or contain something else. + +27:01.279 --> 27:04.000 +The empty [case] is here; + +27:04.000 --> 27:07.039 +if the head value is smaller than 128, + +27:07.039 --> 27:11.840 +then the type of this tail is going to be (unit 0), + +27:11.840 --> 27:16.492 +so basically 'unit' is the empty type, + +27:16.492 --> 27:20.880 +and 0 is the value we will receive when we read it. + +27:20.880 --> 27:25.520 +And if not, then it has as type 'loop', + +27:25.520 --> 27:28.240 +which is the type we're defining, + +27:28.240 --> 27:30.491 +so it's the recursive case, + +27:30.491 --> 27:35.132 +where then the rest of the type is the type itself. + +27:35.132 --> 27:37.120 +And so this lets us pack and unpack. + +27:37.120 --> 27:39.600 +We pass it an arbitrary size integer, + +27:39.600 --> 27:42.240 +and it's going to turn it into + +27:42.240 --> 27:48.492 +this LEB128 binary representation, and vice versa. + +27:48.492 --> 27:52.480 +I have other examples if you're interested, + +27:52.480 --> 00:27:56.093 +but anyway, here's the conclusion. + +27:56.094 --> 27:58.320 +We have a simpler, more flexible, + +27:58.320 --> 28:01.039 +and more powerful BinDat now, + +28:01.039 --> 28:03.454 +which is also significantly faster. + +28:03.454 --> 28:06.799 +And I can't remember the exact speed-up, + +28:06.799 --> 28:08.720 +but it's definitely not a few percents. + +28:08.720 --> 28:12.640 +I vaguely remember about 4x faster in my tests, + +28:12.640 --> 28:16.815 +but it's probably very different in different cases + +28:16.815 --> 28:20.159 +so it might be just 4x, 2x -- who knows? + +28:20.159 --> 28:23.374 +Try it for yourself, but I was pretty pleased, + +28:23.374 --> 00:28:28.335 +because it wasn't the main motivation, so anyway... + +28:28.336 --> 28:31.135 +The negatives are here. + +28:31.135 --> 28:34.480 +In the new system, there's this bindat-defmacro + +28:34.480 --> 28:36.720 +which lets us define, kind of, new types, + +28:36.720 --> 28:40.895 +and bindat-type also lets us define new types, + +28:40.895 --> 28:45.360 +and the distinction between them is a bit subtle; + +28:45.360 --> 28:48.080 +it kind of depends on... + +28:48.080 --> 28:50.880 +well it has an impact on efficiency + +28:50.880 --> 28:53.520 +more than anything, so it's not very satisfactory. + +28:53.520 --> 28:56.737 +There's a bit of redundancy between the two. + +28:56.737 --> 28:59.039 +There is no bit-level control, just as before. + +28:59.039 --> 29:02.097 +We can only manipulate basically bytes. + +29:02.098 --> 29:03.360 +So this is definitely not usable + +29:03.360 --> 29:09.058 +for a Huffman encoding kind of thing. + +29:09.058 --> 29:10.880 +Also, it's not nearly as flexible + +29:10.880 --> 29:12.240 +as some of the alternatives. + +29:12.240 --> 29:13.760 +So you know GNU Poke + +29:13.760 --> 29:20.017 +has been a [vague] inspiration for this work, + +29:20.018 --> 29:22.480 +and GNU Poke gives you a lot more power + +29:22.480 --> 29:25.059 +in how to specify the types, etc. + +29:25.059 --> 29:26.579 +And of course one of the main downsides + +29:26.579 --> 29:28.018 +is that it's still not used very much. + +29:28.018 --> 29:29.283 +[Actually] the new BinDat + +29:29.283 --> 29:31.039 +is not used by any package + +29:31.039 --> 29:33.059 +as far as I know right now, + +29:33.059 --> 29:35.279 +but even the old one is not used very often, + +29:35.279 --> 29:36.799 +so who knows + +29:36.799 --> 29:38.799 +whether it's actually going to + +29:38.799 --> 29:41.520 +work very much better or not? + +29:41.520 --> 29:44.399 +Anyway, this is it for this talk. + +29:44.399 --> 29:46.683 +Thank you very much. Have a nice day. + +29:46.683 --> 29:47.883 +[captions by John Cummings] |