WEBVTT 00:01.360 --> 00:04.080 Hi. So I'm going to talk today 00:04.180 --> 00:10.000 about a fun rewrite I did of the BinDat package. 00:10.000 --> 00:00:12.400 I call this Turbo BinDat. 00:00:12.400 --> 00:00:14.001 Actually, the package hasn't changed name, 00:14.101 --> 00:16.801 it's just that the result happens to be faster. 00:16.901 --> 00:19.521 The point was not to make it faster though, 00:19.621 --> 00:22.241 and the point was not to make you understand 00:22.341 --> 00:23.440 that data is not code. 00:23.540 --> 00:27.120 It's just one more experience I've had 00:27.120 --> 00:31.280 where I've seen that treating data as code 00:31.381 --> 00:33.522 is not always a good idea. 00:33.622 --> 00:36.162 It's important to keep the difference. 00:36.162 --> 00:38.880 So let's get started. 00:38.881 --> 00:40.642 So what is BinDat anyway? 00:40.742 --> 00:43.602 Here's just the overview of basically 00:43.602 --> 00:44.962 what I'm going to present. 00:45.062 --> 00:47.842 So I'm first going to present BinDat itself 00:47.843 --> 00:00:49.039 for those who don't know it, 00:00:49.039 --> 00:00:51.923 which is probably the majority of you. 00:51.923 --> 00:55.363 Then I'm going to talk about the actual problems 00:55.363 --> 00:58.882 that I encountered with this package 00:58.882 --> 01:01.843 that motivated me to rewrite it. 01:01.843 --> 01:05.043 Most of them were lack of flexibility, 01:05.044 --> 01:09.924 and some of it was just poor behavior 01:09.924 --> 01:13.364 with respect to scoping and variables, 01:13.364 --> 01:16.324 which of course, you know, is bad -- 01:16.424 --> 01:20.724 basically uses of eval or, "eval is evil." 01:20.724 --> 01:24.884 Then I'm going to talk about the new design -- 01:24.985 --> 01:28.005 how I redesigned it 01:28.105 --> 01:31.365 to make it both simpler and more flexible, 01:31.365 --> 01:32.965 and where the key idea was 01:33.065 --> 01:35.205 to expose code as code 01:35.305 --> 01:37.525 instead of having it as data, 01:37.625 --> 01:39.605 and so here the distinction between the two 01:39.706 --> 01:44.085 is important and made things simpler. 01:44.085 --> 01:46.405 I tried to keep efficiency in mind, 01:46.405 --> 01:52.405 which resulted in some of the aspects of the design 01:52.505 --> 01:54.886 which are not completely satisfactory, 01:54.886 --> 01:57.046 but the result is actually fairly efficient. 01:57.146 --> 01:59.286 Even though it was not the main motivation, 01:59.287 --> 02:02.967 it was one of the nice outcomes. 02:02.967 --> 00:02:06.006 And then I'm going to present some examples. 02:06.007 --> 02:08.167 So first: what is BinDat? 02:08.267 --> 02:10.567 Oh actually, rather than present THIS, 02:10.667 --> 02:12.407 I'm going to go straight to the code, 02:12.507 --> 02:14.246 because BinDat actually had 02:14.346 --> 02:16.647 an introduction which was fairly legible. 02:16.748 --> 02:21.128 So here we go: this is the old BinDat from Emacs 27 02:21.128 --> 02:23.448 and the commentary starts by explaining 02:23.448 --> 02:25.848 what is BinDat? Basically BinDat is a package 02:25.948 --> 02:30.247 that lets you parse and unparse 02:30.247 --> 02:31.527 basically binary data. 02:31.627 --> 02:34.648 The intent is to have typically network data 02:34.749 --> 02:35.849 or something like this. 02:35.949 --> 02:38.328 So assuming you have network data, 02:38.328 --> 02:41.528 presented or defined 02:41.628 --> 02:44.569 with some kind of C-style structs, typically, 02:44.669 --> 02:46.009 or something along these lines. 02:46.109 --> 02:49.120 So you presumably start with documentation 02:49.120 --> 02:52.809 that presents something like those structs here, 02:52.810 --> 02:57.130 and you want to be able to generate such packets 02:57.230 --> 03:00.249 and read such packets, 03:00.349 --> 03:02.090 so the way you do it is 03:02.190 --> 03:04.570 you rewrite those specifications 03:04.670 --> 03:06.010 into the BinDat syntax. 03:06.110 --> 03:07.529 So here's the BinDat syntax 03:07.529 --> 03:10.490 for the the previous specification. 03:10.491 --> 03:11.610 So here, for example, 03:11.610 --> 03:16.970 you see the case for a data packet 03:16.970 --> 03:20.411 which will have a 'type' field which is a byte 03:20.411 --> 03:24.091 (an unsigned 8-bit entity), 03:24.091 --> 03:26.411 then an 'opcode' which is also a byte, 03:26.411 --> 03:30.731 then a 'length' which is a 16-bit unsigned integer 03:30.732 --> 03:34.092 in little endian order, 03:34.092 --> 03:38.732 and then some 'id' for this entry, which is 03:38.732 --> 03:43.531 8 bytes containing a zero-terminated string, 03:43.531 --> 03:47.531 and then the actual data, basically the payload, 03:47.532 --> 03:51.453 which is in this case a vector of bytes, 03:51.453 --> 03:54.812 ('bytes' here doesn't doesn't need to be specified) 03:54.812 --> 03:58.172 and here we specify the length of this vector. 03:58.172 --> 03:59.773 This 'length' here 03:59.773 --> 04:02.252 happens to be actually the name of THIS field, 04:02.252 --> 04:03.853 so the length of the data 04:03.854 --> 04:06.574 is specified by the 'length' field here, 04:06.574 --> 04:08.574 and BinDat will understand this part, 04:08.574 --> 04:12.333 which is the the nice part of BinDat. 04:12.333 --> 04:15.774 And then you have an alignment field at the end, 04:15.774 --> 04:18.253 which is basically padding. 04:18.253 --> 04:20.574 It says that it is padded 04:20.575 --> 04:23.295 until the next multiple of four. 04:23.295 --> 04:25.855 Okay. So this works reasonably well. 04:25.855 --> 04:27.455 This is actually very nice. 04:27.455 --> 04:30.335 With this, you can then call 04:30.335 --> 04:32.975 bindat-pack or bindat-unpack, 04:32.975 --> 04:37.774 passing it a string, or passing it an alist, 04:37.774 --> 04:40.415 to do the packing and unpacking. 04:40.416 --> 04:43.296 So, for example, if you take this string-- 04:43.296 --> 04:45.856 actually, in this case, it's a vector of bytes 04:45.856 --> 04:49.456 but it works the same; it works in both ways-- 04:49.456 --> 04:53.536 if you pass this to bindat-unpack, 04:53.536 --> 04:57.456 it will presumably return you this structure 04:57.457 --> 05:00.017 if you've given it the corresponding type. 05:00.017 --> 05:01.776 So it will extract-- 05:01.776 --> 05:05.617 you will see that there is an IP address, 05:05.617 --> 05:08.017 which is a destination IP, a source IP, 05:08.017 --> 05:09.857 and some port number, 05:09.857 --> 05:12.977 and some actual data here and there, etc. 05:12.977 --> 05:18.017 So this is quite convenient if you need to do this, 05:18.018 --> 05:20.898 and that's what it was designed for. 05:20.898 --> 00:05:27.537 So here we are. Let's go back to the actual talk. 05:27.538 --> 05:34.338 I converted BinDat to lexical scoping at some point 05:34.339 --> 05:37.299 and things seemed to work fine, 05:37.299 --> 05:42.819 except, at some point, probably weeks later, 05:42.819 --> 05:47.139 I saw a bug report 05:47.139 --> 05:53.058 about the new version using lexical scoping 05:53.059 --> 05:56.339 not working correctly with WeeChat. 05:56.339 --> 06:00.580 So here's the actual chunk of code 06:00.580 --> 06:02.820 that appears in WeeChat. 06:02.820 --> 06:08.420 Here you see that they also define a BinDat spec. 06:08.421 --> 06:14.741 It's a packet that has a 32-bit unsigned length, 06:14.741 --> 06:18.500 then some compression byte/compression information, 06:18.500 --> 06:23.780 then an id which contains basically another struct 06:23.780 --> 06:26.901 (which is specified elsewhere; doesn't matter here), 06:26.902 --> 06:28.661 and after that, a vector 06:28.661 --> 06:33.382 whose size is not just specified by 'length', 06:33.382 --> 06:35.142 but is computed from 'length'. 06:35.142 --> 06:39.142 So here's how they used to compute it in WeeChat. 06:39.142 --> 06:42.822 So the length here can be specified in BinDat. 06:42.822 --> 06:43.941 Instead of having 06:43.942 --> 06:45.863 just a reference to one of the fields, 06:45.863 --> 06:48.903 or having a constant, you can actually compute it, 06:48.903 --> 06:52.502 where you have to use this '(eval', 06:52.502 --> 06:54.743 and then followed by the actual expression 06:54.743 --> 06:58.103 where you say how you compute it. 06:58.103 --> 07:01.463 And here you see that it actually computes it 07:01.464 --> 07:04.904 based on the 'length of the structure -- 07:04.904 --> 07:07.783 that's supposed to be this 'length' field here -- 07:07.783 --> 07:11.223 and it's referred to using the bindat-get-field 07:11.223 --> 07:14.503 to extract the field from the variable 'struct'. 07:14.503 --> 07:17.943 And then it subtracts four, it subtracts one, 07:17.943 --> 07:19.467 and adds some other things 07:19.468 --> 07:22.185 which depend on some field 07:22.185 --> 07:26.905 that's found in this 'id' field here. 07:26.905 --> 07:28.425 And the problem with this code 07:28.425 --> 07:30.425 was that it broke 07:30.425 --> 07:32.745 because of this 'struct' variable here, 07:32.745 --> 07:35.145 because this 'struct' variable is not defined 07:35.145 --> 07:38.105 anywhere in the specification of BinDat. 07:38.106 --> 07:41.866 It was used internally as a local variable, 07:41.866 --> 07:45.306 and because it was using dynamic scoping, 07:45.306 --> 07:47.386 it actually happened to be available here, 07:47.386 --> 07:50.826 but the documentation nowhere specifies it. 07:50.826 --> 07:52.506 So it was not exactly 07:52.506 --> 07:55.546 a bug of the conversion to lexical scoping, 07:55.547 --> 07:58.906 but it ended up breaking this code. 07:58.906 --> 08:01.226 And there was no way to actually 08:01.226 --> 08:05.066 fix the code within the specification of BinDat. 08:05.066 --> 08:08.287 You had to go outside the specification of BinDat 08:08.287 --> 08:10.427 to fix this problem. 08:10.427 --> 08:14.346 This is basically how I started looking at BinDat. 08:14.347 --> 08:17.808 Then I went to actually investigate a bit more 08:17.808 --> 08:19.627 what was going on, 08:19.627 --> 08:22.108 and the thing I noticed along the way 08:22.108 --> 08:25.787 was basically that the specification of BinDat 08:25.787 --> 08:29.528 is fairly complex and has a lot of eval 08:29.528 --> 08:30.748 and things like this. 08:30.749 --> 08:32.288 So let's take a look 08:32.288 --> 08:35.068 at what the BinDat specification looks like. 08:35.068 --> 08:36.589 So here it's actually documented 08:36.589 --> 08:40.269 as a kind of grammar rules. 08:40.269 --> 08:45.308 A specification is basically a sequence of items, 08:45.308 --> 08:47.389 and then each of the items is basically 08:47.389 --> 08:51.248 a FIELD of a struct, so it has a FIELD name, 08:51.249 --> 08:53.249 and then a TYPE. 08:53.249 --> 08:54.510 Instead of a TYPE, 08:54.510 --> 08:56.590 it could have some other FORM for eval, 08:56.590 --> 08:58.989 which was basically never used as far as I know, 08:58.989 --> 09:00.190 or it can be some filler, 09:00.190 --> 09:02.750 or you can have some 'align' specification, 09:02.750 --> 09:05.150 or you can refer to another struct. 09:05.150 --> 09:07.390 It could also be some kind of union, 09:07.391 --> 09:10.430 or it can be some kind of repetition of something. 09:10.430 --> 09:12.430 And then you have the TYPE specified here, 09:12.430 --> 09:18.271 which can be some integers, strings, or a vector, 09:18.271 --> 09:21.631 and there are a few other special cases. 09:21.631 --> 09:25.310 And then the actual field itself 09:25.311 --> 09:28.192 can be either a NAME, or something that's computed, 09:28.192 --> 09:30.752 and then everywhere here, you have LEN, 09:30.752 --> 00:09:32.480 which specifies the length of vectors, 00:09:32.480 --> 00:09:34.672 for example, or length of strings. 09:34.672 --> 09:37.632 This is actually either nil to mean one, 09:37.632 --> 09:39.072 or it can be an ARG, 09:39.072 --> 09:40.952 where ARG is defined to be 09:40.952 --> 09:42.672 either an integer or DEREF, 09:42.673 --> 09:46.673 where DEREF is basically a specification 09:46.673 --> 09:48.833 that can refer, for example, to the 'length' field 09:48.833 --> 09:51.956 -- that's what we saw between parentheses: (length) 09:51.956 --> 09:56.273 was this way to refer to the 'length' field. 09:56.273 --> 09:59.793 Or it can be an expression, which is what we saw 09:59.794 --> 10:02.834 in the computation of the length for WeeChat, 10:02.834 --> 10:04.914 where you just had a '(eval' 10:04.914 --> 10:06.334 and then some computation 10:06.334 --> 10:10.274 of the length of the payload. 10:10.274 --> 10:12.354 And so if you look here, you see that 10:12.354 --> 10:14.674 it is fairly large and complex, 10:14.674 --> 10:18.514 and it uses eval everywhere. And actually, 10:18.515 --> 10:20.675 it's not just that it has eval in its syntax, 10:20.675 --> 10:23.395 but the implementation has to use eval everywhere, 10:23.395 --> 10:25.314 because, if you go back 10:25.314 --> 10:27.475 to see the kind of code we see, 10:27.475 --> 10:29.538 we see here we just define 10:29.538 --> 10:34.195 weechat--relay-message-spec as a constant! 10:34.195 --> 10:37.314 It's nothing than just data, right? 10:37.315 --> 10:38.836 So within this data 10:38.836 --> 10:41.076 there are things we need to evaluate, 10:41.076 --> 10:42.356 but it's pure data, 10:42.356 --> 10:44.356 so it will have to be evaluated 10:44.356 --> 10:46.596 by passing it to eval. It can't be compiled, 10:46.596 --> 10:50.196 because it's within a quote, right? 10:50.196 --> 10:52.836 And so for that reason, kittens really 10:52.837 --> 10:55.956 suffer terribly with uses of BinDat. 10:55.956 --> 10:59.957 You really have to be very careful with that. 10:59.957 --> 11:02.037 More seriously, 11:02.037 --> 11:05.157 the 'struct' variable was not documented, 11:05.157 --> 11:07.797 and yet it's indispensable 11:07.797 --> 11:08.996 for important applications, 11:08.996 --> 11:11.157 such as using in WeeChat. 11:11.158 --> 11:13.078 So clearly this needs to be fixed. 11:13.078 --> 11:15.481 Of course, we can just document 'struct' 11:15.481 --> 11:18.038 as some variable that's used there, 11:18.038 --> 11:19.798 but of course we don't want to do that, 11:19.798 --> 11:23.398 because 'struct' is not obviously 11:23.398 --> 11:25.398 a dynamically scoped variable, 11:25.398 --> 11:29.317 so it's not very clean. 11:29.318 --> 11:31.939 Also other problems I noticed was that the grammar 11:31.939 --> 11:35.239 is significantly more complex than necessary. 11:35.239 --> 11:38.199 We have nine distinct non-terminals. 11:38.199 --> 11:39.639 There is ambiguity. 11:39.639 --> 11:44.919 If you try to use a field whose name is 'align', 11:44.919 --> 11:48.679 or 'fill', or something like this, 11:48.680 --> 11:50.920 then it's going to be misinterpreted, 11:50.920 --> 11:54.920 or it can be misinterpreted. 11:54.920 --> 11:58.760 The vector length can be either an expression, 11:58.760 --> 12:02.280 or an integer, or a reference to a label, 12:02.280 --> 12:03.720 but the expression 12:03.720 --> 12:06.360 should already be the general case, 12:06.361 --> 12:08.041 and this expression can itself be 12:08.041 --> 12:09.401 just a constant integer, 12:09.401 --> 12:13.961 so this complexity is probably not indispensable, 12:13.961 --> 12:15.641 or it could be replaced with something simpler. 12:15.641 --> 12:17.401 That's what I felt like. 12:17.401 --> 12:19.161 And basically lots of places 12:19.161 --> 12:21.721 allow an (eval EXP) form somewhere 12:21.721 --> 12:25.081 to open up the door for more flexibility, 12:25.082 --> 12:26.922 but not all of them do, 12:26.922 --> 12:29.482 and we don't really want 12:29.482 --> 12:31.001 to have this eval there, right? 12:31.001 --> 12:33.802 It's not very convenient syntactically either. 12:33.802 --> 12:36.042 So it makes the uses of eval 12:36.042 --> 12:38.362 a bit heavier than they need to be, 12:38.362 --> 12:41.722 and so I didn't really like this part. 12:41.723 --> 12:42.603 Another part is that 12:42.603 --> 12:45.183 when I tried to figure out what was going on, 12:45.183 --> 12:46.666 [dog barks and distracts Stefan] 12:46.666 --> 12:50.043 I had trouble... Winnie as well, as you can hear. 12:50.043 --> 12:50.923 She had trouble as well. 12:50.923 --> 12:53.083 But one of the troubles was that 12:53.083 --> 12:55.002 there was no way to debug the code 12:55.002 --> 12:57.562 via Edebug, because it's just data, 12:57.562 --> 13:00.523 so Edebug doesn't know that it has to look at it 13:00.524 --> 13:02.683 and instrument it. 13:02.683 --> 13:05.644 And of course it was not conveniently extensible. 13:05.644 --> 13:07.164 That's also one of the things 13:07.164 --> 13:08.487 I noticed along the way. 13:09.084 --> 13:12.844 Okay, so here's an example of 13:12.844 --> 13:15.484 problems not that I didn't just see there, 13:15.485 --> 13:18.684 but that were actually present in code. 13:18.684 --> 13:22.124 I went to look at code that was using BinDat 13:22.124 --> 13:24.285 to see what uses looked like, 13:24.285 --> 13:28.765 and I saw that BinDat was not used very heavily, 13:28.765 --> 13:30.365 but some of the main uses 13:30.365 --> 13:33.884 were just to read and write integers. 13:33.885 --> 13:37.565 And here you can see a very typical case. 13:37.565 --> 13:41.726 This is also coming from WeeChat. 13:41.726 --> 13:43.565 We do a bindat-get-field 13:43.565 --> 13:48.445 of the length of some struct we read. 13:48.445 --> 13:50.685 Actually, the struct we read is here. 13:50.685 --> 13:51.646 It has a single field, 13:51.647 --> 13:53.006 because the only thing we want to do 13:53.006 --> 13:56.287 is actually to unpack a 32-bit integer, 13:56.287 --> 13:58.287 but the only way we can do that 13:58.287 --> 14:01.647 is by specifying a struct with one field. 14:01.647 --> 14:04.847 And so we have to extract this struct of one field, 14:04.847 --> 14:07.246 which constructs an alist 14:07.246 --> 14:09.647 containing the actual integer, 14:09.648 --> 14:11.887 and then we just use get-field to extract it. 14:11.887 --> 14:15.007 So this doesn't seem very elegant 14:15.007 --> 14:16.528 to have to construct an alist 14:16.528 --> 14:20.368 just to then extract the integer from it. 14:20.368 --> 14:21.648 Same thing if you try to pack it: 14:21.648 --> 14:25.007 you first have to construct the alist 14:25.007 --> 14:31.247 to pass it to bindat-pack unnecessarily. 14:31.248 --> 14:33.248 Another problem that I saw in this case 14:33.248 --> 14:35.729 (it was in the websocket package) 14:35.729 --> 14:39.568 was here, where they actually have a function 14:39.568 --> 14:41.169 where they need to write 14:41.169 --> 14:43.888 an integer of a size that will vary 14:43.888 --> 14:45.888 depending on the circumstances. 14:45.889 --> 14:49.650 And so they have to test the value of this integer, 14:49.650 --> 14:52.210 and depending on which one it is, 14:52.210 --> 14:54.449 they're going to use different types. 14:54.449 --> 14:56.290 So here it's a case 14:56.290 --> 14:59.490 where we want to have some kind of way to eval -- 14:59.490 --> 15:02.530 to compute the length of the integer -- 15:02.531 --> 15:08.130 instead of it being predefined or fixed. 15:08.130 --> 15:10.211 So this is one of the cases 15:10.211 --> 15:16.531 where the lack of eval was a problem. 15:16.531 --> 15:20.051 And actually in all of websocket, 15:20.051 --> 15:22.611 BinDat is only used to pack and unpack integers, 15:22.612 --> 15:24.612 even though there are many more opportunities 15:24.612 --> 15:26.772 to use BinDat in there. 15:26.772 --> 15:29.331 But it's not very convenient to use BinDat, 15:29.331 --> 00:15:35.890 as it stands, for those other cases. 15:35.891 --> 15:39.732 So what does the new design look like? 15:39.733 --> 15:44.132 Well in the new design, here's the problematic code 15:44.132 --> 15:46.373 for WeeChat. 15:46.373 --> 15:49.012 So we basically have the same fields as before, 15:49.012 --> 15:50.853 you just see that instead of u32, 15:50.853 --> 15:53.733 we now have 'uint 32' separately. 15:53.733 --> 15:55.332 The idea is that now this 32 15:55.332 --> 15:59.093 can be an expression you can evaluate, 15:59.094 --> 16:04.054 and so the u8 is also replaced by 'uint 8', 16:04.054 --> 16:07.253 and the id type is basically the same as before, 16:07.253 --> 16:08.854 and here another difference we see, 16:08.854 --> 16:11.654 and the main difference... 16:11.654 --> 16:13.494 Actually, it's the second main difference. 16:13.494 --> 16:15.174 The first main difference is that 16:15.175 --> 16:18.694 we don't actually quote this whole thing. 16:18.694 --> 16:23.095 Instead, we pass it to the bindat-type macro. 16:23.095 --> 16:25.095 So this is a macro 16:25.095 --> 16:27.574 that's going to actually build the type. 16:27.574 --> 16:29.254 This is a big difference 16:29.254 --> 16:30.535 in terms of performance also, 16:30.535 --> 16:32.694 because by making it a macro, 16:32.695 --> 16:34.296 we can pre-compute the code 16:34.296 --> 16:37.255 that's going to pack and unpack this thing, 16:37.255 --> 16:38.936 instead of having to interpret it 16:38.936 --> 16:41.096 every time we pack and unpack. 16:41.096 --> 16:43.815 So this macro will generate more efficient code 16:43.815 --> 16:45.815 along the way. 16:45.815 --> 16:48.695 Also it makes the code that appears in here 16:48.695 --> 16:50.296 visible to the compiler 16:50.297 --> 16:54.617 because we can give an Edebug spec for it. 16:54.617 --> 16:57.497 And so here as an argument to vec, 16:57.497 --> 16:59.016 instead of having to specify 16:59.016 --> 17:00.937 that this is an evaluated expression, 17:00.937 --> 17:02.777 we just write the expression directly, 17:02.777 --> 17:05.096 because all the expressions that appear there 17:05.096 --> 17:07.417 will just be evaluated, 17:07.418 --> 17:11.418 and we don't need to use the 'struct' variable 17:11.418 --> 17:14.137 and then extract the length field from it. 17:14.137 --> 17:16.938 We can just use length as a variable. 17:16.938 --> 17:18.698 So this variable 'length' here 17:18.698 --> 17:20.778 will refer to this field here, 17:20.778 --> 17:23.578 and then this variable 'id' here 17:23.578 --> 17:25.897 will refer to this field here, 17:25.898 --> 17:27.738 and so we can just use the field values 17:27.738 --> 17:30.459 as local variables, which is very natural 17:30.459 --> 00:17:31.679 and very efficient also, 00:17:31.679 --> 00:17:34.618 because the code would actually directly do that, 17:34.618 --> 17:37.899 and the code that unpacks those data 17:37.899 --> 17:40.299 will just extract an integer 17:40.299 --> 17:42.219 and bind it to the length variable, 17:42.219 --> 17:47.579 and so that makes it immediately available there. 17:47.580 --> 17:51.340 Okay, let's see also 17:51.340 --> 17:54.220 what the actual documentation looks like. 17:54.220 --> 17:57.739 And so if we look at the doc of BinDat, 17:57.739 --> 18:01.180 we see the actual specification of the grammar. 18:01.181 --> 18:03.181 And so here we see instead of having 18:03.181 --> 18:06.461 these nine different non-terminals, 18:06.461 --> 18:08.061 we basically have two: 18:08.061 --> 18:10.781 we have the non-terminal for TYPE, 18:10.781 --> 18:15.021 which can be either a uint, a uintr, or a string, 18:15.021 --> 18:17.421 or bits, or fill, or align, or vec, 18:17.421 --> 18:19.901 or those various other forms; 18:19.902 --> 18:22.621 or it can be a struct, in which case, 18:22.621 --> 18:23.981 in the case of struct, 18:23.981 --> 18:27.502 then it will be followed by a sequence -- 18:27.502 --> 18:30.142 a list of FIELDs, where each of the FIELDs 18:30.142 --> 18:33.902 is basically a LABEL followed by another TYPE. 18:33.902 --> 18:37.342 And so this makes the whole specification 18:37.343 --> 18:39.823 much simpler. We don't have any distinction now 18:39.823 --> 18:42.862 between struct being a special case, 18:42.862 --> 18:46.383 as opposed to just the normal types. 18:46.383 --> 18:49.263 struct is just now one of the possible types 18:49.263 --> 18:52.543 that can appear here. 18:52.543 --> 18:53.263 The other thing is that 18:53.263 --> 18:55.742 the LABEL is always present in the structure, 18:55.743 --> 18:58.384 so there's no ambiguity. 18:58.384 --> 19:00.304 Also all the above things, 19:00.304 --> 19:03.103 like the BITLEN we have here, 19:03.103 --> 19:04.384 the LEN we have here, 19:04.384 --> 19:07.504 the COUNT for vector we have here, 19:07.504 --> 19:10.224 these are all plain Elisp expressions, 19:10.224 --> 19:13.024 so they are implicitly evaluated if necessary. 19:13.025 --> 19:14.705 If you want them to be constant, 19:14.705 --> 19:16.705 and really constant, you can just use quotes, 19:16.705 --> 19:20.145 for those rare cases where it's necessary. 19:20.145 --> 19:21.905 Another thing is that you can extend it 19:21.905 --> 19:25.505 with with bindat-defmacro. 19:25.505 --> 19:30.225 Okay, let's go back here. 19:30.226 --> 19:32.706 So what are the advantages of this approach? 19:32.706 --> 19:34.625 As I said, one of the main advantages 19:34.625 --> 19:39.346 is that we now have support for Edebug. 19:39.346 --> 19:41.426 We don't have 'struct', 'repeat', and 'align' 19:41.426 --> 19:42.946 as special cases anymore. 19:42.946 --> 19:44.625 These are just normal types. 19:44.625 --> 19:48.066 Before, there was uint as type, int as type, 19:48.067 --> 19:49.267 and those kinds of things. 19:49.267 --> 19:51.110 'struct' and 'repeat' and 'align' 19:51.110 --> 19:53.267 were in a different case. 19:53.267 --> 19:54.387 So there were 19:54.387 --> 19:56.787 some subtle differences between those 19:56.787 --> 19:59.027 that completely disappeared. 19:59.027 --> 20:02.626 Also in the special cases, there was 'union', 20:02.626 --> 20:05.027 and union now has completely disappeared. 20:05.027 --> 20:07.827 We don't need it anymore, because instead, 20:07.828 --> 20:09.588 we can actually use code anywhere. 20:09.588 --> 20:11.908 That's one of the things I didn't mention here, 20:11.908 --> 20:17.268 but in this note here, 20:17.268 --> 20:19.747 that's one of the important notes. 20:19.747 --> 20:21.987 Not only are BITLEN, LEN, COUNT etc. 20:21.987 --> 20:23.028 Elisp expressions, 20:23.028 --> 20:26.788 but the type itself -- any type itself -- 20:26.789 --> 20:29.029 is basically an expression. 20:29.029 --> 20:32.709 And so you can, instead of having 'uint BITLEN', 20:32.709 --> 20:36.628 you can have '(if blah-blah-blah uint string)', 20:36.628 --> 20:38.149 and so you can have a field 20:38.149 --> 20:40.549 that can be either string or an int, 20:40.549 --> 20:44.789 depending on some condition. 20:44.790 --> 20:46.869 And for that reason we don't need a union. 20:46.869 --> 20:47.910 Instead of having a union, 20:47.910 --> 20:50.710 we can just have a 'cond' or a 'pcase' 20:50.710 --> 20:53.590 that will return the type we want to use, 20:53.590 --> 20:55.109 depending on the context, 20:55.109 --> 21:00.950 which will generally depend on some previous field. 21:00.951 --> 21:03.750 Also we don't need to use single-field structs 21:03.750 --> 21:05.351 for simple types anymore, 21:05.351 --> 21:09.271 because there's no distinction between struct 21:09.271 --> 21:11.271 and other types. 21:11.271 --> 21:17.191 So we can pass to bindat-pack and bindat-unpack 21:17.191 --> 21:20.951 a specification which just says "here's an integer" 21:20.952 --> 21:24.392 and we'll just pack and unpack the integer. 21:24.392 --> 21:26.472 And of course now all the code is exposed, 21:26.472 --> 21:29.192 so not only Edebug works, but also Flymake, 21:29.192 --> 21:30.392 and the compiler, etc. -- 21:30.392 --> 21:33.111 they can complain about it, 21:33.111 --> 21:38.871 and give you warnings and errors as we like them. 21:38.872 --> 21:44.553 And of course the kittens are much happier. 21:44.553 --> 21:48.153 Okay. This is going a bit over time, 21:48.153 --> 00:21:51.272 so let's try to go faster. 21:51.273 --> 21:53.752 Here are some of the new features 21:53.753 --> 21:54.794 that are introduced. 21:54.794 --> 21:56.314 I already mentioned briefly 21:56.314 --> 22:00.633 that you can define new types with bindat-defmacro. 22:00.633 --> 22:04.474 that's one of the important novelties, 22:04.474 --> 22:08.794 and you can extend BinDat with new types this way. 22:08.794 --> 22:10.714 The other thing you can do is 22:10.714 --> 22:16.233 you can control how values or packets 22:16.234 --> 22:20.315 are unpacked, and how they are represented. 22:20.315 --> 22:22.555 In the old BinDat, 22:22.555 --> 22:24.315 the packet is necessarily represented, 22:24.315 --> 22:28.634 when you unpack it, as an alist, basically, 22:28.635 --> 22:30.396 or a struct becomes an alist, 22:30.396 --> 22:31.676 and that's all there is. 22:31.676 --> 22:34.076 You don't have any choice about it. 22:34.076 --> 22:35.596 With the new system, 22:35.596 --> 22:38.076 by default, it also returns just an alist, 22:38.076 --> 22:41.916 but you can actually control what it's unpacked as, 22:41.916 --> 22:46.396 or what it's packed from, using these keywords. 22:46.396 --> 22:49.596 With :unpack-val, you can give an expression 22:49.597 --> 22:53.357 that will construct the unpacked value 22:53.357 --> 22:56.957 from the various fields. 22:56.957 --> 22:59.197 And with :pack-val and :pack-var, 22:59.197 --> 23:02.557 you can specify how to extract the information 23:02.557 --> 23:05.116 from the unpacked value 23:05.117 --> 00:23:08.077 to generate the pack value. 23:08.078 --> 23:12.637 So here are some examples. 23:12.637 --> 23:15.358 Here's an example taken from osc. 23:15.358 --> 23:17.438 osc actually doesn't use BinDat currently, 23:17.438 --> 23:22.478 but I have played with it 23:22.479 --> 23:23.758 to see what it would look like 23:23.758 --> 23:26.159 if we were to use BinDat. 23:26.159 --> 23:28.638 So here's the definition 23:28.638 --> 23:30.638 of the timetag representation, 23:30.638 --> 23:35.279 which represents timestamps in osc. 23:35.279 --> 23:37.998 So you would use bindat-type 23:37.998 --> 23:40.559 and then you have here :pack-var 23:40.559 --> 23:42.080 basically gives a name 23:42.080 --> 23:48.559 when we try to pack a timestamp. 23:48.559 --> 23:51.520 'time' will be the variable whose name contains 23:51.520 --> 23:54.159 the actual timestamp we will receive. 23:54.159 --> 23:57.520 So we want to represent the unpacked value 23:57.520 --> 24:00.240 as a normal Emacs timestamp, 24:00.240 --> 24:02.480 and then basically convert from this timestamp 24:02.480 --> 24:06.401 to a string, or from a string to this timestamp. 24:06.401 --> 24:10.080 When we receive it, it will be called time, 24:10.080 --> 24:12.240 so we can refer to it, 24:12.240 --> 24:15.360 and so in order to actually encode it, 24:15.360 --> 24:18.320 we basically turn this timestamp into an integer -- 24:18.320 --> 24:20.799 that's what this :pack-val does. 24:20.799 --> 24:23.442 It says when we try to pack it, 24:23.442 --> 24:26.082 here's the the value that we should use. 24:26.082 --> 24:27.760 We turn it into an integer, 24:27.760 --> 24:30.320 and then this integer is going to be encoded 24:30.320 --> 24:36.162 as a uint 64-bit. So a 64-bit unsigned integer. 24:36.163 --> 24:38.960 When we try to unpack the value, 24:38.960 --> 24:40.720 this 'ticks' field 24:40.720 --> 24:45.679 will contain an unsigned int of 64 bits. 24:45.679 --> 24:50.559 We want to return instead a timestamp -- 24:50.559 --> 24:53.924 a time value -- from Emacs. 24:53.924 --> 24:59.363 Here we use the representation of time 24:59.363 --> 25:02.799 as a pair of number of ticks 25:02.799 --> 25:06.720 and the corresponding frequency of those ticks. 25:06.720 --> 25:09.120 So that's what we do here with :unpack-val, 25:09.120 --> 25:12.004 which is construct the cons corresponding to it. 25:12.004 --> 25:16.400 With this definition, bindat-pack/unpack 25:16.400 --> 00:25:19.039 are going to convert to and from 00:25:19.039 --> 00:25:21.760 proper time values on one side, 25:21.760 --> 25:26.159 and binary strings on the other. 25:26.159 --> 25:27.520 Note, of course, 25:27.520 --> 25:30.320 that I complained that the old BinDat 25:30.320 --> 25:36.080 had to use single-field structs for simple types, 25:36.080 --> 25:37.039 and here, basically, 25:37.039 --> 25:39.840 I'm back using single-field structs as well 25:39.840 --> 25:41.120 for this particular case -- 25:41.120 --> 25:44.640 actually a reasonably frequent case, to be honest. 25:44.640 --> 25:49.279 But at least this is not so problematic, 25:49.279 --> 25:51.840 because we actually control what is returned, 25:51.840 --> 25:54.159 so even though it's a single-field struct, 25:54.159 --> 25:56.640 it's not going to construct an alist 25:56.640 --> 25:58.320 or force you to construct an alist. 25:58.320 --> 26:02.720 Instead, it really receives and takes a value 26:02.720 --> 26:07.367 in the ideal representation that we chose. 26:07.367 --> 26:10.007 Here we have a more complex example, 26:10.007 --> 26:12.488 where the actual type is recursive, 26:12.488 --> 26:18.640 because it's representing those "LEB"... 26:18.640 --> 26:20.400 I can't remember what "LEB" stands for, 26:20.400 --> 26:22.559 but it's a representation 26:22.559 --> 26:25.600 for arbitrary length integers, 26:25.600 --> 26:27.520 where basically 26:27.520 --> 26:33.360 every byte is either smaller than 128, 26:33.360 --> 26:36.799 in which case it's the end of the of the value, 26:36.799 --> 26:39.760 or it's a value bigger than 128, 26:39.760 --> 26:42.159 in which case there's an extra byte on the end 26:42.159 --> 26:44.490 that's going to continue. 26:44.490 --> 26:46.640 Here we see the representation 26:46.640 --> 26:52.240 is basically a structure that starts with a byte, 26:52.240 --> 26:53.679 which contains this value, 26:53.679 --> 26:56.000 which can be either the last value or not, 26:56.000 --> 26:59.770 and the tail, which will either be empty, 26:59.770 --> 27:01.279 or contain something else. 27:01.279 --> 27:04.000 The empty [case] is here; 27:04.000 --> 27:07.039 if the head value is smaller than 128, 27:07.039 --> 27:11.840 then the type of this tail is going to be (unit 0), 27:11.840 --> 27:16.492 so basically 'unit' is the empty type, 27:16.492 --> 27:20.880 and 0 is the value we will receive when we read it. 27:20.880 --> 27:25.520 And if not, then it has as type 'loop', 27:25.520 --> 27:28.240 which is the type we're defining, 27:28.240 --> 27:30.491 so it's the recursive case, 27:30.491 --> 27:35.132 where then the rest of the type is the type itself. 27:35.132 --> 27:37.120 And so this lets us pack and unpack. 27:37.120 --> 27:39.600 We pass it an arbitrary size integer, 27:39.600 --> 27:42.240 and it's going to turn it into 27:42.240 --> 27:48.492 this LEB128 binary representation, and vice versa. 27:48.492 --> 27:52.480 I have other examples if you're interested, 27:52.480 --> 00:27:56.093 but anyway, here's the conclusion. 27:56.094 --> 27:58.320 We have a simpler, more flexible, 27:58.320 --> 28:01.039 and more powerful BinDat now, 28:01.039 --> 28:03.454 which is also significantly faster. 28:03.454 --> 28:06.799 And I can't remember the exact speed-up, 28:06.799 --> 28:08.720 but it's definitely not a few percents. 28:08.720 --> 28:12.640 I vaguely remember about 4x faster in my tests, 28:12.640 --> 28:16.815 but it's probably very different in different cases 28:16.815 --> 28:20.159 so it might be just 4x, 2x -- who knows? 28:20.159 --> 28:23.374 Try it for yourself, but I was pretty pleased, 28:23.374 --> 00:28:28.335 because it wasn't the main motivation, so anyway... 28:28.336 --> 28:31.135 The negatives are here. 28:31.135 --> 28:34.480 In the new system, there's this bindat-defmacro 28:34.480 --> 28:36.720 which lets us define, kind of, new types, 28:36.720 --> 28:40.895 and bindat-type also lets us define new types, 28:40.895 --> 28:45.360 and the distinction between them is a bit subtle; 28:45.360 --> 28:48.080 it kind of depends on... 28:48.080 --> 28:50.880 well it has an impact on efficiency 28:50.880 --> 28:53.520 more than anything, so it's not very satisfactory. 28:53.520 --> 28:56.737 There's a bit of redundancy between the two. 28:56.737 --> 28:59.039 There is no bit-level control, just as before. 28:59.039 --> 29:02.097 We can only manipulate basically bytes. 29:02.098 --> 29:03.360 So this is definitely not usable 29:03.360 --> 29:09.058 for a Huffman encoding kind of thing. 29:09.058 --> 29:10.880 Also, it's not nearly as flexible 29:10.880 --> 29:12.240 as some of the alternatives. 29:12.240 --> 29:13.760 So you know GNU Poke 29:13.760 --> 29:20.017 has been a vague inspiration for this work, 29:20.018 --> 29:22.480 and GNU Poke gives you a lot more power 29:22.480 --> 29:25.059 in how to specify the types, etc. 29:25.059 --> 29:26.579 And of course one of the main downsides 29:26.579 --> 29:28.018 is that it's still not used very much. 29:28.018 --> 29:29.283 Actually, the new BinDat 29:29.283 --> 29:31.039 is not used by any package 29:31.039 --> 29:33.059 as far as I know right now, 29:33.059 --> 29:35.279 but even the old one is not used very often, 29:35.279 --> 29:36.799 so who knows 29:36.799 --> 29:38.799 whether it's actually going to 29:38.799 --> 29:41.520 work very much better or not? 29:41.520 --> 29:44.399 Anyway, this is it for this talk. 29:44.399 --> 29:46.683 Thank you very much. Have a nice day. 29:46.683 --> 29:47.883 [captions by John Cummings]