WEBVTT captioned by sachac NOTE Intro 00:00:00.000 --> 00:00:10.799 Hi everyone, this is EmacsConf 2024. I'm Colin, and today 00:00:10.800 --> 00:00:17.319 I'll be talking about transducers. 00:00:17.320 --> 00:00:21.879 After introducing them, I'll share a bit of history about 00:00:21.880 --> 00:00:25.359 transducers and the problems that they solve, some basics 00:00:25.360 --> 00:00:28.879 about how we can use them, how they work, like how they're 00:00:28.880 --> 00:00:32.399 implemented, some demonstrations of how we can actually 00:00:32.400 --> 00:00:36.959 use them in the wild, and then some other discussions about 00:00:36.960 --> 00:00:41.519 issues that they have. NOTE What are transducers? 00:00:41.520 --> 00:00:46.399 Okay, let's get right in. What are transducers? 00:00:46.400 --> 00:00:49.679 Transducers are a way to do streaming iteration with a 00:00:49.680 --> 00:00:55.679 modern API. 00:00:55.680 --> 00:01:00.359 Who are transducers for, and thereby, who is 00:01:00.360 --> 00:01:05.599 this talk for? Well, it's for people who want to do streamed 00:01:05.600 --> 00:01:10.519 data processing in Emacs. It's for people who perhaps 00:01:10.520 --> 00:01:14.199 aren't satisfied with the existing APIs, for example, the 00:01:14.200 --> 00:01:19.359 seq API, or some other common libraries that provide 00:01:19.360 --> 00:01:23.719 similar functionality. Maybe you're not a fan of the loop 00:01:23.720 --> 00:01:29.079 macro. Some people find it difficult to understand. Or 00:01:29.080 --> 00:01:32.719 maybe you've done a bunch of Clojure before, and you'd like 00:01:32.720 --> 00:01:36.879 more aspects of Clojure in your Emacs Lisp. Or maybe you're 00:01:36.880 --> 00:01:40.239 just interested in transducers in general, because the 00:01:40.240 --> 00:01:48.839 pattern has now been ported to multiple different Lisps. 00:01:48.840 --> 00:01:55.039 So I'm Colin. I'm fosskers on everything online, and I do 00:01:55.040 --> 00:01:58.519 mainly back-end programming work and a lot of open source 00:01:58.520 --> 00:02:05.159 software. I wrote Haskell for a long time, both as a hobbyist 00:02:05.160 --> 00:02:09.079 and professionally. Since the COVID years, I've been 00:02:09.080 --> 00:02:13.439 writing Rust, both open source and professionally. But now 00:02:13.440 --> 00:02:19.719 I find that in my spare time, I'm mostly writing Common Lisp. 00:02:19.720 --> 00:02:22.719 Some things I learned from my years of Haskell was that a lot 00:02:22.720 --> 00:02:27.519 of programming is just altering the shape of data. You know, 00:02:27.520 --> 00:02:31.359 sometimes we work through our algorithm line by line. We're 00:02:31.360 --> 00:02:36.239 trying to just tell the computer exactly what to do. But if we 00:02:36.240 --> 00:02:39.639 step back, a lot of the time we're just getting in data of some 00:02:39.640 --> 00:02:44.119 shape, changing it, and then passing it along. A lot of 00:02:44.120 --> 00:02:49.279 these patterns are common, identified 00:02:49.280 --> 00:02:53.639 decades ago. For instance, we have some collection, and we 00:02:53.640 --> 00:02:56.999 want to transform every element of that collection and then 00:02:57.000 --> 00:03:01.199 pass it on. Or maybe we're trying to filter out bad elements 00:03:01.200 --> 00:03:04.799 in that collection. Or maybe we're looking for a specific 00:03:04.800 --> 00:03:07.759 element in that collection. Yes, you could write all that 00:03:07.760 --> 00:03:11.839 with for loops, but these kind of common patterns were 00:03:11.840 --> 00:03:18.559 identified and given names decades ago. So why not use them? 00:03:18.560 --> 00:03:21.879 They say that there are two major problems in computer 00:03:21.880 --> 00:03:25.759 science, one being cache validation and the other being 00:03:25.760 --> 00:03:27.589 naming things. NOTE Common issues 00:03:27.590 --> 00:03:29.799 I've identified five other problems that 00:03:29.800 --> 00:03:33.199 come up when we're trying to deal with collections of data, 00:03:33.200 --> 00:03:40.599 or big streams of data. One is that if we were trying to 00:03:40.600 --> 00:03:45.279 load a file all into memory all at once and process the whole 00:03:45.280 --> 00:03:48.279 thing, sometimes we can have memory problems. You've 00:03:48.280 --> 00:03:54.999 probably seen out-of-memory errors or such things. 00:03:55.000 --> 00:03:58.199 A second issue that comes up is that if we were looking at a 00:03:58.200 --> 00:04:01.799 giant for loop, in particular a nested for loop or such 00:04:01.800 --> 00:04:06.079 things, it can be hard to tell just by looking at the code what 00:04:06.080 --> 00:04:11.039 it's trying to do, what it intends. If we don't go character 00:04:11.040 --> 00:04:16.439 by character or line by line, it can be hard to understand it. 00:04:16.440 --> 00:04:20.039 Furthermore, and this is particularly an issue with Emacs 00:04:20.040 --> 00:04:26.399 Lisp, is that if one call, for instance, to seq-map, then 00:04:26.400 --> 00:04:29.319 piped into seq-filter, for instance, will have an 00:04:29.320 --> 00:04:33.599 intermediate allocation, the map will take the source 00:04:33.600 --> 00:04:37.639 container, allocate a new one, and then the filter will 00:04:37.640 --> 00:04:40.319 operate over the second one. This is wasteful. 00:04:40.320 --> 00:04:48.879 Furthermore, it can often be difficult to abort a stream. 00:04:48.880 --> 00:04:53.199 For instance, if we were filtering through our collection, 00:04:53.200 --> 00:04:57.319 but we knew we only wanted to go halfway, for instance, for 00:04:57.320 --> 00:05:01.759 some reason, we have no way to stop it halfway through. We 00:05:01.760 --> 00:05:05.479 just have to process the whole thing, even if we know we don't 00:05:05.480 --> 00:05:11.919 need to. Another issue is that for languages that have 00:05:11.920 --> 00:05:18.039 traits, or in Haskell they're called type classes, if you 00:05:18.040 --> 00:05:22.399 are defining what it means to map over something, you often 00:05:22.400 --> 00:05:27.039 have to redefine that for every kind of container or thing 00:05:27.040 --> 00:05:31.239 that you're iterating over. Wouldn't it be nice if we could 00:05:31.240 --> 00:05:34.719 define things like map just once and then reuse them 00:05:34.720 --> 00:05:39.839 everywhere? Now, transducers solve all five of these, 00:05:39.840 --> 00:05:44.039 without the addition of new language features, and with 00:05:44.040 --> 00:05:47.279 little more than plain old function composition. NOTE Transducers 00:05:47.280 --> 00:05:53.119 If this is your first time hearing of transducers, yeah, 00:05:53.120 --> 00:05:57.439 no problem. They were originally invented in Clojure by 00:05:57.440 --> 00:06:01.039 Rich Hickey, and this is a quote from him. He thinks 00:06:01.040 --> 00:06:05.439 transducers are a fundamental primitive that decouple 00:06:05.440 --> 00:06:10.079 critical logic from list or sequence processing, and if he 00:06:10.080 --> 00:06:13.999 had to do Clojure all over, he'd put them at the bottom, at the 00:06:14.000 --> 00:06:19.279 very bottom of all the fundamental primitives. Now, that's 00:06:19.280 --> 00:06:24.599 Rich speaking quite highly of them. And I think he has a point 00:06:24.600 --> 00:06:25.159 here. 00:06:25.160 --> 00:06:32.399 They were invented originally in Clojure. In more 00:06:32.400 --> 00:06:34.772 recent years, they were brought over to Scheme 00:06:34.773 --> 00:06:38.774 via SRFI 171. That's where I found them 00:06:38.775 --> 00:06:41.521 when I was learning the Guile language. 00:06:41.522 --> 00:06:43.919 In the process of submitting a patch, I realized 00:06:43.920 --> 00:06:48.199 that there were other things to be improved. So I ported the 00:06:48.200 --> 00:06:51.399 pattern to Common Lisp, then Fennel, and then more 00:06:51.400 --> 00:06:56.639 recently, Emacs Lisp. The Common Lisp and Emacs Lisp APIs 00:06:56.640 --> 00:07:01.199 are identical. And the Fennel one is not identical, but 00:07:01.200 --> 00:07:05.799 fairly similar. Overall, everywhere you find 00:07:05.800 --> 00:07:10.279 transducers, they should basically be fairly uniform. 00:07:10.280 --> 00:07:15.759 When I originally made the Common Lisp variant first, I 00:07:15.760 --> 00:07:18.799 sampled the APIs from a number of different languages and 00:07:18.800 --> 00:07:23.439 came up with what I believed to be a representative sample of 00:07:23.440 --> 00:07:27.959 what most people would want out of such a library. I gave 00:07:27.960 --> 00:07:32.439 functions their common modern names. For instance, map 00:07:32.440 --> 00:07:35.279 is map and filter is filter and so on. NOTE Using transducers 00:07:35.280 --> 00:07:42.599 What does the usage of transducers look like? Well, 00:07:42.600 --> 00:07:48.959 these examples will all be the Emacs Lisp variant, but the 00:07:48.960 --> 00:07:52.359 Common Lisp will look basically exactly the same, minus 00:07:52.360 --> 00:07:54.079 this little t- prefix. 00:07:54.080 --> 00:08:00.919 Running transducers requires three things. It requires a 00:08:00.920 --> 00:08:06.439 source. This could be an obvious thing like a list or a 00:08:06.440 --> 00:08:11.479 vector, but it could be other things like a file, or in Emacs 00:08:11.480 --> 00:08:16.348 list in particular, a buffer. 00:08:16.349 --> 00:08:20.112 A reducer is a function. It's something like 00:08:20.113 --> 00:08:22.639 the + operator or the * operator, 00:08:22.640 --> 00:08:26.785 or certain constructors of various containers. 00:08:26.786 --> 00:08:32.125 It takes values and collates them into some final version. 00:08:32.126 --> 00:08:33.946 Now, finally, we have what we're calling here 00:08:33.947 --> 00:08:37.567 a transducer chain. This could be one transducer function 00:08:37.568 --> 00:08:43.479 or it could be multiple composed together. These are the 00:08:43.480 --> 00:08:47.079 functions that actually take data and transform them 00:08:47.080 --> 00:08:55.279 somehow. For instance, this. We have a list of three 00:08:55.280 --> 00:09:04.199 elements. We want to reduce it into a vector. How we are 00:09:04.200 --> 00:09:07.519 going to transform the elements along the way: we are doing 00:09:07.520 --> 00:09:13.359 plus one to each of them. If this syntax is new to you, just 00:09:13.360 --> 00:09:18.039 know that this #' just means that this thing that 00:09:18.040 --> 00:09:22.079 comes after it is the name of the function. In Common Lisp and 00:09:22.080 --> 00:09:26.079 Emacs Lisp, this is necessary, but for Clojure and Scheme, 00:09:26.080 --> 00:09:32.719 it is not. So we can see here that just this example is not much 00:09:32.720 --> 00:09:36.119 different than any other normal map call you might see made, 00:09:36.120 --> 00:09:40.239 but if nothing else, it's a handy way to convert a list to a 00:09:40.240 --> 00:09:44.999 vector or anything else. There are many, many reducers 00:09:45.000 --> 00:09:48.239 available and many different forms that we can 00:09:48.240 --> 00:09:52.624 collate the final value into. NOTE A more involved example with comp 00:09:52.625 --> 00:09:55.086 Let's see a more involved example. 00:09:55.087 --> 00:09:58.049 Okay, now we've got some more meat here. 00:09:58.050 --> 00:10:01.772 Here we can see usage of the comp function 00:10:01.773 --> 00:10:05.255 and a custom source, ints. 00:10:05.256 --> 00:10:11.079 Ints is an infinite generator of integer values. That's not 00:10:11.080 --> 00:10:14.783 like a list or a file. It will generate infinitely. 00:10:14.784 --> 00:10:19.439 Comp is letting us compose multiple transducer functions 00:10:19.440 --> 00:10:23.759 together. Notice that this is the opposite order of what 00:10:23.760 --> 00:10:28.079 we'd usually be used to from a function like comp. The order 00:10:28.080 --> 00:10:32.679 here is top to bottom, basically, so that the map goes first, 00:10:32.680 --> 00:10:37.839 then the filter, and then the take. So effectively is what 00:10:37.840 --> 00:10:40.919 we're doing is taking all the integers that exist, 00:10:40.920 --> 00:10:45.399 positive, adding one to them, filtering out only the even 00:10:45.400 --> 00:10:50.039 ones, but then just taking 10. Cons here is a function that 00:10:50.040 --> 00:10:57.039 just produces the ending result as a list. So what happens 00:10:57.040 --> 00:11:00.479 here specifically is how we are avoiding intermediate 00:11:00.480 --> 00:11:04.238 allocations. First, the number 0 will come through. 00:11:04.239 --> 00:11:07.879 It will be pulled out of this source internally by transduce. 00:11:07.880 --> 00:11:10.919 It will make its way into the map. The map will add it. Then it 00:11:10.920 --> 00:11:15.799 will immediately go into this filter step. So it's not like 00:11:15.800 --> 00:11:19.119 all the maps occur, and then all the filters occur. We do 00:11:19.120 --> 00:11:24.039 everything for each element. So the 0 comes in, now it's 1. 00:11:24.040 --> 00:11:27.559 The filter would occur. Well, it's going to fail that 00:11:27.560 --> 00:11:31.119 because it's not even, so it will just bail there. Now we'll 00:11:31.120 --> 00:11:35.239 go to the next one. Now 1 will come, it will become 2, then 00:11:35.240 --> 00:11:39.119 it will be saved by this evenp call, and then the take will 00:11:39.120 --> 00:11:42.599 capture it, because we only want 10 values here. You can 00:11:42.600 --> 00:11:45.239 see 2, 4, 6, 8, and so on is the result that we 00:11:45.240 --> 00:11:49.332 expect. So let's play around a little bit. NOTE In Emacs 00:11:49.333 --> 00:11:53.336 Let's jump into Emacs and see what we can do. 00:11:53.337 --> 00:11:58.500 Alright, you should see my Emacs screen here. 00:11:58.501 --> 00:12:04.359 These are the actual notes for the actual 00:12:04.360 --> 00:12:08.959 presentation done in Org Mode. I'll boost that up in size for 00:12:08.960 --> 00:12:12.639 a little bit. That should be more than big enough for you. 00:12:12.640 --> 00:12:17.719 Just by changing the reducer, we can change the result. 00:12:17.720 --> 00:12:21.079 Okay, now it's a vector. Well, what else can we do to it? Well, 00:12:21.080 --> 00:12:25.959 let's just add up the results. Maybe we just want to count the 00:12:25.960 --> 00:12:30.919 results. Oh, indeed, there were 10. What if we want to find 00:12:30.920 --> 00:12:36.959 the average of the results? What if we want to find the median 00:12:36.960 --> 00:12:40.959 of the results? And so on. Here's some more interesting 00:12:40.960 --> 00:12:45.839 things that we could do. We could add different steps. So 00:12:45.840 --> 00:12:51.239 here we have all the integers. Let's add, hmm, okay, we'll 00:12:51.240 --> 00:12:57.399 keep that. We're going to add t-enumerate. What enumerate does 00:12:57.400 --> 00:13:00.879 is for each item that comes through, it is 00:13:00.880 --> 00:13:06.039 going to add a sort of index to it and make it a pair. In this 00:13:06.040 --> 00:13:08.719 case, it's going to be equal to what came in here. Well, we can 00:13:08.720 --> 00:13:12.399 change it. If we start this at 1, now it will be different. 00:13:12.400 --> 00:13:15.519 1 will be paired with 0, and then 2 would be paired 00:13:15.520 --> 00:13:19.559 with 1, and so on. We'll accept that the even call will change 00:13:19.560 --> 00:13:24.039 that a little bit. Why we're doing this is because we want 00:13:24.040 --> 00:13:27.279 to form a hash table. Let's move that down to 3, maybe 00:13:27.280 --> 00:13:31.439 we'll get a better result. What do we see? Okay, here now the 00:13:31.440 --> 00:13:37.359 result is a hash table. What are its values? Well, 0 seems 00:13:37.360 --> 00:13:40.479 to have... The key of 0 seems to be paired with 2, the key of 00:13:40.480 --> 00:13:42.909 1 seems to be paired with 4, 00:13:42.910 --> 00:13:47.411 and 2 seems to be paired with 6. 00:13:47.412 --> 00:13:51.293 Maybe let's jazz that up even a little bit more. 00:13:51.294 --> 00:13:52.973 We're going to start from a string 00:13:52.974 --> 00:13:57.943 and we'll call it hello. 00:13:57.944 --> 00:13:59.564 That's not going to work anymore 00:13:59.565 --> 00:14:02.585 and neither is that, but what we could do is 00:14:02.586 --> 00:14:05.498 we could say t-map #'string. 00:14:05.499 --> 00:14:08.627 I believe we'll do that. 00:14:08.628 --> 00:14:08.959 Let's see if that works. It did. So that's 00:14:08.960 --> 00:14:13.589 going to convert a character into a string. 00:14:13.590 --> 00:14:14.679 Let's just go two 00:14:14.680 --> 00:14:18.399 just to make it a little easier. Now you can see that we've 00:14:18.400 --> 00:14:21.919 constructed a hash table here. The key of 0 is mapped to the 00:14:21.920 --> 00:14:27.079 string of h and 1 is mapped to e. Now, I really like having 00:14:27.080 --> 00:14:29.468 this reducer in particular. NOTE Hash tables 00:14:29.469 --> 00:14:30.639 Know that hash tables are 00:14:30.640 --> 00:14:34.199 also legal sources. I find that both in Emacs Lisp and in 00:14:34.200 --> 00:14:37.119 Common Lisp, dealing with hash tables--like creating them 00:14:37.120 --> 00:14:41.599 and altering them--can be a bit of a pain. Having them 00:14:41.600 --> 00:14:45.679 immediately available like this with transducers is very 00:14:45.680 --> 00:14:49.079 handy, I find. We can work with something that wasn't a hash 00:14:49.080 --> 00:14:53.279 table. We can construct it in a way that makes it amenable to 00:14:53.280 --> 00:14:56.199 that, and then reduce it down into a hash table, and here you 00:14:56.200 --> 00:14:58.039 go. Very handy. NOTE Clarity 00:14:58.040 --> 00:15:06.399 One last point is that you can see very clearly what 00:15:06.400 --> 00:15:10.479 this is attempting to do, as opposed to, say, a for loop. It's 00:15:10.480 --> 00:15:12.719 very clear what that step is doing, and then you can see what 00:15:12.720 --> 00:15:15.119 that is doing, and you know that the result is going to be two. 00:15:15.120 --> 00:15:18.559 Each line is kind of its own declarative step, and it should 00:15:18.560 --> 00:15:22.159 be clear, just by staring at this, basically what you're 00:15:22.160 --> 00:15:25.399 going to get out. This is one main difference from other 00:15:25.400 --> 00:15:29.599 languages that have things--say, for instance, Rust's 00:15:29.600 --> 00:15:35.439 iterator API--is the difference between the transducers 00:15:35.440 --> 00:15:41.639 and the reducers. If we go up here, for example, the 00:15:41.640 --> 00:15:44.679 difference between the transducers and the reducers and 00:15:44.680 --> 00:15:48.119 the sources is not explicitly laid out, whereas with 00:15:48.120 --> 00:15:53.119 transducers, it is. You have to be aware of how these things 00:15:53.120 --> 00:15:55.799 are different. I think that that helps clarity. NOTE How do transducers work? 00:15:55.800 --> 00:16:01.999 Moving on. How do transducers work? Well, 00:16:02.000 --> 00:16:09.857 we want to go see the README. 00:16:09.858 --> 00:16:11.399 So, what we're going to do is 00:16:11.400 --> 00:16:19.102 we're going to go to here. 00:16:19.103 --> 00:16:21.959 You should still be able to see this. 00:16:21.960 --> 00:16:28.583 This is the CL example, actually. 00:16:28.584 --> 00:16:32.279 Let's go to transducers.el. 00:16:32.280 --> 00:16:37.744 Their APIs and READMEs are the same, 00:16:37.745 --> 00:16:39.919 but just for the sake of it, we will go see 00:16:39.920 --> 00:16:45.726 how this looks on the Emacs side, 00:16:45.727 --> 00:16:48.046 just so that nothing is a surprise. 00:16:48.047 --> 00:16:50.239 But recall that the APIs are essentially the same 00:16:50.240 --> 00:16:53.679 between the two. If you go to this section, writing your 00:16:53.680 --> 00:16:56.839 own primitives, you can read about how transducers are 00:16:56.840 --> 00:17:00.999 actually formed, whether or not you want to write them 00:17:01.000 --> 00:17:06.799 yourself or not. We can see here t-map. We accept the 00:17:06.800 --> 00:17:10.239 function that you want to operate with. Then you've got 00:17:10.240 --> 00:17:13.319 this extra little lambda here that's coming in, and it's 00:17:13.320 --> 00:17:17.079 receiving a thing that is named reducer. Now, while here 00:17:17.080 --> 00:17:20.439 we're calling it reducer, it's actually the chain of all the 00:17:20.440 --> 00:17:25.159 composed functions together. It's all those main 00:17:25.160 --> 00:17:28.479 transducer steps. Finally, it's the reducer all 00:17:28.480 --> 00:17:31.879 composed together with normal function composition. 00:17:31.880 --> 00:17:35.877 That will matter very soon. Now here's the actual meat. 00:17:35.878 --> 00:17:40.519 We can see the accumulative result that's coming in with the 00:17:40.520 --> 00:17:45.739 current element. Now we need to operate on this. 00:17:45.740 --> 00:17:47.840 Were it normally mapped, we would see us 00:17:47.841 --> 00:17:49.919 applying the F to the input. 00:17:49.920 --> 00:17:53.519 But here, you can see us applying the F to the input and then 00:17:53.520 --> 00:17:58.679 continuing on. So us calling the rest of the composed chain 00:17:58.680 --> 00:18:03.159 here is the effect of, in the previous slide, moving to the 00:18:03.160 --> 00:18:07.156 next step. We could ignore this line for now. 00:18:07.157 --> 00:18:13.819 If you're curious, please read the README in detail. 00:18:13.820 --> 00:18:15.579 Now, what about reducers? 00:18:15.580 --> 00:18:18.879 What do those look like? Well, let's just scroll 00:18:18.880 --> 00:18:22.439 down here. Recall that a reducer is a function that's 00:18:22.440 --> 00:18:26.959 consuming a stream, right? Zoom that up for you a little bit. 00:18:26.960 --> 00:18:33.919 Now, in the case of count, recall that this is how it's 00:18:33.920 --> 00:18:37.679 working, how we saw a moment ago. So clearly this list of five 00:18:37.680 --> 00:18:42.199 elements only has five things in it. Well, a reducer by 00:18:42.200 --> 00:18:47.599 structure is a function of two, one, or zero arguments. So we 00:18:47.600 --> 00:18:50.639 can see here in the case of two, this is the normal iterative 00:18:50.640 --> 00:18:54.519 case. We don't care about the input for count, we just care 00:18:54.520 --> 00:18:58.559 about the current accumulated count that we're doing, and 00:18:58.560 --> 00:19:02.879 we add one to it, and that's it. This then goes back to 00:19:02.880 --> 00:19:06.359 the loop and the whole process starts again with the next 00:19:06.360 --> 00:19:10.879 element. In this kind of done case, this is used internal to 00:19:10.880 --> 00:19:16.879 that sort of the supervising function transduce. It's just 00:19:16.880 --> 00:19:19.639 confirming the final result. Sometimes some 00:19:19.640 --> 00:19:21.839 post-processing is necessary here, but in the case of 00:19:21.840 --> 00:19:26.039 count, as it is so simple, that is not necessary. And now 00:19:26.040 --> 00:19:29.359 here's the base case. This is also used within that 00:19:29.360 --> 00:19:34.319 supervising transduce function at the very top. Well, if 00:19:34.320 --> 00:19:36.679 you're counting, you have to start from somewhere, right? 00:19:36.680 --> 00:19:37.349 In this case, well, what you're starting with is zero. 00:19:37.350 --> 00:19:40.251 In the case of cons, you'd be starting with an empty list. 00:19:40.252 --> 00:19:44.434 In the case of vector, you'd be starting 00:19:44.435 --> 00:19:53.999 with an empty vector and so on. 00:19:54.000 --> 00:19:56.799 Once again, if you are more curious, please take a look at 00:19:56.800 --> 00:19:57.679 the README. NOTE Transducers in the wild - CSV 00:20:00.520 --> 00:20:06.039 Okay, transducers in the wild. Well, let's go take a look at 00:20:06.040 --> 00:20:07.639 processing some CSV data. 00:20:07.640 --> 00:20:21.319 We're going to open up a new Emacs Lisp bracket here. So I have 00:20:21.320 --> 00:20:28.839 a file. And in this file, let's just go look at C-x b right 00:20:28.840 --> 00:20:34.839 there, you will see that we've got some bank transaction 00:20:34.840 --> 00:20:37.879 information. It's got these transactions from a whole 00:20:37.880 --> 00:20:40.199 bunch of different people into different accounts, 00:20:40.200 --> 00:20:43.879 whether it's money coming in, money going out, and then a 00:20:43.880 --> 00:20:47.839 basic description. How's your Latin? But for this little 00:20:47.840 --> 00:20:53.679 test, what we want to do is we want to find Bob's final bank 00:20:53.680 --> 00:20:59.679 balance. Let's get on to it. First of all, let's 00:20:59.680 --> 00:21:04.444 just confirm, let's do some basic stuff. 00:21:04.445 --> 00:21:10.844 with-current-buffer, find-file-noselect. 00:21:10.845 --> 00:21:15.542 What's the name of that file? 00:21:15.543 --> 00:21:17.439 This is pre-organized, so you 00:21:17.440 --> 00:21:20.879 will just see it right here. 00:21:20.880 --> 00:21:26.999 t-transduce and t-comp. We don't know what we're going to comp 00:21:27.000 --> 00:21:33.039 yet. Actually, I'll just pass to show you. And then we will 00:21:33.040 --> 00:21:36.999 see, let's just do a little t-count just to confirm. What's 00:21:37.000 --> 00:21:45.112 our source? Well, our source is a buffer, t-buffer-read. 00:21:45.113 --> 00:21:50.153 And note that because we're using with-current-buffer, 00:21:50.154 --> 00:21:55.079 if we go like this, if we go current-buffer, this will just work. So 00:21:55.080 --> 00:21:59.919 now let's... Well, that was odd. I should have done it like 00:21:59.920 --> 00:22:02.159 that. There we go. So now we should make that a little smaller 00:22:02.160 --> 00:22:04.799 so you can see what it is. Now if we hit RET, we should get the 00:22:04.800 --> 00:22:09.559 right result. Okay, so there are 50,001 lines in this file, 00:22:09.560 --> 00:22:13.516 but the one extra one is the name of the headers, right? 00:22:13.517 --> 00:22:18.079 We want to process this file in more detail. So how can we do 00:22:18.080 --> 00:22:22.079 that? Well, let's start by just automatically 00:22:22.080 --> 00:22:28.799 interpreting the results as CSV. If we do that, okay, well 00:22:28.800 --> 00:22:31.559 now we only have 50,000 entries as we expected, right? 00:22:31.560 --> 00:22:36.759 Because it's going to pull out the header line. If we now say 00:22:36.760 --> 00:22:42.679 we want to just filter out, you know, We only want Bob, right? 00:22:42.680 --> 00:22:53.679 So if... gethash, it was in the row of name. Each line here is 00:22:53.680 --> 00:22:57.079 made into, at least by default, is made into a hash map. So if 00:22:57.080 --> 00:23:02.759 we go like this, we should see that. Okay, so 12,000 of these 00:23:02.760 --> 00:23:05.639 lines or thereabout belong to Bob. 00:23:05.640 --> 00:23:13.839 Let's just move that over a little bit. Actually, I suppose we don't even 00:23:13.840 --> 00:23:17.799 need that anymore. I'll just keep that full size for you. 00:23:17.800 --> 00:23:24.399 Okay, so all right, there's about 12,000 results for Bob of 00:23:24.400 --> 00:23:32.479 the 50,000. What's next? Well, we want to confirm, 00:23:32.480 --> 00:23:40.039 we want to pull out everything, 00:23:40.040 --> 00:23:43.079 all of the in and the out entries. 00:23:43.080 --> 00:23:56.279 Thank you. So, string to number, because we know that 00:23:56.280 --> 00:24:01.239 everything came in as strings. Unfortunately, the from-csv 00:24:01.240 --> 00:24:03.799 doesn't try to be smart at all, it's just pulling everything 00:24:03.800 --> 00:24:09.479 in as string values. If you want actual things to be 00:24:09.480 --> 00:24:13.399 numbers or whatever, that is up to you to do the parsing 00:24:13.400 --> 00:24:20.679 yourself. Okay, so we have those two values now. We know 00:24:20.680 --> 00:24:23.879 that we saw from the data just a moment ago that you're only 00:24:23.880 --> 00:24:26.999 going to have a value in one column or the other. It's either 00:24:27.000 --> 00:24:29.119 going to be 0 in the empty one, or you're going to have some 00:24:29.120 --> 00:24:32.159 number in the other. So we know that we can just naively add 00:24:32.160 --> 00:24:35.479 them. If it was in, it would always be positive. So we'll just 00:24:35.480 --> 00:24:41.519 add that. But in the negative case, we want to just make it 00:24:41.520 --> 00:24:45.279 negative really briefly before we add them all together. 00:24:45.280 --> 00:24:50.519 let's now just prove to ourselves that we are sane here. What 00:24:50.520 --> 00:24:52.479 we're going to do is we're going to quickly go say take 00:24:52.480 --> 00:24:57.039 5 just to convince ourselves, and we'll go cons, and let's 00:24:57.040 --> 00:24:59.839 see if we get kind of results that make sense. Okay, these 00:24:59.840 --> 00:25:02.799 sort of make sense. It looks like you know Bob's got some big 00:25:02.800 --> 00:25:07.679 expenses here. If we take say 15, does it look any better? 00:25:07.680 --> 00:25:10.319 Okay, looks like he had a payday. All right, good job Bob. 00:25:10.320 --> 00:25:15.439 Let's get back in there. Now we only really care about 00:25:15.440 --> 00:25:20.119 adding the final result, right? So there we go. Add that all 00:25:20.120 --> 00:25:24.559 together and we'll see what we get in a moment. Okay, wow, 00:25:24.560 --> 00:25:27.519 Bob's rich. Okay, so it looks like in his 12,000 00:25:27.520 --> 00:25:32.279 transaction, Bob has an overall net worth of $8.5 million. 00:25:32.280 --> 00:25:34.439 Looking pretty good. 00:25:34.440 --> 00:25:38.999 So here's an example of how you can, particularly in Emacs 00:25:39.000 --> 00:25:42.959 Lisp, how you can very easily just get a file, consider it the 00:25:42.960 --> 00:25:45.879 current buffer, and then just do whatever you want to it. 00:25:45.880 --> 00:25:50.359 Note that there is sort of first-class support for both CSV 00:25:50.360 --> 00:25:54.359 and JSON, and then you have, and both of those bring in their 00:25:54.360 --> 00:25:57.719 values as hash maps, and then you're just free to do whatever 00:25:57.720 --> 00:26:00.439 you want and process them, potentially both writing them 00:26:00.440 --> 00:26:03.239 back out as CSV or JSON once again. NOTE Issues and next steps 00:26:03.240 --> 00:26:10.719 Some issues with transducers that can come up is 00:26:10.720 --> 00:26:14.919 that one, a zip operator is missing, but I'm working on it. 00:26:14.920 --> 00:26:19.399 Two is that performance, particularly in Emacs Lisp, isn't 00:26:19.400 --> 00:26:24.119 that great. It could be due to the sort of nested lambda calls 00:26:24.120 --> 00:26:27.759 that have to occur internally, but the common Lisp 00:26:27.760 --> 00:26:32.239 implementation is quite good. and there's yet no support 00:26:32.240 --> 00:26:35.399 for parallelism. You can imagine that a lot of those steps 00:26:35.400 --> 00:26:38.559 you could potentially perform in parallel depending on the 00:26:38.560 --> 00:26:44.399 platform, but research has not yet gotten that far. Okay, 00:26:44.400 --> 00:26:47.639 that's all. Thank you very much. If you have any questions, 00:26:47.640 --> 00:26:51.240 please contact me.