summaryrefslogblamecommitdiffstats
path: root/2024/talks/transducers.md
blob: 8f5ee12d183b26ba441f88e9e9d7935276409983 (plain) (tree)








































                                                                                                                 


































































































































































                                                                                                                                                                                                                                                                             





                                                                   
[[!meta title="Transducers: finally, ergonomic data processing for Emacs!"]]
[[!meta copyright="Copyright © 2024 Colin Woodbury"]]
[[!inline pages="internal(2024/info/transducers-nav)" raw="yes"]]

<!-- Initially generated with emacsconf-publish-talk-page and then left alone for manual editing -->
<!-- You can manually edit this file to update the abstract, add links, etc. --->


# Transducers: finally, ergonomic data processing for Emacs!
Colin Woodbury (he) - <https://x.com/@fosskers> ,  @fosskers@m.fosskers.ca on Mastodon, <https://www.fosskers.ca>

[[!inline pages="internal(2024/info/transducers-before)" raw="yes"]]

Transducers are an ergonomic and extremely memory-efficient way to process a
data source. Here "data source" means simple collections like Lists or
Vectors,
but also potentially large files or generators of infinite data.

Transducers&#x2026;

-   allow the chaining of operations like map and filter without allocating memory between each step.
-   aren't tied to any specific data type; they need only be implemented once.
-   vastly simplify "data transformation code".
-   have nothing to do with "lazy evaluation".
-   are a joy to use!

In this talk, Colin will introduce Transducers, show how to use them, and
demonstrate some Emacs-specific workflows that make live processing of large
data sets in JSON and CSV a breeze.

About the speaker:

Colin has been active in the FOSS world since 2011, publishing libraries and
applications primarily in Haskell and Rust. Since 2023 he has been using
Lisps
more and more, and after falling in love with Transducers from Clojure has
ported the pattern to three other Lisps.

Colin is originally from Canada and lives in Japan.


# Discussion

## Questions and answers

-   Q: When I tried comparing transducers.el to cl-lib and dash
    (benchmark-compiled), I got the following results:
	```
cl-lib: 0.5 sec, 2 GCs
dash: 0.3 sec, 1 GC,
transducers: 0.75 sec, 2 GC
cl-loop: 0.025 sec, 0 GC (note: 0.025, one order-of-magnitude faster)
I expected transducers to be slower than cl-loop but faster than the
cl-lib or dash.  However this isn't the case.  Any idea why? (benchmark
is here:
[https://old.reddit.com/r/emacs/comments/1h5c778/which_emacsconf_2024_talks_have_your_attention/m061dge/](https://old.reddit.com/r/emacs/comments/1h5c778/which_emacsconf_2024_talks_have_your_attention/m061dge/){rel="noreferrer noopener"})
	```
	-   (benchmark-run-compiled 1000  (cl-loop for n from 1 below
            2000 by 2           sum (\* n n) into total          
            finally return total))
-   -   A: Loop is always going to win in cases like this where we are
        not doing two nested (e.g.) change calls, what I called comp
        steps.  tranducers shines while we need to do things which chain
        things together; we can sometimes express ourselves more clearly
        vs loop.  this may sound sounds like moving the goal-posts:
        it's really about internal function calls, avoiding stepping
        though each loop in ways which loop doesn't need to do, so loop
        might "win".
    -   Note: I'm comparing against cl-lib and dash \-- the cl-loop is
        only for reference. I'm curious about the performance gap
        between transducers and cl-lib/dash.  The number of function
        calls is the same for cl-lib and dash and transducers.

-   Q: Did you think about generators as a source cf lists, vectors,
    etc? Maybe I got a word wrong. After the development that generators
    and Series operations needed-each-other, not being redundant as had
    been thought. I forgot who the generators person was in lisp.
    -   A:

-   Q:  Do you know of any theoretical texts on transducers?
    -   A: My README and Rich Hickey (inventor of Clojure) may be the
        key texts on transducers. 
        -   and his talks/videos (on the topic)
    -   [https://andreyorst.gitlab.io/posts/2022-08-13-understanding-transducers/](https://andreyorst.gitlab.io/posts/2022-08-13-understanding-transducers/){rel="noreferrer noopener"}
    -   (not fosskers): I think AIM-1082 is interesting to read. 
        ([https://dspace.mit.edu/handle/1721.1/6035](https://dspace.mit.edu/handle/1721.1/6035){rel="noreferrer noopener"}?)
        Yes

-   Q: Waters (lazy series in lisp, late 70s) said that this \*should
    have\* been done as an additional compiler feature in compilers, but
    if not it must be a macro package. Did you think about that viz your
    cl, fennel, elisp, porting of your transducers?
    -   A: I think my work could help provide the basis for this;
        there's much more to be done.

-   Q: Does t-buffer-read provide a lazy stream that's linewise, or
    charwise, or do something else entirely?
    -   A: t-file-read

-   Q: Can the Elisp library be combined with the stream.el API
    ([https://elpa.gnu.org/packages/stream.html](https://elpa.gnu.org/packages/stream.html){rel="noreferrer noopener"})? 
    Or seq in general?
    -   A: I'd say these libraries are completely orthogonal. (Re: what
        made me ask this question was seeing \`t-buffer-read' and
        hoping that this doesn't convert a buffer into a string)  With
        seq/generics it should just work: magic of defgeneric

-   Q: How does one debug a t-comp expression? Can you single step and
    see intermediate results of the different statements you declare?
    -   A: In Common Lisp it is wonderful. In Emacs Lisp it is more
        complicated. I don't know if step debugging would work, but
        there is a "log" (?) function to print out values.

-   Q: Is there a path for transducers to enable elisp processing of
    otherwise overly large datasets as if just normal Emacs "buffers"
    (i.e. just pulling one thing at a time so essentially stream-like
    under the hood but buffer-like in interface), with none of the usual
    perf issues with a traditional buffer structure?
    -   A: Possibly so yes

-   Q: Re the performance issues mentioned and that popped up recently
    in the lists/forums, to what extend does tailcall optimization and
    other mechanisms (incl. inlining, GC-friendliness, etc.) could
    eventually alleviate issues and enable their use at little to no
    extra cost?
    -   A: Over time, with further work and optimizations. Some already
        there (taillcall notably)

-   Q: Is there an option to read a csv/json and produce an alist or
    plist instead of a hash table for an entry?
    -   A:  Absolutely.

-   Q: Is the common lisp version ready for 'production' use? Is it
    complete enough and the API stable enough?
    -   A: I use it all the time. I use it extensively. Programming a
        game, not realizing a dent in the frame rate.

-   Q: Do we need a pre-written "t-" version for every already
    existing reducing function like + or is there a function to
    construct them from already defined reducer 2-arg functions?
    -   A:

-   Q: Is the compelling argument for transducers is that it's a better
    abstraction? Seems like a lot of concerns/objections (while
    pragmatically valid) are focused on implementation. Can this
    abstraction allow for advances in implementation?
    -   A:

-   Time for the closing remarks, but people can stay on the pad or join
    the BBB and keep going

## Notes

-   \<lounge-081\> What made transducers click for me way back when was
    realizing that the usual operations (think map/reduce/filter/etc and
    their generalizations) implicitly or explicitly demand realized
    collections, plus intermediate ones between those operations (and
    GC), instead of composing such transformations into one operation
    and then making this applicable to streams of unknown-sizes and
    such.
    -   \<robin\> lounge-081, ah, like\...\*thinks very hard\*\...stream
        fusion, iirc?
        [http://lambda-the-ultimate.org/node/2192](http://lambda-the-ultimate.org/node/2192){rel="noreferrer noopener"}
        that makes a lot of sense
    -   \<lounge-081\> "Rich Hickey has a point" need never be said :)
-   \<Ez3\> Sorry but map is collect and filter is select for me :)
-   \<lounge-081\> \@robin: there are many ways to get to them (some may
    already think of those functions as folds, etc.), but for the bulk
    of people familiar with map/reduce/filter/etc, it's useful to enter
    the thinking realizing that you may have infinite streams (of any
    type even) on input, and may not want to have realized collections
    at every step of the usual function applications, thus the need to
    compose the things that make up a map/reduce/filter/etc before
    applying them one by one on a continued stream of things one at a
    time
-   \<NullNix\> ellis: I wrote about half of one in binutils libctf
    (generators, anyway). See the \*\_next() functions. Having seen this
    I'll try to mutate the internals to something closer, right now
    every iterator is independently implemented (mostly).
    -   \<NullNix\> (inspired by generators, not transducers, since I
        didn't know about them)
    -   \<NullNix\> still \*far\* less ugly than "call a function with
        every item" \*\_iter() stuff
-   \<lounge-081\> Thanks for the answers fosskers, I'm quite hopeful
    with transducers working their way into many things, so thinking
    ahead to where that may happen and to solving perf hurdles
-   \<ElephantErgo\> I'm totally sold. I'm working on a CL codebase
    right now, and these are going in there immediately
-   \<robin\> (also CL does not require TCO but good compilers support
    it to the extent that extensive use of FP is practical)
-   \<ellis\> it's a tricky protocol to get perfect the first time for
    sure, is something that transcends language barriers so always fun
    to see more impls :)
-   \<robin\> CLTL2 docs on SERIES for those who are curious
    [http://www.cs.cmu.edu/afs/cs.cmu.edu/project/ai-repository/ai/html/cltl/clm/node347.html#SECTION003400000000000000000](http://www.cs.cmu.edu/afs/cs.cmu.edu/project/ai-repository/ai/html/cltl/clm/node347.html#SECTION003400000000000000000){rel="noreferrer noopener"}
    -   \<robin\> (lisp.se mirror in case the ai repository disappears
        someday:
        [http://cltl2.lisp.se/cltl/clm/node347.html](http://cltl2.lisp.se/cltl/clm/node347.html){rel="noreferrer noopener"}
        )
-   \<robin\> definitely watching this one more carefully. if it's
    CLOS-oriented i'm going to like it
-   \<robin\> note that full TCO is strictly more expressive than
    special-case optimizations as with emacs's cl-labels


[[!inline pages="internal(2024/info/transducers-after)" raw="yes"]]

[[!inline pages="internal(2024/info/transducers-nav)" raw="yes"]]