summaryrefslogtreecommitdiffstats
path: root/2023/talks/collab.md
diff options
context:
space:
mode:
Diffstat (limited to '2023/talks/collab.md')
-rw-r--r--2023/talks/collab.md195
1 files changed, 195 insertions, 0 deletions
diff --git a/2023/talks/collab.md b/2023/talks/collab.md
new file mode 100644
index 00000000..1a88809b
--- /dev/null
+++ b/2023/talks/collab.md
@@ -0,0 +1,195 @@
+[[!meta title="Collaborative data processing and documenting using org-babel"]]
+[[!meta copyright="Copyright © 2023 Jonathan Hartman, Lukas C. Bossert"]]
+[[!inline pages="internal(2023/info/collab-nav)" raw="yes"]]
+
+<!-- Initially generated with emacsconf-publish-talk-page and then left alone for manual editing -->
+<!-- You can manually edit this file to update the abstract, add links, etc. --->
+
+
+# Collaborative data processing and documenting using org-babel
+Jonathan Hartman (he/him), Lukas C. Bossert (he/him) - <https://mastodon.social/@lukascbossert>, <mailto:hartman@itc.rwth-aachen.de>, <mailto:bossert@itc.rwth-aachen.de>
+
+[[!inline pages="internal(2023/info/collab-before)" raw="yes"]]
+
+In our presentation we will show an efficient way of combining
+information and enriching it by retrieving data, processing it, and
+finally exporting it, all with org-mode. In this presentation, we will
+demonstrate not only org-mode, but also a few companion libraries that
+add functionality such as knowledge graph visualizations, literate
+programming, and collaborative editing to quickly create a deeply
+informative reference page.
+
+The starting point of our best practice is the National Research Data
+Infrastructure Germany (NFDI), about which we intend to retrieve and
+process certain information data gathered from wikidata. For this, we
+are additionally leveraging the "org-roam" emacs package, which
+provides functionality for quickly and simply linking together notes
+and ideas into a custom knowledge graph. Initially, we will write a
+short abstract about the NFDI and embed it into our existing knowledge
+graph by linking it to other existing nodes. In the visualized graph
+(using the “org-roam-ui” package), links and secondary connections to
+other existing nodes can now be revealed.
+
+Next, we would like to enrich the text about the NFDI by with data
+retrieved from the Wikidata API. A convenient way of creating
+self-documenting code is the approach called “literate programming”,
+which presents program logic embedded within human language text. In
+Emacs we achieve this by using the “org-babel” package. Perhaps now we
+find it is helpful to collaborate with a colleague in the document:
+while one is writing the code, the other can explain its use and
+interpret the results. We will do this simultaneously in the same
+document using a method called “crdt” (conflict-free replicated data
+type) and – of course – there is also an implementation of this in
+Emacs. The results of the code blocks can be used for further analysis
+and shared throughout the same document.
+
+Finally, for the sake of proper and barrier free documentation, we
+show how to export the document to various formats like pdf, html, txt
+etc. using either the built-in feature of org-mode or the
+implementation of pandoc.
+
+About the speakers:
+
+**Jonathan Hartman** is a trained data scientist and works at the IT
+Center of the RWTH Aachen University, Germany.
+
+**Lukas C. Bossert** is a trained classical archaeologist and is deputy
+head of the department "research process and data management" at the
+IT Center of the RWTH.
+
+Lukas, an intermediate Emacs user, is currently exploring how to
+optimize his daily workflow by leveraging various Emacs packages. On
+the other hand, Jonathan is a relative newcomer to this environment,
+encountering common pitfalls faced by beginners. Together, they
+explore the capabilities and functionalities of org-mode, discovering
+how it can enhance data management and presentation in their research
+processes.
+
+[[!img /i/emacsconf-2023-collab-sponsorship.png alt="Lukas and Jonathan are financed by the DKZ.2R Datenkompetenzkolleg Rhein-Ruhr (16DKZ2030E), www.dks2r.de"]]
+
+# Discussion
+
+## Questions and answers
+
+- Q: How reliable it resolves the conflict? I mean, for my personal
+ use case, for example, Sycnthing, sometimes it's not working
+ perfectly and I had to manually edit it. How is it robust compared
+ to syncthing?
+ - A (Lukas): We  also faced sometimes issues that letters got
+ mixed up. We couldnt figure out what caused it and it was not
+ reproducable . I cannot compare it to syncthing, never used that
+ with emacs/org-mode.
+- Q: How's the security for this kind of things? I mean, if we adopt
+ these things in our PAD, is there any, can this thing execute
+ arbitrary (elisp) code in different people's computer? (Think like
+ an adversary!)
+ - A: (Lukas)  As far as we saw the code is executed on the local
+ computer, see the part with the R-code in our video. 
+ - (zaeph) We had plans with qhong (maintainer of crdt.el) to
+ tunnel the connection via SSL, but we were blocked by the SSL
+ library that shipped with Emacs, sadly.  However, we did create
+ a security policy that allowed restrictions on the execution of
+ Elisp code. (great!)
+- Q: Really nice talk and demo!  You guys clearly rehearsed :).  I
+ always wonder with serial data processing sequencing like this, to
+ what degree do the intermediate outputs need to appear inline in the
+ text?  Suppose you had 50,000 or one million rows from your initial
+ wikidata (or similar) call.  How would you handle that size of data
+ using a collaborative, literate approach like this?
+ - A: (Lukas) Good question. In your local buffer there is no
+ difference and for the collaborative partner I cannot tell. We
+ testet it with 50 items because that was enough for
+ demonstrating our purpose.
+ - noweb allows getting results of evaluation without having to put
+ the actual data into Org buffer - just arrange the original
+ block generating the data to have :results silent. Basically,
+ :var foo=block-name does not require "block-name" to be
+ evaluated in advance - it will be evaluated as necessary. AFAIU,
+ in the talk, it is re-evaluated every time (to not have it, one
+ would need :cache t).
+ - This has tremendous utility
+ - So it would be stored on disk and referenced by name in a
+ subsequent block?  Sounds useful.  
+ - Not on disk - just cached within a single session. To store
+ on disk, need to save to actual file on disk.
+- Q: How do you handle the viewing of larger or really any tabular
+ data in Emacs/Org when you want to inspect it, like the nice way
+ tabular data is displayed inline in Rmarkdown/RStudio?
+ - A: (Lukas) I have no particular way of doing this. 
+ - What about pandas data summary functionality? Can be a simple
+ python block.
+ - Lukas: Jonathan is our python expert, he might answer this
+ question.
+ - A: (Jonathan) If I follow, you can certainly just use
+ DataFrame.describe() or Series.describe() to get summary
+ statistics for a dataset - the return value would be a Series or
+ a DataFrame, which would be displayed similiarly to how we show
+ things here. Alternatively, DataFrame.head(n) or
+ DataFrame.sample(n) would return a dataframe of the first n / n
+ random lines of a dataset, and might be a way of providing the
+ gist of a very large dataset without printing the entire table
+ in the document.
+ - Would be nice to have a "summarized table" functionality in
+ Org, that includes an abridged copy of a long table inline, but
+ you can open it in another buffer to browse/edit the full table
+ (ala block edit).  
+ - Feel free to post a feature request - see
+ <https://orgmode.org/manual/Feedback.html#Feedback>
+- Q: I'm thinking about an application for a single user, but in
+ different platforms. In a simple case. For example, you have a
+ buffer in your local computer, and you also want to have some files
+ on your pad or on your phone, and you can use this CADT concept to
+ make sure that there's not too much conflict in between different
+ editing sections. Do you think this is a good idea? I mean, compared
+ to purely relying on Syncthing, which sometimes I feel is unreliable
+ for resolving those conflicts.
+ - A: (Lukas) This sounds very interesting and could beneficial for
+ contiously working on things.
+
+## Notes
+
+- I like the way you highlight the point you are talking about in real
+ time.
+- Conflict-free Replicated Data Types (CADT) ::
+ <https://github.com/emacs-straight/crdt>
+- !This is the future of PAD for our conference.
+- Just came here to say watching two users editing the same buffer
+ simultaneously is BLOWING MY MIND 
+ - BLOWING MY MIND  +2
+ - blowing my mind, too ...
+ - WOW
+- Gitlab custom-export.setup
+ - What about it?
+ - I am looking for that setup file and want to try it :) 
+ -->
+ <https://git.rwth-aachen.de/dl/workshops/collaborative-coding-with-emacs/-/blob/main/emacs/custom-export.setup>
+ - Thank you!
+- Truly one of the most impressive talks of the day. Congrats! Very
+ inspiring
+ - Yes, indeed. 
+ - (Lukas) Wow! Thank you. We werent sure if this is worth showing
+ at EmacsConf because there already have been plenty of talks
+ about literate programming and org-babel....
+ - Great collaborative conversation and step-wise example
+ creates a different (and impactful) framing.  Thank you!
+- crdt is fantastic; pity that most (all but one) of my collaborators use Word & VS Code. 🙁
+- that's really cool. One of the parts that's a bit hidden from the user is seeing the format that the data is in inside the shell script
+- it is whatever constitutes the closest equivalent of table in sh (array)
+ - yeah, you have to keep the representation in mind when filtering it as text through sed
+- this demo is so cool :D
+- Really, really impressive I have to admit
+- HA. you cannot evaluate in place so seamlessly in that way with Rmarkdown :). And you cannot combine named blocks in this way either. Wish more folks used emacs.
+- wow, so `#+CALL` can be embedded in text via `call_()?` TIL
+- such a slick presentation, I like the CRDT collaboration angle, looks like an end-game UX
+- Impressive workflow!
+- great presentation!
+- For those of you who remember the bad old days before "reproducible research," that talk is even more impressive. Great job!
+ - i was prolly not there in the bad old days, but imho reproducible research is a pressing, current problem.
+- I feel like that talk video should be shared on Hacker News
+
+
+[[!inline pages="internal(2023/info/collab-after)" raw="yes"]]
+
+[[!inline pages="internal(2023/info/collab-nav)" raw="yes"]]
+
+