[[!sidebar content=""]]
[[!meta title="Using SQLite as a data source: a framework and an example"]]
[[!meta copyright="Copyright © 2022 Andrew Hyatt"]]
[[!inline pages="internal(2022/info/sqlite-nav)" raw="yes"]]
<!-- Initially generated with emacsconf-generate-talk-page and then left alone for manual editing -->
<!-- You can manually edit this file to update the abstract, add links, etc. --->
# Using SQLite as a data source: a framework and an example
Andrew Hyatt (he/him)
[[!inline pages="internal(2022/info/sqlite-before)" raw="yes"]]
[[!template id="help"
volunteer=""
summary="Q&A could be indexed with chapter markers"
tags="help_with_chapter_markers"
message="""The Q&A session for this talk does not have chapter markers yet.
Would you like to help? See [[help_with_chapter_markers]] for more details. You can use the vidid="sqlite-qanda" if adding the markers to this wiki page, or e-mail your chapter notes to <emacsconf-submit@gnu.org>."""]]
Emacs can now be built with SQLite, giving native support for reading
and writing to a database. With this, we can start seriously
considering a SQLite-first approach: instead of storing data on the
filesystem, and using various ad-hoc solutions for metadata, we can
use SQLite to store and search our data. This is essentially a
tradeoff between the power and speed of SQLite and the universality of
the filesystem. If we accept that this approach is useful, then a
standard way to store information in database, may be useful and
promote package interoperability, just as our single filesystem does.
The triples packages is a RDF-like database for supplying such a
flexible system for storing and retrieving data from SQLite. A sample
application, ekg, a replacement for org-roam, is shown using this, and
the advantages of the triple design are explained.
For more information and the packages discussed here, see the
[triples](https://github.com/ahyatt/triples) and
[ekg](https://github.com/ahyatt/ekg) pages.
# Discussion
- <https://www.reddit.com/r/emacs/comments/zexv6b/using_sqlite_as_a_data_source_a_framework_and_an/>
- <https://news.ycombinator.com/item?id=33853509>
## Notes
- <https://github.com/ahyatt/triples>
- <https://github.com/ahyatt/ekg>
- <https://www.gnu.org/software/hyperbole/man/hyperbole.html#Implicit-Buttons> -
discussed in the next talk could really help simplify access to your
triples and ekg primitives; have a look when you have time.
- You had some delicious recipes in there. Made ME hungry!
- Thanks, that's insightful
## Questions and answers
- Q: To what extent did Datomic influence the design of triples?
<https://www.datomic.com/>
- A: I wasn't aware of Datomics, but I think both triples and
Datomics are influenced by RDF & trends in Knowledge Graph
construction in the industry. I took a look at that page, and
one interesting thing they do is (AFAICT) store edits to the
database instead of actual values, allowing you to reconstruct
the database or go forward or backward in time. This is very
interesting, and I'm thinking of ways to make the database
safer, but not sure if I can get to such sophisticated
properties in this implementation.
- Q:built into Emacs? nice. multiple schemas (so to say
differentiation of "databases" (if you will so) are possible with
that built-in instance?
- A: Yes, with emacs 29. Full-featured with multiple databases,
transactions, etc.
- Q:What about collaborative editing whith this? Multiple computers
with multiple emacs like crdt.el with org mode?
- A: Database are great for more async collaboration, multiple
people / processes can add to the repository at the same time.
But I think it's not going to be a great solution for multiple
users modifying the same buffer.
- Q:What about using this on multiple computers? How would you
syncronize the data? (This is a minor problem in org-roam where if
you share files between multiple computers, their SQLite databases
get out of sync and require M-x org-roam-db-sync to rebuild the
SQLite database.)
- A: This is an unsolved problem, one that I'm interested in
looking into. There possibly are standardized db solutions to
this, but I don't yet know what they are.
- Q: With EKG what about views like org roam node mind map view? Or
org mode virtual view for integration with other org packages?
- A: This is possible, it just needs to interface with the
database in a different way. It's all graphs, so really any
triple library might be a good fit for this.
- Q: Are you planning to further develop EKG? It is highly
interesting to me, I do prefer SQLite over text.
- A: Andrew is using it; not ready for general use but quite soon
-- towards the end of the month! Still in exploratory mode
though. Still thinking about (some of) the fundamental
concepts.
- Q: Is it then possible to combine the triples DB with some custom
tables in the same SQLite file? (e.g. to build a log table next to
the triples tables for quick query of event data)
- A: You could do that. AT the moment it's just one table
(triples). It's designed to be one table in one DB -- beware
of consistency issues if you add further tables, which you can
definitely do.
- Q: What are your thoughts on adding a timestamp attribute to triples
so that the DB becomes append-only and by default you return the
latest fact for a subject/object pair?
- Q+ -> Use is to keep a record - you don´t delete? e.g. you get
all past addresses of a person or all past versions of a given
fact. Even version control for notes.
- A: I haven't thought of that / not seen in other triple stores
- A: Be ware that these DBs already take quite a bit of space
- A: may make synchronization easier
- Q: can ordinary lisp data types (lists, symbols, etc) be stored in
the data base
- A: Yes, if you don't specify, it defaults to a list. Not sure
that was the right design choice; lists map to rows with a
hidden index column. emacs-sql and this also represent most
things as strings.
- Q: beyond note taking what kind of packages do you think would
benefit from triples library?
- A: Anything where you have lisp forms stored in a file would
probably be better implemented via a database. And the triples
library makes this easy and standardized. So, for example BBDB,
which is a "database" should actually be in a database. And
then you might want to annotate, tag, etc, so having be in one
big database makes a lot of sense to me, because I think all
this kind of info wants to live together.
- Q: Are you trying to create a PIM with EKG? What information do you
primarily want to manage?
- A: Yes, I think many uses of emacs is in line with PIM, and
those use-cases are a good fit for triples / ekg. Notes,
people, projects, maybe even being able to integrate with org
and manage TODOs there as well.
- Q: What about using other databases programs Postgres mongoDB etc..
[see last q too, but I guess you refer to other relational, e.g.
Postgresql]
- A: Those could work, maybe the triples library would work via
emacsql with those, but I haven't tested it. I'm not sure
what the benefit would be, typically these database tend to be
simple and small, so a more full-featured DB is probably
overkill. MongoDB and those kinds of row-oriented databases
probably wouldn't be a good fit, but I haven't tried it.
- Q: What is your preferred reference to understand triples/graph dbs?
(e.g. think better about schema design)
- A: I know from using them / talking about them. I will come back
with some references!
- Q: Will it slow down with a growth of database?
- A: there is a tradeoff -- triples gives you a standard schema,
but you lose the power of getting things in one SQL expression -
which makes it slow. But I have a bunch of data (2yrs of
org-roam usage) and it is still very fast. These limits exist,
but the usual rate of content creation is not large enough to
hit the limits.
- Q: What are your thoughts on allowing for a "true" graph-db
backend? (whatever the current best free software alternative to
neo4j is, I guess).
- A: In my usage, the graph DBs tended to be slow and somewhat
clumsy in usage, we returned every time to SQL [please reword
if appropriate]. At the moment not a glaring need (for the
quantities of data people manage in Emacs).
- Q: How hungry did you get while writing and recording this?
- A: I forgot that I used recipes as an example in my demo!
org-roam / ekg and other things are a great way to cook better,
BTW. When you make a recipe, write a org-roam daily (or in ekg,
an entry tagged with the date and the recipe) with notes about
how it went, what could be better, etc. Then you can later see
the recipe and notes on it at once, which helps you further
refine the recipe.
- Q: beyond note taking what kind of packages do you think would benefit from triples library
Other discussions from IRC:
- I like the font he's using in his org doc.
- In some way, triples (turtle, RDF and similar things) describe a directed graph, where echa edge is like: Subject--Predicate-->Object. A datalog can describe this too (for example: with predicates like triple(S,P,O)).
[[!inline pages="internal(2022/info/sqlite-after)" raw="yes"]]
[[!inline pages="internal(2022/info/sqlite-nav)" raw="yes"]]
[[!taglink CategoryEmacsLisp]]