[[!meta title="Emacs and private AI: a great match"]] [[!meta copyright="Copyright © 2025 Aaron Grothe"]] [[!inline pages="internal(2025/info/private-ai-nav)" raw="yes"]] # Emacs and private AI: a great match Aaron Grothe (he/him) - Pronunciation: Air-un Grow-the, LinkedIn: , [[!inline pages="internal(2025/info/private-ai-before)" raw="yes"]] When experimenting with using AI with Emacs, many users have concerns. A few of the concerns that people have are the possibility of their information being shared with the AI provider (either to train newer models, or as a potential revenue source), the possibility of running up unpredictable costs with their cloud provider, and the potential environmental impact of using cloud AI. Using Private/Local AI models provide an AI environment that the user can fully control. User can add to it incrementally over time as their skills and experience grows. This talk will be a quick intro to using Ollama Buddy, Ellama, and gptel to add the ability to have a private AI integrated into your Emacs session. We’ll start with the basics and show people how they can add AI to their workflow safely and securely. Hopefully, people will come away from the talk feeling better about our AI futures. The talk will start with a simple implementation: Ollama and Ollama Buddy and a couple of models. After that it will build on that for the rest of the 20 minutes. The goal is show the users multiple ways of using AI with Emacs and let them make their own choices. About the speaker: AI is everywhere and everyone is trying to figure out how to use it better.  This talk will be a quick introduction to showing some of the tools and techniques that a user can do to integrate AI privately and securely into their Emacs workflow.  The goal is to help people take the first steps on what will hopefully be a productive journey. ## Discussion / notes - Q: Why is the David Bowie question a good one for testing a model? e.g. does it fail in interesting ways? - A:  Big fan, firstly; also Deepseek will tend to have errors and I'm familiar with the data so easy to spot halucinations - A: First off, huge fan of David Bowie. But I came down to it really taught me a few things about how the models work in terms of things like how many kids he had, because Deepseek, which is a very popular Chinese model that a lot of people are using now, misidentifies him having three daughters, and he has like one son and one, one, I think, two sons and a daughter or something like that. so there's differences on that, and it just goes over... there's a whole lot of stuff because his story spans like 60 years, so it gives good feedback. That's the real main reason I asked that question because I just needed one... That sea monkeys, I just picked because it was obscure, and just always have, write, I used to have it write hello world in forth because I thought was an interesting one as well. It's just picking random ones like that. One question I ask a lot of models is, what is the closest star to the Earth? Because most of them will say Alpha Centauri or Proxima Centauri and not the sun. And I have a whole 'nother talk where I just argue with the LLM trying to say, hey, the sun is a star. And he just wouldn't accept it, so. - Q: What specific tasks do you use local AI for? - A: refactoring for example converting python 2 to python 3, cybersecurity researching - A: I like to load a lot of my code into and actually have it do analysis of it. I was actually going through some code I have for some pen testing, and I was having it modified to update it for the newer version, because I hate to say this, but it was written for Python 2, and I needed to update it for Python 3. And the 2 to 3 tool did not do all of it, but the actual tool was able to do the refactoring. It's part of my laziness. But I use that for anything I don't want to hit the web. And that's a lot of stuff when you start thinking about if you're doing cyber security researching. and you have your white papers and stuff like that and stuff in there. I've got a lot of that loaded into RAG in one model on my Open WebUI system. - Q: Have you used any small domain-specific LLMs?  What are the kinds of tasks they specialize in, and how do I find and use them? - A:  On the todo list but not something I have used very much yet - Q: Are the various models updated regularly?  Can you add your own data to pre-built models? +1 - A: - Q-piggy-back: Will the models reach out to the web if they need to for more info? - A: haven't  - Q: What is your experiance with RAG? are you using them and how have they helped? - A: - Q: Thoughts on running things on AWS/digital ocean instances, etc? - A: prefer not to have the data leave home; AWS and DO works okay, oracle has some free offerings but tend to work locally most often - Q: What has your experience been using AI for cyber security applications? What do you usually use it for? - A: Yeah, really, for cybersecurity, what I've had to do is I've dumped logs to have it do correlation. Keep in mind, the size of that Llama file we were using for figuring out David Bowie, writing the hello world, all that stuff, is like six gig. How does it get the entire world in six gig? I still haven't figured that out in terms of quantization. So I'm really interested in seeing the ability to take all this stuff out of all my logs, dump it all in there, and actually be able to do intelligent queries against that. Microsoft has a project called Security Copilot, which is trying to do that in the Cloud. But I want to work on something to do that more locally and be able to actually drive this stuff over that. That's one also on the long-term goals. - Q: Is there a disparity where you go to paid models becouse they are better and what problems would those be? - A: Paid models, I don't mind them. I think they're good, but I don't think they're actually economically sustainable under their current system. Because right now, if you're paying 20 bucks a month for Copilot and that goes up to 200 bucks, I'm not going to be as likely to use it. You know what I mean? But it does do some things in a way that I did not expect. For example, Grok was refactoring some of my code in the comments and dropped an F-bomb. which I did not see coming, but the other code before that I had gotten off GitHub had F bombs in it. So it was just emulating the style, but would that be something I'd want to turn in a pull request? I don't know. But, uh, there's, there's a lot of money going into these AIs and stuff, but in terms of the ability to get a decent one, like the llama, llama 3.2, and load your data into it, you can be pretty competitive. You're not going to get all the benefits, but you have more control over it. So it's a balancing act. - Q:  What's the largest (in parameter size) local model you've been able to successfully run locally, and do you run into issues with limited context window size?  The top tier paid models are up to 200k now. - A: By default, the context size is I think 1024. But I've upped it to 8192 on this box, the Pangolin, because it seems to be, for some reason, it's just a very... working quite well. But the largest ones I've loaded have been in the... have not been that huge. I've loaded this... the last biggest one I've done... That's the reason why I'm planning on breaking down and buying a Ryzen. Actually, I'm going to buy an Intel i285H with 96 gig of RAM. Then I should be able to load a 70 billion parameter model in that. How fast will it run? It's going to run slow as dog, but it's going to be cool to be able to do it. It's an AI bragging rights thing, but I mostly stick with the smaller size models and the ones that are more quantitized because it just tends to work better for me. - Q: Are thre "Free" as in FSF/open source issues with the data? - A: Yes.  Where the data is coming from is a huge issue with AI and will be an issue long term. - A: Yes, where's the data coming from is a huge question with AI. It's astonishing you can ask questions to models that you don't know where it's coming from. That is gonna be one of the big issues long-term. There are people who are working on trying to figure out that stuff, but it's, I mean, if you look at, God, I can't remember who it was. Somebody was actually out torrenting books just to be able to build it into their AI system. I think it might've been Meta. So there's a lot of that going on. The open source of the stuff is going to be tough. There's going to be there's some models like the mobile guys have got their own license, but where they're getting their data from, I'm not sure, so that's a huge question. That's a talk in itself. But yeah, if you train on your RAG and your data, you know what it's come, you know, you have a license that... but the other stuff is just more lines of supplement if you're using a smaller model. - Q:  Have you used local models capable of tool-calling? - A: I'm scared of agentic. I'm going to be a slow adopter of that. I want to do it, but I just don't have the, uh, four decimal fortitude right now to do it. I've had to give me the commands, but I still run the commands by hand. I'm looking into it and it's on once again, it's on that list, but I just, that's a big step for me. - Q: What scares you most about agentic tools? How would you think about putting a sandbox around it if you adopt an agentic workflow? - A: Air-gap; based on experiece in the defense industry - A: In terms of that, I would just control what it's able to talk to, what machines, I would actually have it be air gap. I work for a defense contractor, and we spend a lot of time dealing with air gap systems, because that's just kind of the way it works out for us. So agentic, it's just going to take a while to get trust. I want to see more stuff happening. Humans screw up stuff enough. The last thing we need is to multiply that by 1000. So in terms of that, I would be restricting what it can do. If you look at the capabilities, if I created a user and gave it permissions, I would have a lockdown through sudo, what it's able to do, what the account's able to do. I would do those kind of things, but it's going to be, it's happening. It's just, I'm going to be one of the laggards on that one. So air gap, jail, extremely locked down environments, like we're talking about separate physicals, not Docker. Yeah, hopefully. - Q: Tool calling can be read-only, such as giving models the ability to search the web before answersing your question. (No write access or execute access) I'm interested to know if local models are any good at calling tools, though. - A: Yes, local models can do a lot of that stuff. It's their capabilities. If you load LM studio, you can do a lot of wonderful stuff with that or with Open Web UI with ollama. It's a lot of capabilities. It's amazing. Open Web UI is actually what a lot of companies are using now to put their data behind that. They're curated data and stuff like that. So works well. I can confirm that from my own professional experience. Excellent. - Q: Really interesting stuff, thank you for your talk :) Given that large AI companies are openly stealing IP and copyright, thereby eroding the authority of such law (and eroding truth itself as well), can you see a future where IP & copyright flaw become untenable and what sort of onwards effect might that have? Apologies if this is outside of the scope of your talk - A: I'm not a lawyer, but it is really getting complicated. It is getting to the point, I asked a question from, I played with Sora a little bit, and it generated someone, you can go like, oh, that's Jon Hamm, that's Christopher Walken, you start figuring out who the people they're modeling stuff after. There is an apocalypse, something going to happen right now. There is, but this is once again, my personal opinion, and I'm not a lawyer, and I do not have money. So don't sue me, is there's going to be the current administration tends is very AI, pro AI. And there's very a great deal of lobbying by those groups. And it's on both sides. And it's going to be, it's gonna be interesting to see what happens to copyright the next 510 years. I just don't know how it keeps up without there being some adjustments and stuff. - [https://grothe.us/](https://grothe.us/) <-- speaker's online presence - Thanks for your demo and for encouragement. I'll actually give it a try. - I remember seeing the adverts for sea monkeys in old comic books as a kid -- that was a blast from the past! - Super inspired! And very well done as a live prezi! :)  - respect his commitment to privacy - [https://aws.amazon.com/what-is/retrieval-augmented-generation/](https://aws.amazon.com/what-is/retrieval-augmented-generation/) <- What is RAG?  (an explanation) - File size is not going to be the bottleneck, your RAM is. You're going to need 16 GB of RAM to run the smallest local models and ~512 GB RAM to run the largest ones.  You'll need a GPU with this much memory (VRAM) if you want it to run fast. - A: It also depends upon how your memory is laid out. Like example being the Ultra i285H I plan to buy, that has 96 gig of memory. It's unified between the GPU and the CPU share it, but they go over the same bus. So the overall bandwidth of it tends to be a bit less, but you're able to load more of it into memory. So it's able to do some additional stuff with it as opposed to come off disk. It's all balancing act. If you hit Ziskind's website, that guy's done some great work on it. I'm trying to figure out how big a model you can do, what you can do with it. And some of the stuff seems to be not obvious, because like example, being that MacBook Air, for the five minutes I can run the model, it runs it faster than a lot of other things that should be able to run it faster, just because of the way the ARM cores and the unified memory work on it. So it's a learning process. But if you want to, Network Chuck had a great video talking about building his own system with a couple really powerful Nvidia cards and stuff like that in it. And just actually setting up on his system as a node and using a web UI on it. So there's a lot of stuff there, but it is a process of learning how big your data is, which models you want to use, how much information you need, but it's part of the learning. And you can run models, even on Raspberry Pi 5s, if you want to, they'll run slow. Don't get me wrong, but they're possible. - Great talk/info.   Thanks. - it went very well! - (from the audience perspective) - respect his commitment to privacy - Very interesting talk! Thanks! - AI, you are on notice: we want SBOMs, not f-bombs! - thanks for the presentation [[!inline pages="internal(2025/info/private-ai-after)" raw="yes"]] [[!inline pages="internal(2025/info/private-ai-nav)" raw="yes"]]