WEBVTT NOTE Introduction 00:00:00.000 --> 00:00:04.859 Hey, everybody. Welcome from frigid Omaha, Nebraska. 00:00:04.860 --> 00:00:06.619 I'm just going to kick off my talk here, 00:00:06.620 --> 00:00:23.899 and we'll see how it all goes. Thanks for attending. 00:00:23.900 --> 00:00:26.939 So the slides will be available on my site, https://grothe.us, 00:00:26.940 --> 00:00:29.899 in the presentation section tonight or tomorrow. 00:00:29.900 --> 00:00:33.099 This is a quick intro to one way to do private AI in Emacs. 00:00:33.100 --> 00:00:35.299 There are a lot of other ways to do it. 00:00:35.300 --> 00:00:38.899 This one is really just more or less the easiest way to do it. 00:00:38.900 --> 00:00:40.379 It's a minimal viable product 00:00:40.380 --> 00:00:42.379 to get you an idea of how to get started with it 00:00:42.380 --> 00:00:43.859 and how to give it a spin. 00:00:43.860 --> 00:00:45.819 Really hope some of you give it a shot 00:00:45.820 --> 00:00:48.179 and learn something along the way. NOTE Overview of talk 00:00:48.180 --> 00:00:50.379 So the overview of the talk 00:00:50.380 --> 00:00:54.939 broke down these basic bullet points of why private AI, 00:00:54.940 --> 00:00:58.939 what do I need to do private AI, Emacs and private AI, 00:00:58.940 --> 00:01:02.739 pieces for an AI Emacs solution, 00:01:02.740 --> 00:01:08.059 a demo of a minimal viable product, and the summary. NOTE Why private AI? 00:01:08.060 --> 00:01:10.779 Why private AI? This is pretty simple. 00:01:10.780 --> 00:01:12.099 Just read the terms and conditions 00:01:12.100 --> 00:01:14.819 for any AI system you're currently using. 00:01:14.820 --> 00:01:17.019 If you're using the free tiers, your queries, 00:01:17.020 --> 00:01:18.619 code, uploaded information 00:01:18.620 --> 00:01:20.699 is being used to train the models. 00:01:20.700 --> 00:01:22.939 In some cases, you are giving the company 00:01:22.940 --> 00:01:25.419 a perpetual license to your data. 00:01:25.420 --> 00:01:27.059 You have no control over this, 00:01:27.060 --> 00:01:29.219 except for not using the engine. 00:01:29.220 --> 00:01:30.699 And keep in mind, the terms 00:01:30.700 --> 00:01:32.179 are changing all the time on that, 00:01:32.180 --> 00:01:34.139 and they're not normally changing for our benefit. 00:01:34.140 --> 00:01:38.259 So that's not necessarily a good thing. 00:01:38.260 --> 00:01:40.339 If you're using the paid tiers, 00:01:40.340 --> 00:01:43.459 you may be able to opt out of the data collection. 00:01:43.460 --> 00:01:45.539 But keep in mind, this can change, 00:01:45.540 --> 00:01:48.619 or they may start charging for that option. 00:01:48.620 --> 00:01:51.419 Every AI company wants more and more data. 00:01:51.420 --> 00:01:53.779 They need more and more data to train their models. 00:01:53.780 --> 00:01:56.019 It is just the way it is. 00:01:56.020 --> 00:01:57.899 They need more and more information 00:01:57.900 --> 00:02:00.459 to get it more and more accurate to keep it up to date. 00:02:00.460 --> 00:02:03.219 There's been a story about Stack Overflow. 00:02:03.220 --> 00:02:05.819 It has like half the number of queries they had a year ago 00:02:05.820 --> 00:02:07.379 because people are using AI. 00:02:07.380 --> 00:02:08.579 The problem with that is now 00:02:08.580 --> 00:02:10.379 there's less data going to Stack Overflow 00:02:10.380 --> 00:02:12.979 for the AI to get. Vicious cycle, 00:02:12.980 --> 00:02:14.619 especially when you start looking at 00:02:14.620 --> 00:02:16.579 newer language like Ruby and stuff like that. 00:02:16.580 --> 00:02:21.419 So it comes down to being an interesting time. 00:02:21.420 --> 00:02:24.739 Another reason why to go private AI is your costs are going to vary. 00:02:24.740 --> 00:02:27.019 Right now, these services are being heavily subsidized. 00:02:27.020 --> 00:02:29.419 If you're paying Claude $20 a month, 00:02:29.420 --> 00:02:32.579 it is not costing Claude, those guys, $20 a month 00:02:32.580 --> 00:02:34.099 to host all the infrastructure 00:02:34.100 --> 00:02:35.619 to build all these data centers. 00:02:35.620 --> 00:02:38.779 They are severely subsidizing that 00:02:38.780 --> 00:02:41.259 at a very much a loss right now. 00:02:41.260 --> 00:02:43.659 When they start charging the real costs plus a profit, 00:02:43.660 --> 00:02:45.499 it's going to change. 00:02:45.500 --> 00:02:48.019 Right now, I use a bunch of different services. 00:02:48.020 --> 00:02:50.019 I've played with Grok and a bunch of other ones. 00:02:50.020 --> 00:02:52.459 But Grok right now is like $30 a month 00:02:52.460 --> 00:02:54.139 for a regular Super Grok. 00:02:54.140 --> 00:02:56.419 When they start charging the real cost of that, 00:02:56.420 --> 00:02:59.819 it's going to go from $30 to something a great deal more, 00:02:59.820 --> 00:03:02.379 perhaps, I think, $100 or $200 00:03:02.380 --> 00:03:04.459 or whatever really turns out to be the cost 00:03:04.460 --> 00:03:06.059 when you figure everything into it. 00:03:06.060 --> 00:03:07.539 When you start adding that cost into that, 00:03:07.540 --> 00:03:10.179 a lot of people are using public AI right now 00:03:10.180 --> 00:03:11.899 are going to have no option but to move to private AI 00:03:11.900 --> 00:03:16.019 or give up on AI overall. NOTE What do I need for private AI? 00:03:16.020 --> 00:03:18.659 What do you need to be able to do private AI? 00:03:18.660 --> 00:03:21.179 If you're going to run your own AI, 00:03:21.180 --> 00:03:23.579 you're going to need a system with either some cores, 00:03:23.580 --> 00:03:25.699 a graphics processor unit, 00:03:25.700 --> 00:03:28.339 or a neural processing unit, a GPU or an NPU. 00:03:28.340 --> 00:03:29.819 I currently have four systems 00:03:29.820 --> 00:03:32.979 I'm experimenting with and playing around with on a daily basis. 00:03:32.980 --> 00:03:37.979 I have a System76 Pangolin AMD Ryzen 7 78040U 00:03:37.980 --> 00:03:41.099 with a Radeon 7080M integrated graphics card. 00:03:41.100 --> 00:03:42.539 It's got 32 gigs of RAM. 00:03:42.540 --> 00:03:45.259 It's a beautiful piece of hardware. I really do like it. 00:03:45.260 --> 00:03:46.499 I have my main workstation, 00:03:46.500 --> 00:03:50.579 it's an HP Z620 with dual Intel Xeons 00:03:50.580 --> 00:03:53.179 with four NVIDIA K2200 graphics cards in it. 00:03:53.180 --> 00:03:56.699 Why the four NVIDIA K2200 graphics card on it? 00:03:56.700 --> 00:03:59.739 Because I could buy four of them on eBay for $100 00:03:59.740 --> 00:04:02.379 and it was still supported by the NVIDIA drivers for Debian. 00:04:02.380 --> 00:04:08.179 So that's why that is. A MacBook Air with an M1 processor, 00:04:08.180 --> 00:04:10.939 a very nice piece of kit I picked up a couple years ago, 00:04:10.940 --> 00:04:14.139 very cheap, but it runs AI surprisingly well, 00:04:14.140 --> 00:04:18.099 and an Acer Aspire 1 with an AMD Ryzen 5700H in it. 00:04:18.100 --> 00:04:22.099 This was my old laptop. It was a sturdy beast. 00:04:22.100 --> 00:04:24.379 It was able to do enough AI to do demos and stuff, 00:04:24.380 --> 00:04:25.859 and I liked it quite a bit for that. 00:04:25.860 --> 00:04:28.339 I'm using the Pangolin for this demonstration 00:04:28.340 --> 00:04:30.979 because it's just better. 00:04:30.980 --> 00:04:37.219 Apple's M4 chip has 38 teraflops of MPU performance. 00:04:37.220 --> 00:04:40.099 The Microsoft co-pilots are now requiring 00:04:40.100 --> 00:04:41.459 45 teraflops of MPU 00:04:41.460 --> 00:04:43.939 to be able to have the co-pilot badge on it. 00:04:43.940 --> 00:04:48.299 And Raspberry Pi's new AI top is about 18 teraflops 00:04:48.300 --> 00:04:51.219 and is $70 on top of the cost of Raspberry Pi 5. 00:04:51.220 --> 00:04:56.059 Keep in mind, Raspberry recently 00:04:56.060 --> 00:04:59.499 raised the cost of their Pi 5s because of RAM pricing, 00:04:59.500 --> 00:05:00.379 which is going to be affecting 00:05:00.380 --> 00:05:02.459 a lot of these types of solutions in the near future. 00:05:02.460 --> 00:05:05.299 But there's going to be a lot of 00:05:05.300 --> 00:05:06.699 local power available in the future. 00:05:06.700 --> 00:05:08.219 That's what it really comes down to. 00:05:08.220 --> 00:05:11.179 A lot of people are going to have PCs on their desks. 00:05:11.180 --> 00:05:13.459 They're going to run a decent private AI 00:05:13.460 --> 00:05:16.347 without much issue. NOTE Emacs and private AI 00:05:16.348 --> 00:05:18.059 So for Emacs and private AI, 00:05:18.060 --> 00:05:20.139 there's a couple popular solutions. 00:05:20.140 --> 00:05:22.099 Gptel, which is the one we're going to talk about. 00:05:22.100 --> 00:05:24.739 It's a simple interface. It's a minimal interface. 00:05:24.740 --> 00:05:26.579 It integrates easily into your workflow. 00:05:26.580 --> 00:05:29.019 It's just, quite honestly, chef's kiss, 00:05:29.020 --> 00:05:31.059 just a beautifully well-done piece of software. 00:05:31.060 --> 00:05:33.859 Ollama Buddy has more features, 00:05:33.860 --> 00:05:36.259 a menu interface, has quick access 00:05:36.260 --> 00:05:37.499 for things like code refactoring, 00:05:37.500 --> 00:05:38.979 text-free formatting, et cetera. 00:05:38.980 --> 00:05:41.979 This is the one that you spend a little more time with, 00:05:41.980 --> 00:05:43.939 but you also get a little bit more back from it. 00:05:43.940 --> 00:05:49.419 Ellama is another one, has some really good features to it, 00:05:49.420 --> 00:05:51.059 more different capabilities, 00:05:51.060 --> 00:05:54.979 but it's a different set of rules and capabilities to it. 00:05:54.980 --> 00:05:59.179 Aidermac, which is programming with your AI and Emacs. 00:05:59.180 --> 00:06:01.219 The closest thing I can come up 00:06:01.220 --> 00:06:04.139 to comparing this to is Cursor, except it's in Emacs. 00:06:04.140 --> 00:06:05.659 It's really quite well done. 00:06:05.660 --> 00:06:07.299 These are all really quite well done. 00:06:07.300 --> 00:06:08.499 There's a bunch of other projects out there. 00:06:08.500 --> 00:06:10.819 If you go out to GitHub, type Emacs AI, 00:06:10.820 --> 00:06:13.219 you'll find a lot of different options. NOTE Pieces for an AI Emacs solution 00:06:13.220 --> 00:06:18.459 So what is a minimal viable product that can be done? 00:06:18.460 --> 00:06:23.379 A minimal viable product to show what an AI Emacs solution is 00:06:23.380 --> 00:06:27.179 can be done with only needing two pieces of software. 00:06:27.180 --> 00:06:31.179 Llamafile, this is an amazing piece of software. 00:06:31.180 --> 00:06:32.899 This is a whole LLM contained in one file. 00:06:32.900 --> 00:06:36.059 And the same file runs on Mac OS X, 00:06:36.060 --> 00:06:39.379 Linux, Windows, and the BSDs. 00:06:39.380 --> 00:06:42.179 It's a wonderful piece of kit 00:06:42.180 --> 00:06:44.179 based on these people who created 00:06:44.180 --> 00:06:45.899 this thing called Cosmopolitan 00:06:45.900 --> 00:06:46.779 that lets you create and execute 00:06:46.780 --> 00:06:48.699 while it runs on a bunch of different systems. 00:06:48.700 --> 00:06:51.299 And Gptel, which is an easy plug-in for Emacs, 00:06:51.300 --> 00:06:56.339 which we talked about in the last slide a bit. 00:06:56.340 --> 00:07:00.179 So setting up the LLM, you have to just go out 00:07:00.180 --> 00:07:03.542 and just hit a page for it 00:07:03.543 --> 00:07:05.099 and go out and do a wget of it. 00:07:05.100 --> 00:07:07.099 That's all it takes there. 00:07:07.100 --> 00:07:10.259 Chmodding it so you can actually execute the executable. 00:07:10.260 --> 00:07:12.939 And then just go ahead and actually running it. 00:07:12.940 --> 00:07:16.939 And let's go ahead and do that. 00:07:16.940 --> 00:07:18.899 I've already downloaded it because I don't want to wait. 00:07:18.900 --> 00:07:21.259 And let's just take a look at it. 00:07:21.260 --> 00:07:22.899 I've actually downloaded several of them, 00:07:22.900 --> 00:07:25.699 but let's go ahead and just run llama 3.2-1b 00:07:25.700 --> 00:07:31.179 with the 3 billion instructions. And that's it firing up. 00:07:31.180 --> 00:07:33.899 And it is nice enough to actually be listening in port 8080, 00:07:33.900 --> 00:07:35.339 which we'll need in a minute. 00:07:35.340 --> 00:07:43.139 So once you do that, you have to install gptel and emacs. 00:07:43.140 --> 00:07:45.659 That's as simple as firing up emacs, 00:07:45.660 --> 00:07:48.339 doing the M-x install-package, 00:07:48.340 --> 00:07:49.779 and then just typing gptel, 00:07:49.780 --> 00:07:51.499 if you have your repository set up right, 00:07:51.500 --> 00:07:52.299 which hopefully you do. 00:07:52.300 --> 00:07:56.339 And then you just go ahead and have it. NOTE Config file 00:07:56.340 --> 00:07:58.139 You also have to set up a config file. 00:07:58.140 --> 00:08:01.739 Here's my example config file as it currently set up, 00:08:01.740 --> 00:08:04.019 requiring, ensuring Gptel is loaded, 00:08:04.020 --> 00:08:05.899 defining the Llamafile backend. 00:08:05.900 --> 00:08:07.779 You can put multiple backends into it, 00:08:07.780 --> 00:08:09.859 but I just have the one defined on this example. 00:08:09.860 --> 00:08:12.059 But it's pretty straightforward. 00:08:12.060 --> 00:08:16.739 Llama local file, name for it, stream, protocol HTTP. 00:08:16.740 --> 00:08:20.859 If you have HTTPS set up, that's obviously preferable, 00:08:20.860 --> 00:08:22.779 but a lot of people don't for their home labs. 00:08:22.780 --> 00:08:26.379 Host is just 127.0.0.1 port 8080. 00:08:26.380 --> 00:08:30.099 Keep in mind, some of the AIs run on a different port, 00:08:30.100 --> 00:08:31.499 so you may be 8081 00:08:31.500 --> 00:08:34.619 if you're running OpenWebView at the same time. The key, 00:08:34.620 --> 00:08:37.019 we don't need an API key because it's a local server. 00:08:37.020 --> 00:08:40.259 And the models just, uh, we can put multiple models 00:08:40.260 --> 00:08:41.339 on there if we want to. 00:08:41.340 --> 00:08:43.699 So if we create one with additional stuff 00:08:43.700 --> 00:08:45.379 or like rag and stuff like that, 00:08:45.380 --> 00:08:47.459 we can actually name those models by their domain, 00:08:47.460 --> 00:08:48.699 which is really kind of cool. 00:08:48.700 --> 00:08:52.099 But, uh, that's all that takes. NOTE Demo: Who was David Bowie? 00:08:52.100 --> 00:09:03.779 So let's go ahead and go to a quick test of it. 00:09:03.780 --> 00:09:11.019 Oops. Alt-X, gptel. And we're going to just choose 00:09:11.020 --> 00:09:12.499 the default buffer to make things easier. 00:09:12.500 --> 00:09:15.339 Going to resize it up a bit. 00:09:15.340 --> 00:09:19.859 And usually the go-to question I go to is, who was David Bowie? 00:09:19.860 --> 00:09:24.499 This one is actually a question 00:09:24.500 --> 00:09:26.219 that's turned out to be really good 00:09:26.220 --> 00:09:28.019 for figuring out whether or not AI is complete. 00:09:28.020 --> 00:09:31.139 This is one that some engines do well on, other ones don't. 00:09:31.140 --> 00:09:33.739 And we can just do, we can either do 00:09:33.740 --> 00:09:36.059 the alt X and send the gptel-send, 00:09:36.060 --> 00:09:37.979 or we can just do C-c and hit enter. 00:09:37.980 --> 00:09:39.139 We'll just do C-c and enter. 00:09:39.140 --> 00:09:43.659 And now it's going ahead and hitting our local AI system 00:09:43.660 --> 00:09:46.659 running on port 8080. And that looks pretty good, 00:09:46.660 --> 00:09:50.739 but let's go ahead and say, hey, it's set to terse mode right now. 00:09:50.740 --> 00:10:03.859 Please expand upon this. And there we go. 00:10:03.860 --> 00:10:05.379 We're getting a full description 00:10:05.380 --> 00:10:08.739 of the majority of, uh, about David Bowie's life 00:10:08.740 --> 00:10:10.139 and other information about him. 00:10:10.140 --> 00:10:21.699 So very, very happy with that. NOTE Hallucinations 00:10:21.700 --> 00:10:23.539 One thing to keep in mind is you look at things 00:10:23.540 --> 00:10:24.699 when you're looking for hallucinations, 00:10:24.700 --> 00:10:26.899 how accurate AI is, how it's compressed 00:10:26.900 --> 00:10:29.259 is it will tend to screw up on things like 00:10:29.260 --> 00:10:30.859 how many children he had and stuff like that. 00:10:30.860 --> 00:10:32.459 Let me see if it gets to that real quick. 00:10:32.460 --> 00:10:39.739 Is it not actually on this one? 00:10:39.740 --> 00:10:42.179 Alright, so that's the first question I always ask one. NOTE Next question: What are sea monkeys? 00:10:42.180 --> 00:10:44.659 The next one is what are sea monkeys? 00:10:44.660 --> 00:10:48.979 It gives you an idea of the breadth of the system. 00:10:48.980 --> 00:11:10.619 It's querying right now. Pulls it back correctly. Yes. 00:11:10.620 --> 00:11:12.339 And it's smart enough to actually detect David Bowie 00:11:12.340 --> 00:11:15.019 even referenced see monkeys in the song sea of love, 00:11:15.020 --> 00:11:16.179 which came at hit single. 00:11:16.180 --> 00:11:18.859 So it's actually keeping the context alive 00:11:18.860 --> 00:11:20.419 and that which is very cool feature. 00:11:20.420 --> 00:11:21.459 I did not see that coming. 00:11:21.460 --> 00:11:24.139 Here's one that some people say is a really good one 00:11:24.140 --> 00:11:42.779 to ask. Rs in "strawberry." 00:11:42.780 --> 00:11:46.179 All right, now she's going off the reservation. 00:11:46.180 --> 00:11:48.139 She's going in a different direction. 00:11:48.140 --> 00:11:49.979 Let me go ahead and reopen that again, 00:11:49.980 --> 00:11:57.179 because it went down a bad hole there for a second. NOTE Writing Hello World in Emacs Lisp 00:11:57.180 --> 00:11:58.419 Let me ask it to write hello world in Emacs Lisp. 00:11:58.420 --> 00:12:10.419 Yep, that works. So the point being here, 00:12:10.420 --> 00:12:14.939 that was like two minutes of setup. 00:12:14.940 --> 00:12:18.019 And now we have a small AI embedded inside the system. 00:12:18.020 --> 00:12:20.539 So that gives you an idea just how easy it can be. 00:12:20.540 --> 00:12:22.299 And it's just running locally on the system. 00:12:22.300 --> 00:12:25.259 We also have the default system here as well. 00:12:25.260 --> 00:12:32.579 So not that bad. NOTE Pieces for a better solution 00:12:32.580 --> 00:12:35.379 That's a basic solution, that's a basic setup 00:12:35.380 --> 00:12:37.059 that will get you to the point where you can go like, 00:12:37.060 --> 00:12:39.859 it's a party trick, but it's a very cool party trick. 00:12:39.860 --> 00:12:42.859 The way that Gptel works is it puts it into buffers, 00:12:42.860 --> 00:12:45.099 it doesn't interfere with your flow that much, 00:12:45.100 --> 00:12:47.179 it's just an additional window you can pop open 00:12:47.180 --> 00:12:49.019 to ask questions and get information for, 00:12:49.020 --> 00:12:51.459 dump code into it and have it refactored. 00:12:51.460 --> 00:12:53.339 Gptel has a lot of additional options 00:12:53.340 --> 00:12:55.699 for things that are really cool for that. 00:12:55.700 --> 00:12:57.099 But if you want a better solution, 00:12:57.100 --> 00:12:59.939 I recommend Ollama or LM Studio. 00:12:59.940 --> 00:13:01.899 They're both more capable than Llamafile. 00:13:01.900 --> 00:13:03.859 They can accept a lot of different models. 00:13:03.860 --> 00:13:05.739 You can do things like RAG. 00:13:05.740 --> 00:13:09.219 You can do loading of things onto the GPU more explicitly. 00:13:09.220 --> 00:13:10.379 It can speed stuff up. 00:13:10.380 --> 00:13:13.059 One of the things about the retrieval augmentation is 00:13:13.060 --> 00:13:15.539 it will let you put your data into the system 00:13:15.540 --> 00:13:17.779 so you can start uploading your code, your information, 00:13:17.780 --> 00:13:20.139 and actually being able to do analysis of it. 00:13:20.140 --> 00:13:23.539 Open WebUI provides more capabilities. 00:13:23.540 --> 00:13:24.859 It provides an interface that's similar 00:13:24.860 --> 00:13:25.899 to what you're used to seeing 00:13:25.900 --> 00:13:28.179 for ChatGPT and the other systems. 00:13:28.180 --> 00:13:29.419 It's really quite well done. 00:13:29.420 --> 00:13:32.539 And once again, gptel, I have to mention that 00:13:32.540 --> 00:13:34.779 because that's the one I really kind of like. 00:13:34.780 --> 00:13:36.899 And Ollama Buddy is also another really nice one. NOTE What about the license? 00:13:36.900 --> 00:13:41.019 So what about the licensing of these models? 00:13:41.020 --> 00:13:42.299 Since I'm going out pulling down 00:13:42.300 --> 00:13:43.579 a model and doing this stuff. 00:13:43.580 --> 00:13:46.579 Let's take a look at a couple of highlights 00:13:46.580 --> 00:13:49.379 from the Meta Llama 3 community license scale. 00:13:49.380 --> 00:13:52.579 If your service exceeds 700 million monthly users, 00:13:52.580 --> 00:13:54.099 you need additional licensing. 00:13:54.100 --> 00:13:56.099 Probably not going to be a problem for most of us. 00:13:56.100 --> 00:13:58.379 There's a competition restriction. 00:13:58.380 --> 00:14:00.899 You can't use this model to enhance competing models. 00:14:00.900 --> 00:14:04.219 And there's some limitations on using the Meta trademarks. 00:14:04.220 --> 00:14:05.939 Not that big a deal. 00:14:05.940 --> 00:14:09.139 And the other ones are it's a permissive one 00:14:09.140 --> 00:14:10.939 designed to encourage innovation, 00:14:10.940 --> 00:14:13.779 open development, commercial use is allowed, 00:14:13.780 --> 00:14:15.219 but there are some restrictions on it. 00:14:15.220 --> 00:14:17.259 Yeah, you can modify the model, 00:14:17.260 --> 00:14:20.419 but you have to rely on the license terms. 00:14:20.420 --> 00:14:22.339 And you can distribute the model with derivatives. 00:14:22.340 --> 00:14:24.059 And there are some very cool ones out there. 00:14:24.060 --> 00:14:25.259 There's people who've done things 00:14:25.260 --> 00:14:29.579 to try and make the Llama be less, what's the phrase, 00:14:29.580 --> 00:14:31.939 ethical if you're doing penetration testing research 00:14:31.940 --> 00:14:32.619 and stuff like that. 00:14:32.620 --> 00:14:34.459 It has some very nice value there. 00:14:34.460 --> 00:14:37.739 Keep in mind licenses also vary 00:14:37.740 --> 00:14:39.619 depending on the model you're using. 00:14:39.620 --> 00:14:42.419 Mistral AI has the non-production license. 00:14:42.420 --> 00:14:45.219 It's designed to keep it to research and development. 00:14:45.220 --> 00:14:46.739 You can't use it commercially. 00:14:46.740 --> 00:14:51.792 So it's designed to clearly delineate 00:14:51.793 --> 00:14:52.939 between research and development 00:14:52.940 --> 00:14:54.259 and somebody trying to actually build 00:14:54.260 --> 00:14:56.579 something on top of it. NOTE Are there open source data model options? 00:14:56.580 --> 00:14:57.979 And another question I get asked is, 00:14:57.980 --> 00:14:59.899 are there open source data model options? 00:14:59.900 --> 00:15:02.819 Yeah, but most of them are small or specialized currently. 00:15:02.820 --> 00:15:05.499 MoMo is a whole family of them, 00:15:05.500 --> 00:15:07.339 but there tend to be more specialized, 00:15:07.340 --> 00:15:09.019 but it's very cool to see where it's going. 00:15:09.020 --> 00:15:11.339 And it's another thing that's just going forward. 00:15:11.340 --> 00:15:14.519 It's under the MIT license. NOTE Things to know 00:15:14.520 --> 00:15:15.819 Some things to know to help you 00:15:15.820 --> 00:15:17.499 have a better experience with this. 00:15:17.500 --> 00:15:21.059 Get ollama and Open WebUI working by themselves, 00:15:21.060 --> 00:15:22.659 then set up your config file. 00:15:22.660 --> 00:15:24.819 I was fighting both at the same time, 00:15:24.820 --> 00:15:26.699 and it turned out I had a problem with my ollama. 00:15:26.700 --> 00:15:28.899 I had a conflict, so that was what my problem is. 00:15:28.900 --> 00:15:32.819 Llamafile, gptel is a great way to start experimenting 00:15:32.820 --> 00:15:34.299 just to get you an idea of how it works 00:15:34.300 --> 00:15:36.939 and figure out how the interfaces work. Tremendous. 00:15:36.940 --> 00:15:40.739 RAG loading documents into it is really easy with open web UI. 00:15:40.740 --> 00:15:43.019 You can create models, you can put things like 00:15:43.020 --> 00:15:46.419 help desk developers and stuff like that, breaking it out. 00:15:46.420 --> 00:15:51.019 The Hacker Noon has a how to build a $300 AI computer. 00:15:51.020 --> 00:15:52.859 This is for March 2024, 00:15:52.860 --> 00:15:55.099 but it still has a lot of great information 00:15:55.100 --> 00:15:56.819 on how to benchmark the environments, 00:15:56.820 --> 00:16:01.339 what some values are like the Ryzen 5700U 00:16:01.340 --> 00:16:02.579 inside my Acer Aspire, 00:16:02.580 --> 00:16:04.419 that's where I got the idea doing that. 00:16:04.420 --> 00:16:06.739 Make sure you do the ROCm stuff correctly 00:16:06.740 --> 00:16:09.899 to get the GUI extensions. But it's just really good stuff. 00:16:09.900 --> 00:16:13.059 You don't need a great GPU or CPU to get started. 00:16:13.060 --> 00:16:14.819 Smaller models like tinyllama 00:16:14.820 --> 00:16:16.819 can run on very small systems. 00:16:16.820 --> 00:16:19.042 It gets you the ability to start playing with it 00:16:19.043 --> 00:16:21.619 and start experimenting and figure out if that's for you 00:16:21.620 --> 00:16:23.379 and to move forward with it. 00:16:23.380 --> 00:16:29.219 The AMD Ryzen AI Max+ 395 is a mini PC 00:16:29.220 --> 00:16:31.179 makes it really nice dedicated host. 00:16:31.180 --> 00:16:34.078 You used to be able to buy these for about $1200. 00:16:34.079 --> 00:16:35.579 Now with the RAM price increase, 00:16:35.580 --> 00:16:38.458 you want to get 120 gig when you're pushing two brands, 00:16:38.459 --> 00:16:40.739 so it gets a little tighter. 00:16:40.740 --> 00:16:44.099 Macs work remarkably well with AI. 00:16:44.100 --> 00:16:47.659 My MacBook Air was one of my go-tos for a while, 00:16:47.660 --> 00:16:49.779 but once I started doing anything AI, 00:16:49.780 --> 00:16:50.779 I had a five-minute window 00:16:50.780 --> 00:16:52.619 before the thermal throttling became an issue. 00:16:52.620 --> 00:16:54.619 Keep in mind that's a MacBook Air, 00:16:54.620 --> 00:16:56.659 so it doesn't have the greatest ventilation. 00:16:56.660 --> 00:16:58.339 If you get the MacBook Pros and stuff, 00:16:58.340 --> 00:17:00.139 they tend to have more ventilation, 00:17:00.140 --> 00:17:02.499 but still you're going to be pushing against that. 00:17:02.500 --> 00:17:04.939 So Mac Minis and the Mac Ultras and stuff like that 00:17:04.940 --> 00:17:06.099 tend to work really well for that. 00:17:06.100 --> 00:17:09.779 Alex Ziskind on YouTube has a channel. 00:17:09.780 --> 00:17:11.899 He does a lot of AI performance benchmarking, 00:17:11.900 --> 00:17:14.819 like "I load a 70 billion parameter model 00:17:14.820 --> 00:17:16.699 on this mini PC" and stuff like that. 00:17:16.700 --> 00:17:19.019 It's a lot of fun and interesting stuff there. 00:17:19.020 --> 00:17:21.219 And it's influencing my decision 00:17:21.220 --> 00:17:22.979 to buy my next AI style PC. 00:17:22.980 --> 00:17:27.619 Small domain specific LLMs are happening. 00:17:27.620 --> 00:17:29.939 An LLM that has all your code and information, 00:17:29.940 --> 00:17:31.659 it sounds like a really cool idea. 00:17:31.660 --> 00:17:34.299 It gives you capabilities to start training stuff 00:17:34.300 --> 00:17:35.899 that you couldn't do with like the big ones. 00:17:35.900 --> 00:17:38.059 Even with in terms of fine-tuning and stuff, 00:17:38.060 --> 00:17:40.539 it's remarkable to see where that space is coming along 00:17:40.540 --> 00:17:41.739 in the next year or so. 00:17:41.740 --> 00:17:46.219 HuggingFace.co has pointers to tons of AI models. 00:17:46.220 --> 00:17:48.417 You'll find the one that works for you, hopefully there. 00:17:48.418 --> 00:17:50.539 If you're doing cybersecurity, 00:17:50.540 --> 00:17:52.059 there's a whole bunch out there for that, 00:17:52.060 --> 00:17:54.619 that have certain training on it, information. 00:17:54.620 --> 00:17:56.139 It's really good. 00:17:56.140 --> 00:18:00.099 One last thing to keep in mind is hallucinations are real. 00:18:00.100 --> 00:18:02.779 You will get BS back from the AI occasionally, 00:18:02.780 --> 00:18:05.179 so do validate everything you get from it. 00:18:05.180 --> 00:18:08.459 Don't be using it for court cases like some people have 00:18:08.460 --> 00:18:14.539 and run into those problems. So, That is my talk. 00:18:14.540 --> 00:18:17.219 What I would like you to get out of that is, 00:18:17.220 --> 00:18:21.859 if you haven't tried it, give Gptel and LlamaFile a shot. 00:18:21.860 --> 00:18:23.979 Fire up a little small AI instance, 00:18:23.980 --> 00:18:27.339 play around with a little bit inside your Emacs, 00:18:27.340 --> 00:18:30.139 and see if it makes your life better. Hopefully it will. 00:18:30.140 --> 00:18:32.139 And I really hope you guys 00:18:32.140 --> 00:18:34.659 learned something from this talk. And thanks for listening. 00:18:34.660 --> 00:18:38.979 And the links are at the end of the talk, if you have any questions. 00:18:38.980 --> 00:18:42.739 Let me see if we got anything you want, Pat. You do. 00:18:42.740 --> 00:18:43.899 You've got a few questions. 00:18:43.900 --> 00:18:48.059 [Corwin]: Hey, this is Corwin. Thank you so much. Thank you, Aaron. 00:18:48.060 --> 00:18:50.339 What an awesome talk this was, actually. 00:18:50.340 --> 00:18:52.179 If you don't have a camera, 00:18:52.180 --> 00:18:54.339 I can get away with not having one too. 00:18:54.340 --> 00:18:56.299 [Aaron]: I've got, I'll turn the camera on. 00:18:56.300 --> 00:18:59.833 [Corwin]: Okay. All right. I'll turn mine back on. Here I come. 00:18:59.834 --> 00:19:03.139 Yeah, so there are a few questions, 00:19:03.140 --> 00:19:04.579 but first let me say thank you 00:19:04.580 --> 00:19:06.339 for a really captivating talk. 00:19:06.340 --> 00:19:10.939 I think a lot of people will be empowered from this 00:19:10.940 --> 00:19:15.259 to try to do more with less, especially locally. 00:19:15.260 --> 00:19:20.179 concerned about the data center footprint, 00:19:20.180 --> 00:19:23.659 environmentally concerned 00:19:23.660 --> 00:19:26.979 about the footprint of LLM inside data centers. 00:19:26.980 --> 00:19:28.219 So just thinking about how we can 00:19:28.220 --> 00:19:32.419 put infrastructure we have at home to use 00:19:32.420 --> 00:19:34.019 and get more done with less. 00:19:34.020 --> 00:19:37.499 [Aaron]: Yeah, the data center impact's interesting 00:19:37.500 --> 00:19:39.979 because there was a study a while ago. 00:19:39.980 --> 00:19:42.099 Someone said every time you do a Gemini query, 00:19:42.100 --> 00:19:45.019 it's like boiling a cup of water. 00:19:45.020 --> 00:19:48.619 [Corwin]: Yeah, I've heard that one too. So do you want to, you know, 00:19:48.620 --> 00:19:51.699 I don't know how much direction you want. 00:19:51.700 --> 00:19:53.859 I'd be very happy to read out the questions for you. 00:19:53.860 --> 00:19:55.219 [Aaron]: Yeah, that would be great. 00:19:55.220 --> 00:19:57.619 I'm having trouble getting to that tab. 00:19:57.620 --> 00:20:02.779 [Corwin]: Okay, I'm there, so I'll put it into our chat too, 00:20:02.780 --> 00:20:07.419 so you can follow along if you'd like. NOTE Q: Why is the David Bowie question a good one for testing a model? e.g. does it fail in interesting ways? 00:20:07.420 --> 00:20:11.219 [Corwin]: The first question was, why is the David Bowie question 00:20:11.220 --> 00:20:12.219 a good one to start with? 00:20:12.220 --> 00:20:14.419 Does it have interesting failure conditions 00:20:14.420 --> 00:20:16.639 or what made you choose that? 00:20:16.640 --> 00:20:21.979 [Aaron]: First off, huge fan of David Bowie. 00:20:21.980 --> 00:20:24.499 But I came down to it really taught me a few things 00:20:24.500 --> 00:20:26.299 about how the models work 00:20:26.300 --> 00:20:28.819 in terms of things like how many kids he had, 00:20:28.820 --> 00:20:31.779 because Deepseek, which is a very popular Chinese model 00:20:31.780 --> 00:20:33.179 that a lot of people are using now, 00:20:33.180 --> 00:20:35.619 misidentifies him having three daughters, 00:20:35.620 --> 00:20:38.459 and he has like one son and one, one, I think, 00:20:38.460 --> 00:20:40.899 two sons and a daughter or something like that. 00:20:40.900 --> 00:20:43.659 so there's differences on that, and it just goes over... 00:20:43.660 --> 00:20:45.299 there's a whole lot of stuff 00:20:45.300 --> 00:20:47.779 because his story spans like 60 years, 00:20:47.780 --> 00:20:49.659 so it gives good feedback. 00:20:49.660 --> 00:20:51.539 That's the real main reason I asked that question 00:20:51.540 --> 00:20:53.699 because I just needed one... That sea monkeys, I just picked 00:20:53.700 --> 00:20:56.579 because it was obscure, and just always have, write, 00:20:56.580 --> 00:20:58.939 I used to have it write hello world in forth 00:20:58.940 --> 00:21:01.019 because I thought was an interesting one as well. 00:21:01.020 --> 00:21:03.899 It's just picking random ones like that. 00:21:03.900 --> 00:21:06.499 One question I ask a lot of models is, 00:21:06.500 --> 00:21:09.419 what is the closest star to the Earth? 00:21:09.420 --> 00:21:12.019 Because most of them will say Alpha Centauri 00:21:12.020 --> 00:21:13.739 or Proxima Centauri and not the sun. 00:21:13.740 --> 00:21:15.899 And I have a whole 'nother talk 00:21:15.900 --> 00:21:17.899 where I just argue with the LLM 00:21:17.900 --> 00:21:20.019 trying to say, hey, the sun is a star. 00:21:20.020 --> 00:21:26.579 And he just wouldn't accept it, so. What? 00:21:26.580 --> 00:21:30.739 Oh, I can... You're there. NOTE Q: What specific tasks do you use local AI for? 00:21:30.740 --> 00:21:34.379 [Corwin]: So what specific tasks do you like to use your local AI? 00:21:34.380 --> 00:21:37.459 [Aaron]: I like to load a lot of my code into 00:21:37.460 --> 00:21:39.099 and actually have it do analysis of it. 00:21:39.100 --> 00:21:42.339 I was actually going through some code 00:21:42.340 --> 00:21:45.619 I have for some pen testing, and I was having it modified 00:21:45.620 --> 00:21:47.259 to update it for the newer version, 00:21:47.260 --> 00:21:48.459 because I hate to say this, 00:21:48.460 --> 00:21:49.859 but it was written for Python 2, 00:21:49.860 --> 00:21:51.459 and I needed to update it for Python 3. 00:21:51.460 --> 00:21:53.859 And the 2 to 3 tool did not do all of it, 00:21:53.860 --> 00:21:56.659 but the actual tool was able to do the refactoring. 00:21:56.660 --> 00:21:58.499 It's part of my laziness. 00:21:58.500 --> 00:22:01.459 But I use that for anything I don't want to hit the web. 00:22:01.460 --> 00:22:03.259 And that's a lot of stuff when you start thinking about 00:22:03.260 --> 00:22:04.979 if you're doing cyber security researching. 00:22:04.980 --> 00:22:06.819 and you have your white papers 00:22:06.820 --> 00:22:08.417 and stuff like that and stuff in there. 00:22:08.418 --> 00:22:10.625 I've got a lot of that loaded into RAG 00:22:10.626 --> 00:22:16.879 in one model on my Open WebUI system. NOTE Q: Have you used any small domain-specific LLMs? What are the kinds of tasks they specialize in, and how do I find and use them? 00:22:16.880 --> 00:22:21.059 [Corwin]: Neat. Have you used have you used 00:22:21.060 --> 00:22:25.739 any small domain specific LLMs? What kind of tasks? 00:22:25.740 --> 00:22:30.419 If so, what kind of tasks that they specialize in? 00:22:30.420 --> 00:22:32.139 And you know, how? 00:22:32.140 --> 00:22:34.979 [Aaron]: Not to be honest, but there are some out there like once again, 00:22:34.980 --> 00:22:36.779 for cybersecurity and stuff like that, 00:22:36.780 --> 00:22:39.739 that I really need to dig into that's on my to do list. 00:22:39.740 --> 00:22:41.699 I've got a couple weeks off at the end of the year. 00:22:41.700 --> 00:22:46.539 And that's a big part of my plan for that. NOTE Q: Are the various models updated regularly? Can you add your own data to pre-built models? 00:22:46.540 --> 00:22:49.379 [Corwin]: Are the various models updated pretty regularly? 00:22:49.380 --> 00:22:52.059 Can you add your own data to the pre-built models? 00:22:52.060 --> 00:22:56.699 [Aaron]: Yes. The models are updated pretty reasonably. 00:22:56.700 --> 00:22:59.699 You can add data to a model in a couple of different ways. 00:22:59.700 --> 00:23:01.099 You can do something called fine-tuning, 00:23:01.100 --> 00:23:03.819 which requires a really nice GPU and a lot of CPU time. 00:23:03.820 --> 00:23:05.499 Probably not going to do that. 00:23:05.500 --> 00:23:07.419 You can do retrieval augmentation generation, 00:23:07.420 --> 00:23:09.499 which is you load your data on top of the system 00:23:09.500 --> 00:23:11.299 and put inside a database, 00:23:11.300 --> 00:23:12.859 and you can actually scan that and stuff. 00:23:12.860 --> 00:23:14.619 I have another talk where I go through 00:23:14.620 --> 00:23:16.219 and I start asking questions about, 00:23:16.220 --> 00:23:18.579 I load the talk into the engine 00:23:18.580 --> 00:23:20.099 and I ask questions against that. 00:23:20.100 --> 00:23:22.179 If I would have had time, I would have done that, 00:23:22.180 --> 00:23:25.796 but it comes down to how many... That's RAG. 00:23:25.797 --> 00:23:29.419 RAG is pretty easy to do through Open WebUI or LM studio. 00:23:29.420 --> 00:23:31.419 It's a great way, you just, like, 00:23:31.420 --> 00:23:34.099 point it to a folder and it just sucks all that state into... 00:23:34.100 --> 00:23:35.499 and it'll hit that data first. 00:23:35.500 --> 00:23:36.859 You have like helpdesk and stuff and... 00:23:36.860 --> 00:23:39.619 The other options: there's vector databases, 00:23:39.620 --> 00:23:41.819 which is, like, if you use PostgreSQL, 00:23:41.820 --> 00:23:43.699 it has a pg vector that can do a lot of that stuff. 00:23:43.700 --> 00:23:44.739 I've not dug into that yet, 00:23:44.740 --> 00:23:46.099 but that is also on that to-do list 00:23:46.100 --> 00:23:48.055 I've got a lot of stuff planned for... NOTE Q: What is your experience with RAG? Are you using them and how have they helped? 00:23:48.056 --> 00:23:51.819 [Corwin]: Cool. So what are your experience with RAGs? 00:23:51.820 --> 00:23:54.339 I don't even know what that means. 00:23:54.340 --> 00:23:57.419 Do you know what that means? 00:23:57.420 --> 00:23:59.619 Do you remember this question again? 00:23:59.620 --> 00:24:03.979 What is your experience with RAGs? 00:24:03.980 --> 00:24:07.459 [Aaron]: RAGs is great. That's Retrieval Augmentation Generation. 00:24:07.460 --> 00:24:09.739 That loads your data first, and it hits yours, 00:24:09.740 --> 00:24:11.499 and it'll actually cite it and stuff. 00:24:11.500 --> 00:24:14.659 There's a guy who wrote a RAG in 100 lines of Python, 00:24:14.660 --> 00:24:16.899 and it's an impressive piece of software. 00:24:16.900 --> 00:24:18.779 I think if you hit one of my sites, 00:24:18.780 --> 00:24:22.099 I've got a private AI talk where I actually refer to that. 00:24:22.100 --> 00:24:25.219 But retrieval augmentation, it's easy, it's fast, 00:24:25.220 --> 00:24:26.699 it puts your data into the system, 00:24:26.700 --> 00:24:31.339 Yeah, start with that and go then iterate on top of that. 00:24:31.340 --> 00:24:32.659 That's one of the great things about AI, 00:24:32.660 --> 00:24:33.619 especially private AI, 00:24:33.620 --> 00:24:35.625 is you can do whatever you want to with it 00:24:35.626 --> 00:24:38.833 and build up with it as you get more experience. NOTE Q: Thoughts on running things on AWS/digital ocean instances, etc? 00:24:38.834 --> 00:24:44.219 [Corwin]: Any thoughts on running things 00:24:44.220 --> 00:24:49.179 on AWS, DigitalOcean, and so on? 00:24:49.180 --> 00:24:50.619 [Aaron]: AWS is not bad. 00:24:50.620 --> 00:24:52.659 The DigitalOcean, they have some of their GPUs. 00:24:52.660 --> 00:24:54.379 I still don't like having the data 00:24:54.380 --> 00:24:57.419 leave my house, to be honest, or at work, 00:24:57.420 --> 00:24:59.019 because I tend to do some stuff 00:24:59.020 --> 00:25:01.259 that I don't want it even hitting that situation. 00:25:01.260 --> 00:25:03.699 But they have pretty good stuff. 00:25:03.700 --> 00:25:05.579 Another one to consider is Oracle Cloud. 00:25:05.580 --> 00:25:09.059 Oracle has their AI infrastructure that's really well done. 00:25:09.060 --> 00:25:12.379 But I mean, once again, then you start looking at potential 00:25:12.380 --> 00:25:13.779 is saying your data is private, 00:25:13.780 --> 00:25:14.819 I don't necessarily trust it. 00:25:14.820 --> 00:25:17.859 But they do have good stuff, both DigitalOcean, AWS, 00:25:17.860 --> 00:25:20.339 Oracle Cloud has the free service, which isn't too bad, 00:25:20.340 --> 00:25:21.339 usually a certain number of stuff. 00:25:21.340 --> 00:25:23.179 And Google's also has it, 00:25:23.180 --> 00:25:26.739 but I still tend to keep more stuff on local PCs, 00:25:26.740 --> 00:25:31.077 because I'm just paranoid that way. NOTE Q: What has your experience been using AI for cyber security applications? What do you usually use it for? 00:25:31.078 --> 00:25:35.579 [Corwin]: Gotcha. What has your experience been using AI? 00:25:35.580 --> 00:25:40.139 Do you want to get into that, using AI for cybersecurity? 00:25:40.140 --> 00:25:42.019 You might have already touched on this. 00:25:42.020 --> 00:25:44.379 [Aaron]: Yeah, really, for cybersecurity, 00:25:44.380 --> 00:25:46.259 what I've had to do is I've dumped logs 00:25:46.260 --> 00:25:47.299 to have it do correlation. 00:25:47.300 --> 00:25:49.859 Keep in mind, the size of that Llama file we were using 00:25:49.860 --> 00:25:52.059 for figuring out David Bowie, writing the hello world, 00:25:52.060 --> 00:25:54.179 all that stuff, is like six gig. 00:25:54.180 --> 00:25:56.859 How does it get the entire world in six gig? 00:25:56.860 --> 00:25:59.739 I still haven't figured that out in terms of quantization. 00:25:59.740 --> 00:26:02.499 So I'm really interested in seeing the ability 00:26:02.500 --> 00:26:05.139 to take all this stuff out of all my logs, 00:26:05.140 --> 00:26:06.339 dump it all in there, 00:26:06.340 --> 00:26:08.459 and actually be able to do intelligent queries against that. 00:26:08.460 --> 00:26:10.899 Microsoft has a project called Security Copilot, 00:26:10.900 --> 00:26:12.819 which is trying to do that in the Cloud. 00:26:12.820 --> 00:26:15.299 But I want to work on something to do that more locally 00:26:15.300 --> 00:26:19.019 and be able to actually drive this stuff over that. 00:26:19.020 --> 00:26:24.659 That's one also on the long-term goals. 00:26:24.660 --> 00:26:26.059 [Corwin]: So we got any other questions or? 00:26:26.060 --> 00:26:29.099 Those are the questions that I see. 00:26:29.100 --> 00:26:31.179 I want to just read out a couple of comments 00:26:31.180 --> 00:26:33.419 that I saw in IRC though. 00:26:33.420 --> 00:26:36.699 jrootabaga says, it went very well 00:26:36.700 --> 00:26:39.259 from an audience perspective. 00:26:39.260 --> 00:26:43.619 And GGundam says, respect your commitment to privacy. 00:26:43.620 --> 00:26:45.619 And then somebody is telling us 00:26:45.620 --> 00:26:46.779 we might have skipped a question. 00:26:46.780 --> 00:26:50.019 So I'm just going to run back to my list. 00:26:50.020 --> 00:26:52.819 Updated regularly experience. 00:26:52.820 --> 00:26:57.659 I just didn't type in the answer here's 00:26:57.660 --> 00:26:59.659 and there's a couple more questions coming in so NOTE Q: Is there a disparity where you go to paid models becouse they are better and what problems would those be? 00:26:59.660 --> 00:27:04.699 Is there a disparity where you go to paid models 00:27:04.700 --> 00:27:08.619 because they are better and what problems? 00:27:08.620 --> 00:27:14.019 You know what would drive you to? That's a good question. 00:27:14.020 --> 00:27:17.819 Paid models, I don't mind them. I think they're good, 00:27:17.820 --> 00:27:21.299 but I don't think they're actually economically sustainable 00:27:21.300 --> 00:27:22.659 under their current system. 00:27:22.660 --> 00:27:24.299 Because right now, if you're paying 00:27:24.300 --> 00:27:26.899 20 bucks a month for Copilot and that goes up to 200 bucks, 00:27:26.900 --> 00:27:28.499 I'm not going to be as likely to use it. 00:27:28.500 --> 00:27:29.579 You know what I mean? 00:27:29.580 --> 00:27:33.059 But it does do some things in a way that I did not expect. 00:27:33.060 --> 00:27:35.459 For example, Grok was refactoring 00:27:35.460 --> 00:27:38.019 some of my code in the comments and dropped an F-bomb. 00:27:38.020 --> 00:27:39.979 which I did not see coming, 00:27:39.980 --> 00:27:41.619 but the other code before 00:27:41.620 --> 00:27:43.219 that I had gotten off GitHub 00:27:43.220 --> 00:27:44.059 had F bombs in it. 00:27:44.060 --> 00:27:45.899 So it was just emulating the style, 00:27:45.900 --> 00:27:47.779 but would that be something 00:27:47.780 --> 00:27:49.979 I'd want to turn in a pull request? I don't know. 00:27:49.980 --> 00:27:52.139 But, uh, there's, there's a lot of money 00:27:52.140 --> 00:27:53.899 going into these AIs and stuff, 00:27:53.900 --> 00:27:56.219 but in terms of the ability to get a decent one, 00:27:56.220 --> 00:27:57.979 like the llama, llama 3.2, 00:27:57.980 --> 00:28:01.239 and load your data into it, you can be pretty competitive. 00:28:01.240 --> 00:28:02.792 You're not going to get all the benefits, 00:28:02.793 --> 00:28:04.333 but you have more control over it. 00:28:04.334 --> 00:28:11.000 So it's a balancing act. 00:28:11.001 --> 00:28:14.125 [Corwin]: Okay, and I think I see a couple more questions coming in. NOTE Q: What's the largest (in parameter size) local model you've been able to successfully run locally, and do you run into issues with limited context window size? 00:28:14.126 --> 00:28:19.619 What is the largest parameter size for local models 00:28:19.620 --> 00:28:22.459 that you've been able to successfully run locally 00:28:22.460 --> 00:28:26.059 and do you run into issues with limited context window size? 00:28:26.060 --> 00:28:29.659 The top paid models will tend to have a larger ceiling. 00:28:29.660 --> 00:28:32.859 [Aaron]: Yes, yes, yes, yes, yes. 00:28:32.860 --> 00:28:37.019 By default, the context size is I think 1024. 00:28:37.020 --> 00:28:41.160 But I've upped it to 8192 on this box, the Pangolin, 00:28:41.161 --> 00:28:43.542 because it seems to be, for some reason, 00:28:43.543 --> 00:28:45.208 it's just a very... working quite well. 00:28:45.209 --> 00:28:49.750 But the largest ones I've loaded have been in the... 00:28:49.751 --> 00:28:51.333 have not been that huge. 00:28:51.334 --> 00:28:55.699 I've loaded this... the last biggest one I've done... 00:28:55.700 --> 00:28:57.459 That's the reason why I'm planning 00:28:57.460 --> 00:29:01.339 on breaking down and buying a Ryzen. 00:29:01.340 --> 00:29:03.619 Actually, I'm going to buy 00:29:03.620 --> 00:29:06.979 an Intel i285H with 96 gig of RAM. 00:29:06.980 --> 00:29:08.379 Then I should be able to load 00:29:08.380 --> 00:29:12.059 a 70 billion parameter model in that. How fast will it run? 00:29:12.060 --> 00:29:13.819 It's going to run slow as dog, 00:29:13.820 --> 00:29:15.819 but it's going to be cool to be able to do it. 00:29:15.820 --> 00:29:17.379 It's an AI bragging rights thing, 00:29:17.380 --> 00:29:20.019 but I mostly stick with the smaller size models 00:29:20.020 --> 00:29:22.819 and the ones that are more quantitized 00:29:22.820 --> 00:29:26.619 because it just tends to work better for me. 00:29:26.620 --> 00:29:29.179 [Corwin]: We've still got over 10 minutes before we're cutting away, 00:29:29.180 --> 00:29:30.179 but I'm just anticipating 00:29:30.180 --> 00:29:32.859 that we're going to be going strong at the 10 minute mark. 00:29:32.860 --> 00:29:34.899 So I'm just, just letting, you know, 00:29:34.900 --> 00:29:37.379 we can go as long as we like here at a certain point. 00:29:37.380 --> 00:29:41.059 I may have to jump away and check in with the next speaker, 00:29:41.060 --> 00:29:44.419 but we'll post the entirety of this, 00:29:44.420 --> 00:29:47.979 even if we aren't able to stay with it all. 00:29:47.980 --> 00:29:49.739 Okay. And we've got 10 minutes 00:29:49.740 --> 00:29:52.379 where we're still going to stay live. NOTE Q: Are there "Free" as in FSF/open source issues with the data? 00:29:52.380 --> 00:30:00.139 So next question coming in, I see, are there free as in freedom, 00:30:00.140 --> 00:30:05.739 free as in FSF issues with the data? 00:30:05.740 --> 00:30:11.699 [Aaron]: Yes, where's the data coming from is a huge question with AI. 00:30:11.700 --> 00:30:13.739 It's astonishing you can ask questions 00:30:13.740 --> 00:30:16.899 to models that you don't know where it's coming from. 00:30:16.900 --> 00:30:19.979 That is gonna be one of the big issues long-term. 00:30:19.980 --> 00:30:21.499 There are people who are working 00:30:21.500 --> 00:30:22.979 on trying to figure out that stuff, 00:30:22.980 --> 00:30:25.259 but it's, I mean, if you look at, God, 00:30:25.260 --> 00:30:27.059 I can't remember who it was. 00:30:27.060 --> 00:30:28.659 Somebody was actually out torrenting books 00:30:28.660 --> 00:30:30.939 just to be able to build it into their AI system. 00:30:30.940 --> 00:30:32.339 I think it might've been Meta. 00:30:32.340 --> 00:30:34.819 So there's a lot of that going on. 00:30:34.820 --> 00:30:38.139 The open source of the stuff is going to be tough. 00:30:38.140 --> 00:30:39.459 There's going to be there's some models 00:30:39.460 --> 00:30:41.419 like the mobile guys have got their own license, 00:30:41.420 --> 00:30:42.739 but where they're getting their data from, 00:30:42.740 --> 00:30:45.499 I'm not sure, so that's a huge question. 00:30:45.500 --> 00:30:47.979 That's a talk in itself. 00:30:47.980 --> 00:30:51.979 But yeah, if you train on your RAG and your data, 00:30:51.980 --> 00:30:53.499 you know what it's come, you know, 00:30:53.500 --> 00:30:54.379 you have a license that 00:30:54.380 --> 00:30:55.139 but the other stuff is just 00:30:55.140 --> 00:30:56.739 more lines of supplement 00:30:56.740 --> 00:31:01.379 if you're using a smaller model. 00:31:01.380 --> 00:31:05.419 [Corwin]: The comments online, I see a couple of them. 00:31:05.420 --> 00:31:08.339 I'll read them out in order here. Really interesting stuff. 00:31:08.340 --> 00:31:09.556 Thank you for your talk. NOTE Q: Given that large AI companies are openly stealing IP and copyright, thereby eroding the authority of such law (and eroding truth itself as well), can you see a future where IP & copyright flaw become untenable and what sort of onwards effect might that have? 00:31:09.557 --> 00:31:11.659 Given that large AI companies 00:31:11.660 --> 00:31:14.899 are openly stealing intellectual property and copyright 00:31:14.900 --> 00:31:18.939 and therefore eroding the authority of such laws 00:31:18.940 --> 00:31:21.579 and maybe obscuring the truth itself, 00:31:21.580 --> 00:31:26.579 can you see a future where IP and copyright flaw become untenable? 00:31:26.580 --> 00:31:29.619 I think that's a great question. 00:31:29.620 --> 00:31:34.979 I'm not a lawyer, but it is really getting complicated. 00:31:34.980 --> 00:31:37.859 It is getting to the point, I asked a question from, 00:31:37.860 --> 00:31:41.179 I played with Sora a little bit, and it generated someone, 00:31:41.180 --> 00:31:42.819 you can go like, oh, that's Jon Hamm, 00:31:42.820 --> 00:31:44.099 that's Christopher Walken, 00:31:44.100 --> 00:31:45.379 you start figuring out who the people 00:31:45.380 --> 00:31:47.019 they're modeling stuff after. 00:31:47.020 --> 00:31:48.979 There is an apocalypse, something 00:31:48.980 --> 00:31:52.459 going to happen right now. 00:31:52.460 --> 00:31:53.579 There is, but this is once again, 00:31:53.580 --> 00:31:56.059 my personal opinion, and I'm not a lawyer, 00:31:56.060 --> 00:31:57.459 and I do not have money. 00:31:57.460 --> 00:31:58.859 So don't sue me, is there's going to be 00:31:58.860 --> 00:32:02.899 the current administration tends is very AI, pro AI. 00:32:02.900 --> 00:32:05.499 And there's very a great deal of lobbying by those groups. 00:32:05.500 --> 00:32:07.139 And it's on both sides. 00:32:07.140 --> 00:32:09.699 And it's going to be, it's gonna be interesting to see 00:32:09.700 --> 00:32:11.699 what happens to copyright the next 510 years. 00:32:11.700 --> 00:32:13.339 I just don't know how it keeps up 00:32:13.340 --> 00:32:18.059 without there being some adjustments and stuff. NOTE Comment: File size is not going to be the bottleneck, your RAM is. 00:32:18.060 --> 00:32:20.419 [Corwin]: Okay, and then another comment I saw, 00:32:20.420 --> 00:32:23.219 file size is not going to be a bottleneck. 00:32:23.220 --> 00:32:25.819 RAM is. You'll need 16 gigabytes of RAM 00:32:25.820 --> 00:32:28.259 to run the smallest local models 00:32:28.260 --> 00:32:31.979 and 512 gigabytes of RAM to run the larger ones. 00:32:31.980 --> 00:32:35.059 You'll need a GPU with that much memory 00:32:35.060 --> 00:32:38.318 if you want it to run quickly. 00:32:38.319 --> 00:32:41.259 [Aaron]: Yeah. Oh no. It also depends upon how your memory is laid out. 00:32:41.260 --> 00:32:45.699 Like example being the Ultra i285H 00:32:45.700 --> 00:32:47.899 I plan to buy, that has 96 gig of memory. 00:32:47.900 --> 00:32:50.499 It's unified between the GPU and the CPU share it, 00:32:50.500 --> 00:32:52.739 but they go over the same bus. 00:32:52.740 --> 00:32:55.779 So the overall bandwidth of it tends to be a bit less, 00:32:55.780 --> 00:32:57.579 but you're able to load more of it into memory. 00:32:57.580 --> 00:32:59.419 So it's able to do some additional stuff with it 00:32:59.420 --> 00:33:00.819 as opposed to come off disk. 00:33:00.820 --> 00:33:03.699 It's all balancing act. If you hit Ziskind's website, 00:33:03.700 --> 00:33:05.819 that guy's done some great work on it. 00:33:05.820 --> 00:33:07.499 I'm trying to figure out how big a model you can do, 00:33:07.500 --> 00:33:08.619 what you can do with it. 00:33:08.620 --> 00:33:12.699 And some of the stuff seems to be not obvious, 00:33:12.700 --> 00:33:15.299 because like example, being that MacBook Air, 00:33:15.300 --> 00:33:17.619 for the five minutes I can run the model, 00:33:17.620 --> 00:33:19.379 it runs it faster than a lot of other things 00:33:19.380 --> 00:33:21.339 that should be able to run it faster, 00:33:21.340 --> 00:33:24.619 just because of the way the ARM cores and the unified memory work on it. 00:33:24.620 --> 00:33:26.019 So it's a learning process. 00:33:26.020 --> 00:33:29.579 But if you want to, Network Chuck had a great video 00:33:29.580 --> 00:33:30.939 talking about building his own system 00:33:30.940 --> 00:33:34.379 with a couple really powerful Nvidia cards 00:33:34.380 --> 00:33:35.379 and stuff like that in it. 00:33:35.380 --> 00:33:38.859 And just actually setting up on his system as a node 00:33:38.860 --> 00:33:41.459 and using a web UI on it. So there's a lot of stuff there, 00:33:41.460 --> 00:33:43.899 but it is a process of learning how big your data is, 00:33:43.900 --> 00:33:44.899 which models you want to use, 00:33:44.900 --> 00:33:46.219 how much information you need, 00:33:46.220 --> 00:33:49.579 but it's part of the learning. 00:33:49.580 --> 00:33:52.899 And you can run models, even on Raspberry Pi 5s, 00:33:52.900 --> 00:33:54.499 if you want to, they'll run slow. 00:33:54.500 --> 00:33:59.339 Don't get me wrong, but they're possible. 00:33:59.340 --> 00:34:02.179 [Corwin]: Okay, and I think there's other questions coming in too, 00:34:02.180 --> 00:34:04.019 so I'll just bam for another second. 00:34:04.020 --> 00:34:06.299 We've got about five minutes before we'll, 00:34:06.300 --> 00:34:09.739 before we'll be cutting over, 00:34:09.740 --> 00:34:13.179 but I just want to say in case we get close for time here, 00:34:13.180 --> 00:34:14.859 how much I appreciate your talk. 00:34:14.860 --> 00:34:15.979 This is another one that I'm going to 00:34:15.980 --> 00:34:18.339 have to study after the conference. 00:34:18.340 --> 00:34:21.099 [Aaron]: We greatly appreciate, all of us appreciate 00:34:21.100 --> 00:34:22.459 you guys putting on the conference. 00:34:22.460 --> 00:34:26.299 It's a great conference. It's well done. 00:34:26.300 --> 00:34:28.019 [Corwin]: It's an honor to be on the stage 00:34:28.020 --> 00:34:33.124 with the brains of the project, which is you. 00:34:33.125 --> 00:34:34.699 [Aaron]: So what else we got? Question wise. 00:34:34.700 --> 00:34:46.899 [Corwin]: Okay, so just scanning here. NOTE Q: Have you used local models capable of tool-calling? 00:34:46.900 --> 00:34:50.699 Have you used local models capable of tool calling? 00:34:50.700 --> 00:34:54.779 I'm scared of agentic. 00:34:54.780 --> 00:34:58.739 I'm going to be a slow adopter of that. 00:34:58.740 --> 00:35:02.459 I want to do it, but I just don't have the, uh, 00:35:02.460 --> 00:35:04.339 four decimal fortitude right now to do it. 00:35:04.340 --> 00:35:07.179 I've had to give me the commands, 00:35:07.180 --> 00:35:08.739 but I still run the commands by hand. 00:35:08.740 --> 00:35:10.539 I'm looking into it and it's on once again, 00:35:10.540 --> 00:35:20.899 it's on that list, but I just, that's a big step for me. 00:35:20.900 --> 00:35:23.139 [Corwin]: So. Awesome. All right. 00:35:23.140 --> 00:35:27.179 Well, maybe it's, let me just scroll through 00:35:27.180 --> 00:35:31.539 because we might have missed one question. Oh, I see. 00:35:31.540 --> 00:35:36.899 Here was the piggyback question. 00:35:36.900 --> 00:35:38.419 Now I see the question that I missed. 00:35:38.420 --> 00:35:41.139 So this was piggybacking on the question 00:35:41.140 --> 00:35:44.859 about model updates and adding data. NOTE Q: Will the models reach out to the web if they need to for more info? 00:35:44.860 --> 00:35:46.579 And will models reach out to the web 00:35:46.580 --> 00:35:47.819 if they need more info? 00:35:47.820 --> 00:35:52.479 Or have you worked with any models that work that way? 00:35:52.480 --> 00:35:55.259 [Aaron]: No, I've not seen any models to do that 00:35:55.260 --> 00:35:57.739 There's there was like a group 00:35:57.740 --> 00:35:59.899 working on something like a package updater 00:35:59.900 --> 00:36:02.499 that would do different diffs on it, 00:36:02.500 --> 00:36:03.939 but it's so... Models change so much, 00:36:03.940 --> 00:36:05.739 even who make minor changes and fine-tuning, 00:36:05.740 --> 00:36:07.659 It's hard just to update them in place. 00:36:07.660 --> 00:36:10.099 So I haven't seen one, but that doesn't mean 00:36:10.100 --> 00:36:15.713 they're not out there. Curious topic though. 00:36:15.714 --> 00:36:16.259 [Corwin]: Awesome. 00:36:16.260 --> 00:36:19.539 Well, it's probably pretty good timing. 00:36:19.540 --> 00:36:21.299 Let me just scroll and make sure. 00:36:21.300 --> 00:36:23.499 And of course, before I can say that, 00:36:23.500 --> 00:36:25.899 there's one more question. So let's go ahead and have that. 00:36:25.900 --> 00:36:28.299 I want to make sure while we're still live, though, 00:36:28.300 --> 00:36:31.299 I give you a chance to offer any closing thoughts. NOTE Q: What scares you most about agentic tools? How would you think about putting a sandbox around it if you adopt an agentic workflow? 00:36:31.300 --> 00:36:35.779 So what scares you most about the agentic tools? 00:36:35.780 --> 00:36:38.419 How would you think about putting a sandbox around that 00:36:38.420 --> 00:36:41.619 if you did adopt an agentic workflow? 00:36:41.620 --> 00:36:42.899 [Aaron]: That is a great question. 00:36:42.900 --> 00:36:45.939 In terms of that, I would just control 00:36:45.940 --> 00:36:48.099 what it's able to talk to, what machines, 00:36:48.100 --> 00:36:50.059 I would actually have it be air gap. 00:36:50.060 --> 00:36:52.099 I work for a defense contractor, 00:36:52.100 --> 00:36:53.819 and we spend a lot of time dealing with air gap systems, 00:36:53.820 --> 00:36:55.979 because that's just kind of the way it works out for us. 00:36:55.980 --> 00:36:58.499 So agentic, it's just going to take a while to get trust. 00:36:58.500 --> 00:37:01.059 I want to see more stuff happening. 00:37:01.060 --> 00:37:02.819 Humans screw up stuff enough. 00:37:02.820 --> 00:37:04.819 The last thing we need is to multiply that by 1000. 00:37:04.820 --> 00:37:09.419 So in terms of that, I would be restricting what it can do. 00:37:09.420 --> 00:37:10.859 If you look at the capabilities, 00:37:10.860 --> 00:37:13.579 if I created a user and gave it permissions, 00:37:13.580 --> 00:37:15.299 I would have a lockdown through sudo, 00:37:15.300 --> 00:37:17.379 what it's able to do, what the account's able to do. 00:37:17.380 --> 00:37:18.899 I would do those kind of things, 00:37:18.900 --> 00:37:20.859 but it's going to be, it's happening. 00:37:20.860 --> 00:37:25.819 It's just, I'm going to be one of the laggards on that one. 00:37:25.820 --> 00:37:29.259 So air gap, jail, extremely locked down environments, 00:37:29.260 --> 00:37:34.899 like we're talking about separate physicals, not Docker. 00:37:34.900 --> 00:37:36.577 Yeah, hopefully. NOTE Q: Tool calling can be read-only, such as giving models the ability to search the web before answersing your question. (No write access or execute access) I'm interested to know if local models are any good at calling tools, though. 00:37:36.578 --> 00:37:39.899 [Corwin]: Right, fair. So tool calling can be read-only, 00:37:39.900 --> 00:37:42.539 such as giving models the ability to search the web 00:37:42.540 --> 00:37:43.979 before answering your question, 00:37:43.980 --> 00:37:46.219 you know, write access, execute access. 00:37:46.220 --> 00:37:49.219 I'm interested to know if local models 00:37:49.220 --> 00:37:51.419 are any good at that. 00:37:51.420 --> 00:37:55.579 [Aaron]: Yes, local models can do a lot of that stuff. 00:37:55.580 --> 00:37:56.819 It's their capabilities. 00:37:56.820 --> 00:37:59.019 If you load LM studio, you can do a lot of wonderful stuff 00:37:59.020 --> 00:38:02.419 with that or with Open Web UI with ollama. 00:38:02.420 --> 00:38:05.739 It's a lot of capabilities. It's amazing. 00:38:05.740 --> 00:38:08.139 Open Web UI is actually what a lot of companies are using now 00:38:08.140 --> 00:38:10.259 to put their data behind that. 00:38:10.260 --> 00:38:12.139 They're curated data and stuff like that. So works well. 00:38:12.140 --> 00:38:15.819 I can confirm that from my own professional experience. 00:38:15.820 --> 00:38:16.915 Excellent. 00:38:16.916 --> 00:38:19.659 [Corwin]: Okay, well, our timing should be just perfect 00:38:19.660 --> 00:38:22.659 if you want to give us like a 30-second, 45-second wrap-up. 00:38:22.660 --> 00:38:24.419 Aaron, let me squeeze in mine. 00:38:24.420 --> 00:38:26.779 Thank you again so much for preparing this talk 00:38:26.780 --> 00:38:30.499 and for entertaining all of our questions. 00:38:30.500 --> 00:38:33.299 [Aaron]: Yeah, let me just thank you guys for the conference again. 00:38:33.300 --> 00:38:35.179 This is a great one. I've enjoyed a lot of it. 00:38:35.180 --> 00:38:37.339 I've only had a couple of talks so far, 00:38:37.340 --> 00:38:41.659 but I'm looking forward to hitting the ones after this and tomorrow. NOTE Wrapping up 00:38:41.660 --> 00:38:44.739 But the AI stuff is coming. Get on board. 00:38:44.740 --> 00:38:46.939 Definitely recommend it. If you want to just try it out 00:38:46.940 --> 00:38:48.419 and get a little taste of it, 00:38:48.420 --> 00:38:49.779 what my minimal viable product 00:38:49.780 --> 00:38:51.619 with just Llamafile and gptel 00:38:51.620 --> 00:38:53.139 will get you to the point where you start figuring out. 00:38:53.140 --> 00:38:55.579 Gptel is an amazing thing. It just gets out of your way, 00:38:55.580 --> 00:39:00.459 but it works so well with Emacs's design because 00:39:00.460 --> 00:39:01.699 it doesn't take your hands off the keyboard. 00:39:01.700 --> 00:39:02.499 It's just another buffer, 00:39:02.500 --> 00:39:04.059 and you just put information in there. 00:39:04.060 --> 00:39:06.979 It's quite a wonderful time. 00:39:06.980 --> 00:39:10.501 Let's put that way. That's all I got. 00:39:10.502 --> 00:39:14.339 [Corwin]: Thank you so much for once again, and we've just cut away. 00:39:14.340 --> 00:39:15.779 So I'll stop the recording 00:39:15.780 --> 00:39:18.259 and you're on your own recognizance. 00:39:18.260 --> 00:39:19.699 [Aaron]: Well, I'm gonna punch out 00:39:19.700 --> 00:39:21.059 if anybody has any questions or anything 00:39:21.060 --> 00:39:24.699 my email address is ajgrothe@yahoo.com or at gmail and 00:39:24.700 --> 00:39:26.779 thank you all for attending, 00:39:26.780 --> 00:39:29.939 and thanks again for the conference 00:39:29.940 --> 00:39:32.579 Okay, I'm gonna go ahead and end the room there, thank you. 00:39:32.580 --> 00:39:34.100 Excellent, thanks, bye.