WEBVTT 00:00:00.000 --> 00:00:04.859 Hey, everybody. Welcome from frigid Omaha, Nebraska. 00:00:04.860 --> 00:00:06.619 I'm just going to kick off my talk here, 00:00:06.620 --> 00:00:23.899 and we'll see how it all goes. Thanks for attending. 00:00:23.900 --> 00:00:26.939 So the slides will be available on my site, growthy.us, 00:00:26.940 --> 00:00:29.899 in the presentation section tonight or tomorrow. 00:00:29.900 --> 00:00:33.099 This is a quick intro to one way to do private AI in Emacs. 00:00:33.100 --> 00:00:35.299 There are a lot of other ways to do it. 00:00:35.300 --> 00:00:38.899 This one is really just more or less the easiest way to do it. 00:00:38.900 --> 00:00:40.379 It's a minimal viable product 00:00:40.380 --> 00:00:42.379 to get you an idea of how to get started with it 00:00:42.380 --> 00:00:43.859 and how to give it a spin. 00:00:43.860 --> 00:00:45.819 Really hope some of you give it a shot 00:00:45.820 --> 00:00:48.179 and learn something along the way. 00:00:48.180 --> 00:00:50.379 So the overview of the talk. 00:00:50.380 --> 00:00:54.939 broke down these basic bullet points of why private AI, 00:00:54.940 --> 00:00:58.939 what do I need to do private AI, Emacs and private AI, 00:00:58.940 --> 00:01:02.739 pieces for an AI Emacs solution, 00:01:02.740 --> 00:01:08.059 a demo of a minimal viable product, and the summary. 00:01:08.060 --> 00:01:10.779 Why private AI? This is pretty simple. 00:01:10.780 --> 00:01:12.099 Just read the terms and conditions 00:01:12.100 --> 00:01:14.819 for any AI system you're currently using. 00:01:14.820 --> 00:01:17.019 If you're using the free tiers, your queries, 00:01:17.020 --> 00:01:18.619 code uploaded information 00:01:18.620 --> 00:01:20.699 is being used to train the models. 00:01:20.700 --> 00:01:22.939 In some cases, you are giving the company 00:01:22.940 --> 00:01:25.419 a perpetual license to your data. 00:01:25.420 --> 00:01:27.059 You have no control over this, 00:01:27.060 --> 00:01:29.219 except for not using the engine. 00:01:29.220 --> 00:01:30.699 And keep in mind, the terms 00:01:30.700 --> 00:01:32.179 are changing all the time on that, 00:01:32.180 --> 00:01:34.139 and they're not normally changing for our benefit. 00:01:34.140 --> 00:01:38.259 So that's not necessarily a good thing. 00:01:38.260 --> 00:01:40.339 If you're using the paid tiers, 00:01:40.340 --> 00:01:43.459 you may be able to opt out of the data collection. 00:01:43.460 --> 00:01:45.539 But keep in mind, this can change, 00:01:45.540 --> 00:01:48.619 or they may start charging for that option. 00:01:48.620 --> 00:01:51.419 Every AI company wants more and more data. 00:01:51.420 --> 00:01:53.779 They need more and more data to train their models. 00:01:53.780 --> 00:01:56.019 It is just the way it is. 00:01:56.020 --> 00:01:57.899 They need more and more information 00:01:57.900 --> 00:02:00.459 to get it more and more accurate to keep it up to date. 00:02:00.460 --> 00:02:03.219 There's been a story about Stack Overflow. 00:02:03.220 --> 00:02:05.819 It has like half the number of queries they had a year ago 00:02:05.820 --> 00:02:07.379 because people are using AI. 00:02:07.380 --> 00:02:08.579 The problem with that is now 00:02:08.580 --> 00:02:10.379 there's less data going to Stack Overflow 00:02:10.380 --> 00:02:12.979 for the AI to get. vicious cycle, 00:02:12.980 --> 00:02:14.619 especially when you start looking at 00:02:14.620 --> 00:02:16.579 newer language like Ruby and stuff like that. 00:02:16.580 --> 00:02:21.419 So it comes down to being an interesting time. 00:02:21.420 --> 00:02:24.739 Another reason why to go private AI is your costs are going to vary. 00:02:24.740 --> 00:02:27.019 Right now, these services are being heavily subsidized. 00:02:27.020 --> 00:02:29.419 If you're paying Claude $20 a month, 00:02:29.420 --> 00:02:32.579 it is not costing Claude, those guys $20 a month 00:02:32.580 --> 00:02:34.099 to host all the infrastructure 00:02:34.100 --> 00:02:35.619 to build all these data centers. 00:02:35.620 --> 00:02:38.779 They are severely subsidizing that 00:02:38.780 --> 00:02:41.259 at a very much a loss right now. 00:02:41.260 --> 00:02:43.659 When they start charging the real costs plus a profit, 00:02:43.660 --> 00:02:45.499 it's going to change. 00:02:45.500 --> 00:02:48.019 Right now, I use a bunch of different services. 00:02:48.020 --> 00:02:50.019 I've played with Grok and a bunch of other ones. 00:02:50.020 --> 00:02:52.459 But Grok right now is like $30 a month 00:02:52.460 --> 00:02:54.139 for a regular Super Grok. 00:02:54.140 --> 00:02:56.419 When they start charging the real cost of that, 00:02:56.420 --> 00:02:59.819 it's going to go from $30 to something a great deal more, 00:02:59.820 --> 00:03:02.379 perhaps, I think, $100 or $200 00:03:02.380 --> 00:03:04.459 or whatever really turns out to be the cost 00:03:04.460 --> 00:03:06.059 when you figure everything into it. 00:03:06.060 --> 00:03:07.539 When you start adding that cost into that, 00:03:07.540 --> 00:03:10.179 a lot of people are using public AI right now 00:03:10.180 --> 00:03:11.899 are going to have no option but to move to private AI 00:03:11.900 --> 00:03:16.019 or give up on AI overall. 00:03:16.020 --> 00:03:18.659 What do you need to be able to do private AI? 00:03:18.660 --> 00:03:21.179 If you're going to run your own AI, 00:03:21.180 --> 00:03:23.579 you're going to need a system with either some cores, 00:03:23.580 --> 00:03:25.699 a graphics processor unit, 00:03:25.700 --> 00:03:28.339 or a neural processing unit, a GPU or an NPU. 00:03:28.340 --> 00:03:29.819 I currently have four systems 00:03:29.820 --> 00:03:32.979 I'm experimenting with and playing around with on a daily basis. 00:03:32.980 --> 00:03:37.979 I have a System76 Pangolin AMD Ryzen 7 78040U 00:03:37.980 --> 00:03:41.099 with a Radeon 7080M integrated graphics card. 00:03:41.100 --> 00:03:42.539 It's got 32 gigs of RAM. 00:03:42.540 --> 00:03:45.259 It's a beautiful piece of hardware. I really do like it. 00:03:45.260 --> 00:03:46.499 I have my main workstation, 00:03:46.500 --> 00:03:50.579 it's an HP Z620 with dual Intel Xeons 00:03:50.580 --> 00:03:53.179 with four NVIDIA K2200 graphics cards in it. 00:03:53.180 --> 00:03:56.699 Why the four NVIDIA K2200 graphics card on it? 00:03:56.700 --> 00:03:59.739 Because I could buy four of them on eBay for $100 00:03:59.740 --> 00:04:02.379 and it was still supported by the NVIDIA drivers for Debian. 00:04:02.380 --> 00:04:08.179 So that's why that is. A MacBook Air with an M1 processor, 00:04:08.180 --> 00:04:10.939 a very nice piece of kit I picked up a couple years ago, 00:04:10.940 --> 00:04:14.139 very cheap, but it runs AI surprisingly well, 00:04:14.140 --> 00:04:18.099 and an Acer Aspire 1 with an AMD Ryzen 5700H in it. 00:04:18.100 --> 00:04:22.099 This was my old laptop. It was a sturdy beast. 00:04:22.100 --> 00:04:24.379 It was able to do enough AI to do demos and stuff, 00:04:24.380 --> 00:04:25.859 and I liked it quite a bit for that. 00:04:25.860 --> 00:04:28.339 I'm using the Pangolin for this demonstration 00:04:28.340 --> 00:04:30.979 because it's just better. 00:04:30.980 --> 00:04:37.219 Apple's M4 chip has 38 teraflops of MPU performance. 00:04:37.220 --> 00:04:40.099 The Microsoft co-pilots are now requiring 00:04:40.100 --> 00:04:41.459 45 teraflops of MPU 00:04:41.460 --> 00:04:43.939 to be able to have the co-pilot badge on it. 00:04:43.940 --> 00:04:48.299 And Raspberry Pi's new AI top is about 18 teraflops 00:04:48.300 --> 00:04:51.219 and is $70 on top of the cost of Raspberry Pi 5. 00:04:51.220 --> 00:04:56.059 Keep in mind Raspberry recently 00:04:56.060 --> 00:04:59.499 raised the cost of their Pi 5s because of RAM pricing, 00:04:59.500 --> 00:05:00.379 which is going to be affecting 00:05:00.380 --> 00:05:02.459 a lot of these types of solutions in the near future. 00:05:02.460 --> 00:05:05.299 But there's going to be a lot of 00:05:05.300 --> 00:05:06.699 local power available in the future. 00:05:06.700 --> 00:05:08.219 That's what it really comes down to. 00:05:08.220 --> 00:05:11.179 A lot of people are going to have PCs on their desks. 00:05:11.180 --> 00:05:13.459 They're going to run a decent private AI 00:05:13.460 --> 00:05:18.059 without much issue. So for Emacs and private AI, 00:05:18.060 --> 00:05:20.139 there's a couple popular solutions. 00:05:20.140 --> 00:05:22.099 Gptel, which is the one we're going to talk about. 00:05:22.100 --> 00:05:24.739 It's a simple interface. It's a minimal interface. 00:05:24.740 --> 00:05:26.579 It integrates easily into your workflow. 00:05:26.580 --> 00:05:29.019 It's just, quite honestly, chef's kiss, 00:05:29.020 --> 00:05:31.059 just a beautifully well-done piece of software. 00:05:31.060 --> 00:05:33.859 OllamaBuddy has more features, 00:05:33.860 --> 00:05:36.259 a menu interface, has quick access 00:05:36.260 --> 00:05:37.499 for things like code refactoring, 00:05:37.500 --> 00:05:38.979 text-free formatting, et cetera. 00:05:38.980 --> 00:05:41.979 This is the one that you spend a little more time with, 00:05:41.980 --> 00:05:43.939 but you also get a little bit more back from it. 00:05:43.940 --> 00:05:49.419 Elama is another one, has some really good features to it, 00:05:49.420 --> 00:05:51.059 more different capabilities, 00:05:51.060 --> 00:05:54.979 but it's a different set of rules and capabilities to it. 00:05:54.980 --> 00:05:59.179 Itermac, which is programming with your AI and Emacs. 00:05:59.180 --> 00:06:01.219 The closest thing I can come up 00:06:01.220 --> 00:06:04.139 to comparing this to is Cursor, except it's an Emacs. 00:06:04.140 --> 00:06:05.659 It's really quite well done. 00:06:05.660 --> 00:06:07.299 These are all really quite well done. 00:06:07.300 --> 00:06:08.499 There's a bunch of other projects out there. 00:06:08.500 --> 00:06:10.819 If you go out to GitHub, type Emacs AI, 00:06:10.820 --> 00:06:13.219 you'll find a lot of different options. 00:06:13.220 --> 00:06:18.459 So what is a minimal viable product that can be done? 00:06:18.460 --> 00:06:23.379 A minimal viable product to show what an AI Emacs solution is 00:06:23.380 --> 00:06:27.179 can be done with only needing two pieces of software. 00:06:27.180 --> 00:06:31.179 Llamafile, this is an amazing piece of software. 00:06:31.180 --> 00:06:32.899 This is a whole LLM contained in one file. 00:06:32.900 --> 00:06:36.059 And the same file runs on Mac OS X, 00:06:36.060 --> 00:06:39.379 Linux, Windows, and the BSDs. 00:06:39.380 --> 00:06:42.179 It's a wonderful piece of kit 00:06:42.180 --> 00:06:44.179 based on these people who created 00:06:44.180 --> 00:06:45.899 this thing called Cosmopolitan 00:06:45.900 --> 00:06:46.779 that lets you create and execute 00:06:46.780 --> 00:06:48.699 while it runs on a bunch of different systems. 00:06:48.700 --> 00:06:51.299 And Gptel, which is an easy plug-in for Emacs, 00:06:51.300 --> 00:06:54.979 which we talked about in the last slide a bit. 00:06:54.980 --> 00:07:00.179 So setting up the LLM, you have to just go out 00:07:00.180 --> 00:07:01.699 and just hit the a page for it 00:07:01.700 --> 00:07:05.099 and go out and do a wget of it. 00:07:05.100 --> 00:07:07.099 That's all it takes there. 00:07:07.100 --> 00:07:10.259 Chmodding it so you can actually execute the executable. 00:07:10.260 --> 00:07:12.939 And then just go ahead and actually running it. 00:07:12.940 --> 00:07:16.939 And let's go ahead and do that. 00:07:16.940 --> 00:07:18.899 I've already downloaded it because I don't want to wait. 00:07:18.900 --> 00:07:21.259 And let's just take a look at it. 00:07:21.260 --> 00:07:22.899 I've actually downloaded several of them, 00:07:22.900 --> 00:07:25.699 but let's go ahead and just run lava 3.2b 00:07:25.700 --> 00:07:31.179 with the 3 billion instructions. And that's it firing up. 00:07:31.180 --> 00:07:33.899 And it is nice enough to actually be listening in port 8080, 00:07:33.900 --> 00:07:35.339 which we'll need in a minute. 00:07:35.340 --> 00:07:43.139 So once you do that, you have to install gptel and emacs. 00:07:43.140 --> 00:07:45.659 That's as simple as firing up emacs, 00:07:45.660 --> 00:07:48.339 doing the meta x install package, 00:07:48.340 --> 00:07:49.779 and then just typing gptel 00:07:49.780 --> 00:07:51.499 if you have your repository set up right, 00:07:51.500 --> 00:07:52.299 which hopefully you do. 00:07:52.300 --> 00:07:54.499 And then you just go ahead and have it. 00:07:54.500 --> 00:07:58.139 You also have to set up a config file. 00:07:58.140 --> 00:08:01.739 Here's my example config file as it currently set up, 00:08:01.740 --> 00:08:04.019 requiring ensuring Gptel is loaded, 00:08:04.020 --> 00:08:05.899 defining the Llamafile backend. 00:08:05.900 --> 00:08:07.779 You can put multiple backends into it, 00:08:07.780 --> 00:08:09.859 but I just have the one defined on this example. 00:08:09.860 --> 00:08:12.059 But it's pretty straightforward. 00:08:12.060 --> 00:08:16.739 Llama local file, name for it, stream, protocol HTTP. 00:08:16.740 --> 00:08:20.859 If you have HTTPS set up, that's obviously preferable, 00:08:20.860 --> 00:08:22.779 but a lot of people don't for their home labs. 00:08:22.780 --> 00:08:26.379 Host is just 127.0.0.1 port 8080. 00:08:26.380 --> 00:08:30.099 Keep in mind, some of the AIs run on a different port, 00:08:30.100 --> 00:08:31.499 so you may be 8081 00:08:31.500 --> 00:08:34.619 if you're running OpenWebView at the same time. The key, 00:08:34.620 --> 00:08:37.019 we don't need an API key because it's a local server. 00:08:37.020 --> 00:08:40.259 And the models just, uh, we can put multiple models 00:08:40.260 --> 00:08:41.339 on there if we want to. 00:08:41.340 --> 00:08:43.699 So if we create one with additional stuff 00:08:43.700 --> 00:08:45.379 or like rag and stuff like that, 00:08:45.380 --> 00:08:47.459 we can actually name those models by their domain, 00:08:47.460 --> 00:08:48.699 which is really kind of cool. 00:08:48.700 --> 00:08:52.099 But, uh, that's all that takes. 00:08:52.100 --> 00:09:03.779 So let's go ahead and go to a quick test of it. 00:09:03.780 --> 00:09:11.019 Oops. Alt-X, gptel. And we're going to just choose 00:09:11.020 --> 00:09:12.499 the default buffer to make things easier. 00:09:12.500 --> 00:09:15.339 Going to resize it up a bit. 00:09:15.340 --> 00:09:19.859 And usually the go-to question I go to is, who was David Bowie? 00:09:19.860 --> 00:09:24.499 This one is actually a question 00:09:24.500 --> 00:09:26.219 that's turned out to be really good 00:09:26.220 --> 00:09:28.019 for figuring out whether or not AI is complete. 00:09:28.020 --> 00:09:31.139 This is one that some engines do well on, other ones don't. 00:09:31.140 --> 00:09:33.739 And we can just do, we can either do 00:09:33.740 --> 00:09:36.059 the alt X and send the gptel-send, 00:09:36.060 --> 00:09:37.979 or we can just do control C and hit enter. 00:09:37.980 --> 00:09:39.139 We'll just do control C and enter. 00:09:39.140 --> 00:09:43.659 And now it's going ahead and hitting our local AI system 00:09:43.660 --> 00:09:46.659 running on port 8080. And that looks pretty good, 00:09:46.660 --> 00:09:50.739 but let's go ahead and say, hey, it's set to terse mode right now. 00:09:50.740 --> 00:10:03.859 Please expand upon this. And there we go. 00:10:03.860 --> 00:10:05.379 We're getting a full description 00:10:05.380 --> 00:10:08.739 of the majority of, uh, about David Bowie's life 00:10:08.740 --> 00:10:10.139 and other information about him. 00:10:10.140 --> 00:10:21.699 So very, very happy with that. 00:10:21.700 --> 00:10:23.539 One thing to keep in mind is you look at things 00:10:23.540 --> 00:10:24.699 when you're looking for hallucinations, 00:10:24.700 --> 00:10:26.899 how accurate AI is, how it's compressed 00:10:26.900 --> 00:10:29.259 is it will tend to screw up on things like 00:10:29.260 --> 00:10:30.859 how many children he had and stuff like that. 00:10:30.860 --> 00:10:32.459 Let me see if it gets to that real quick. 00:10:32.460 --> 00:10:39.739 Is it not actually on this one? 00:10:39.740 --> 00:10:42.179 Alright, so that's the first question I always ask one. 00:10:42.180 --> 00:10:44.659 The next one is what are sea monkeys? 00:10:44.660 --> 00:10:48.979 It gives you an idea of the breadth of the system. 00:10:48.980 --> 00:11:10.619 It's querying right now. Pulls it back correctly. Yes. 00:11:10.620 --> 00:11:12.339 And it's smart enough to actually detect David Bowie 00:11:12.340 --> 00:11:15.019 even referenced see monkeys in the song sea of love, 00:11:15.020 --> 00:11:16.179 which came at hit single. 00:11:16.180 --> 00:11:18.859 So it's actually keeping the context alive 00:11:18.860 --> 00:11:20.419 and that which is very cool feature. 00:11:20.420 --> 00:11:21.459 I did not see that coming. 00:11:21.460 --> 00:11:24.139 Here's one that some people say is a really good one 00:11:24.140 --> 00:11:25.739 to ask ours in strawberry. 00:11:25.740 --> 00:11:46.179 All right, now she's going off the reservation. 00:11:46.180 --> 00:11:48.139 She's going in a different direction. 00:11:48.140 --> 00:11:49.979 Let me go ahead and reopen that again, 00:11:49.980 --> 00:11:52.979 because it's went down a bad hole there for a second. 00:11:52.980 --> 00:11:58.419 Let me ask it to do write hello world in Emacs list. 00:11:58.420 --> 00:12:10.419 Yep, that works. So the point being here, 00:12:10.420 --> 00:12:14.939 that was like two minutes of setup. 00:12:14.940 --> 00:12:18.019 And now we have a small AI embedded inside the system. 00:12:18.020 --> 00:12:20.539 So that gives you an idea just how easy it can be. 00:12:20.540 --> 00:12:22.299 And it's just running locally on the system. 00:12:22.300 --> 00:12:25.259 We also have the default system here as well. 00:12:25.260 --> 00:12:32.579 So not that bad. 00:12:32.580 --> 00:12:35.379 That's a basic solution, that's a basic setup 00:12:35.380 --> 00:12:37.059 that will get you to the point where you can go like, 00:12:37.060 --> 00:12:39.859 it's a party trick, but it's a very cool party trick. 00:12:39.860 --> 00:12:42.859 The way that Gptel works is it puts it into buffers, 00:12:42.860 --> 00:12:45.099 it doesn't interfere with your flow that much, 00:12:45.100 --> 00:12:47.179 it's just an additional window you can pop open 00:12:47.180 --> 00:12:49.019 to ask questions and get information for, 00:12:49.020 --> 00:12:51.459 dump code into it and have it refactored. 00:12:51.460 --> 00:12:53.339 Gptel has a lot of additional options 00:12:53.340 --> 00:12:55.699 for things that are really cool for that. 00:12:55.700 --> 00:12:57.099 But if you want a better solution, 00:12:57.100 --> 00:12:59.939 I recommend Ollama or LM Studio. 00:12:59.940 --> 00:13:01.899 They're both more capable than llama file. 00:13:01.900 --> 00:13:03.859 They can accept a lot of different models. 00:13:03.860 --> 00:13:05.739 You can do things like RAG. 00:13:05.740 --> 00:13:09.219 You can do loading of things onto the GPU more explicitly. 00:13:09.220 --> 00:13:10.379 It can speed stuff up. 00:13:10.380 --> 00:13:13.059 One of the things about the retrieval augmentation is 00:13:13.060 --> 00:13:15.539 it will let you put your data into the system 00:13:15.540 --> 00:13:17.779 so you can start uploading your code, your information, 00:13:17.780 --> 00:13:20.139 and actually being able to do analysis of it. 00:13:20.140 --> 00:13:23.539 OpenWebUI provides more capabilities. 00:13:23.540 --> 00:13:24.859 It provides an interface that's similar 00:13:24.860 --> 00:13:25.899 to what you're used to seeing 00:13:25.900 --> 00:13:28.179 for chat, GPT, and the other systems. 00:13:28.180 --> 00:13:29.419 It's really quite well done. 00:13:29.420 --> 00:13:32.539 And once again, gptel, I have to mention that 00:13:32.540 --> 00:13:34.779 because that's the one I really kind of like. 00:13:34.780 --> 00:13:36.899 And OlamaBuddy is also another really nice one. 00:13:36.900 --> 00:13:41.019 So what about the licensing of these models? 00:13:41.020 --> 00:13:42.299 Since I'm going out pulling down 00:13:42.300 --> 00:13:43.579 a model and doing this stuff. 00:13:43.580 --> 00:13:46.579 Let's take a look at a couple of highlights 00:13:46.580 --> 00:13:49.379 from the MetaLlama 3 community license scale. 00:13:49.380 --> 00:13:52.579 If your service exceeds 700 million monthly users, 00:13:52.580 --> 00:13:54.099 you need additional licensing. 00:13:54.100 --> 00:13:56.099 Probably not going to be a problem for most of us. 00:13:56.100 --> 00:13:58.379 There's a competition restriction. 00:13:58.380 --> 00:14:00.899 You can't use this model to enhance competing models. 00:14:00.900 --> 00:14:04.219 And there's some limitations on using the Meta trademarks. 00:14:04.220 --> 00:14:05.939 Not that big a deal. 00:14:05.940 --> 00:14:09.139 And the other ones are it's a permissive one 00:14:09.140 --> 00:14:10.939 designed to encourage innovation, 00:14:10.940 --> 00:14:13.779 open development, commercial use is allowed, 00:14:13.780 --> 00:14:15.219 but there are some restrictions on it. 00:14:15.220 --> 00:14:17.259 Yeah, you can modify the model, 00:14:17.260 --> 00:14:20.419 but you have to rely on the license terms. 00:14:20.420 --> 00:14:22.339 And you can distribute the model with derivatives. 00:14:22.340 --> 00:14:24.059 And there are some very cool ones out there. 00:14:24.060 --> 00:14:25.259 There's people who've done things 00:14:25.260 --> 00:14:29.579 to try and make the llama bee less, what's the phrase, 00:14:29.580 --> 00:14:31.939 ethical if you're doing penetration testing research 00:14:31.940 --> 00:14:32.619 and stuff like that. 00:14:32.620 --> 00:14:34.459 It has some very nice value there. 00:14:34.460 --> 00:14:37.739 Keep in mind licenses also vary 00:14:37.740 --> 00:14:39.619 depending on the model you're using. 00:14:39.620 --> 00:14:42.419 Mistral AI has the non-production license. 00:14:42.420 --> 00:14:45.219 It's designed to keep it to research and development. 00:14:45.220 --> 00:14:46.739 You can't use it commercially. 00:14:46.740 --> 00:14:50.419 So it's designed to clearly delineate 00:14:50.420 --> 00:14:52.939 between research and development 00:14:52.940 --> 00:14:54.259 and somebody trying to actually build 00:14:54.260 --> 00:14:55.379 something on top of it. 00:14:55.380 --> 00:14:57.979 And another question I get asked is, 00:14:57.980 --> 00:14:59.899 are there open source data model options? 00:14:59.900 --> 00:15:02.819 Yeah, but most of them are small or specialized currently. 00:15:02.820 --> 00:15:05.499 MoMo is a whole family of them, 00:15:05.500 --> 00:15:07.339 but there tend to be more specialized, 00:15:07.340 --> 00:15:09.019 but it's very cool to see where it's going. 00:15:09.020 --> 00:15:11.339 And it's another thing that's just going forward. 00:15:11.340 --> 00:15:13.379 It's under the MIT license. 00:15:13.380 --> 00:15:15.819 Some things to know to help you 00:15:15.820 --> 00:15:17.499 have a better experience with this. 00:15:17.500 --> 00:15:21.059 Get a Llama and OpenWebUI working by themselves, 00:15:21.060 --> 00:15:22.659 then set up your config file. 00:15:22.660 --> 00:15:24.819 I was fighting both at the same time, 00:15:24.820 --> 00:15:26.699 and it turned out I had a problem with my LLAMA. 00:15:26.700 --> 00:15:28.899 I had a conflict, so that was what my problem is. 00:15:28.900 --> 00:15:32.819 Llamafile, gptel is a great way to start experimenting 00:15:32.820 --> 00:15:34.299 just to get you an idea of how it works 00:15:34.300 --> 00:15:36.939 and figure out how the interfaces work. Tremendous. 00:15:36.940 --> 00:15:40.739 RAG loading documents into it is really easy with open web UI. 00:15:40.740 --> 00:15:43.019 You can create models, you can put things like 00:15:43.020 --> 00:15:46.419 help desk developers and stuff like that, breaking it out. 00:15:46.420 --> 00:15:51.019 The Hacker News has a how to build a $300 AI computer. 00:15:51.020 --> 00:15:52.859 This is for March 2024, 00:15:52.860 --> 00:15:55.099 but it still has a lot of great information 00:15:55.100 --> 00:15:56.819 on how to benchmark the environments, 00:15:56.820 --> 00:16:01.339 what some values are like the Ryzen 5700U 00:16:01.340 --> 00:16:02.579 inside my Acer Aspire, 00:16:02.580 --> 00:16:04.419 that's where I got the idea doing that. 00:16:04.420 --> 00:16:06.739 Make sure you do the RockM stuff correctly 00:16:06.740 --> 00:16:09.899 to get the GUI extensions. But it's just really good stuff. 00:16:09.900 --> 00:16:13.059 You don't need a great GPU or CPU to get started. 00:16:13.060 --> 00:16:14.819 Smaller models like Tiny Llama 00:16:14.820 --> 00:16:16.179 can run on very small systems. 00:16:16.180 --> 00:16:18.499 It gets you the ability to start playing with it 00:16:18.500 --> 00:16:21.619 and start experimenting and figure out if that's for you 00:16:21.620 --> 00:16:23.379 and to move forward with it. 00:16:23.380 --> 00:16:29.219 The AMD Ryzen AI Max 395 plus is a mini PC 00:16:29.220 --> 00:16:31.179 makes it really nice dedicated host. 00:16:31.180 --> 00:16:34.619 You used to be able to buy these for about $1200 now 00:16:34.620 --> 00:16:35.579 with the RAM price increase, 00:16:35.580 --> 00:16:38.779 you want to get 120 gig when you're pushing two brands so. 00:16:38.780 --> 00:16:40.739 It gets a little tighter. 00:16:40.740 --> 00:16:44.099 Macs work remarkably well with AI. 00:16:44.100 --> 00:16:47.659 My MacBook Air was one of my go-tos for a while, 00:16:47.660 --> 00:16:49.779 but once I started doing anything AI, 00:16:49.780 --> 00:16:50.779 I had a five-minute window 00:16:50.780 --> 00:16:52.619 before the thermal throttling became an issue. 00:16:52.620 --> 00:16:54.619 Keep in mind that's a MacBook Air, 00:16:54.620 --> 00:16:56.659 so it doesn't have the greatest ventilation. 00:16:56.660 --> 00:16:58.339 If you get the MacBook Pros and stuff, 00:16:58.340 --> 00:17:00.139 they tend to have more ventilation, 00:17:00.140 --> 00:17:02.499 but still you're going to be pushing against that. 00:17:02.500 --> 00:17:04.939 So Mac Minis and the Mac Ultras and stuff like that 00:17:04.940 --> 00:17:06.099 tend to work really well for that. 00:17:06.100 --> 00:17:09.779 Alex Ziskin on YouTube has a channel. 00:17:09.780 --> 00:17:11.899 He does a lot of AI performance benchmarking, 00:17:11.900 --> 00:17:14.819 like I load a 70 billion parameter model 00:17:14.820 --> 00:17:16.699 on this mini PC and stuff like that. 00:17:16.700 --> 00:17:19.019 It's a lot of fun and interesting stuff there. 00:17:19.020 --> 00:17:21.219 And it's influencing my decision 00:17:21.220 --> 00:17:22.979 to buy my next AI style PC. 00:17:22.980 --> 00:17:27.619 Small domain specific LLMs are happening. 00:17:27.620 --> 00:17:29.939 An LLM that has all your code and information, 00:17:29.940 --> 00:17:31.659 it sounds like a really cool idea. 00:17:31.660 --> 00:17:34.299 It gives you capabilities to start training stuff 00:17:34.300 --> 00:17:35.899 that you couldn't do with like the big ones. 00:17:35.900 --> 00:17:38.059 Even with in terms of fine tuning and stuff, 00:17:38.060 --> 00:17:40.539 it's remarkable to see where that space is coming along 00:17:40.540 --> 00:17:41.739 in the next year or so. 00:17:41.740 --> 00:17:46.219 Hugging Face Co has pointers to tons of AI models. 00:17:46.220 --> 00:17:49.259 You'll find the one that works for you, hopefully there. 00:17:49.260 --> 00:17:50.539 If you're doing cybersecurity, 00:17:50.540 --> 00:17:52.059 there's a whole bunch out there for that, 00:17:52.060 --> 00:17:54.619 that have certain training on it, information. 00:17:54.620 --> 00:17:56.139 It's really good. 00:17:56.140 --> 00:18:00.099 One last thing to keep in mind is hallucinations are real. 00:18:00.100 --> 00:18:02.779 You will get BS back from the AI occasionally, 00:18:02.780 --> 00:18:05.179 so do validate everything you get from it. 00:18:05.180 --> 00:18:08.459 Don't be using it for court cases like some people have 00:18:08.460 --> 00:18:14.539 and run into those problems. So, That is my talk. 00:18:14.540 --> 00:18:17.219 What I would like you to get out of that is, 00:18:17.220 --> 00:18:21.859 if you haven't tried it, give GPTEL and LlamaFile a shot. 00:18:21.860 --> 00:18:23.979 Fire up a little small AI instance, 00:18:23.980 --> 00:18:27.339 play around with a little bit inside your Emacs, 00:18:27.340 --> 00:18:30.139 and see if it makes your life better. Hopefully it will. 00:18:30.140 --> 00:18:32.139 And I really hope you guys 00:18:32.140 --> 00:18:34.659 learned something from this talk. And thanks for listening. 00:18:34.660 --> 00:18:38.979 And the links are at the end of the talk, if you have any questions. 00:18:38.980 --> 00:18:42.739 Let me see if we got anything you want, Pat. You do. 00:18:42.740 --> 00:18:43.899 You've got a few questions. 00:18:43.900 --> 00:18:48.059 Hey, this is Corwin. Thank you so much. Thank you, Aaron. 00:18:48.060 --> 00:18:50.339 What an awesome talk this was, actually. 00:18:50.340 --> 00:18:52.179 If you don't have a camera, 00:18:52.180 --> 00:18:54.339 I can get away with not having one too. 00:18:54.340 --> 00:18:56.299 I've got, I'll turn the camera on. 00:18:56.300 --> 00:19:01.499 Okay. All right. I'll turn mine back on. Here I come. 00:19:01.500 --> 00:19:03.139 Yeah, so there are a few questions, 00:19:03.140 --> 00:19:04.579 but first let me say thank you 00:19:04.580 --> 00:19:06.339 for a really captivating talk. 00:19:06.340 --> 00:19:10.939 I think a lot of people will be empowered from this 00:19:10.940 --> 00:19:15.259 to try to do more with less, especially locally. 00:19:15.260 --> 00:19:20.179 concerned about the data center footprint, 00:19:20.180 --> 00:19:23.659 environmentally concerned 00:19:23.660 --> 00:19:26.979 about the footprint of LLM inside data centers. 00:19:26.980 --> 00:19:28.219 So just thinking about how we can 00:19:28.220 --> 00:19:32.419 put infrastructure we have at home to use 00:19:32.420 --> 00:19:34.019 and get more done with less. 00:19:34.020 --> 00:19:37.499 Yeah, the data center impact's interesting 00:19:37.500 --> 00:19:39.979 because there was a study a while ago. 00:19:39.980 --> 00:19:42.099 Someone said every time you do a Gemini query, 00:19:42.100 --> 00:19:45.019 it's like boiling a cup of water. 00:19:45.020 --> 00:19:48.619 Yeah, I've heard that one too. So do you want to, you know, 00:19:48.620 --> 00:19:51.699 I don't know how much direction you want. 00:19:51.700 --> 00:19:53.859 I'd be very happy to read out the questions for you. 00:19:53.860 --> 00:19:55.219 Yeah, that would be great. 00:19:55.220 --> 00:19:57.619 I'm having trouble getting to that tab. 00:19:57.620 --> 00:20:02.779 Okay, I'm there, so I'll put it into our chat too, 00:20:02.780 --> 00:20:07.419 so you can follow along if you'd like. 00:20:07.420 --> 00:20:11.219 The first question was, why is the David Bowie question 00:20:11.220 --> 00:20:12.219 a good one to start with? 00:20:12.220 --> 00:20:14.419 Does it have interesting failure conditions 00:20:14.420 --> 00:20:17.299 or what made you choose that? 00:20:17.300 --> 00:20:21.979 First off, huge fan of David Bowie. 00:20:21.980 --> 00:20:24.499 But I came down to it really taught me a few things 00:20:24.500 --> 00:20:26.299 about how old the models work 00:20:26.300 --> 00:20:28.819 in terms of things like how many kids he had, 00:20:28.820 --> 00:20:31.779 because deep seek, which is a very popular Chinese model 00:20:31.780 --> 00:20:33.179 that a lot of people are using now, 00:20:33.180 --> 00:20:35.619 misidentifies him having three daughters, 00:20:35.620 --> 00:20:38.459 and he has like one son and one, one, I think, 00:20:38.460 --> 00:20:40.899 two sons and a daughter or something like that. 00:20:40.900 --> 00:20:43.659 so there's differences on that and it just goes over 00:20:43.660 --> 00:20:45.299 there's a whole lot of stuff 00:20:45.300 --> 00:20:47.779 because his story spans like 60 years 00:20:47.780 --> 00:20:49.659 so it gives a good good feedback 00:20:49.660 --> 00:20:51.539 that's the real main reason I asked that question 00:20:51.540 --> 00:20:53.699 because I just needed one that sea monkeys I just picked 00:20:53.700 --> 00:20:56.579 because it was obscure and just always have right 00:20:56.580 --> 00:20:58.939 I used to have it right hello world and forth 00:20:58.940 --> 00:21:01.019 because I thought was an interesting one as well so 00:21:01.020 --> 00:21:03.899 It's just picking random ones like that. 00:21:03.900 --> 00:21:06.499 One question asked, sorry, a lot of models is, 00:21:06.500 --> 00:21:09.419 what is the closest star to the Earth? 00:21:09.420 --> 00:21:12.019 Because most of them will say Alpha Centauri 00:21:12.020 --> 00:21:13.739 or Proxima Centauri and not the sun. 00:21:13.740 --> 00:21:15.899 And I have a whole nother talk 00:21:15.900 --> 00:21:17.899 where I just argue with the LLM 00:21:17.900 --> 00:21:20.019 trying to say, hey, the sun is a star. 00:21:20.020 --> 00:21:26.579 And he just wouldn't accept it, so. What? 00:21:26.580 --> 00:21:28.419 Oh, I can hear that. 00:21:28.420 --> 00:21:34.379 So what specific tasks do you like to use your local AI? 00:21:34.380 --> 00:21:37.459 I like to load a lot of my code into 00:21:37.460 --> 00:21:39.739 and actually have it do analysis of it. 00:21:39.740 --> 00:21:42.339 I was actually going through some code 00:21:42.340 --> 00:21:45.619 I have for some pen testing, and I was having it modified 00:21:45.620 --> 00:21:47.259 to update it for the newer version, 00:21:47.260 --> 00:21:48.459 because I hate to say this, 00:21:48.460 --> 00:21:49.859 but it was written for Python 2, 00:21:49.860 --> 00:21:51.459 and I needed to update it for Python 3. 00:21:51.460 --> 00:21:53.859 And the 2 to 3 tool did not do all of it, 00:21:53.860 --> 00:21:56.659 but the actual tool was able to do the refactoring. 00:21:56.660 --> 00:21:58.499 It's part of my laziness. 00:21:58.500 --> 00:22:01.459 But I use that for anything I don't want to hit the web. 00:22:01.460 --> 00:22:03.259 And that's a lot of stuff when you start thinking about 00:22:03.260 --> 00:22:04.979 if you're doing cyber security researching. 00:22:04.980 --> 00:22:06.819 and you have your white papers 00:22:06.820 --> 00:22:10.779 and stuff like that and stuff in there. 00:22:10.780 --> 00:22:13.979 I've got a lot of that loaded into RAG 00:22:13.980 --> 00:22:15.659 in one model on my OpenWebUI system. 00:22:15.660 --> 00:22:21.059 Neat. Have you used have you used 00:22:21.060 --> 00:22:25.739 any small domain specific LLMs? What kind of tasks? 00:22:25.740 --> 00:22:30.419 If so, what kind of tasks that they specialize in? 00:22:30.420 --> 00:22:32.139 And you know, how? 00:22:32.140 --> 00:22:34.979 Not to be honest, but there are some out there like once again, 00:22:34.980 --> 00:22:36.779 for cybersecurity and stuff like that, 00:22:36.780 --> 00:22:39.739 that I really need to dig into that's on my to do list. 00:22:39.740 --> 00:22:41.699 I've got a couple weeks off at the end of the year. 00:22:41.700 --> 00:22:43.779 And that's a big part of my plan for that. 00:22:43.780 --> 00:22:49.379 Are the various models updated pretty regularly? 00:22:49.380 --> 00:22:52.059 Can you add your own data to the pre-built models? 00:22:52.060 --> 00:22:56.699 Yes. The models are updated pretty reasonably. 00:22:56.700 --> 00:22:59.699 You can add data to a model in a couple of different ways. 00:22:59.700 --> 00:23:01.099 You can do something called fine-tuning, 00:23:01.100 --> 00:23:03.819 which requires a really nice GPU and a lot of CPU time. 00:23:03.820 --> 00:23:05.499 Probably not going to do that. 00:23:05.500 --> 00:23:07.419 You can do retrieval augmentation generation, 00:23:07.420 --> 00:23:09.499 which is you load your data on top of the system 00:23:09.500 --> 00:23:11.299 and puts inside a database 00:23:11.300 --> 00:23:12.859 and you can actually scan that and stuff. 00:23:12.860 --> 00:23:14.619 I have another talk where I go through 00:23:14.620 --> 00:23:16.219 and I start asking questions about, 00:23:16.220 --> 00:23:18.579 I load the talk into the engine 00:23:18.580 --> 00:23:20.099 and I ask questions against that. 00:23:20.100 --> 00:23:22.179 I would have one more time would have done that 00:23:22.180 --> 00:23:26.499 but it comes down to how many That's that's rag rag 00:23:26.500 --> 00:23:29.419 is pretty easy to do through open web UI or LM studio 00:23:29.420 --> 00:23:31.419 It's a great way you just like point a folder 00:23:31.420 --> 00:23:34.099 point it to a folder and it just sucks all that state into 00:23:34.100 --> 00:23:35.499 and it'll hit that data first 00:23:35.500 --> 00:23:36.859 you have like helpdesk and stuff and 00:23:36.860 --> 00:23:39.619 The other options there's vector databases, 00:23:39.620 --> 00:23:41.819 which is like if you use PostgreSQL. 00:23:41.820 --> 00:23:43.699 It has a PG vector I can do a lot of that stuff. 00:23:43.700 --> 00:23:44.739 I've not dug into that yet, 00:23:44.740 --> 00:23:46.099 but that is also on that to-do list 00:23:46.100 --> 00:23:48.459 I've got a lot of stuff planned for Cool. 00:23:48.460 --> 00:23:51.819 So what are your experience with rags? 00:23:51.820 --> 00:23:54.339 I don't even know what that means. 00:23:54.340 --> 00:23:57.419 Do you know what that means? 00:23:57.420 --> 00:23:59.619 Do you remember this question again? 00:23:59.620 --> 00:24:03.979 What is your experience with RAGs? RAGs is great. 00:24:03.980 --> 00:24:07.459 That's Retrieval Augmentation Generation. 00:24:07.460 --> 00:24:09.739 That loads your data first, and it hits yours, 00:24:09.740 --> 00:24:11.499 and it'll actually cite it and stuff. 00:24:11.500 --> 00:24:14.659 There's a guy who wrote a RAG in 100 lines of Python, 00:24:14.660 --> 00:24:16.899 and it's an impressive piece of software. 00:24:16.900 --> 00:24:18.779 I think if you hit one of my site, 00:24:18.780 --> 00:24:22.099 I've got a private AI talk where I actually refer to that. 00:24:22.100 --> 00:24:25.219 But retrieval augmentation, it's easy, it's fast, 00:24:25.220 --> 00:24:26.699 it puts your data into the system, 00:24:26.700 --> 00:24:31.339 Yeah, start with that and go then iterate on top of that. 00:24:31.340 --> 00:24:32.659 That's one of the great things about AI, 00:24:32.660 --> 00:24:33.619 especially private AI, 00:24:33.620 --> 00:24:37.739 is you can do whatever you want to with it 00:24:37.740 --> 00:24:43.179 and build up with it as you get more experience. 00:24:43.180 --> 00:24:44.219 Any thoughts on running things 00:24:44.220 --> 00:24:49.179 on AWS, DigitalOcean, and so on? 00:24:49.180 --> 00:24:50.619 AWS is not bad. 00:24:50.620 --> 00:24:52.659 The DigitalOcean, they have some of their GPUs. 00:24:52.660 --> 00:24:54.379 I still don't like having the data 00:24:54.380 --> 00:24:57.419 leave my house, to be honest, or at work, 00:24:57.420 --> 00:24:59.019 because I tend to do some stuff 00:24:59.020 --> 00:25:01.259 that I don't want it even hitting that situation. 00:25:01.260 --> 00:25:03.699 But they have pretty good stuff. 00:25:03.700 --> 00:25:05.579 Another one to consider is Oracle Cloud. 00:25:05.580 --> 00:25:09.059 Oracle has their AI infrastructure that's really well done. 00:25:09.060 --> 00:25:12.379 But I mean, once again, then you start looking at potential 00:25:12.380 --> 00:25:13.779 is saying your data is private, 00:25:13.780 --> 00:25:14.819 I don't necessarily trust it. 00:25:14.820 --> 00:25:17.859 But they do have good stuff, both DigitalOcean, AWS, 00:25:17.860 --> 00:25:20.339 Oracle Cloud has the free service, which isn't too bad, 00:25:20.340 --> 00:25:21.339 usually a certain number of stuff. 00:25:21.340 --> 00:25:23.179 And Google's also has it, 00:25:23.180 --> 00:25:26.739 but I still tend to keep more stuff on local PCs, 00:25:26.740 --> 00:25:33.299 because I just paranoid that way. Gotcha. 00:25:33.300 --> 00:25:35.579 What has your experience been using AI? 00:25:35.580 --> 00:25:40.139 Do you want to get into that, using AI for cybersecurity? 00:25:40.140 --> 00:25:42.019 You might have already touched on this. 00:25:42.020 --> 00:25:44.379 Yeah, really, for cybersecurity, 00:25:44.380 --> 00:25:46.259 what I've had to do is I've dumped logs 00:25:46.260 --> 00:25:47.299 to have a due correlation. 00:25:47.300 --> 00:25:49.859 Keep in mind, the size of that LLAMA file we were using 00:25:49.860 --> 00:25:52.059 for figuring out David Bowie, writing the hello world, 00:25:52.060 --> 00:25:54.179 all that stuff, is like six gig. 00:25:54.180 --> 00:25:56.859 How does it get the entire world in six gig? 00:25:56.860 --> 00:25:59.739 I still haven't figured that out in terms of quantization. 00:25:59.740 --> 00:26:02.499 So I'm really interested in seeing the ability 00:26:02.500 --> 00:26:05.139 to take all this stuff out of all my logs, 00:26:05.140 --> 00:26:06.339 dump it all in there, 00:26:06.340 --> 00:26:08.459 and actually be able to do intelligent queries against that. 00:26:08.460 --> 00:26:10.899 Microsoft has a project called Security Copilot, 00:26:10.900 --> 00:26:12.819 which is trying to do that in the Cloud. 00:26:12.820 --> 00:26:15.299 But I want to work on something to do that more locally 00:26:15.300 --> 00:26:19.019 and be able to actually drive this stuff over that. 00:26:19.020 --> 00:26:21.979 That's one also on the long-term goals. 00:26:21.980 --> 00:26:26.059 So we got any other questions or? 00:26:26.060 --> 00:26:29.099 Those are the questions that I see. 00:26:29.100 --> 00:26:31.179 I want to just read out a couple of comments 00:26:31.180 --> 00:26:33.419 that I saw in IRC though. 00:26:33.420 --> 00:26:36.699 Jay Rutabaga says, it went very well 00:26:36.700 --> 00:26:39.259 from an audience perspective. 00:26:39.260 --> 00:26:43.619 And G Gundam says, respect your commitment to privacy. 00:26:43.620 --> 00:26:45.619 And then somebody is telling us 00:26:45.620 --> 00:26:46.779 we might have skipped a question. 00:26:46.780 --> 00:26:50.019 So I'm just going to run back to my list. 00:26:50.020 --> 00:26:52.819 Updated regularly experience. 00:26:52.820 --> 00:26:57.659 I just didn't type in the answer here's 00:26:57.660 --> 00:26:59.659 and there's a couple more questions coming in so 00:26:59.660 --> 00:27:04.699 Is there a disparity where you go to paid models 00:27:04.700 --> 00:27:08.619 because they are better and what problems? 00:27:08.620 --> 00:27:14.019 You know what would drive you to? That's a good question. 00:27:14.020 --> 00:27:17.819 Paid models, I don't mind them. I think they're good, 00:27:17.820 --> 00:27:21.299 but I don't think they're actually economically sustainable 00:27:21.300 --> 00:27:22.659 under their current system. 00:27:22.660 --> 00:27:24.299 Because right now, if you're paying 00:27:24.300 --> 00:27:26.899 20 bucks a month for Copilot and that goes up to 200 bucks, 00:27:26.900 --> 00:27:28.499 I'm not going to be as likely to use it. 00:27:28.500 --> 00:27:29.579 You know what I mean? 00:27:29.580 --> 00:27:33.059 But it does do some things in a way that I did not expect. 00:27:33.060 --> 00:27:35.459 For example, Grok was refactoring 00:27:35.460 --> 00:27:38.019 some of my code in the comments and dropped an F-bomb. 00:27:38.020 --> 00:27:39.979 which I did not see coming, 00:27:39.980 --> 00:27:41.619 but the other code before 00:27:41.620 --> 00:27:43.219 that I had gotten off GitHub 00:27:43.220 --> 00:27:44.059 had F bombs in it. 00:27:44.060 --> 00:27:45.899 So it was just emulating the style, 00:27:45.900 --> 00:27:47.779 but would that be something 00:27:47.780 --> 00:27:49.979 I'd want to turn in a pull request? I don't know. 00:27:49.980 --> 00:27:52.139 But, uh, there's, there's a lot of money 00:27:52.140 --> 00:27:53.899 going into these AIs and stuff, 00:27:53.900 --> 00:27:56.219 but in terms of the ability to get a decent one, 00:27:56.220 --> 00:27:57.979 like the llama, llama three, two, 00:27:57.980 --> 00:28:01.699 and load your data into it, you can be pretty competitive. 00:28:01.700 --> 00:28:04.779 You're not going to get all the benefits, 00:28:04.780 --> 00:28:07.299 but you have more control over it. 00:28:07.300 --> 00:28:11.819 So it's, it's a, this and that it's a, 00:28:11.820 --> 00:28:13.139 it's a balancing act. 00:28:13.140 --> 00:28:15.539 Okay, and I think I see a couple more questions coming in. 00:28:15.540 --> 00:28:19.619 What is the largest parameter size for local models 00:28:19.620 --> 00:28:22.459 that you've been able to successfully run locally 00:28:22.460 --> 00:28:26.059 and do run into issues with limited context window size? 00:28:26.060 --> 00:28:29.659 The top eight models will tend to have a larger ceiling. 00:28:29.660 --> 00:28:32.859 Yes, yes, yes, yes, yes. 00:28:32.860 --> 00:28:37.019 By default, the context size is I think 1024. 00:28:37.020 --> 00:28:44.619 But I've upped it to 8192 on the on this box, the Pangolin 00:28:44.620 --> 00:28:46.939 because it seems to be some reason 00:28:46.940 --> 00:28:49.459 it's just a very working quite well. 00:28:49.460 --> 00:28:52.219 But the largest ones I've loaded have been in 00:28:52.220 --> 00:28:54.059 the have not been that huge. 00:28:54.060 --> 00:28:55.699 I've loaded this the last biggest one I've done. 00:28:55.700 --> 00:28:57.459 That's the reason why I'm planning 00:28:57.460 --> 00:29:01.339 on breaking down and buying a Ryzen. 00:29:01.340 --> 00:29:03.619 Actually, I'm going to buy 00:29:03.620 --> 00:29:06.979 an Intel i285H with 96 gig of RAM. 00:29:06.980 --> 00:29:08.379 Then I should be able to load 00:29:08.380 --> 00:29:12.059 a 70 billion parameter model in that. How fast will it run? 00:29:12.060 --> 00:29:13.819 It's going to run slow as dog, 00:29:13.820 --> 00:29:15.819 but it's going to be cool to be able to do it. 00:29:15.820 --> 00:29:17.379 It's an AI bragging rights thing, 00:29:17.380 --> 00:29:20.019 but I mostly stick with the smaller size models 00:29:20.020 --> 00:29:22.819 and the ones that are more quantitized 00:29:22.820 --> 00:29:26.619 because it just tends to work better for me. 00:29:26.620 --> 00:29:29.179 We've still got over 10 minutes before we're cutting away, 00:29:29.180 --> 00:29:30.179 but I'm just anticipating 00:29:30.180 --> 00:29:32.859 that we're going to be going strong at the 10 minute mark. 00:29:32.860 --> 00:29:34.899 So I'm just, just letting, you know, 00:29:34.900 --> 00:29:37.379 we can go as long as we like here at a certain point. 00:29:37.380 --> 00:29:41.059 I may have to jump away and check in with the next speaker, 00:29:41.060 --> 00:29:44.419 but we'll post the entirety of this, 00:29:44.420 --> 00:29:47.979 even if we aren't able to stay with it all. 00:29:47.980 --> 00:29:49.739 Okay. And we've got 10 minutes 00:29:49.740 --> 00:29:52.379 where we're still going to stay live. 00:29:52.380 --> 00:30:00.139 So next question coming in, I see, are there free as in freedom, 00:30:00.140 --> 00:30:05.739 free as in FSF issues with the data? 00:30:05.740 --> 00:30:11.699 Yes, where's the data coming from is a huge question with AI. 00:30:11.700 --> 00:30:13.739 It's astonishing you can ask questions 00:30:13.740 --> 00:30:16.899 to models that you don't know where it's coming from. 00:30:16.900 --> 00:30:19.979 That is gonna be one of the big issues long-term. 00:30:19.980 --> 00:30:21.499 There are people who are working 00:30:21.500 --> 00:30:22.979 on trying to figure out that stuff, 00:30:22.980 --> 00:30:25.259 but it's, I mean, if you look at, God, 00:30:25.260 --> 00:30:27.059 I can't remember who it was. 00:30:27.060 --> 00:30:28.659 Somebody was actually out torrenting books 00:30:28.660 --> 00:30:30.939 just to be able to build into their AI system. 00:30:30.940 --> 00:30:32.339 I think it might've been Meta. 00:30:32.340 --> 00:30:34.819 So there's a lot of that going on. 00:30:34.820 --> 00:30:38.139 The open source of the stuff is going to be tough. 00:30:38.140 --> 00:30:39.459 There's going to be there's some models 00:30:39.460 --> 00:30:41.419 like the mobile guys have got their own license, 00:30:41.420 --> 00:30:42.739 but where they're getting their data from, 00:30:42.740 --> 00:30:45.499 I'm not sure on so that that's a huge question. 00:30:45.500 --> 00:30:47.979 That's a that's a talk in itself. 00:30:47.980 --> 00:30:51.979 But yeah, but you if you train on your RAG and your data, 00:30:51.980 --> 00:30:53.499 you know what it's come, you know, 00:30:53.500 --> 00:30:54.379 you have a license that 00:30:54.380 --> 00:30:55.139 but the other stuff is just 00:30:55.140 --> 00:30:56.739 more lines of supplement 00:30:56.740 --> 00:31:01.379 if you're using a smaller model, 00:31:01.380 --> 00:31:05.419 but the comment online, I see a couple of them. 00:31:05.420 --> 00:31:08.339 I'll read them out in order here. Really interesting stuff. 00:31:08.340 --> 00:31:11.659 Thank you for your talk. Given that large AI companies 00:31:11.660 --> 00:31:14.899 are openly stealing intellectual property and copyright 00:31:14.900 --> 00:31:18.939 and therefore eroding the authority of such laws 00:31:18.940 --> 00:31:21.579 and maybe obscuring the truth itself, 00:31:21.580 --> 00:31:26.579 can you see a future where IP and copyright flaw become untenable? 00:31:26.580 --> 00:31:29.619 I think that's a great question. 00:31:29.620 --> 00:31:34.979 I'm not a lawyer, but it is really getting complicated. 00:31:34.980 --> 00:31:37.859 It is getting to the point, I asked a question from, 00:31:37.860 --> 00:31:41.179 I played with Sora a little bit, and it generated someone, 00:31:41.180 --> 00:31:42.819 you can go like, oh, that's Jon Hamm, 00:31:42.820 --> 00:31:44.099 that's Christopher Walken, 00:31:44.100 --> 00:31:45.379 you start figuring out who the people 00:31:45.380 --> 00:31:47.019 they're modeling stuff after. 00:31:47.020 --> 00:31:48.979 There is an apocalypse, something 00:31:48.980 --> 00:31:52.459 going to happen right now. 00:31:52.460 --> 00:31:53.579 There is, but this is once again, 00:31:53.580 --> 00:31:56.059 my personal opinion, and I'm not a lawyer, 00:31:56.060 --> 00:31:57.459 and I do not have money. 00:31:57.460 --> 00:31:58.859 So don't sue me, is there's going to be 00:31:58.860 --> 00:32:02.899 the current administration tends is very AI pro AI. 00:32:02.900 --> 00:32:05.499 And there's very a great deal of lobbying by those groups. 00:32:05.500 --> 00:32:07.139 And it's on both sides. 00:32:07.140 --> 00:32:09.699 And it's going to be, it's gonna be interesting to see 00:32:09.700 --> 00:32:11.699 what happens to copyright the next 510 years. 00:32:11.700 --> 00:32:13.339 I just don't know how it keeps up 00:32:13.340 --> 00:32:16.059 without there being some adjustments and stuff. 00:32:16.060 --> 00:32:20.419 Okay, and then another comment I saw, 00:32:20.420 --> 00:32:23.219 file size is not going to be a bottleneck. 00:32:23.220 --> 00:32:25.819 RAM is. You'll need 16 gigabytes of RAM 00:32:25.820 --> 00:32:28.259 to run the smallest local models 00:32:28.260 --> 00:32:31.979 and 512 gigabytes of RAM to run the larger ones. 00:32:31.980 --> 00:32:35.059 You'll need a GPU with that much memory 00:32:35.060 --> 00:32:39.099 if you want it to run quickly. Yeah. Oh no. 00:32:39.100 --> 00:32:41.259 It also depends upon how your memory is laid out. 00:32:41.260 --> 00:32:45.699 Like example being the Ultra i285H 00:32:45.700 --> 00:32:47.899 I plan to buy, that has 96 gig of memory. 00:32:47.900 --> 00:32:50.499 It's unified between the GPU and the CPU share it, 00:32:50.500 --> 00:32:52.739 but they go over the same bus. 00:32:52.740 --> 00:32:55.779 So the overall bandwidth of it tends to be a bit less, 00:32:55.780 --> 00:32:57.579 but you're able to load more of it into memory. 00:32:57.580 --> 00:32:59.419 So it's able to do some additional stuff with it 00:32:59.420 --> 00:33:00.819 as opposed to come off disk. 00:33:00.820 --> 00:33:03.699 It's all balancing act. If you hit Zyskin's website, 00:33:03.700 --> 00:33:05.819 that guy's done some great work on it. 00:33:05.820 --> 00:33:07.499 I'm trying to figure out how big a model you can do, 00:33:07.500 --> 00:33:08.619 what you can do with it. 00:33:08.620 --> 00:33:12.699 And some of the stuff seems to be not obvious, 00:33:12.700 --> 00:33:15.299 because like example, being that MacBook Air, 00:33:15.300 --> 00:33:17.619 for the five minutes I can run the model, 00:33:17.620 --> 00:33:19.379 it runs it faster than a lot of other things 00:33:19.380 --> 00:33:21.339 that should be able to run it faster, 00:33:21.340 --> 00:33:24.619 just because of the way the ARM cores and the unified memory work on it. 00:33:24.620 --> 00:33:26.019 So it's a learning process. 00:33:26.020 --> 00:33:29.579 But if you want to, Network Chuck had a great video 00:33:29.580 --> 00:33:30.939 talking about building his own system 00:33:30.940 --> 00:33:34.379 with a couple really powerful NVIDIA cards 00:33:34.380 --> 00:33:35.379 and stuff like that in it. 00:33:35.380 --> 00:33:38.859 And just actually setting up on his system as a node 00:33:38.860 --> 00:33:41.459 and using a web UI on it. So there's a lot of stuff there, 00:33:41.460 --> 00:33:43.899 but it is a process of learning how big your data is, 00:33:43.900 --> 00:33:44.899 which models you want to use, 00:33:44.900 --> 00:33:46.219 how much information you need, 00:33:46.220 --> 00:33:48.019 but it's part of the learning. 00:33:48.020 --> 00:33:52.899 And you can run models, even as a Raspberry PI fives, 00:33:52.900 --> 00:33:54.499 if you want to, they'll run slow. 00:33:54.500 --> 00:33:56.459 Don't get me wrong, but they're possible. 00:33:56.460 --> 00:34:02.179 Okay, and I think there's other questions coming in too, 00:34:02.180 --> 00:34:04.019 so I'll just bam for another second. 00:34:04.020 --> 00:34:06.299 We've got about five minutes before we'll, 00:34:06.300 --> 00:34:09.739 before we'll be cutting over, 00:34:09.740 --> 00:34:13.179 but I just want to say in case we get close for time here, 00:34:13.180 --> 00:34:14.859 how much I appreciate your talk. 00:34:14.860 --> 00:34:15.979 This is another one that I'm going to 00:34:15.980 --> 00:34:18.339 have to study after the conference. 00:34:18.340 --> 00:34:21.099 We greatly appreciate, all of us appreciate 00:34:21.100 --> 00:34:22.459 you guys putting on the conference. 00:34:22.460 --> 00:34:26.299 It's a great conference. It's well done. 00:34:26.300 --> 00:34:28.019 It's an honor to be on the stage 00:34:28.020 --> 00:34:30.899 with the brains of the project, which is you. 00:34:30.900 --> 00:34:34.699 So what else we got? Question wise. 00:34:34.700 --> 00:34:39.499 Okay, so just scanning here. 00:34:39.500 --> 00:34:50.699 Have you used local models capable of tool calling? 00:34:50.700 --> 00:34:54.779 I'm, I'm scared of agentic. 00:34:54.780 --> 00:34:58.739 I, I am, I'm going to be a slow adopter of that. 00:34:58.740 --> 00:35:02.459 I want to do it, but I just don't have the, uh, 00:35:02.460 --> 00:35:04.339 four decimal fortitude right now to do it. 00:35:04.340 --> 00:35:07.179 I, I, I've had to give me the commands, 00:35:07.180 --> 00:35:08.739 but I still run the commands by hand. 00:35:08.740 --> 00:35:10.539 I'm looking into it and it's on once again, 00:35:10.540 --> 00:35:14.139 it's on that list, but I just, that's a big step for me. 00:35:14.140 --> 00:35:23.139 So. Awesome. All right. 00:35:23.140 --> 00:35:27.179 Well, maybe it's, let me just scroll through 00:35:27.180 --> 00:35:31.539 because we might have missed one question. Oh, I see. 00:35:31.540 --> 00:35:36.899 Here was the piggyback question. 00:35:36.900 --> 00:35:38.419 Now I see the question that I missed. 00:35:38.420 --> 00:35:41.139 So this was piggybacking on the question 00:35:41.140 --> 00:35:44.859 about model updates and adding data. 00:35:44.860 --> 00:35:46.579 And will models reach out to the web 00:35:46.580 --> 00:35:47.819 if they need more info? 00:35:47.820 --> 00:35:51.779 Or have you worked with any models that work that way? 00:35:51.780 --> 00:35:55.259 No, I've not seen any models to do that 00:35:55.260 --> 00:35:57.739 There's there was like a group 00:35:57.740 --> 00:35:59.899 working on something like a package updater 00:35:59.900 --> 00:36:02.499 that would do different diffs on it, 00:36:02.500 --> 00:36:03.939 but it's so Models change so much 00:36:03.940 --> 00:36:05.739 even who make minor changes and fine-tuning. 00:36:05.740 --> 00:36:07.659 It's hard just to update them in place 00:36:07.660 --> 00:36:10.099 So I haven't seen one, but that doesn't mean 00:36:10.100 --> 00:36:16.259 they're not out there. I'm curious topic though Awesome 00:36:16.260 --> 00:36:19.539 Well, it's probably pretty good timing. 00:36:19.540 --> 00:36:21.299 Let me just scroll and make sure. 00:36:21.300 --> 00:36:23.499 And of course, before I can say that, 00:36:23.500 --> 00:36:25.899 there's one more question. So let's go ahead and have that. 00:36:25.900 --> 00:36:28.299 I want to make sure while we're still live, though, 00:36:28.300 --> 00:36:31.299 I give you a chance to offer any closing thoughts. 00:36:31.300 --> 00:36:35.779 So what scares you most about the agentic tools? 00:36:35.780 --> 00:36:38.419 How would you think about putting a sandbox around that 00:36:38.420 --> 00:36:42.139 if you did adopt an agentic workflow? 00:36:42.140 --> 00:36:42.899 That is a great question. 00:36:42.900 --> 00:36:45.939 In terms of that, I would just control 00:36:45.940 --> 00:36:48.099 what it's able to talk to, what machines, 00:36:48.100 --> 00:36:50.059 I would actually have it be air gap. 00:36:50.060 --> 00:36:52.099 I work for a defense contractor, 00:36:52.100 --> 00:36:53.819 and we spend a lot of time dealing with air gap systems, 00:36:53.820 --> 00:36:55.979 because that's just kind of the way it works out for us. 00:36:55.980 --> 00:36:58.499 So agentic, it's just going to take a while to get trust. 00:36:58.500 --> 00:37:01.059 I want to want to see more stuff happening. 00:37:01.060 --> 00:37:02.819 Humans screw up stuff enough. 00:37:02.820 --> 00:37:04.819 The last thing we need is to multiply that by 1000. 00:37:04.820 --> 00:37:09.419 So in terms of that, I would be restricting what it can do. 00:37:09.420 --> 00:37:10.859 If you look at the capabilities, 00:37:10.860 --> 00:37:13.579 if I created a user and gave it permissions, 00:37:13.580 --> 00:37:15.299 I would have a lockdown through sudo, 00:37:15.300 --> 00:37:17.379 what it's able to do, what the account's able to do. 00:37:17.380 --> 00:37:18.899 I would do those kind of things, 00:37:18.900 --> 00:37:20.859 but it's going to be, it's happening. 00:37:20.860 --> 00:37:25.819 It's just, I'm going to be one of the laggards on that one. 00:37:25.820 --> 00:37:29.259 So airgab, jail, extremely locked down environments, 00:37:29.260 --> 00:37:34.899 like we're talking about separate physicals, not Docker. 00:37:34.900 --> 00:37:37.499 Yeah, hopefully. Right, fair. 00:37:37.500 --> 00:37:39.899 So tool calling can be read-only, 00:37:39.900 --> 00:37:42.539 such as giving models the ability to search the web 00:37:42.540 --> 00:37:43.979 before answering your question, 00:37:43.980 --> 00:37:46.219 you know, write access, execute access. 00:37:46.220 --> 00:37:49.219 I'm interested to know if local models 00:37:49.220 --> 00:37:51.419 are any good at that. 00:37:51.420 --> 00:37:55.579 Yes, local models can do a lot of that stuff. 00:37:55.580 --> 00:37:56.819 It's their capabilities. 00:37:56.820 --> 00:37:59.019 If you load LM studio, you can do a lot of wonderful stuff 00:37:59.020 --> 00:38:02.419 with that or with open web UI with a llama. 00:38:02.420 --> 00:38:05.739 It's a lot of capabilities. It's amazing. 00:38:05.740 --> 00:38:08.139 Open web UI is actually what a lot of companies are using now 00:38:08.140 --> 00:38:10.259 to put their data behind that. 00:38:10.260 --> 00:38:12.139 They're curated data and stuff like that. So works well. 00:38:12.140 --> 00:38:15.819 I can confirm that from my own professional experience. 00:38:15.820 --> 00:38:19.659 Excellent. Okay, well, our timing should be just perfect 00:38:19.660 --> 00:38:22.659 if you want to give us like a 30-second, 45-second wrap-up. 00:38:22.660 --> 00:38:24.419 Aaron, let me squeeze in mine. 00:38:24.420 --> 00:38:26.779 Thank you again so much for preparing this talk 00:38:26.780 --> 00:38:30.499 and for entertaining all of our questions. 00:38:30.500 --> 00:38:33.299 Yeah, let me just thank you guys for the conference again. 00:38:33.300 --> 00:38:35.179 This is a great one. I've enjoyed a lot of it. 00:38:35.180 --> 00:38:37.339 I've only had a couple of talks so far, 00:38:37.340 --> 00:38:41.659 but I'm looking forward to hitting the ones after this and tomorrow. 00:38:41.660 --> 00:38:44.739 But the AI stuff is coming. Get on board. 00:38:44.740 --> 00:38:46.939 Definitely recommend it. If you want to just try it out 00:38:46.940 --> 00:38:48.419 and get a little taste of it, 00:38:48.420 --> 00:38:49.779 what my minimal viable product 00:38:49.780 --> 00:38:51.619 with just LlamaFile and GPTEL 00:38:51.620 --> 00:38:53.139 will get you to the point where you start figuring out. 00:38:53.140 --> 00:38:55.579 Gptel is an amazing thing. It just gets out of your way, 00:38:55.580 --> 00:39:00.459 but it works solo with Emacs. Design because it takes 00:39:00.460 --> 00:39:01.699 doesn't take your hands off the keyboard. 00:39:01.700 --> 00:39:02.499 It's just another buffer 00:39:02.500 --> 00:39:04.059 and you just put information in there. 00:39:04.060 --> 00:39:06.979 It's quite quite a wonderful It's a wonderful time. 00:39:06.980 --> 00:39:10.819 Let's put that way That's all I got Thank you 00:39:10.820 --> 00:39:14.339 so much for once again, and we're we're just cut away. 00:39:14.340 --> 00:39:15.779 So I'll stop the recording 00:39:15.780 --> 00:39:18.259 and you're on your own recognizance 00:39:18.260 --> 00:39:19.699 Well, I'm gonna punch out 00:39:19.700 --> 00:39:21.059 if anybody has any questions or anything 00:39:21.060 --> 00:39:24.699 my email address is ajgrothe@yahoo.com or at gmail and 00:39:24.700 --> 00:39:26.779 Thank you all for attending 00:39:26.780 --> 00:39:29.939 and thanks again for the conference 00:39:29.940 --> 00:39:32.579 Okay, I'm gonna go ahead and end the room there, thank you. 00:39:32.580 --> 00:39:34.100 Excellent, thanks, bye.