WEBVTT

NOTE Introduction

00:00:00.000 --> 00:00:04.859
Hey, everybody. Welcome from frigid Omaha, Nebraska.

00:00:04.860 --> 00:00:06.619
I'm just going to kick off my talk here,

00:00:06.620 --> 00:00:23.899
and we'll see how it all goes. Thanks for attending.

00:00:23.900 --> 00:00:26.939
So the slides will be available on my site, https://grothe.us,

00:00:26.940 --> 00:00:29.899
in the presentation section tonight or tomorrow.

00:00:29.900 --> 00:00:33.099
This is a quick intro to one way to do private AI in Emacs.

00:00:33.100 --> 00:00:35.299
There are a lot of other ways to do it.

00:00:35.300 --> 00:00:38.899
This one is really just more or less the easiest way to do it.

00:00:38.900 --> 00:00:40.379
It's a minimal viable product

00:00:40.380 --> 00:00:42.379
to get you an idea of how to get started with it

00:00:42.380 --> 00:00:43.859
and how to give it a spin.

00:00:43.860 --> 00:00:45.819
Really hope some of you give it a shot

00:00:45.820 --> 00:00:48.179
and learn something along the way.

NOTE Overview of talk

00:00:48.180 --> 00:00:50.379
So the overview of the talk

00:00:50.380 --> 00:00:54.939
broke down these basic bullet points of why private AI,

00:00:54.940 --> 00:00:58.939
what do I need to do private AI, Emacs and private AI,

00:00:58.940 --> 00:01:02.739
pieces for an AI Emacs solution,

00:01:02.740 --> 00:01:08.059
a demo of a minimal viable product, and the summary.

NOTE Why private AI?

00:01:08.060 --> 00:01:10.779
Why private AI? This is pretty simple.

00:01:10.780 --> 00:01:12.099
Just read the terms and conditions

00:01:12.100 --> 00:01:14.819
for any AI system you're currently using.

00:01:14.820 --> 00:01:17.019
If you're using the free tiers, your queries,

00:01:17.020 --> 00:01:18.619
code, uploaded information

00:01:18.620 --> 00:01:20.699
is being used to train the models.

00:01:20.700 --> 00:01:22.939
In some cases, you are giving the company

00:01:22.940 --> 00:01:25.419
a perpetual license to your data.

00:01:25.420 --> 00:01:27.059
You have no control over this,

00:01:27.060 --> 00:01:29.219
except for not using the engine.

00:01:29.220 --> 00:01:30.699
And keep in mind, the terms

00:01:30.700 --> 00:01:32.179
are changing all the time on that,

00:01:32.180 --> 00:01:34.139
and they're not normally changing for our benefit.

00:01:34.140 --> 00:01:38.259
So that's not necessarily a good thing.

00:01:38.260 --> 00:01:40.339
If you're using the paid tiers,

00:01:40.340 --> 00:01:43.459
you may be able to opt out of the data collection.

00:01:43.460 --> 00:01:45.539
But keep in mind, this can change,

00:01:45.540 --> 00:01:48.619
or they may start charging for that option.

00:01:48.620 --> 00:01:51.419
Every AI company wants more and more data.

00:01:51.420 --> 00:01:53.779
They need more and more data to train their models.

00:01:53.780 --> 00:01:56.019
It is just the way it is.

00:01:56.020 --> 00:01:57.899
They need more and more information

00:01:57.900 --> 00:02:00.459
to get it more and more accurate to keep it up to date.

00:02:00.460 --> 00:02:03.219
There's been a story about Stack Overflow.

00:02:03.220 --> 00:02:05.819
It has like half the number of queries they had a year ago

00:02:05.820 --> 00:02:07.379
because people are using AI.

00:02:07.380 --> 00:02:08.579
The problem with that is now

00:02:08.580 --> 00:02:10.379
there's less data going to Stack Overflow

00:02:10.380 --> 00:02:12.979
for the AI to get. Vicious cycle,

00:02:12.980 --> 00:02:14.619
especially when you start looking at

00:02:14.620 --> 00:02:16.579
newer language like Ruby and stuff like that.

00:02:16.580 --> 00:02:21.419
So it comes down to being an interesting time.

00:02:21.420 --> 00:02:24.739
Another reason why to go private AI is your costs are going to vary.

00:02:24.740 --> 00:02:27.019
Right now, these services are being heavily subsidized.

00:02:27.020 --> 00:02:29.419
If you're paying Claude $20 a month,

00:02:29.420 --> 00:02:32.579
it is not costing Claude, those guys, $20 a month

00:02:32.580 --> 00:02:34.099
to host all the infrastructure

00:02:34.100 --> 00:02:35.619
to build all these data centers.

00:02:35.620 --> 00:02:38.779
They are severely subsidizing that

00:02:38.780 --> 00:02:41.259
at a very much a loss right now.

00:02:41.260 --> 00:02:43.659
When they start charging the real costs plus a profit,

00:02:43.660 --> 00:02:45.499
it's going to change.

00:02:45.500 --> 00:02:48.019
Right now, I use a bunch of different services.

00:02:48.020 --> 00:02:50.019
I've played with Grok and a bunch of other ones.

00:02:50.020 --> 00:02:52.459
But Grok right now is like $30 a month

00:02:52.460 --> 00:02:54.139
for a regular Super Grok.

00:02:54.140 --> 00:02:56.419
When they start charging the real cost of that,

00:02:56.420 --> 00:02:59.819
it's going to go from $30 to something a great deal more,

00:02:59.820 --> 00:03:02.379
perhaps, I think, $100 or $200

00:03:02.380 --> 00:03:04.459
or whatever really turns out to be the cost

00:03:04.460 --> 00:03:06.059
when you figure everything into it.

00:03:06.060 --> 00:03:07.539
When you start adding that cost into that,

00:03:07.540 --> 00:03:10.179
a lot of people are using public AI right now

00:03:10.180 --> 00:03:11.899
are going to have no option but to move to private AI

00:03:11.900 --> 00:03:16.019
or give up on AI overall.

NOTE What do I need for private AI?

00:03:16.020 --> 00:03:18.659
What do you need to be able to do private AI?

00:03:18.660 --> 00:03:21.179
If you're going to run your own AI,

00:03:21.180 --> 00:03:23.579
you're going to need a system with either some cores,

00:03:23.580 --> 00:03:25.699
a graphics processor unit,

00:03:25.700 --> 00:03:28.339
or a neural processing unit, a GPU or an NPU.

00:03:28.340 --> 00:03:29.819
I currently have four systems

00:03:29.820 --> 00:03:32.979
I'm experimenting with and playing around with on a daily basis.

00:03:32.980 --> 00:03:37.979
I have a System76 Pangolin AMD Ryzen 7 78040U

00:03:37.980 --> 00:03:41.099
with a Radeon 7080M integrated graphics card.

00:03:41.100 --> 00:03:42.539
It's got 32 gigs of RAM.

00:03:42.540 --> 00:03:45.259
It's a beautiful piece of hardware. I really do like it.

00:03:45.260 --> 00:03:46.499
I have my main workstation,

00:03:46.500 --> 00:03:50.579
it's an HP Z620 with dual Intel Xeons

00:03:50.580 --> 00:03:53.179
with four NVIDIA K2200 graphics cards in it.

00:03:53.180 --> 00:03:56.699
Why the four NVIDIA K2200 graphics card on it?

00:03:56.700 --> 00:03:59.739
Because I could buy four of them on eBay for $100

00:03:59.740 --> 00:04:02.379
and it was still supported by the NVIDIA drivers for Debian.

00:04:02.380 --> 00:04:08.179
So that's why that is. A MacBook Air with an M1 processor,

00:04:08.180 --> 00:04:10.939
a very nice piece of kit I picked up a couple years ago,

00:04:10.940 --> 00:04:14.139
very cheap, but it runs AI surprisingly well,

00:04:14.140 --> 00:04:18.099
and an Acer Aspire 1 with an AMD Ryzen 5700H in it.

00:04:18.100 --> 00:04:22.099
This was my old laptop. It was a sturdy beast.

00:04:22.100 --> 00:04:24.379
It was able to do enough AI to do demos and stuff,

00:04:24.380 --> 00:04:25.859
and I liked it quite a bit for that.

00:04:25.860 --> 00:04:28.339
I'm using the Pangolin for this demonstration

00:04:28.340 --> 00:04:30.979
because it's just better.

00:04:30.980 --> 00:04:37.219
Apple's M4 chip has 38 teraflops of MPU performance.

00:04:37.220 --> 00:04:40.099
The Microsoft co-pilots are now requiring

00:04:40.100 --> 00:04:41.459
45 teraflops of MPU

00:04:41.460 --> 00:04:43.939
to be able to have the co-pilot badge on it.

00:04:43.940 --> 00:04:48.299
And Raspberry Pi's new AI top is about 18 teraflops

00:04:48.300 --> 00:04:51.219
and is $70 on top of the cost of Raspberry Pi 5.

00:04:51.220 --> 00:04:56.059
Keep in mind, Raspberry recently

00:04:56.060 --> 00:04:59.499
raised the cost of their Pi 5s because of RAM pricing,

00:04:59.500 --> 00:05:00.379
which is going to be affecting

00:05:00.380 --> 00:05:02.459
a lot of these types of solutions in the near future.

00:05:02.460 --> 00:05:05.299
But there's going to be a lot of

00:05:05.300 --> 00:05:06.699
local power available in the future.

00:05:06.700 --> 00:05:08.219
That's what it really comes down to.

00:05:08.220 --> 00:05:11.179
A lot of people are going to have PCs on their desks.

00:05:11.180 --> 00:05:13.459
They're going to run a decent private AI

00:05:13.460 --> 00:05:16.347
without much issue.

NOTE Emacs and private AI

00:05:16.348 --> 00:05:18.059
So for Emacs and private AI,

00:05:18.060 --> 00:05:20.139
there's a couple popular solutions.

00:05:20.140 --> 00:05:22.099
Gptel, which is the one we're going to talk about.

00:05:22.100 --> 00:05:24.739
It's a simple interface. It's a minimal interface.

00:05:24.740 --> 00:05:26.579
It integrates easily into your workflow.

00:05:26.580 --> 00:05:29.019
It's just, quite honestly, chef's kiss,

00:05:29.020 --> 00:05:31.059
just a beautifully well-done piece of software.

00:05:31.060 --> 00:05:33.859
Ollama Buddy has more features,

00:05:33.860 --> 00:05:36.259
a menu interface, has quick access

00:05:36.260 --> 00:05:37.499
for things like code refactoring,

00:05:37.500 --> 00:05:38.979
text-free formatting, et cetera.

00:05:38.980 --> 00:05:41.979
This is the one that you spend a little more time with,

00:05:41.980 --> 00:05:43.939
but you also get a little bit more back from it.

00:05:43.940 --> 00:05:49.419
Ellama is another one, has some really good features to it,

00:05:49.420 --> 00:05:51.059
more different capabilities,

00:05:51.060 --> 00:05:54.979
but it's a different set of rules and capabilities to it.

00:05:54.980 --> 00:05:59.179
Aidermac, which is programming with your AI and Emacs.

00:05:59.180 --> 00:06:01.219
The closest thing I can come up

00:06:01.220 --> 00:06:04.139
to comparing this to is Cursor, except it's in Emacs.

00:06:04.140 --> 00:06:05.659
It's really quite well done.

00:06:05.660 --> 00:06:07.299
These are all really quite well done.

00:06:07.300 --> 00:06:08.499
There's a bunch of other projects out there.

00:06:08.500 --> 00:06:10.819
If you go out to GitHub, type Emacs AI,

00:06:10.820 --> 00:06:13.219
you'll find a lot of different options.

NOTE Pieces for an AI Emacs solution

00:06:13.220 --> 00:06:18.459
So what is a minimal viable product that can be done?

00:06:18.460 --> 00:06:23.379
A minimal viable product to show what an AI Emacs solution is

00:06:23.380 --> 00:06:27.179
can be done with only needing two pieces of software.

00:06:27.180 --> 00:06:31.179
Llamafile, this is an amazing piece of software.

00:06:31.180 --> 00:06:32.899
This is a whole LLM contained in one file.

00:06:32.900 --> 00:06:36.059
And the same file runs on Mac OS X,

00:06:36.060 --> 00:06:39.379
Linux, Windows, and the BSDs.

00:06:39.380 --> 00:06:42.179
It's a wonderful piece of kit

00:06:42.180 --> 00:06:44.179
based on these people who created

00:06:44.180 --> 00:06:45.899
this thing called Cosmopolitan

00:06:45.900 --> 00:06:46.779
that lets you create and execute

00:06:46.780 --> 00:06:48.699
while it runs on a bunch of different systems.

00:06:48.700 --> 00:06:51.299
And Gptel, which is an easy plug-in for Emacs,

00:06:51.300 --> 00:06:56.339
which we talked about in the last slide a bit.

00:06:56.340 --> 00:07:00.179
So setting up the LLM, you have to just go out

00:07:00.180 --> 00:07:03.542
and just hit a page for it

00:07:03.543 --> 00:07:05.099
and go out and do a wget of it.

00:07:05.100 --> 00:07:07.099
That's all it takes there.

00:07:07.100 --> 00:07:10.259
Chmodding it so you can actually execute the executable.

00:07:10.260 --> 00:07:12.939
And then just go ahead and actually running it.

00:07:12.940 --> 00:07:16.939
And let's go ahead and do that.

00:07:16.940 --> 00:07:18.899
I've already downloaded it because I don't want to wait.

00:07:18.900 --> 00:07:21.259
And let's just take a look at it.

00:07:21.260 --> 00:07:22.899
I've actually downloaded several of them,

00:07:22.900 --> 00:07:25.699
but let's go ahead and just run llama 3.2-1b

00:07:25.700 --> 00:07:31.179
with the 3 billion instructions. And that's it firing up.

00:07:31.180 --> 00:07:33.899
And it is nice enough to actually be listening in port 8080,

00:07:33.900 --> 00:07:35.339
which we'll need in a minute.

00:07:35.340 --> 00:07:43.139
So once you do that, you have to install gptel and emacs.

00:07:43.140 --> 00:07:45.659
That's as simple as firing up emacs,

00:07:45.660 --> 00:07:48.339
doing the M-x install-package,

00:07:48.340 --> 00:07:49.779
and then just typing gptel,

00:07:49.780 --> 00:07:51.499
if you have your repository set up right,

00:07:51.500 --> 00:07:52.299
which hopefully you do.

00:07:52.300 --> 00:07:56.339
And then you just go ahead and have it.

NOTE Config file

00:07:56.340 --> 00:07:58.139
You also have to set up a config file.

00:07:58.140 --> 00:08:01.739
Here's my example config file as it currently set up,

00:08:01.740 --> 00:08:04.019
requiring, ensuring Gptel is loaded,

00:08:04.020 --> 00:08:05.899
defining the Llamafile backend.

00:08:05.900 --> 00:08:07.779
You can put multiple backends into it,

00:08:07.780 --> 00:08:09.859
but I just have the one defined on this example.

00:08:09.860 --> 00:08:12.059
But it's pretty straightforward.

00:08:12.060 --> 00:08:16.739
Llama local file, name for it, stream, protocol HTTP.

00:08:16.740 --> 00:08:20.859
If you have HTTPS set up, that's obviously preferable,

00:08:20.860 --> 00:08:22.779
but a lot of people don't for their home labs.

00:08:22.780 --> 00:08:26.379
Host is just 127.0.0.1 port 8080.

00:08:26.380 --> 00:08:30.099
Keep in mind, some of the AIs run on a different port,

00:08:30.100 --> 00:08:31.499
so you may be 8081

00:08:31.500 --> 00:08:34.619
if you're running OpenWebView at the same time. The key,

00:08:34.620 --> 00:08:37.019
we don't need an API key because it's a local server.

00:08:37.020 --> 00:08:40.259
And the models just, uh, we can put multiple models

00:08:40.260 --> 00:08:41.339
on there if we want to.

00:08:41.340 --> 00:08:43.699
So if we create one with additional stuff

00:08:43.700 --> 00:08:45.379
or like rag and stuff like that,

00:08:45.380 --> 00:08:47.459
we can actually name those models by their domain,

00:08:47.460 --> 00:08:48.699
which is really kind of cool.

00:08:48.700 --> 00:08:52.099
But, uh, that's all that takes.

NOTE Demo: Who was David Bowie?

00:08:52.100 --> 00:09:03.779
So let's go ahead and go to a quick test of it.

00:09:03.780 --> 00:09:11.019
Oops. Alt-X, gptel. And we're going to just choose

00:09:11.020 --> 00:09:12.499
the default buffer to make things easier.

00:09:12.500 --> 00:09:15.339
Going to resize it up a bit.

00:09:15.340 --> 00:09:19.859
And usually the go-to question I go to is, who was David Bowie?

00:09:19.860 --> 00:09:24.499
This one is actually a question

00:09:24.500 --> 00:09:26.219
that's turned out to be really good

00:09:26.220 --> 00:09:28.019
for figuring out whether or not AI is complete.

00:09:28.020 --> 00:09:31.139
This is one that some engines do well on, other ones don't.

00:09:31.140 --> 00:09:33.739
And we can just do, we can either do

00:09:33.740 --> 00:09:36.059
the alt X and send the gptel-send,

00:09:36.060 --> 00:09:37.979
or we can just do C-c and hit enter.

00:09:37.980 --> 00:09:39.139
We'll just do C-c and enter.

00:09:39.140 --> 00:09:43.659
And now it's going ahead and hitting our local AI system

00:09:43.660 --> 00:09:46.659
running on port 8080. And that looks pretty good,

00:09:46.660 --> 00:09:50.739
but let's go ahead and say, hey, it's set to terse mode right now.

00:09:50.740 --> 00:10:03.859
Please expand upon this. And there we go.

00:10:03.860 --> 00:10:05.379
We're getting a full description

00:10:05.380 --> 00:10:08.739
of the majority of, uh, about David Bowie's life

00:10:08.740 --> 00:10:10.139
and other information about him.

00:10:10.140 --> 00:10:21.699
So very, very happy with that.

NOTE Hallucinations

00:10:21.700 --> 00:10:23.539
One thing to keep in mind is you look at things

00:10:23.540 --> 00:10:24.699
when you're looking for hallucinations,

00:10:24.700 --> 00:10:26.899
how accurate AI is, how it's compressed

00:10:26.900 --> 00:10:29.259
is it will tend to screw up on things like

00:10:29.260 --> 00:10:30.859
how many children he had and stuff like that.

00:10:30.860 --> 00:10:32.459
Let me see if it gets to that real quick.

00:10:32.460 --> 00:10:39.739
Is it not actually on this one?

00:10:39.740 --> 00:10:42.179
Alright, so that's the first question I always ask one.

NOTE Next question: What are sea monkeys?

00:10:42.180 --> 00:10:44.659
The next one is what are sea monkeys?

00:10:44.660 --> 00:10:48.979
It gives you an idea of the breadth of the system.

00:10:48.980 --> 00:11:10.619
It's querying right now. Pulls it back correctly. Yes.

00:11:10.620 --> 00:11:12.339
And it's smart enough to actually detect David Bowie

00:11:12.340 --> 00:11:15.019
even referenced see monkeys in the song sea of love,

00:11:15.020 --> 00:11:16.179
which came at hit single.

00:11:16.180 --> 00:11:18.859
So it's actually keeping the context alive

00:11:18.860 --> 00:11:20.419
and that which is very cool feature.

00:11:20.420 --> 00:11:21.459
I did not see that coming.

00:11:21.460 --> 00:11:24.139
Here's one that some people say is a really good one

00:11:24.140 --> 00:11:42.779
to ask. Rs in "strawberry."

00:11:42.780 --> 00:11:46.179
All right, now she's going off the reservation.

00:11:46.180 --> 00:11:48.139
She's going in a different direction.

00:11:48.140 --> 00:11:49.979
Let me go ahead and reopen that again,

00:11:49.980 --> 00:11:57.179
because it went down a bad hole there for a second.

NOTE Writing Hello World in Emacs Lisp

00:11:57.180 --> 00:11:58.419
Let me ask it to write hello world in Emacs Lisp.

00:11:58.420 --> 00:12:10.419
Yep, that works. So the point being here,

00:12:10.420 --> 00:12:14.939
that was like two minutes of setup.

00:12:14.940 --> 00:12:18.019
And now we have a small AI embedded inside the system.

00:12:18.020 --> 00:12:20.539
So that gives you an idea just how easy it can be.

00:12:20.540 --> 00:12:22.299
And it's just running locally on the system.

00:12:22.300 --> 00:12:25.259
We also have the default system here as well.

00:12:25.260 --> 00:12:32.579
So not that bad.

NOTE Pieces for a better solution

00:12:32.580 --> 00:12:35.379
That's a basic solution, that's a basic setup

00:12:35.380 --> 00:12:37.059
that will get you to the point where you can go like,

00:12:37.060 --> 00:12:39.859
it's a party trick, but it's a very cool party trick.

00:12:39.860 --> 00:12:42.859
The way that Gptel works is it puts it into buffers,

00:12:42.860 --> 00:12:45.099
it doesn't interfere with your flow that much,

00:12:45.100 --> 00:12:47.179
it's just an additional window you can pop open

00:12:47.180 --> 00:12:49.019
to ask questions and get information for,

00:12:49.020 --> 00:12:51.459
dump code into it and have it refactored.

00:12:51.460 --> 00:12:53.339
Gptel has a lot of additional options

00:12:53.340 --> 00:12:55.699
for things that are really cool for that.

00:12:55.700 --> 00:12:57.099
But if you want a better solution,

00:12:57.100 --> 00:12:59.939
I recommend Ollama or LM Studio.

00:12:59.940 --> 00:13:01.899
They're both more capable than Llamafile.

00:13:01.900 --> 00:13:03.859
They can accept a lot of different models.

00:13:03.860 --> 00:13:05.739
You can do things like RAG.

00:13:05.740 --> 00:13:09.219
You can do loading of things onto the GPU more explicitly.

00:13:09.220 --> 00:13:10.379
It can speed stuff up.

00:13:10.380 --> 00:13:13.059
One of the things about the retrieval augmentation is

00:13:13.060 --> 00:13:15.539
it will let you put your data into the system

00:13:15.540 --> 00:13:17.779
so you can start uploading your code, your information,

00:13:17.780 --> 00:13:20.139
and actually being able to do analysis of it.

00:13:20.140 --> 00:13:23.539
Open WebUI provides more capabilities.

00:13:23.540 --> 00:13:24.859
It provides an interface that's similar

00:13:24.860 --> 00:13:25.899
to what you're used to seeing

00:13:25.900 --> 00:13:28.179
for ChatGPT and the other systems.

00:13:28.180 --> 00:13:29.419
It's really quite well done.

00:13:29.420 --> 00:13:32.539
And once again, gptel, I have to mention that

00:13:32.540 --> 00:13:34.779
because that's the one I really kind of like.

00:13:34.780 --> 00:13:36.899
And Ollama Buddy is also another really nice one.

NOTE What about the license?

00:13:36.900 --> 00:13:41.019
So what about the licensing of these models?

00:13:41.020 --> 00:13:42.299
Since I'm going out pulling down

00:13:42.300 --> 00:13:43.579
a model and doing this stuff.

00:13:43.580 --> 00:13:46.579
Let's take a look at a couple of highlights

00:13:46.580 --> 00:13:49.379
from the Meta Llama 3 community license scale.

00:13:49.380 --> 00:13:52.579
If your service exceeds 700 million monthly users,

00:13:52.580 --> 00:13:54.099
you need additional licensing.

00:13:54.100 --> 00:13:56.099
Probably not going to be a problem for most of us.

00:13:56.100 --> 00:13:58.379
There's a competition restriction.

00:13:58.380 --> 00:14:00.899
You can't use this model to enhance competing models.

00:14:00.900 --> 00:14:04.219
And there's some limitations on using the Meta trademarks.

00:14:04.220 --> 00:14:05.939
Not that big a deal.

00:14:05.940 --> 00:14:09.139
And the other ones are it's a permissive one

00:14:09.140 --> 00:14:10.939
designed to encourage innovation,

00:14:10.940 --> 00:14:13.779
open development, commercial use is allowed,

00:14:13.780 --> 00:14:15.219
but there are some restrictions on it.

00:14:15.220 --> 00:14:17.259
Yeah, you can modify the model,

00:14:17.260 --> 00:14:20.419
but you have to rely on the license terms.

00:14:20.420 --> 00:14:22.339
And you can distribute the model with derivatives.

00:14:22.340 --> 00:14:24.059
And there are some very cool ones out there.

00:14:24.060 --> 00:14:25.259
There's people who've done things

00:14:25.260 --> 00:14:29.579
to try and make the Llama be less, what's the phrase,

00:14:29.580 --> 00:14:31.939
ethical if you're doing penetration testing research

00:14:31.940 --> 00:14:32.619
and stuff like that.

00:14:32.620 --> 00:14:34.459
It has some very nice value there.

00:14:34.460 --> 00:14:37.739
Keep in mind licenses also vary

00:14:37.740 --> 00:14:39.619
depending on the model you're using.

00:14:39.620 --> 00:14:42.419
Mistral AI has the non-production license.

00:14:42.420 --> 00:14:45.219
It's designed to keep it to research and development.

00:14:45.220 --> 00:14:46.739
You can't use it commercially.

00:14:46.740 --> 00:14:51.792
So it's designed to clearly delineate

00:14:51.793 --> 00:14:52.939
between research and development

00:14:52.940 --> 00:14:54.259
and somebody trying to actually build

00:14:54.260 --> 00:14:56.579
something on top of it.

NOTE Are there open source data model options?

00:14:56.580 --> 00:14:57.979
And another question I get asked is,

00:14:57.980 --> 00:14:59.899
are there open source data model options?

00:14:59.900 --> 00:15:02.819
Yeah, but most of them are small or specialized currently.

00:15:02.820 --> 00:15:05.499
MoMo is a whole family of them,

00:15:05.500 --> 00:15:07.339
but there tend to be more specialized,

00:15:07.340 --> 00:15:09.019
but it's very cool to see where it's going.

00:15:09.020 --> 00:15:11.339
And it's another thing that's just going forward.

00:15:11.340 --> 00:15:14.519
It's under the MIT license.

NOTE Things to know

00:15:14.520 --> 00:15:15.819
Some things to know to help you

00:15:15.820 --> 00:15:17.499
have a better experience with this.

00:15:17.500 --> 00:15:21.059
Get ollama and Open WebUI working by themselves,

00:15:21.060 --> 00:15:22.659
then set up your config file.

00:15:22.660 --> 00:15:24.819
I was fighting both at the same time,

00:15:24.820 --> 00:15:26.699
and it turned out I had a problem with my ollama.

00:15:26.700 --> 00:15:28.899
I had a conflict, so that was what my problem is.

00:15:28.900 --> 00:15:32.819
Llamafile, gptel is a great way to start experimenting

00:15:32.820 --> 00:15:34.299
just to get you an idea of how it works

00:15:34.300 --> 00:15:36.939
and figure out how the interfaces work. Tremendous.

00:15:36.940 --> 00:15:40.739
RAG loading documents into it is really easy with open web UI.

00:15:40.740 --> 00:15:43.019
You can create models, you can put things like

00:15:43.020 --> 00:15:46.419
help desk developers and stuff like that, breaking it out.

00:15:46.420 --> 00:15:51.019
The Hacker Noon has a how to build a $300 AI computer.

00:15:51.020 --> 00:15:52.859
This is for March 2024,

00:15:52.860 --> 00:15:55.099
but it still has a lot of great information

00:15:55.100 --> 00:15:56.819
on how to benchmark the environments,

00:15:56.820 --> 00:16:01.339
what some values are like the Ryzen 5700U

00:16:01.340 --> 00:16:02.579
inside my Acer Aspire,

00:16:02.580 --> 00:16:04.419
that's where I got the idea doing that.

00:16:04.420 --> 00:16:06.739
Make sure you do the ROCm stuff correctly

00:16:06.740 --> 00:16:09.899
to get the GUI extensions. But it's just really good stuff.

00:16:09.900 --> 00:16:13.059
You don't need a great GPU or CPU to get started.

00:16:13.060 --> 00:16:14.819
Smaller models like tinyllama

00:16:14.820 --> 00:16:16.819
can run on very small systems.

00:16:16.820 --> 00:16:19.042
It gets you the ability to start playing with it

00:16:19.043 --> 00:16:21.619
and start experimenting and figure out if that's for you

00:16:21.620 --> 00:16:23.379
and to move forward with it.

00:16:23.380 --> 00:16:29.219
The AMD Ryzen AI Max+ 395 is a mini PC

00:16:29.220 --> 00:16:31.179
makes it really nice dedicated host.

00:16:31.180 --> 00:16:34.078
You used to be able to buy these for about $1200.

00:16:34.079 --> 00:16:35.579
Now with the RAM price increase,

00:16:35.580 --> 00:16:38.458
you want to get 120 gig when you're pushing two brands,

00:16:38.459 --> 00:16:40.739
so it gets a little tighter.

00:16:40.740 --> 00:16:44.099
Macs work remarkably well with AI.

00:16:44.100 --> 00:16:47.659
My MacBook Air was one of my go-tos for a while,

00:16:47.660 --> 00:16:49.779
but once I started doing anything AI,

00:16:49.780 --> 00:16:50.779
I had a five-minute window

00:16:50.780 --> 00:16:52.619
before the thermal throttling became an issue.

00:16:52.620 --> 00:16:54.619
Keep in mind that's a MacBook Air,

00:16:54.620 --> 00:16:56.659
so it doesn't have the greatest ventilation.

00:16:56.660 --> 00:16:58.339
If you get the MacBook Pros and stuff,

00:16:58.340 --> 00:17:00.139
they tend to have more ventilation,

00:17:00.140 --> 00:17:02.499
but still you're going to be pushing against that.

00:17:02.500 --> 00:17:04.939
So Mac Minis and the Mac Ultras and stuff like that

00:17:04.940 --> 00:17:06.099
tend to work really well for that.

00:17:06.100 --> 00:17:09.779
Alex Ziskind on YouTube has a channel.

00:17:09.780 --> 00:17:11.899
He does a lot of AI performance benchmarking,

00:17:11.900 --> 00:17:14.819
like "I load a 70 billion parameter model

00:17:14.820 --> 00:17:16.699
on this mini PC" and stuff like that.

00:17:16.700 --> 00:17:19.019
It's a lot of fun and interesting stuff there.

00:17:19.020 --> 00:17:21.219
And it's influencing my decision

00:17:21.220 --> 00:17:22.979
to buy my next AI style PC.

00:17:22.980 --> 00:17:27.619
Small domain specific LLMs are happening.

00:17:27.620 --> 00:17:29.939
An LLM that has all your code and information,

00:17:29.940 --> 00:17:31.659
it sounds like a really cool idea.

00:17:31.660 --> 00:17:34.299
It gives you capabilities to start training stuff

00:17:34.300 --> 00:17:35.899
that you couldn't do with like the big ones.

00:17:35.900 --> 00:17:38.059
Even with in terms of fine-tuning and stuff,

00:17:38.060 --> 00:17:40.539
it's remarkable to see where that space is coming along

00:17:40.540 --> 00:17:41.739
in the next year or so.

00:17:41.740 --> 00:17:46.219
HuggingFace.co has pointers to tons of AI models.

00:17:46.220 --> 00:17:48.417
You'll find the one that works for you, hopefully there.

00:17:48.418 --> 00:17:50.539
If you're doing cybersecurity,

00:17:50.540 --> 00:17:52.059
there's a whole bunch out there for that,

00:17:52.060 --> 00:17:54.619
that have certain training on it, information.

00:17:54.620 --> 00:17:56.139
It's really good.

00:17:56.140 --> 00:18:00.099
One last thing to keep in mind is hallucinations are real.

00:18:00.100 --> 00:18:02.779
You will get BS back from the AI occasionally,

00:18:02.780 --> 00:18:05.179
so do validate everything you get from it.

00:18:05.180 --> 00:18:08.459
Don't be using it for court cases like some people have

00:18:08.460 --> 00:18:14.539
and run into those problems. So, That is my talk.

00:18:14.540 --> 00:18:17.219
What I would like you to get out of that is,

00:18:17.220 --> 00:18:21.859
if you haven't tried it, give Gptel and LlamaFile a shot.

00:18:21.860 --> 00:18:23.979
Fire up a little small AI instance,

00:18:23.980 --> 00:18:27.339
play around with a little bit inside your Emacs,

00:18:27.340 --> 00:18:30.139
and see if it makes your life better. Hopefully it will.

00:18:30.140 --> 00:18:32.139
And I really hope you guys

00:18:32.140 --> 00:18:34.659
learned something from this talk. And thanks for listening.

00:18:34.660 --> 00:18:38.979
And the links are at the end of the talk, if you have any questions.

00:18:38.980 --> 00:18:42.739
Let me see if we got anything you want, Pat. You do.

00:18:42.740 --> 00:18:43.899
You've got a few questions.

00:18:43.900 --> 00:18:48.059
[Corwin]: Hey, this is Corwin. Thank you so much. Thank you, Aaron.

00:18:48.060 --> 00:18:50.339
What an awesome talk this was, actually.

00:18:50.340 --> 00:18:52.179
If you don't have a camera,

00:18:52.180 --> 00:18:54.339
I can get away with not having one too.

00:18:54.340 --> 00:18:56.299
[Aaron]: I've got, I'll turn the camera on.

00:18:56.300 --> 00:18:59.833
[Corwin]: Okay. All right. I'll turn mine back on. Here I come.

00:18:59.834 --> 00:19:03.139
Yeah, so there are a few questions,

00:19:03.140 --> 00:19:04.579
but first let me say thank you

00:19:04.580 --> 00:19:06.339
for a really captivating talk.

00:19:06.340 --> 00:19:10.939
I think a lot of people will be empowered from this

00:19:10.940 --> 00:19:15.259
to try to do more with less, especially locally.

00:19:15.260 --> 00:19:20.179
concerned about the data center footprint,

00:19:20.180 --> 00:19:23.659
environmentally concerned

00:19:23.660 --> 00:19:26.979
about the footprint of LLM inside data centers.

00:19:26.980 --> 00:19:28.219
So just thinking about how we can

00:19:28.220 --> 00:19:32.419
put infrastructure we have at home to use

00:19:32.420 --> 00:19:34.019
and get more done with less.

00:19:34.020 --> 00:19:37.499
[Aaron]: Yeah, the data center impact's interesting

00:19:37.500 --> 00:19:39.979
because there was a study a while ago.

00:19:39.980 --> 00:19:42.099
Someone said every time you do a Gemini query,

00:19:42.100 --> 00:19:45.019
it's like boiling a cup of water.

00:19:45.020 --> 00:19:48.619
[Corwin]: Yeah, I've heard that one too. So do you want to, you know,

00:19:48.620 --> 00:19:51.699
I don't know how much direction you want.

00:19:51.700 --> 00:19:53.859
I'd be very happy to read out the questions for you.

00:19:53.860 --> 00:19:55.219
[Aaron]: Yeah, that would be great.

00:19:55.220 --> 00:19:57.619
I'm having trouble getting to that tab.

00:19:57.620 --> 00:20:02.779
[Corwin]: Okay, I'm there, so I'll put it into our chat too,

00:20:02.780 --> 00:20:07.419
so you can follow along if you'd like.

NOTE Q: Why is the David Bowie question a good one for testing a model? e.g. does it fail in interesting ways?

00:20:07.420 --> 00:20:11.219
[Corwin]: The first question was, why is the David Bowie question

00:20:11.220 --> 00:20:12.219
a good one to start with?

00:20:12.220 --> 00:20:14.419
Does it have interesting failure conditions

00:20:14.420 --> 00:20:16.639
or what made you choose that?

00:20:16.640 --> 00:20:21.979
[Aaron]: First off, huge fan of David Bowie.

00:20:21.980 --> 00:20:24.499
But I came down to it really taught me a few things

00:20:24.500 --> 00:20:26.299
about how the models work

00:20:26.300 --> 00:20:28.819
in terms of things like how many kids he had,

00:20:28.820 --> 00:20:31.779
because Deepseek, which is a very popular Chinese model

00:20:31.780 --> 00:20:33.179
that a lot of people are using now,

00:20:33.180 --> 00:20:35.619
misidentifies him having three daughters,

00:20:35.620 --> 00:20:38.459
and he has like one son and one, one, I think,

00:20:38.460 --> 00:20:40.899
two sons and a daughter or something like that.

00:20:40.900 --> 00:20:43.659
so there's differences on that, and it just goes over...

00:20:43.660 --> 00:20:45.299
there's a whole lot of stuff

00:20:45.300 --> 00:20:47.779
because his story spans like 60 years,

00:20:47.780 --> 00:20:49.659
so it gives good feedback.

00:20:49.660 --> 00:20:51.539
That's the real main reason I asked that question

00:20:51.540 --> 00:20:53.699
because I just needed one... That sea monkeys, I just picked

00:20:53.700 --> 00:20:56.579
because it was obscure, and just always have, write,

00:20:56.580 --> 00:20:58.939
I used to have it write hello world in forth

00:20:58.940 --> 00:21:01.019
because I thought was an interesting one as well.

00:21:01.020 --> 00:21:03.899
It's just picking random ones like that.

00:21:03.900 --> 00:21:06.499
One question I ask a lot of models is,

00:21:06.500 --> 00:21:09.419
what is the closest star to the Earth?

00:21:09.420 --> 00:21:12.019
Because most of them will say Alpha Centauri

00:21:12.020 --> 00:21:13.739
or Proxima Centauri and not the sun.

00:21:13.740 --> 00:21:15.899
And I have a whole 'nother talk

00:21:15.900 --> 00:21:17.899
where I just argue with the LLM

00:21:17.900 --> 00:21:20.019
trying to say, hey, the sun is a star.

00:21:20.020 --> 00:21:26.579
And he just wouldn't accept it, so. What?

00:21:26.580 --> 00:21:30.739
Oh, I can... You're there.

NOTE Q: What specific tasks do you use local AI for?

00:21:30.740 --> 00:21:34.379
[Corwin]: So what specific tasks do you like to use your local AI?

00:21:34.380 --> 00:21:37.459
[Aaron]: I like to load a lot of my code into

00:21:37.460 --> 00:21:39.099
and actually have it do analysis of it.

00:21:39.100 --> 00:21:42.339
I was actually going through some code

00:21:42.340 --> 00:21:45.619
I have for some pen testing, and I was having it modified

00:21:45.620 --> 00:21:47.259
to update it for the newer version,

00:21:47.260 --> 00:21:48.459
because I hate to say this,

00:21:48.460 --> 00:21:49.859
but it was written for Python 2,

00:21:49.860 --> 00:21:51.459
and I needed to update it for Python 3.

00:21:51.460 --> 00:21:53.859
And the 2 to 3 tool did not do all of it,

00:21:53.860 --> 00:21:56.659
but the actual tool was able to do the refactoring.

00:21:56.660 --> 00:21:58.499
It's part of my laziness.

00:21:58.500 --> 00:22:01.459
But I use that for anything I don't want to hit the web.

00:22:01.460 --> 00:22:03.259
And that's a lot of stuff when you start thinking about

00:22:03.260 --> 00:22:04.979
if you're doing cyber security researching.

00:22:04.980 --> 00:22:06.819
and you have your white papers

00:22:06.820 --> 00:22:08.417
and stuff like that and stuff in there.

00:22:08.418 --> 00:22:10.625
I've got a lot of that loaded into RAG

00:22:10.626 --> 00:22:16.879
in one model on my Open WebUI system.

NOTE Q: Have you used any small domain-specific LLMs?  What are the kinds of tasks they specialize in, and how do I find and use them?

00:22:16.880 --> 00:22:21.059
[Corwin]: Neat. Have you used have you used

00:22:21.060 --> 00:22:25.739
any small domain specific LLMs? What kind of tasks?

00:22:25.740 --> 00:22:30.419
If so, what kind of tasks that they specialize in?

00:22:30.420 --> 00:22:32.139
And you know, how?

00:22:32.140 --> 00:22:34.979
[Aaron]: Not to be honest, but there are some out there like once again,

00:22:34.980 --> 00:22:36.779
for cybersecurity and stuff like that,

00:22:36.780 --> 00:22:39.739
that I really need to dig into that's on my to do list.

00:22:39.740 --> 00:22:41.699
I've got a couple weeks off at the end of the year.

00:22:41.700 --> 00:22:46.539
And that's a big part of my plan for that.

NOTE Q: Are the various models updated regularly?  Can you add your own data to pre-built models?

00:22:46.540 --> 00:22:49.379
[Corwin]: Are the various models updated pretty regularly?

00:22:49.380 --> 00:22:52.059
Can you add your own data to the pre-built models?

00:22:52.060 --> 00:22:56.699
[Aaron]: Yes. The models are updated pretty reasonably.

00:22:56.700 --> 00:22:59.699
You can add data to a model in a couple of different ways.

00:22:59.700 --> 00:23:01.099
You can do something called fine-tuning,

00:23:01.100 --> 00:23:03.819
which requires a really nice GPU and a lot of CPU time.

00:23:03.820 --> 00:23:05.499
Probably not going to do that.

00:23:05.500 --> 00:23:07.419
You can do retrieval augmentation generation,

00:23:07.420 --> 00:23:09.499
which is you load your data on top of the system

00:23:09.500 --> 00:23:11.299
and put inside a database,

00:23:11.300 --> 00:23:12.859
and you can actually scan that and stuff.

00:23:12.860 --> 00:23:14.619
I have another talk where I go through

00:23:14.620 --> 00:23:16.219
and I start asking questions about,

00:23:16.220 --> 00:23:18.579
I load the talk into the engine

00:23:18.580 --> 00:23:20.099
and I ask questions against that.

00:23:20.100 --> 00:23:22.179
If I would have had time, I would have done that,

00:23:22.180 --> 00:23:25.796
but it comes down to how many... That's RAG.

00:23:25.797 --> 00:23:29.419
RAG is pretty easy to do through Open WebUI or LM studio.

00:23:29.420 --> 00:23:31.419
It's a great way, you just, like,

00:23:31.420 --> 00:23:34.099
point it to a folder and it just sucks all that state into...

00:23:34.100 --> 00:23:35.499
and it'll hit that data first.

00:23:35.500 --> 00:23:36.859
You have like helpdesk and stuff and...

00:23:36.860 --> 00:23:39.619
The other options: there's vector databases,

00:23:39.620 --> 00:23:41.819
which is, like, if you use PostgreSQL,

00:23:41.820 --> 00:23:43.699
it has a pg vector that can do a lot of that stuff.

00:23:43.700 --> 00:23:44.739
I've not dug into that yet,

00:23:44.740 --> 00:23:46.099
but that is also on that to-do list

00:23:46.100 --> 00:23:48.055
I've got a lot of stuff planned for...

NOTE Q: What is your experience with RAG? Are you using them and how have they helped?

00:23:48.056 --> 00:23:51.819
[Corwin]: Cool. So what are your experience with RAGs?

00:23:51.820 --> 00:23:54.339
I don't even know what that means.

00:23:54.340 --> 00:23:57.419
Do you know what that means?

00:23:57.420 --> 00:23:59.619
Do you remember this question again?

00:23:59.620 --> 00:24:03.979
What is your experience with RAGs?

00:24:03.980 --> 00:24:07.459
[Aaron]: RAGs is great. That's Retrieval Augmentation Generation.

00:24:07.460 --> 00:24:09.739
That loads your data first, and it hits yours,

00:24:09.740 --> 00:24:11.499
and it'll actually cite it and stuff.

00:24:11.500 --> 00:24:14.659
There's a guy who wrote a RAG in 100 lines of Python,

00:24:14.660 --> 00:24:16.899
and it's an impressive piece of software.

00:24:16.900 --> 00:24:18.779
I think if you hit one of my sites,

00:24:18.780 --> 00:24:22.099
I've got a private AI talk where I actually refer to that.

00:24:22.100 --> 00:24:25.219
But retrieval augmentation, it's easy, it's fast,

00:24:25.220 --> 00:24:26.699
it puts your data into the system,

00:24:26.700 --> 00:24:31.339
Yeah, start with that and go then iterate on top of that.

00:24:31.340 --> 00:24:32.659
That's one of the great things about AI,

00:24:32.660 --> 00:24:33.619
especially private AI,

00:24:33.620 --> 00:24:35.625
is you can do whatever you want to with it

00:24:35.626 --> 00:24:38.833
and build up with it as you get more experience.

NOTE Q: Thoughts on running things on AWS/digital ocean instances, etc?

00:24:38.834 --> 00:24:44.219
[Corwin]: Any thoughts on running things

00:24:44.220 --> 00:24:49.179
on AWS, DigitalOcean, and so on?

00:24:49.180 --> 00:24:50.619
[Aaron]: AWS is not bad.

00:24:50.620 --> 00:24:52.659
The DigitalOcean, they have some of their GPUs.

00:24:52.660 --> 00:24:54.379
I still don't like having the data

00:24:54.380 --> 00:24:57.419
leave my house, to be honest, or at work,

00:24:57.420 --> 00:24:59.019
because I tend to do some stuff

00:24:59.020 --> 00:25:01.259
that I don't want it even hitting that situation.

00:25:01.260 --> 00:25:03.699
But they have pretty good stuff.

00:25:03.700 --> 00:25:05.579
Another one to consider is Oracle Cloud.

00:25:05.580 --> 00:25:09.059
Oracle has their AI infrastructure that's really well done.

00:25:09.060 --> 00:25:12.379
But I mean, once again, then you start looking at potential

00:25:12.380 --> 00:25:13.779
is saying your data is private,

00:25:13.780 --> 00:25:14.819
I don't necessarily trust it.

00:25:14.820 --> 00:25:17.859
But they do have good stuff, both DigitalOcean, AWS,

00:25:17.860 --> 00:25:20.339
Oracle Cloud has the free service, which isn't too bad,

00:25:20.340 --> 00:25:21.339
usually a certain number of stuff.

00:25:21.340 --> 00:25:23.179
And Google's also has it,

00:25:23.180 --> 00:25:26.739
but I still tend to keep more stuff on local PCs,

00:25:26.740 --> 00:25:31.077
because I'm just paranoid that way.

NOTE Q: What has your experience been using AI for cyber security applications? What do you usually use it for?

00:25:31.078 --> 00:25:35.579
[Corwin]: Gotcha. What has your experience been using AI?

00:25:35.580 --> 00:25:40.139
Do you want to get into that, using AI for cybersecurity?

00:25:40.140 --> 00:25:42.019
You might have already touched on this.

00:25:42.020 --> 00:25:44.379
[Aaron]: Yeah, really, for cybersecurity,

00:25:44.380 --> 00:25:46.259
what I've had to do is I've dumped logs

00:25:46.260 --> 00:25:47.299
to have it do correlation.

00:25:47.300 --> 00:25:49.859
Keep in mind, the size of that Llama file we were using

00:25:49.860 --> 00:25:52.059
for figuring out David Bowie, writing the hello world,

00:25:52.060 --> 00:25:54.179
all that stuff, is like six gig.

00:25:54.180 --> 00:25:56.859
How does it get the entire world in six gig?

00:25:56.860 --> 00:25:59.739
I still haven't figured that out in terms of quantization.

00:25:59.740 --> 00:26:02.499
So I'm really interested in seeing the ability

00:26:02.500 --> 00:26:05.139
to take all this stuff out of all my logs,

00:26:05.140 --> 00:26:06.339
dump it all in there,

00:26:06.340 --> 00:26:08.459
and actually be able to do intelligent queries against that.

00:26:08.460 --> 00:26:10.899
Microsoft has a project called Security Copilot,

00:26:10.900 --> 00:26:12.819
which is trying to do that in the Cloud.

00:26:12.820 --> 00:26:15.299
But I want to work on something to do that more locally

00:26:15.300 --> 00:26:19.019
and be able to actually drive this stuff over that.

00:26:19.020 --> 00:26:24.659
That's one also on the long-term goals.

00:26:24.660 --> 00:26:26.059
[Corwin]: So we got any other questions or?

00:26:26.060 --> 00:26:29.099
Those are the questions that I see.

00:26:29.100 --> 00:26:31.179
I want to just read out a couple of comments

00:26:31.180 --> 00:26:33.419
that I saw in IRC though.

00:26:33.420 --> 00:26:36.699
jrootabaga says, it went very well

00:26:36.700 --> 00:26:39.259
from an audience perspective.

00:26:39.260 --> 00:26:43.619
And GGundam says, respect your commitment to privacy.

00:26:43.620 --> 00:26:45.619
And then somebody is telling us

00:26:45.620 --> 00:26:46.779
we might have skipped a question.

00:26:46.780 --> 00:26:50.019
So I'm just going to run back to my list.

00:26:50.020 --> 00:26:52.819
Updated regularly experience.

00:26:52.820 --> 00:26:57.659
I just didn't type in the answer here's

00:26:57.660 --> 00:26:59.659
and there's a couple more questions coming in so

NOTE Q: Is there a disparity where you go to paid models becouse they are better and what problems would those be?

00:26:59.660 --> 00:27:04.699
Is there a disparity where you go to paid models

00:27:04.700 --> 00:27:08.619
because they are better and what problems?

00:27:08.620 --> 00:27:14.019
You know what would drive you to? That's a good question.

00:27:14.020 --> 00:27:17.819
Paid models, I don't mind them. I think they're good,

00:27:17.820 --> 00:27:21.299
but I don't think they're actually economically sustainable

00:27:21.300 --> 00:27:22.659
under their current system.

00:27:22.660 --> 00:27:24.299
Because right now, if you're paying

00:27:24.300 --> 00:27:26.899
20 bucks a month for Copilot and that goes up to 200 bucks,

00:27:26.900 --> 00:27:28.499
I'm not going to be as likely to use it.

00:27:28.500 --> 00:27:29.579
You know what I mean?

00:27:29.580 --> 00:27:33.059
But it does do some things in a way that I did not expect.

00:27:33.060 --> 00:27:35.459
For example, Grok was refactoring

00:27:35.460 --> 00:27:38.019
some of my code in the comments and dropped an F-bomb.

00:27:38.020 --> 00:27:39.979
which I did not see coming,

00:27:39.980 --> 00:27:41.619
but the other code before

00:27:41.620 --> 00:27:43.219
that I had gotten off GitHub

00:27:43.220 --> 00:27:44.059
had F bombs in it.

00:27:44.060 --> 00:27:45.899
So it was just emulating the style,

00:27:45.900 --> 00:27:47.779
but would that be something

00:27:47.780 --> 00:27:49.979
I'd want to turn in a pull request? I don't know.

00:27:49.980 --> 00:27:52.139
But, uh, there's, there's a lot of money

00:27:52.140 --> 00:27:53.899
going into these AIs and stuff,

00:27:53.900 --> 00:27:56.219
but in terms of the ability to get a decent one,

00:27:56.220 --> 00:27:57.979
like the llama, llama 3.2,

00:27:57.980 --> 00:28:01.239
and load your data into it, you can be pretty competitive.

00:28:01.240 --> 00:28:02.792
You're not going to get all the benefits,

00:28:02.793 --> 00:28:04.333
but you have more control over it.

00:28:04.334 --> 00:28:11.000
So it's a balancing act.

00:28:11.001 --> 00:28:14.125
[Corwin]: Okay, and I think I see a couple more questions coming in.

NOTE Q: What's the largest (in parameter size) local model you've been able to successfully run locally, and do you run into issues with limited context window size?

00:28:14.126 --> 00:28:19.619
What is the largest parameter size for local models

00:28:19.620 --> 00:28:22.459
that you've been able to successfully run locally

00:28:22.460 --> 00:28:26.059
and do you run into issues with limited context window size?

00:28:26.060 --> 00:28:29.659
The top paid models will tend to have a larger ceiling.

00:28:29.660 --> 00:28:32.859
[Aaron]: Yes, yes, yes, yes, yes.

00:28:32.860 --> 00:28:37.019
By default, the context size is I think 1024.

00:28:37.020 --> 00:28:41.160
But I've upped it to 8192 on this box, the Pangolin,

00:28:41.161 --> 00:28:43.542
because it seems to be, for some reason,

00:28:43.543 --> 00:28:45.208
it's just a very... working quite well.

00:28:45.209 --> 00:28:49.750
But the largest ones I've loaded have been in the...

00:28:49.751 --> 00:28:51.333
have not been that huge.

00:28:51.334 --> 00:28:55.699
I've loaded this... the last biggest one I've done...

00:28:55.700 --> 00:28:57.459
That's the reason why I'm planning

00:28:57.460 --> 00:29:01.339
on breaking down and buying a Ryzen.

00:29:01.340 --> 00:29:03.619
Actually, I'm going to buy

00:29:03.620 --> 00:29:06.979
an Intel i285H with 96 gig of RAM.

00:29:06.980 --> 00:29:08.379
Then I should be able to load

00:29:08.380 --> 00:29:12.059
a 70 billion parameter model in that. How fast will it run?

00:29:12.060 --> 00:29:13.819
It's going to run slow as dog,

00:29:13.820 --> 00:29:15.819
but it's going to be cool to be able to do it.

00:29:15.820 --> 00:29:17.379
It's an AI bragging rights thing,

00:29:17.380 --> 00:29:20.019
but I mostly stick with the smaller size models

00:29:20.020 --> 00:29:22.819
and the ones that are more quantitized

00:29:22.820 --> 00:29:26.619
because it just tends to work better for me.

00:29:26.620 --> 00:29:29.179
[Corwin]: We've still got over 10 minutes before we're cutting away,

00:29:29.180 --> 00:29:30.179
but I'm just anticipating

00:29:30.180 --> 00:29:32.859
that we're going to be going strong at the 10 minute mark.

00:29:32.860 --> 00:29:34.899
So I'm just, just letting, you know,

00:29:34.900 --> 00:29:37.379
we can go as long as we like here at a certain point.

00:29:37.380 --> 00:29:41.059
I may have to jump away and check in with the next speaker,

00:29:41.060 --> 00:29:44.419
but we'll post the entirety of this,

00:29:44.420 --> 00:29:47.979
even if we aren't able to stay with it all.

00:29:47.980 --> 00:29:49.739
Okay. And we've got 10 minutes

00:29:49.740 --> 00:29:52.379
where we're still going to stay live.

NOTE Q: Are there "Free" as in FSF/open source issues with the data?

00:29:52.380 --> 00:30:00.139
So next question coming in, I see, are there free as in freedom,

00:30:00.140 --> 00:30:05.739
free as in FSF issues with the data?

00:30:05.740 --> 00:30:11.699
[Aaron]: Yes, where's the data coming from is a huge question with AI.

00:30:11.700 --> 00:30:13.739
It's astonishing you can ask questions

00:30:13.740 --> 00:30:16.899
to models that you don't know where it's coming from.

00:30:16.900 --> 00:30:19.979
That is gonna be one of the big issues long-term.

00:30:19.980 --> 00:30:21.499
There are people who are working

00:30:21.500 --> 00:30:22.979
on trying to figure out that stuff,

00:30:22.980 --> 00:30:25.259
but it's, I mean, if you look at, God,

00:30:25.260 --> 00:30:27.059
I can't remember who it was.

00:30:27.060 --> 00:30:28.659
Somebody was actually out torrenting books

00:30:28.660 --> 00:30:30.939
just to be able to build it into their AI system.

00:30:30.940 --> 00:30:32.339
I think it might've been Meta.

00:30:32.340 --> 00:30:34.819
So there's a lot of that going on.

00:30:34.820 --> 00:30:38.139
The open source of the stuff is going to be tough.

00:30:38.140 --> 00:30:39.459
There's going to be there's some models

00:30:39.460 --> 00:30:41.419
like the mobile guys have got their own license,

00:30:41.420 --> 00:30:42.739
but where they're getting their data from,

00:30:42.740 --> 00:30:45.499
I'm not sure, so that's a huge question.

00:30:45.500 --> 00:30:47.979
That's a talk in itself.

00:30:47.980 --> 00:30:51.979
But yeah, if you train on your RAG and your data,

00:30:51.980 --> 00:30:53.499
you know what it's come, you know,

00:30:53.500 --> 00:30:54.379
you have a license that

00:30:54.380 --> 00:30:55.139
but the other stuff is just

00:30:55.140 --> 00:30:56.739
more lines of supplement

00:30:56.740 --> 00:31:01.379
if you're using a smaller model.

00:31:01.380 --> 00:31:05.419
[Corwin]: The comments online, I see a couple of them.

00:31:05.420 --> 00:31:08.339
I'll read them out in order here. Really interesting stuff.

00:31:08.340 --> 00:31:09.556
Thank you for your talk.

NOTE Q: Given that large AI companies are openly stealing IP and copyright, thereby eroding the authority of such law (and eroding truth itself as well), can you see a future where IP & copyright flaw become untenable and what sort of onwards effect might that have?

00:31:09.557 --> 00:31:11.659
Given that large AI companies

00:31:11.660 --> 00:31:14.899
are openly stealing intellectual property and copyright

00:31:14.900 --> 00:31:18.939
and therefore eroding the authority of such laws

00:31:18.940 --> 00:31:21.579
and maybe obscuring the truth itself,

00:31:21.580 --> 00:31:26.579
can you see a future where IP and copyright flaw become untenable?

00:31:26.580 --> 00:31:29.619
I think that's a great question.

00:31:29.620 --> 00:31:34.979
I'm not a lawyer, but it is really getting complicated.

00:31:34.980 --> 00:31:37.859
It is getting to the point, I asked a question from,

00:31:37.860 --> 00:31:41.179
I played with Sora a little bit, and it generated someone,

00:31:41.180 --> 00:31:42.819
you can go like, oh, that's Jon Hamm,

00:31:42.820 --> 00:31:44.099
that's Christopher Walken,

00:31:44.100 --> 00:31:45.379
you start figuring out who the people

00:31:45.380 --> 00:31:47.019
they're modeling stuff after.

00:31:47.020 --> 00:31:48.979
There is an apocalypse, something

00:31:48.980 --> 00:31:52.459
going to happen right now.

00:31:52.460 --> 00:31:53.579
There is, but this is once again,

00:31:53.580 --> 00:31:56.059
my personal opinion, and I'm not a lawyer,

00:31:56.060 --> 00:31:57.459
and I do not have money.

00:31:57.460 --> 00:31:58.859
So don't sue me, is there's going to be

00:31:58.860 --> 00:32:02.899
the current administration tends is very AI, pro AI.

00:32:02.900 --> 00:32:05.499
And there's very a great deal of lobbying by those groups.

00:32:05.500 --> 00:32:07.139
And it's on both sides.

00:32:07.140 --> 00:32:09.699
And it's going to be, it's gonna be interesting to see

00:32:09.700 --> 00:32:11.699
what happens to copyright the next 510 years.

00:32:11.700 --> 00:32:13.339
I just don't know how it keeps up

00:32:13.340 --> 00:32:18.059
without there being some adjustments and stuff.

NOTE Comment: File size is not going to be the bottleneck, your RAM is.

00:32:18.060 --> 00:32:20.419
[Corwin]: Okay, and then another comment I saw,

00:32:20.420 --> 00:32:23.219
file size is not going to be a bottleneck.

00:32:23.220 --> 00:32:25.819
RAM is. You'll need 16 gigabytes of RAM

00:32:25.820 --> 00:32:28.259
to run the smallest local models

00:32:28.260 --> 00:32:31.979
and 512 gigabytes of RAM to run the larger ones.

00:32:31.980 --> 00:32:35.059
You'll need a GPU with that much memory

00:32:35.060 --> 00:32:38.318
if you want it to run quickly.

00:32:38.319 --> 00:32:41.259
[Aaron]: Yeah. Oh no. It also depends upon how your memory is laid out.

00:32:41.260 --> 00:32:45.699
Like example being the Ultra i285H

00:32:45.700 --> 00:32:47.899
I plan to buy, that has 96 gig of memory.

00:32:47.900 --> 00:32:50.499
It's unified between the GPU and the CPU share it,

00:32:50.500 --> 00:32:52.739
but they go over the same bus.

00:32:52.740 --> 00:32:55.779
So the overall bandwidth of it tends to be a bit less,

00:32:55.780 --> 00:32:57.579
but you're able to load more of it into memory.

00:32:57.580 --> 00:32:59.419
So it's able to do some additional stuff with it

00:32:59.420 --> 00:33:00.819
as opposed to come off disk.

00:33:00.820 --> 00:33:03.699
It's all balancing act. If you hit Ziskind's website,

00:33:03.700 --> 00:33:05.819
that guy's done some great work on it.

00:33:05.820 --> 00:33:07.499
I'm trying to figure out how big a model you can do,

00:33:07.500 --> 00:33:08.619
what you can do with it.

00:33:08.620 --> 00:33:12.699
And some of the stuff seems to be not obvious,

00:33:12.700 --> 00:33:15.299
because like example, being that MacBook Air,

00:33:15.300 --> 00:33:17.619
for the five minutes I can run the model,

00:33:17.620 --> 00:33:19.379
it runs it faster than a lot of other things

00:33:19.380 --> 00:33:21.339
that should be able to run it faster,

00:33:21.340 --> 00:33:24.619
just because of the way the ARM cores and the unified memory work on it.

00:33:24.620 --> 00:33:26.019
So it's a learning process.

00:33:26.020 --> 00:33:29.579
But if you want to, Network Chuck had a great video

00:33:29.580 --> 00:33:30.939
talking about building his own system

00:33:30.940 --> 00:33:34.379
with a couple really powerful Nvidia cards

00:33:34.380 --> 00:33:35.379
and stuff like that in it.

00:33:35.380 --> 00:33:38.859
And just actually setting up on his system as a node

00:33:38.860 --> 00:33:41.459
and using a web UI on it. So there's a lot of stuff there,

00:33:41.460 --> 00:33:43.899
but it is a process of learning how big your data is,

00:33:43.900 --> 00:33:44.899
which models you want to use,

00:33:44.900 --> 00:33:46.219
how much information you need,

00:33:46.220 --> 00:33:49.579
but it's part of the learning.

00:33:49.580 --> 00:33:52.899
And you can run models, even on Raspberry Pi 5s,

00:33:52.900 --> 00:33:54.499
if you want to, they'll run slow.

00:33:54.500 --> 00:33:59.339
Don't get me wrong, but they're possible.

00:33:59.340 --> 00:34:02.179
[Corwin]: Okay, and I think there's other questions coming in too,

00:34:02.180 --> 00:34:04.019
so I'll just bam for another second.

00:34:04.020 --> 00:34:06.299
We've got about five minutes before we'll,

00:34:06.300 --> 00:34:09.739
before we'll be cutting over,

00:34:09.740 --> 00:34:13.179
but I just want to say in case we get close for time here,

00:34:13.180 --> 00:34:14.859
how much I appreciate your talk.

00:34:14.860 --> 00:34:15.979
This is another one that I'm going to

00:34:15.980 --> 00:34:18.339
have to study after the conference.

00:34:18.340 --> 00:34:21.099
[Aaron]: We greatly appreciate, all of us appreciate

00:34:21.100 --> 00:34:22.459
you guys putting on the conference.

00:34:22.460 --> 00:34:26.299
It's a great conference. It's well done.

00:34:26.300 --> 00:34:28.019
[Corwin]: It's an honor to be on the stage

00:34:28.020 --> 00:34:33.124
with the brains of the project, which is you.

00:34:33.125 --> 00:34:34.699
[Aaron]: So what else we got? Question wise.

00:34:34.700 --> 00:34:46.899
[Corwin]: Okay, so just scanning here.

NOTE Q: Have you used local models capable of tool-calling?

00:34:46.900 --> 00:34:50.699
Have you used local models capable of tool calling?

00:34:50.700 --> 00:34:54.779
I'm scared of agentic.

00:34:54.780 --> 00:34:58.739
I'm going to be a slow adopter of that.

00:34:58.740 --> 00:35:02.459
I want to do it, but I just don't have the, uh,

00:35:02.460 --> 00:35:04.339
four decimal fortitude right now to do it.

00:35:04.340 --> 00:35:07.179
I've had to give me the commands,

00:35:07.180 --> 00:35:08.739
but I still run the commands by hand.

00:35:08.740 --> 00:35:10.539
I'm looking into it and it's on once again,

00:35:10.540 --> 00:35:20.899
it's on that list, but I just, that's a big step for me.

00:35:20.900 --> 00:35:23.139
[Corwin]: So. Awesome. All right.

00:35:23.140 --> 00:35:27.179
Well, maybe it's, let me just scroll through

00:35:27.180 --> 00:35:31.539
because we might have missed one question. Oh, I see.

00:35:31.540 --> 00:35:36.899
Here was the piggyback question.

00:35:36.900 --> 00:35:38.419
Now I see the question that I missed.

00:35:38.420 --> 00:35:41.139
So this was piggybacking on the question

00:35:41.140 --> 00:35:44.859
about model updates and adding data.

NOTE Q: Will the models reach out to the web if they need to for more info?

00:35:44.860 --> 00:35:46.579
And will models reach out to the web

00:35:46.580 --> 00:35:47.819
if they need more info?

00:35:47.820 --> 00:35:52.479
Or have you worked with any models that work that way?

00:35:52.480 --> 00:35:55.259
[Aaron]: No, I've not seen any models to do that

00:35:55.260 --> 00:35:57.739
There's there was like a group

00:35:57.740 --> 00:35:59.899
working on something like a package updater

00:35:59.900 --> 00:36:02.499
that would do different diffs on it,

00:36:02.500 --> 00:36:03.939
but it's so... Models change so much,

00:36:03.940 --> 00:36:05.739
even who make minor changes and fine-tuning,

00:36:05.740 --> 00:36:07.659
It's hard just to update them in place.

00:36:07.660 --> 00:36:10.099
So I haven't seen one, but that doesn't mean

00:36:10.100 --> 00:36:15.713
they're not out there. Curious topic though.

00:36:15.714 --> 00:36:16.259
[Corwin]: Awesome.

00:36:16.260 --> 00:36:19.539
Well, it's probably pretty good timing.

00:36:19.540 --> 00:36:21.299
Let me just scroll and make sure.

00:36:21.300 --> 00:36:23.499
And of course, before I can say that,

00:36:23.500 --> 00:36:25.899
there's one more question. So let's go ahead and have that.

00:36:25.900 --> 00:36:28.299
I want to make sure while we're still live, though,

00:36:28.300 --> 00:36:31.299
I give you a chance to offer any closing thoughts.

NOTE Q: What scares you most about agentic tools? How would you think about putting a sandbox around it if you adopt an agentic workflow?

00:36:31.300 --> 00:36:35.779
So what scares you most about the agentic tools?

00:36:35.780 --> 00:36:38.419
How would you think about putting a sandbox around that

00:36:38.420 --> 00:36:41.619
if you did adopt an agentic workflow?

00:36:41.620 --> 00:36:42.899
[Aaron]: That is a great question.

00:36:42.900 --> 00:36:45.939
In terms of that, I would just control

00:36:45.940 --> 00:36:48.099
what it's able to talk to, what machines,

00:36:48.100 --> 00:36:50.059
I would actually have it be air gap.

00:36:50.060 --> 00:36:52.099
I work for a defense contractor,

00:36:52.100 --> 00:36:53.819
and we spend a lot of time dealing with air gap systems,

00:36:53.820 --> 00:36:55.979
because that's just kind of the way it works out for us.

00:36:55.980 --> 00:36:58.499
So agentic, it's just going to take a while to get trust.

00:36:58.500 --> 00:37:01.059
I want to see more stuff happening.

00:37:01.060 --> 00:37:02.819
Humans screw up stuff enough.

00:37:02.820 --> 00:37:04.819
The last thing we need is to multiply that by 1000.

00:37:04.820 --> 00:37:09.419
So in terms of that, I would be restricting what it can do.

00:37:09.420 --> 00:37:10.859
If you look at the capabilities,

00:37:10.860 --> 00:37:13.579
if I created a user and gave it permissions,

00:37:13.580 --> 00:37:15.299
I would have a lockdown through sudo,

00:37:15.300 --> 00:37:17.379
what it's able to do, what the account's able to do.

00:37:17.380 --> 00:37:18.899
I would do those kind of things,

00:37:18.900 --> 00:37:20.859
but it's going to be, it's happening.

00:37:20.860 --> 00:37:25.819
It's just, I'm going to be one of the laggards on that one.

00:37:25.820 --> 00:37:29.259
So air gap, jail, extremely locked down environments,

00:37:29.260 --> 00:37:34.899
like we're talking about separate physicals, not Docker.

00:37:34.900 --> 00:37:36.577
Yeah, hopefully.

NOTE Q: Tool calling can be read-only, such as giving models the ability to search the web before answersing your question. (No write access or execute access) I'm interested to know if local models are any good at calling tools, though.

00:37:36.578 --> 00:37:39.899
[Corwin]: Right, fair. So tool calling can be read-only,

00:37:39.900 --> 00:37:42.539
such as giving models the ability to search the web

00:37:42.540 --> 00:37:43.979
before answering your question,

00:37:43.980 --> 00:37:46.219
you know, write access, execute access.

00:37:46.220 --> 00:37:49.219
I'm interested to know if local models

00:37:49.220 --> 00:37:51.419
are any good at that.

00:37:51.420 --> 00:37:55.579
[Aaron]: Yes, local models can do a lot of that stuff.

00:37:55.580 --> 00:37:56.819
It's their capabilities.

00:37:56.820 --> 00:37:59.019
If you load LM studio, you can do a lot of wonderful stuff

00:37:59.020 --> 00:38:02.419
with that or with Open Web UI with ollama.

00:38:02.420 --> 00:38:05.739
It's a lot of capabilities. It's amazing.

00:38:05.740 --> 00:38:08.139
Open Web UI is actually what a lot of companies are using now

00:38:08.140 --> 00:38:10.259
to put their data behind that.

00:38:10.260 --> 00:38:12.139
They're curated data and stuff like that. So works well.

00:38:12.140 --> 00:38:15.819
I can confirm that from my own professional experience.

00:38:15.820 --> 00:38:16.915
Excellent.

00:38:16.916 --> 00:38:19.659
[Corwin]: Okay, well, our timing should be just perfect

00:38:19.660 --> 00:38:22.659
if you want to give us like a 30-second, 45-second wrap-up.

00:38:22.660 --> 00:38:24.419
Aaron, let me squeeze in mine.

00:38:24.420 --> 00:38:26.779
Thank you again so much for preparing this talk

00:38:26.780 --> 00:38:30.499
and for entertaining all of our questions.

00:38:30.500 --> 00:38:33.299
[Aaron]: Yeah, let me just thank you guys for the conference again.

00:38:33.300 --> 00:38:35.179
This is a great one. I've enjoyed a lot of it.

00:38:35.180 --> 00:38:37.339
I've only had a couple of talks so far,

00:38:37.340 --> 00:38:41.659
but I'm looking forward to hitting the ones after this and tomorrow.

NOTE Wrapping up

00:38:41.660 --> 00:38:44.739
But the AI stuff is coming. Get on board.

00:38:44.740 --> 00:38:46.939
Definitely recommend it. If you want to just try it out

00:38:46.940 --> 00:38:48.419
and get a little taste of it,

00:38:48.420 --> 00:38:49.779
what my minimal viable product

00:38:49.780 --> 00:38:51.619
with just Llamafile and gptel

00:38:51.620 --> 00:38:53.139
will get you to the point where you start figuring out.

00:38:53.140 --> 00:38:55.579
Gptel is an amazing thing. It just gets out of your way,

00:38:55.580 --> 00:39:00.459
but it works so well with Emacs's design because

00:39:00.460 --> 00:39:01.699
it doesn't take your hands off the keyboard.

00:39:01.700 --> 00:39:02.499
It's just another buffer,

00:39:02.500 --> 00:39:04.059
and you just put information in there.

00:39:04.060 --> 00:39:06.979
It's quite a wonderful time.

00:39:06.980 --> 00:39:10.501
Let's put that way. That's all I got.

00:39:10.502 --> 00:39:14.339
[Corwin]: Thank you so much for once again, and we've just cut away.

00:39:14.340 --> 00:39:15.779
So I'll stop the recording

00:39:15.780 --> 00:39:18.259
and you're on your own recognizance.

00:39:18.260 --> 00:39:19.699
[Aaron]: Well, I'm gonna punch out

00:39:19.700 --> 00:39:21.059
if anybody has any questions or anything

00:39:21.060 --> 00:39:24.699
my email address is ajgrothe@yahoo.com or at gmail and

00:39:24.700 --> 00:39:26.779
thank you all for attending,

00:39:26.780 --> 00:39:29.939
and thanks again for the conference

00:39:29.940 --> 00:39:32.579
Okay, I'm gonna go ahead and end the room there, thank you.

00:39:32.580 --> 00:39:34.100
Excellent, thanks, bye.