Large Language Models are so cool, they are so much fun, and they can be very helpful. They are something that I have wanted for a long time. I may not have realized exactly what I was seeking, but I have been seeking my own bot ever since High School when I attempted to create or run some sort of chatbot that would speak for me over MSN Messenger while I was away.

Local Llama, An AI under my own roof

I have wanted to run my own chatbot for a long long time. My first attempt was some pitiful chatbot that you could add to MSN messenger way back in 2008. It was nothing more than a bunch of canned responses with a limited set of variations. Definitely not very impressive and I dropped it soon after. Every once in a while I would hear about some other new chatbot and I would try it out and inevitably it would dissapointing. Then, one day chatGPT appeared on the scene. LLMs had existed before this obviously, but they weren't exactly available to the public. But openAI and chatGPT were exactly what I wanted. It could talk to you and tell you stories, it could help you code, it could answer any question you had. It was like having a college professor at your beck and call at any time you wanted. Something wonderful happened. Meta 'accidentally' open sourced Llama.cpp, their in-house LLM. Suddenly anyone could run a large language model chatbot for themselves. Chatbots didn't suck anymore and now anyone could have one. I have always had a passion for computer hardware and have built many gaming computers for myself and other over the years. As a lifelong hardware enthusiast, I have a collection of PC hardware. Old parts from computers I have built over the years. Special editions of graphics cards that I can't bear to part with. Anything I could get my hands on that was cool. This is just a long way of saying that I had more compute power at my figertips than I really knew what to do with. I have used it to mine Cryptocurrency, I have used it to build computers for family, I have donated it to Folding@home. These language models needed a lot of raw compute and memory in order to function at anything other than a crawl. They also needed a large pool of VRAM. A couple of old Titan XPs were the perfect choice. I used them for crypto mining in the past but they have since become obsolete for this purpose. And with the right tool I could combine their VRAM to hold a larger LLM. That tool is the Oobabooga webui graciously provided by the open source community for hosting LLM text generators such as chatbots. I played around with a good deal of different open source models from HuggingFace.co. I did this by asking fairly standard set of questions. Such as: "Make me a meal plan for this week with recipies and generate a grocery list with as much overlap between recipes as possible. Here are some foods I like. Mexican, Japanese, Pasta, and Pizza. I work late and don't have much time to cook.", "How do I write a powershell script that can scan a directory and print out a list of duplicate files?" and "Tell me about some random historical event." This worked great to show me the capabilities of a model right away. I liked the Guanaco model and the Airoboros model. I eventually settled on the Airoboros 33 Billion parameter model that just barely fit inside the combined 24GB of VRAM in the two Titan XP cards. I have also been able to run 65B models on my system RAM, but it was just so much slower that I went back to the 33B model instead. Eventually I settled on models uploaded by Tom Jobbins. The models quantized and uploaded by him just seemed to have the most consistently stable and high quality. I found that some models hallucinated much more than others, but were also more clever and verbose. I found some that were fairly accurate when answering questions and formatted their responses better, but they were also very terse and still prone to some hallucinations. So if I was going to deal with hallucinations and inaccuracies anyway, I figured that I might as well choose an extra creative model so it would be able to provide more fun responses. I used the API from oobabooga to crank up the randomness of the model. That way it's output would be extra creative at the cost of more frequently giving flat out incorrect information. For prompts relating to subjective information or just conversation this worked out great! My Life as a Teenage Large Language Model

Finally getting my own chatbot was a great incentive to learn how to do all of this. I wanted my own bot that I could talk to over Discord. I wanted other people to be able to interact with it. To accomplise this I used another open source tool. A simple Discord Bot created by Dolyfin on Github. I first needed to create a bot for people to interact with on Discord and it needed a name. I thought for a short while about what friendly AI that I would want my friends and I to interact and have fun with. It came to me pretty quickly. XJ-9. I watched and enjoyed My Life as a Teenage Robot whenever I could catch it on Cable TV. My parents didn't have cable at our own house so I had to frequently go over to Grandma and Grandpa's house or a cousin's house in order to watch it or any of the other cartoons on Nickelodeon or Cartoon Network. I wrote a persona for the LLM using descriptions of XJ-9 from offical sources and various wiki websites. The tool written by Dolyfin also supports Image Generation through the use of the Stable Diffusion API so I loaded that up as well and gave my chatbot the ability to generate images based on text prompts. I activated the bot and it really was as easy as that. My very own version of Jenny Wakeman was ready to chat with anybody on my Discord server.

It turned out to be exactly what I wanted. It was clever and creative. It could hold a conversation and even tell a story!