Editor’s Note: Thursday is the Fourth of July US holiday, and Your Humble Correspondent is taking the week of the 8th off to hang with the fam. While we’re on vacation we’ll schedule issues of the Best of Perfecting Equilibrium from our archives; new content will resume July 18. Have a great holiday!
This piece originally ran March 28, 2024
I’ve gone on and on about how Large Language Models — so-called AIs — are crippled by training on bad data. Garbage In; Garbage Out! So let’s build a local LLM and feed it good data for a specific purpose. Welcome to Virtual Grad Student! We’re going to set up a Large Language Model to run locally, feed it a clean set of data, then make it available to authors as a virtual writer’s assistant. For example, to pull together a few paragraphs of background on Roman aqueduct architecture. Today we're walking through h2oGPT options.
Links:
The h2oGPT open source Large Language Model
Common Corpus, the largest public domain dataset for training LLMs
Part 1 of Build Your Own AI
Share this post