This tutorial will show how to prepare data and ultimately fine-tune a language model on Replicate. Replicate is a commercial offering - we’ll be using an API key and an authentication proxy server much in the same way as with OpenAI.

1. Getting an API key from Replicate

Presently this is using credits generously made available to NYU by Replicate.

2. Setting up an authentication proxy server

Same steps as documented here for OpenAI, except the repository URL shall be https://github.com/gohai/replicate-auth-proxy and the name of the environment variable to be created on Glitch shall be REPLICATE_API_TOKEN.

3. Preparing the dataset

4. Uploading the dataset to GitHub