There are two types of finetuning that we support:
- LoRA finetuning
- Full finetuning
We recommend choosing LoRA finetuning for most usecases, if you are not very familiar with the finetuning types. Here are some features of both types of finetuning:
- Requires less computational resources compared to full finetuning, making it more accessible.
- Retains the base model’s generalization capabilities while still allowing for customization to specific tasks.
- Faster to train, as it only adjusts a small fraction of the model’s weights.
- May not achieve as high performance on the finetuned task as full finetuning, especially for tasks that are significantly different from the base model’s training data.
- The improvements are limited to the layers where LoRA is applied, which might not be sufficient for all tasks.
- Cheaper to host, cost is the same as the base model.
- Can lead to higher performance on the finetuned task since all model weights are adjusted to the specific dataset.
- More flexible, as it allows the model to deviate more significantly from the base model’s behavior.
- Requires significantly more computational resources and time to train compared to LoRA finetuning.
- There’s a higher risk of overfitting to the finetuning dataset, especially if the dataset is small or not diverse.
- May lose some of the base model’s generalization capabilities, making the model less effective on tasks outside of the finetuned domain.
- More expensive to host. Expect 10-20 seconds cold-start times and pay-per-second billing. We host your model separately on a dedicated GPU instance through modal.