There are two types of finetuning that we support:

  1. LoRA finetuning
  2. Full finetuning

We recommend choosing LoRA finetuning for most usecases, if you are not very familiar with the finetuning types. Here are some features of both types of finetuning:

LoRA Finetuning:

  • Requires less computational resources compared to full finetuning, making it more accessible.
  • Retains the base model’s generalization capabilities while still allowing for customization to specific tasks.
  • Faster to train, as it only adjusts a small fraction of the model’s weights.
  • May not achieve as high performance on the finetuned task as full finetuning, especially for tasks that are significantly different from the base model’s training data.
  • The improvements are limited to the layers where LoRA is applied, which might not be sufficient for all tasks.
  • Cheaper to host, cost is the same as the base model.

Full Finetuning:

  • Can lead to higher performance on the finetuned task since all model weights are adjusted to the specific dataset.
  • More flexible, as it allows the model to deviate more significantly from the base model’s behavior.
  • Requires significantly more computational resources and time to train compared to LoRA finetuning.
  • There’s a higher risk of overfitting to the finetuning dataset, especially if the dataset is small or not diverse.
  • May lose some of the base model’s generalization capabilities, making the model less effective on tasks outside of the finetuned domain.
  • More expensive to host. Expect 10-20 seconds cold-start times and pay-per-second billing. We host your model separately on a dedicated GPU instance through modal.