Pre-training
Training a model on large-scale, typically unlabelled data before fine-tuning.
Training a model on large-scale, typically unlabelled data before fine-tuning. GPT-1's pre-training objective: next-token prediction on BooksCorpus.
Training a language model on unlabeled, diverse data from the internet (Common Crawl, books, Wikipedia). This teaches the model general language patterns. It requires massive compute but is done once. All downstream tasks benefit from this knowledge.
The initial training phase where a language model learns from large amounts of unlabeled data. After pre-training, the model can be fine-tuned on specific tasks or used zero-shot.