Combined loss (L₃)
The total fine-tuning loss: L₃ = L_task + λ · L_language_model.
The total fine-tuning loss: L₃ = L_task + λ · L_language_model. The λ weight (0.5 in the paper) keeps the language modelling objective active during fine-tuning, acting as a regulariser.