Pre-training and Fine-tuning BERT: Energy and Carbon Considerations
Haverford College. Department of Computer Science
Place of Publication
Table of Contents
Dark Archive until 2024-01-01, afterwards Open Access
Artificial intelligence is becoming more powerful and making more decisions for people than anyone would have expected. Its impact has been significant and it is important to consider the performance of machines more than just classification accuracy. For example, some algorithms discriminate against a certain population. In response to that, it is worth the effort to create fairness-aware machines. On the other hand, as models get more complicated, sometimes humans don’t understand why models perform in a certain way. However, as these machines are making important decisions for humans, it is important to understand how they come to the conclusions, which leads us to the measure of interpretability. Furthermore, the more complicated models usually require longer running times which results in higher computation power. The training and inference energy required for these models have a significant impact on the environment. Consequently, it is crucial to account for environmental costs as well. In this thesis, we will delve deeper into evaluating the energy usage regarding Natural Language Processing (NLP) tasks. Despite the popularity of model fine-tuning in the NLP community, existing work on quantifying energy costs and associated carbon emissions has mostly focused on pre-training of language models (e.g. Strubell et al. 2019; Patterson, Gonzalez, Le, et al. 2021; Luccioni et al. 2022).For this reason, we investigate this prevalent yet understudied workload by comparing energy costs of pre-training and fine-tuning, and by comparing energy costs across different fine-tuning tasks, datasets, and hardware infrastructure settings.