Pre-training and Fine-tuning BERT: Energy and Carbon Considerations

Date
2023
Journal Title
Journal ISSN
Volume Title
Publisher
Producer
Director
Performer
Choreographer
Costume Designer
Music
Videographer
Lighting Designer
Set Designer
Crew Member
Funder
Rehearsal Director
Concert Coordinator
Moderator
Panelist
Alternative Title
Department
Haverford College. Department of Computer Science
Type
Thesis
Original Format
Running Time
File Format
Place of Publication
Date Span
Copyright Date
Award
Language
eng
Note
Table of Contents
Terms of Use
Rights Holder
Access Restrictions
Dark Archive until 2024-01-01, afterwards Open Access
Tripod URL
Identifier
Abstract
Artificial intelligence is becoming more powerful and making more decisions for people than anyone would have expected. Its impact has been significant and it is important to consider the performance of machines more than just classification accuracy. For example, some algorithms discriminate against a certain population. In response to that, it is worth the effort to create fairness-aware machines. On the other hand, as models get more complicated, sometimes humans don’t understand why models perform in a certain way. However, as these machines are making important decisions for humans, it is important to understand how they come to the conclusions, which leads us to the measure of interpretability. Furthermore, the more complicated models usually require longer running times which results in higher computation power. The training and inference energy required for these models have a significant impact on the environment. Consequently, it is crucial to account for environmental costs as well. In this thesis, we will delve deeper into evaluating the energy usage regarding Natural Language Processing (NLP) tasks. Despite the popularity of model fine-tuning in the NLP community, existing work on quantifying energy costs and associated carbon emissions has mostly focused on pre-training of language models (e.g. Strubell et al. 2019; Patterson, Gonzalez, Le, et al. 2021; Luccioni et al. 2022).For this reason, we investigate this prevalent yet understudied workload by comparing energy costs of pre-training and fine-tuning, and by comparing energy costs across different fine-tuning tasks, datasets, and hardware infrastructure settings.
Description
Subjects
Citation
Collections