In practice, models are usually trained many times during research and development.
Date of original paper | Energy consumption (kWh) | Carbon footprint (lbs of CO2e) | Cloud compute cost (USD) | |
---|---|---|---|---|
Transformer (65M parameters) | Jun, 2017 | 27 | 26 | $41-$140 |
Transformer (213M parameters) | Jun, 2017 | 201 | 192 | $289-$981 |
ELMo | Feb, 2018 | 275 | 262 | $433-$1,472 |
BERT (110M parameters) | Oct, 2018 | 1,507 | 1,438 | $3,751-$12,571 |
Transformer (213M parameters) w/ neural architecture search | Jan, 2019 | 656,347 | 626,155 | $942,973-$3,201,722 |
GPT-2 | Feb, 2019 | - | - | $12,902-$43,008 |