The Carbon Footprint of Training Clay v1.5

TL;DR: Training Clay v1.5 was “carbon neutral” and actually emitted ~10 tonnes of CO₂e. Moreover, focusing on lower emissions during geoAI training is a climate distraction compared to understanding geoembeddings.

A year ago we trained Clay model v1.5 — still one of the most capable geoAI models today: open-source, open-data, open-license. At the time we promised to publish its emissions. I just updated the docs, but sharing this longer post since it proved harder — and had deeper pragmatic implications — than I expected.

This post is about the footprint of emissions, not the “handprint” of impact (who has used it for what). That post is coming.

The Numbers

In June, AWS added monthly “location-based” metrics (Scope 1+2) showing emissions based on the average local carbon footprint. The month we trained most of Clay (in Ohio) totaled 15 tCO₂e (which AWS offset, meaning it was carbon neutral on a “market-based” basis). This covers everything that month — from pre-training tests to post-training large-scale inference runs embedding all NAIP and lots of Sentinel imagery (released on Source Cooperative). Yes, we should have used a separate account or method to isolate the reporting.

Training Clay alongside embedding inference makes reporting training emissions much harder. We learned that inference itself can emit a lot. Specifically, we learned that GPUs are very energy-efficient but hard to feed and scale. CPUs are plentiful and cheap, so using them for inference is tempting — but for the same AI compute, CPU inference can produce up to 10× more emissions (see MLPerf benchmarks). AI for Earth doesn’t really have “chain of thought” inference, but unless careful, you’ll emit more making Earth embeddings than training the model itself. This further strengthens the case for leveraging embeddings, not just open models.

Taking this all together, we estimate Clay v1.5’s training emissions were probably 10–12 tCO₂e: less than the flights for our company retreat, less than 1% of GPT-3 training, less than 12 lifetime trees’ sequestration, less than $250 to offset with gold standard credits. It’s remarkably small on the relative scale of AI. All this also suggests that it’s not training efficiency but understanding embeddings in practice that merits the most geoAI attention.

A Challenge to the Field

To our knowledge, no other large geoAI models have yet published their emissions. Could we get estimates for TerraMind, Google AEF, Prithvi, or Tessera? My rough guesses: MOSAIKS «« Tessera < Clay < Prithvi < TerraMind ~ AEF. And all combined likely less than 10% of GPT-3, the model that birthed the ChatGPT revolution.

Would be great if other model providers share their numbers — even roughly — so we can all build smarter and more responsibly, and possibly double down on geoembeddings.

Originally posted on LinkedIn.

The Numbers#

A Challenge to the Field#

The Numbers

A Challenge to the Field