Meta SAM 3 Meets Earth Observation: Tracking the Planet in Real-Time

The world of geospatial AI just got a powerful new tool. Meta’s latest Segment Anything Model 3 (SAM 3), a foundation model for detection, segmentation, and tracking in images and video, is opening up exciting possibilities for analyzing satellite imagery. See the model page and GitHub.

While object detection in Earth Observation (EO) is a well-established field and the integration of Promptable Concept Segmentation (PCS) is exciting, the real breakthrough is that SAM 3 is built up for video tracking. By treating a sequence of satellite images as a video, we can leverage SAM 3’s temporal memory to monitor a changing world.

The Experiment: Tracking the Wildfire Smoke

A natural starting point for this “video-like” analysis is the Geostationary Operational Environmental Satellite (GOES) series. Unlike sun-synchronous satellites (like Sentinel-2) that pass over once every few days, GOES-16 and GOES-17 by NOAA and NASA provide a constant view of the Western Hemisphere, refreshing every few minutes.

I spent a few hours of the Thanksgiving holiday exploring this concept using the 2020 California Creek Fire to track wildfire smoke. Using Google Earth Engine (GEE), inspired by this brilliant blog post by Justin Baraaten, I generated a true and a false-color animations of the wildfire.

The Workflow: I generated animations of the wildfire with different frame rates using GEE. Next, I fed the GOES animation into SAM 3, experimenting with different frame rates, resolutions, and text prompts.

The Result: Using the simple text prompt "smoke", SAM 3 achieved impressive zero-shot segmentation. It successfully maintained the “identity” of the smoke plume as it drifted and expanded across the Sierra Nevada, demonstrating its potential utility for near-real-time disaster monitoring (Figure 1).

Fig. 1. SAM3 segmentation (prompt: ‘smoke’) using GOES-17 false-color animation of 2020’s California Creek Fire on September 5th. Animation generated in Google Earth Engine using Script from Justin Baraaten’s blog post here

Other Applications & Limitations

The results above highlight SAM 3’s massive potential for specialized EO workflows that leverage its ‘video DNA’ and open-vocabulary capabilities. While I haven’t fully tested these yet, I can envision similar workflows using GOES data for diverse, dynamic disaster response without the need for custom retraining, applications such as tracking volcanic ash plumes, monitoring offshore oil spills, or even mapping ice sheet drift.

Finer scale applications could involve tracking ship movement, or flooding in urban and coastal areas. Although not tested yet, these seem to me like possible applications if high resolution satellite imagery are available and consistent to construct animation of the movement.

Note on Resolution: My attempts to explore SAM 3’s applications with Sentinel-2 (10m resolution) were less successful. Because Sentinel-2’s revisit time is measured in days, the “movement” between frames is too disjointed for SAM 3’s temporal tracker. These applications likely require sub-meter resolution and frequent data to maintain visual consistency.

The Misalignment Challenge

Despite the success with smoke tracking, my second experiment with Sentinel-2 time-series, prompting SAM 3 for detecting the growth changes highlighted a critical scientific hurdle: SAM 3 is optimized for “mechanical” movement, but Earth processes are often “transformational.”

When I tested Sentinel-2 (10m) annual time-series of agricultural fields, the results were unsatisfactory and SAM-3 didn’t’ highlight any changes. SAM 3’s tracker struggled because the “change” wasn’t an object moving, it was a pixel-level transformation of the land itself. For SAM 3, the “identity” of the field was lost as it turned from bare soil to canopy (Table 1).

SAM 3 was trained on objects like cars, people, and animals, objects that move mechanically across a scene while keeping a relatively consistent visual signature. In contrast, many EO tasks involve biological or phenological transformation. For example, in agriculture, looking at crop growth. A crop field doesn’t “move”; it transforms from bare soil to vibrant green over months. Its color, texture, and spectral signature change entirely. Additionally, the “object” (the field) remains in the same coordinates, but its visual identity is fluid, transforming over a longer period of time (seasons).

    Current models optimized for rapid movement often struggle with these slow-motion transformations. We need further experimentation to determine if SAM 3 can be “tricked” into understanding these longer timescales, or if we need a new class of Geospatial Foundation Models specifically “cooked” for phenology.

    FeatureSAM 3 Training Logic (Video)Earth Observation Reality (Time-Series)
    MovementMechanical or Spatial Displacement: A car moves from A to B.Varies from Mechanical to Stationary Growth: A crop field stays at A, but changes from brown to green.
    AppearanceConsistent Signature: A person looks like a person across 100 frames.Transformational: The spectral signature of an object experience Phenological Shift: A forest or crop changes texture, color, and leaf density seasonally.
    TimescaleMilliseconds to Seconds: Changes are rapid and frame-to-frame.Days to Months: Meaningful change happens over weeks, often with cloud gaps.

    The Opportunities and Road Ahead

    Despite the challenges, the immediate value of SAM 3 in the geospatial workflow is undeniable. The zero-shot, open-vocabulary segmentation power can serve as adaptable pre-processing engine, freeing up domain experts to focus on the unique temporal and spectral complexities of our planet.

    Researchers like Qiusheng Wu (SAMGeo) and Aliaksandr Hancharenka (segment-anything-eo) have already paved the way for integrating these models into geospatial environments. Check out this new interactive version for SAM3 by Qiusheng Wu.

    Foundation models like SAM 3 has an incredible immediate value for many geospatial workflow. The segmentation power of such model can be utilized for rapidly labeling data: Use its text or visual prompting to quickly generate high-quality segmentation masks for virtually any object (like a specific building type or a logging area), drastically speeding up the creation of custom datasets. Possibly, it could also act as the “Teacher” Model, using SAM 3 to rapidly label massive datasets, which are then used to train smaller, specialized models fine-tuned for the unique spectral complexities of EO data.

    SAM 3 and AI super resolution: In many EO applications, that bottleneck is spatial resolution. Using AI super-resolution tools, such as anicha.earth, we can now sharpen Sentinel-2 data to 1–2m resolution. This is an immediate advantage for object detection.

    For live-tracking, the success of smoke-tracking applications has been impressive, yet detecting finer changes in Sentinel-2 images remains a hurdle. While the phenological challenges are real, we haven’t reached the ceiling of what resolution can do and whether processing images using AI super-resolution before animation enhances SAM-3 tracking is an exciting experimental question we have yet to explore.

    𝗟𝗼𝗼𝗸𝗶𝗻𝗴 𝗔𝗵𝗲𝗮𝗱: While SAM 3 has demonstrated impressive, albeit specific, utility in satellite time-series animations, the fundamental challenge of biological transformation remains. This is might be where the next generation of Geospatial Foundation Models (GeoFMs) will step in to unlock a new tier of reliable, planetary-scale monitoring for agriculture and environmental change.

    #GeoAI, #FoundationModels, #Geospatial, #AI, #EO, #SAM, #Object-Tracking, #Segmentation #GOES, #RemoteSensing

    Esmaeel Adrah