Methodology

Talking about space

Data sources

Two corpora were assembled to analyze how outer space has been framed across different public arenas.

For news, headlines were collected from The New York Times via the NYT Article Search API, covering 1950–2024. A set of space-related search phrases was used to fetch articles spanning space exploration, policy, competition, science, and commercialization. Only articles where the phrase appeared in the headline were retained. The final corpus comprises 6,442 headlines.

For politics, space-relevant excerpts were drawn from the American Presidency Project at UC Santa Barbara, the most comprehensive freely accessible archive of U.S. presidential documents. The same search phrases were used to identify documents, and only the specific paragraphs where a phrase appeared were extracted. Where a single document matched multiple phrases in different paragraphs, all matching paragraphs were merged into one excerpt. The final corpus comprises 1,408 excerpts covering 1960–2024. The 1950s are excluded from the analysis due to insufficient data

Thematic & actor classification

Each text was classified across a set of thematic frames and actor types using the semantic embedding model nomic-embed-text-v1.5. The model converts both texts and thematic and anchor anchors into numerical vectors in a high-dimensional space, where meaning is encoded as geometric position and similarity is measured as the angle between vectors.

Each thematic frame and actor type is represented by a single dense anchor string made up of descriptive terms, capturing what that type looks like across different historical periods and vocabulary registers. The six thematic frames are:

    • Space race & rivalry: competition between nations and powers for space dominance and strategic advantage
    • Spaceflight & missions: operational activities, programs, missions, and execution of space exploration
    • Studying the cosmos: scientific research and discovery about space and celestial phenomena
    • Risk & hazards: dangers, failures, disasters, and hazards related to space activities and infrastructure
    • Markets & commerce: commercial opportunities, private companies, and economic dimensions of space
    • Meaning & identity: cultural, philosophical, and existential significance of space exploration

    The main actor sets differ between the two corpora, reflecting the different rhetorical logics of journalism and political speech. In news, five actor types are tracked: astronauts, the national state, geopolitical rivals, the private sector, and the scientific community. In political speech, four actor types are tracked: the national state, geopolitical rivals, international partnerships, and the private sector.

Because all texts in the corpus are already about space, raw similarity scores for both actors and themes tend to be close together by definition. Each document's scores are therefore z-score normalized relative to its own mean and standard deviation across all six themes — asking not which theme/actor this document is close to, but which theme it stands out in relative to all others. A softmax transformation then converts these relative scores into a probability distribution summing to 1.0.


Talking to space

The Talking to space section uses a different type of data: a catalogue of cultural objects and messages sent beyond Earth since the beginning of the space age.

This dataset is based directly on:

Paul E. Quast, "A Profile of Humanity: The Cultural Signature of Earth's Inhabitants Beyond the Atmosphere," in Speaking Beyond Earth: Perspectives on Messaging Across Deep Space and Cosmic Time (McFarland, 2024).

Each object is plotted by year and categorized by type: METI interstellar radio messages, send-your-name campaigns, time capsules, space race deposits, cultural and advertisement messages, and others.