8 datasets found

The GDELT Project
kaggle.com
zip
Updated Feb 12, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The GDELT Project (2019). The GDELT Project [Dataset]. https://www.kaggle.com/gdelt/gdelt
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Feb 12, 2019
Dataset authored and provided by
The GDELT Project
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The GDELT Project is the largest, most comprehensive, and highest resolution open database of human society ever created. Just the 2015 data alone records nearly three quarters of a trillion emotional snapshots and more than 1.5 billion location references, while its total archives span more than 215 years, making it one of the largest open-access spatio-temporal datasets in existance and pushing the boundaries of "big data" study of global human society. Its Global Knowledge Graph connects the world's people, organizations, locations, themes, counts, images and emotions into a single holistic network over the entire planet. How can you query, explore, model, visualize, interact, and even forecast this vast archive of human society?

Content

GDELT 2.0 has a wealth of features in the event database which includes events reported in articles published in 65 live translated languages, measurements of 2,300 emotions and themes, high resolution views of the non-Western world, relevant imagery, videos, and social media embeds, quotes, names, amounts, and more.

You may find these code books helpful:
GDELT Global Knowledge Graph Codebook V2.1 (PDF)
GDELT Event Codebook V2.0 (PDF)

Querying BigQuery tables

You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.github_repos.[TABLENAME]. [Fork this kernel to get started][98] to learn how to safely manage analyzing large BigQuery datasets.

Acknowledgements

You may redistribute, rehost, republish, and mirror any of the GDELT datasets in any form. However, any use or redistribution of the data must include a citation to the GDELT Project and a link to the website (https://www.gdeltproject.org/).
CUTLER-GDELT Datasets
zenodo.org
explore.openaire.eu
xz
Updated Nov 26, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jun Sun; Jun Sun (2020). CUTLER-GDELT Datasets [Dataset]. http://doi.org/10.5281/zenodo.4286472
Explore at:
xzAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4286472
Dataset updated
Nov 26, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jun Sun; Jun Sun
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
GDELT (https://www.gdeltproject.org/) is a project that monitors news from all over the world and in more than 100 languages in order to gather data about current events. This is a subset of the GDELT dataset that is relevant to the CUTLER project.

This dataset is used by UNIKO and USTUTT to analyse social events and public news in the four pilot cities and helps policy makers in city pilots to understand media and public sentiments regarding these events and news. This data could also be useful to researchers doing research on online news spread behaviour.
h
gdelt-gkg-march2020-v2
huggingface.co
Updated Mar 10, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Don Branson (2020). gdelt-gkg-march2020-v2 [Dataset]. https://huggingface.co/datasets/dwb2023/gdelt-gkg-march2020-v2
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 10, 2020
Authors
Don Branson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for dwb2023/gdelt-gkg-march2020-v2

Dataset Details Dataset Description

This dataset contains GDELT Global Knowledge Graph (GKG) data covering March 10-22, 2020, during the early phase of the COVID-19 pandemic. It captures global event interactions, actor relationships, and contextual narratives to support temporal, spatial, and thematic analysis.

Curated by: dwb2023

Dataset Sources

Repository: http://data.gdeltproject.org/gdeltv2 GKG… See the full description on the dataset page: https://huggingface.co/datasets/dwb2023/gdelt-gkg-march2020-v2.
w
Global Database of Events, Language, and Tone (GDELT Project)
data.wu.ac.at
datadiscoverystudio.org
Updated Jul 18, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
World Wide Human Geography Data Working Group (2014). Global Database of Events, Language, and Tone (GDELT Project) [Dataset]. https://data.wu.ac.at/schema/data_gov/ZWZhYTQxZjItZjk0MC00M2JlLWJkM2UtYWU2Yjg0MzNkZTRh
Explore at:
Dataset updated
Jul 18, 2014
Dataset provided by
World Wide Human Geography Data Working Group
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
The Global Database of Events, Language, and Tone (GDELT Project) monitors the world's broadcast, print, and web news from nearly every corner of every country in over 100 languages and identifies the people, locations, organizations, counts, themes, sources, and events driving our global society every second of every day, creating a free open platform for computing on the entire world.
Liberia Conflict Points
ebola-nga.opendata.arcgis.com
Updated Dec 4, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Geospatial-Intelligence Agency (2014). Liberia Conflict Points [Dataset]. https://ebola-nga.opendata.arcgis.com/content/eb7b77ec1b294e1393a4f099eae33f6e
Explore at:
Dataset updated
Dec 4, 2014
Dataset authored and provided by
National Geospatial-Intelligence Agencyhttp://www.nga.mil/
Area covered

Description
UNCLASSIFIED - Conflict Events in Liberia (2004-2014)Founded in 1847, the country of Liberia was one of the first democratic African nations. Decades of inequality at the hands of newly freed slaves and their descendants over the indigenous populations came to a head in 1989 kicking off a fourteen year long civil war whose ramifications are still felt today. According to Healthcare Technologies for the World Traveler, there are “no domestic or transnational terrorist organizations known to be operating in Liberia”. The same source attributes this to a strong UNMIL (United Nations Mission in Liberia) presence. With 15,000 UN soldiers in Liberia, it is one of the UN’s most expensive peacekeeping operations. That being said, there is still a fair amount of civil unrest that has only been amplified by the Ebola crisis currently gripping the region. As stated previously, there is a significant lack of rebel or terrorist organization presence since the end of the country’s civil war. The creation and perpetuation of a strong democratic system as well as a significant UN military presence should prevent such penetration in the future. However with the rapid spread of Ebola in the country causing widespread fear among Liberians there is a major potential for increased riots, protests, and other forms of civil discourse. This could lead to government crackdowns and violations of basic human rights. Attribute Table Field DescriptionsISO3 - International Organization for Standardization 3-digit country code ADM0_NAME - Administration level zero identification / name ADM1_NAME - Administration level one identification / name ADM2_NAME - Administration level two identification / name LOCATION - Location of Conflict Event ACTOR1 - First actor involved in conflict event ACTOR2 - Second actor involved in conflict event EVENT_TYPE - Classification of conflict event DATE - Date of conflict event YEAR - Year of conflict event SPA_ACC - Spatial accuracy of site location (1 – high, 2 – medium, 3 – low) ORG_SOURCE - Original source of conflict event report NUM_DTH - Number of reported deaths during conflict event NUM_INJ - Number of reported injuries during conflict event COMMENTS - Comments or notes regarding the conflict event SOURCE_DT - Source one creation date SOURCE - Source one SOURCE2_DT - Source two creation date SOURCE2 - Source two CollectionConflict Points were compiled from the GDELT, ACLED and GTD conflict databases, three authorities in the monitoring and recording of instances of conflict across the globe. Consistent naming conventions for geographic locations were attempted but name variants may exist which can include historical or less widespread interpretations.The data included herein have not been derived from a registered survey and should be considered approximate unless otherwise defined. While rigorous steps have been taken to ensure the quality of each dataset, DigitalGlobe is not responsible for the accuracy and completeness of data compiled from outside sources.Sources (HGIS)"ACLED (1997 – 2014)." ACLED. September 2014. Accessed October 2014. http://www.acleddata.com."The GDELT Project." Data: Querying, Analyzing and Downloading:. September 1, 2014. Accessed September 25, 2014. www.gdeltproject.org/."Search the Database." Global Terrorism Database. August 1, 2014. Accessed September 25, 2014. http://www.start.umd.edu.Sources (Metadata)"Liberia Profile." BBC News. September 18, 2014. Accessed September 25, 2014. http://www.bbc.com."Liberia: War, Conflict & Peace." Insight on Conflict. January 1, 2014. Accessed September 25, 2014. http://www.insightonconflict.org."Country Risk Report." Healthcare Technologies for the World Traveler (HTH Worldwide). September 25, 2014. Accessed September 25, 2014. http://www.hthworldwide.com.Brooks, Cholo. "LIBERIA: The Rise of Terrorism In Africa." Global News Network (GNN) Liberia. February 26, 2014. Accessed September 25, 2014. http://www.gnnliberia.com.
Geospatiality_data
zenodo.org
bin, zip
Updated Jan 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Johannes Mast; Johannes Mast (2025). Geospatiality_data [Dataset]. http://doi.org/10.5281/zenodo.14329235
Explore at:
bin, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14329235
Dataset updated
Jan 21, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Johannes Mast; Johannes Mast
Description
This repository contains code and data for reproducing the study Geospatiality: The effect of topics on the presence of geolocation in English text data.

The study analyzed the frequency of geolocations in texts across several distinct datasets from different sources. These sources were:

Twitter (X)

Reddit

Stackexchange

GDELT

IA-Americana

Nairaland

For each source, a dataset was acquired and tested for the presence of geolocations in the texts, as well as annotated with topic-labels.
The scripts use as inputs the data from the zip files in the data directory. Files need to be unzipped before running the scripts. Note that usernames have been anonymized.

E_Modeling.R Applies the mixed modeling approach described in the article.

F1_Analyze_FracGeo.R produces figures and tables visualising FracGeo, the fraction of geolocated text items per supertopic and dataset (Table 3 and Figure 3).

F2_Explore_Variables.R analyses FracGeo, across timesteps, authors, and text length (Figure 4).

F3_Analyze_Models.R analyses the fixed effects of the GLMM models for each dataset, and compares their correlation across datasets (Table 4, Figure 5, and Appendices A1-A6).

F4_Validate.R compares the georeferences and supertopic assignments of the models to the human annotations (Appendix 9 and Table 5).

The file topic_taxonomy.xlsx contains the topic taxonomy which matches topics to site-specific categories (e.g. subreddits, subforums, stackexchange sites). For users without access to MS office, the file can be loaded using open scripting languages, for example R:

library(openxlsx2) path <- "../2_Data_Processing/Topic_taxonomy.xlsx" tax_reddit <- openxlsx2::wb_read(path, sheet = "Topic_Taxonomy_Reddit") tax_Stackexchange <- openxlsx2::wb_read(path, sheet = "Topic_Taxonomy_Stackexchange") tax_Nairaland <- openxlsx2::wb_read(path, sheet = "Topic_Taxonomy_Nairaland") tax_GDELT <- openxlsx2::wb_read(path, sheet = "Topic_Taxonomy_GDELT")
Analyzing the Capacity Gap between China and the United States as Trade...
figshare.com
xlsx
Updated May 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Minhua Lu (2025). Analyzing the Capacity Gap between China and the United States as Trade Powers [Dataset]. http://doi.org/10.6084/m9.figshare.28844381.v2
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28844381.v2
Dataset updated
May 9, 2025
Dataset provided by
figshare
Authors
Minhua Lu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
China, United States
Description
All datasets used in this study are sourced from publicly available international organization databases. These databases adhere to the principle of open access and have no third-party usage restrictions or special constraints. Therefore, all data can be directly obtained from the official websites of the corresponding databases.The detailed information and links of these databases are as follows: United Nations Conference on Trade and Development (UNCTAD) database：（URL:https://unctad.org/statistics ）,UN Comtrade database：（URL:https://comtrade.un.org/ ）,World Trade Organization (WTO) database：（URL:https://www.wto.org/english/res_e/statis_e/statis_e.htm ）,Trade Remedy Information Network of the Ministry of Commerce of the People’s Republic of China：（URL:https://cacs.mofcom.gov.cn/cacscms/view/statistics/ckajtj ）,Global Database of Events, Language, and Tone（GDELT） database：（URL:https://www.gdeltproject.org/ ）.If researchers encounter any issues during the data acquisition process, they are welcome to contact the corresponding author, Minhua Lu, at the email address lumark@shisu.edu.cn
Z
Data from: The impact of news exposure on collective attention in the United...
data.niaid.nih.gov
Updated Mar 2, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michele Tizzoni (2020). The impact of news exposure on collective attention in the United States during the 2016 Zika epidemic [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3603915
Explore at:
Dataset updated
Mar 2, 2020
Dataset provided by
Ciro Cattuto
Michele Tizzoni
André Panisson
Daniela Paolotti
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United States
Description
This repository contains the data of the study "The impact of news exposure on collective attention in the United States during the 2016 Zika epidemic".

Epidemiological data

The folder zika_USA_weekly_cases_2016.zip contains weekly ZIKV incidence counts reported by the US Centers for Disease Control and Prevention in 2016, by state. Data were extracted from reports made publicly available by the CDC at: https://zenodo.org/record/584136#.Xk07-RNKjOQ

Web news data

The file news_GDELT_data.csv.gz contains all news items extracted from the GDELT platform (https://www.gdeltproject.org/) matching TAX_DISEASE_ZIKA as a Theme, and United_States as a Location in the GDELT platform.

TV closed captions

The file zika_TV_mentions_dataframe.csv contains all the TV news items of 2016 matching the word ``Zika" in the TV News Archive https://archive.org/details/tv

Wikipedia pageview counts

Dataset 1: wikipedia_dataset1_zika_daily_pageview_usa.csv

Content of each line of the dataset: day, pageview_count

The dataset contains the daily number of pageview counts of 128 different Wikipedia pages related to the Zika virus (aggregated and summed to total) originated in the United States, from January 1st to December 31st, 2016.

Dataset 2: wikipedia_dataset2_zika_daily_pageview_bystate.zip

Content of each line of the dataset: day, pageview_count, state

The dataset contains the daily number of pageview counts of 128 different Wikipedia pages related to the Zika virus (aggregated and summed to total) originated in the United States, disaggregated by state, from January 1st to December 31st, 2016.

Dataset 3: wikipedia_dataset3_zika_pagecount_by_city.csv

Content of each line of the dataset: US_city, pageview_count_Zika,pageview_count_total

The dataset contains the total number of pageview counts of 128 different Wikipedia pages related to the Zika virus (pageview_count_Zika) originated in 788 cities (US_city) of the United States with a population larger than 40,000 in 2016.The dataset also contains the total number of pageview counts to all Wikipedia pages (all Wikipedia projects, pageview_count_total) originated in 788 cities (US_city) of the United States with a population larger than 40,000 in 2016."
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

The GDELT Project (2019). The GDELT Project [Dataset]. https://www.kaggle.com/gdelt/gdelt

The GDELT Project

A realtime database of global human society for open research

Explore at:

zip(0 bytes)Available download formats

Dataset updated

Feb 12, 2019

Dataset authored and provided by

The GDELT Project

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

The GDELT Project is the largest, most comprehensive, and highest resolution open database of human society ever created. Just the 2015 data alone records nearly three quarters of a trillion emotional snapshots and more than 1.5 billion location references, while its total archives span more than 215 years, making it one of the largest open-access spatio-temporal datasets in existance and pushing the boundaries of "big data" study of global human society. Its Global Knowledge Graph connects the world's people, organizations, locations, themes, counts, images and emotions into a single holistic network over the entire planet. How can you query, explore, model, visualize, interact, and even forecast this vast archive of human society?

Content

GDELT 2.0 has a wealth of features in the event database which includes events reported in articles published in 65 live translated languages, measurements of 2,300 emotions and themes, high resolution views of the non-Western world, relevant imagery, videos, and social media embeds, quotes, names, amounts, and more.

You may find these code books helpful:
GDELT Global Knowledge Graph Codebook V2.1 (PDF)
GDELT Event Codebook V2.0 (PDF)

Querying BigQuery tables

You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.github_repos.[TABLENAME]. [Fork this kernel to get started][98] to learn how to safely manage analyzing large BigQuery datasets.

Acknowledgements

You may redistribute, rehost, republish, and mirror any of the GDELT datasets in any form. However, any use or redistribution of the data must include a citation to the GDELT Project and a link to the website (https://www.gdeltproject.org/).

Clear search

Close search

Google apps

Main menu

The GDELT Project

Context

Content

Querying BigQuery tables

Acknowledgements

CUTLER-GDELT Datasets

gdelt-gkg-march2020-v2

Global Database of Events, Language, and Tone (GDELT Project)

Liberia Conflict Points

Geospatiality_data

Analyzing the Capacity Gap between China and the United States as Trade...

Data from: The impact of news exposure on collective attention in the United...

The GDELT Project

A realtime database of global human society for open research

Context

Content

Querying BigQuery tables

Acknowledgements