8 datasets found
  1. The GDELT Project

    • kaggle.com
    zip
    Updated Feb 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The GDELT Project (2019). The GDELT Project [Dataset]. https://www.kaggle.com/gdelt/gdelt
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Feb 12, 2019
    Dataset authored and provided by
    The GDELT Project
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The GDELT Project is the largest, most comprehensive, and highest resolution open database of human society ever created. Just the 2015 data alone records nearly three quarters of a trillion emotional snapshots and more than 1.5 billion location references, while its total archives span more than 215 years, making it one of the largest open-access spatio-temporal datasets in existance and pushing the boundaries of "big data" study of global human society. Its Global Knowledge Graph connects the world's people, organizations, locations, themes, counts, images and emotions into a single holistic network over the entire planet. How can you query, explore, model, visualize, interact, and even forecast this vast archive of human society?

    Content

    GDELT 2.0 has a wealth of features in the event database which includes events reported in articles published in 65 live translated languages, measurements of 2,300 emotions and themes, high resolution views of the non-Western world, relevant imagery, videos, and social media embeds, quotes, names, amounts, and more.

    You may find these code books helpful:
    GDELT Global Knowledge Graph Codebook V2.1 (PDF)
    GDELT Event Codebook V2.0 (PDF)

    Querying BigQuery tables

    You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.github_repos.[TABLENAME]. [Fork this kernel to get started][98] to learn how to safely manage analyzing large BigQuery datasets.

    Acknowledgements

    You may redistribute, rehost, republish, and mirror any of the GDELT datasets in any form. However, any use or redistribution of the data must include a citation to the GDELT Project and a link to the website (https://www.gdeltproject.org/).

  2. CUTLER-GDELT Datasets

    • zenodo.org
    • explore.openaire.eu
    xz
    Updated Nov 26, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jun Sun; Jun Sun (2020). CUTLER-GDELT Datasets [Dataset]. http://doi.org/10.5281/zenodo.4286472
    Explore at:
    xzAvailable download formats
    Dataset updated
    Nov 26, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jun Sun; Jun Sun
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    GDELT (https://www.gdeltproject.org/) is a project that monitors news from all over the world and in more than 100 languages in order to gather data about current events. This is a subset of the GDELT dataset that is relevant to the CUTLER project.

    This dataset is used by UNIKO and USTUTT to analyse social events and public news in the four pilot cities and helps policy makers in city pilots to understand media and public sentiments regarding these events and news. This data could also be useful to researchers doing research on online news spread behaviour.

  3. h

    gdelt-gkg-march2020-v2

    • huggingface.co
    Updated Mar 10, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Don Branson (2020). gdelt-gkg-march2020-v2 [Dataset]. https://huggingface.co/datasets/dwb2023/gdelt-gkg-march2020-v2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 10, 2020
    Authors
    Don Branson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for dwb2023/gdelt-gkg-march2020-v2

      Dataset Details
    
    
    
    
    
      Dataset Description
    

    This dataset contains GDELT Global Knowledge Graph (GKG) data covering March 10-22, 2020, during the early phase of the COVID-19 pandemic. It captures global event interactions, actor relationships, and contextual narratives to support temporal, spatial, and thematic analysis.

    Curated by: dwb2023

      Dataset Sources
    

    Repository: http://data.gdeltproject.org/gdeltv2 GKG… See the full description on the dataset page: https://huggingface.co/datasets/dwb2023/gdelt-gkg-march2020-v2.

  4. w

    Global Database of Events, Language, and Tone (GDELT Project)

    • data.wu.ac.at
    • datadiscoverystudio.org
    Updated Jul 18, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Wide Human Geography Data Working Group (2014). Global Database of Events, Language, and Tone (GDELT Project) [Dataset]. https://data.wu.ac.at/schema/data_gov/ZWZhYTQxZjItZjk0MC00M2JlLWJkM2UtYWU2Yjg0MzNkZTRh
    Explore at:
    Dataset updated
    Jul 18, 2014
    Dataset provided by
    World Wide Human Geography Data Working Group
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    The Global Database of Events, Language, and Tone (GDELT Project) monitors the world's broadcast, print, and web news from nearly every corner of every country in over 100 languages and identifies the people, locations, organizations, counts, themes, sources, and events driving our global society every second of every day, creating a free open platform for computing on the entire world.

  5. Liberia Conflict Points

    • ebola-nga.opendata.arcgis.com
    Updated Dec 4, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Geospatial-Intelligence Agency (2014). Liberia Conflict Points [Dataset]. https://ebola-nga.opendata.arcgis.com/content/eb7b77ec1b294e1393a4f099eae33f6e
    Explore at:
    Dataset updated
    Dec 4, 2014
    Dataset authored and provided by
    National Geospatial-Intelligence Agencyhttp://www.nga.mil/
    Area covered
    Description

    UNCLASSIFIED - Conflict Events in Liberia (2004-2014)Founded in 1847, the country of Liberia was one of the first democratic African nations. Decades of inequality at the hands of newly freed slaves and their descendants over the indigenous populations came to a head in 1989 kicking off a fourteen year long civil war whose ramifications are still felt today. According to Healthcare Technologies for the World Traveler, there are “no domestic or transnational terrorist organizations known to be operating in Liberia”. The same source attributes this to a strong UNMIL (United Nations Mission in Liberia) presence. With 15,000 UN soldiers in Liberia, it is one of the UN’s most expensive peacekeeping operations. That being said, there is still a fair amount of civil unrest that has only been amplified by the Ebola crisis currently gripping the region. As stated previously, there is a significant lack of rebel or terrorist organization presence since the end of the country’s civil war. The creation and perpetuation of a strong democratic system as well as a significant UN military presence should prevent such penetration in the future. However with the rapid spread of Ebola in the country causing widespread fear among Liberians there is a major potential for increased riots, protests, and other forms of civil discourse. This could lead to government crackdowns and violations of basic human rights. Attribute Table Field DescriptionsISO3 - International Organization for Standardization 3-digit country code ADM0_NAME - Administration level zero identification / name ADM1_NAME - Administration level one identification / name ADM2_NAME - Administration level two identification / name LOCATION - Location of Conflict Event ACTOR1 - First actor involved in conflict event ACTOR2 - Second actor involved in conflict event EVENT_TYPE - Classification of conflict event DATE - Date of conflict event YEAR - Year of conflict event SPA_ACC - Spatial accuracy of site location (1 – high, 2 – medium, 3 – low) ORG_SOURCE - Original source of conflict event report NUM_DTH - Number of reported deaths during conflict event NUM_INJ - Number of reported injuries during conflict event COMMENTS - Comments or notes regarding the conflict event SOURCE_DT - Source one creation date SOURCE - Source one SOURCE2_DT - Source two creation date SOURCE2 - Source two CollectionConflict Points were compiled from the GDELT, ACLED and GTD conflict databases, three authorities in the monitoring and recording of instances of conflict across the globe. Consistent naming conventions for geographic locations were attempted but name variants may exist which can include historical or less widespread interpretations.The data included herein have not been derived from a registered survey and should be considered approximate unless otherwise defined. While rigorous steps have been taken to ensure the quality of each dataset, DigitalGlobe is not responsible for the accuracy and completeness of data compiled from outside sources.Sources (HGIS)"ACLED (1997 – 2014)." ACLED. September 2014. Accessed October 2014. http://www.acleddata.com."The GDELT Project." Data: Querying, Analyzing and Downloading:. September 1, 2014. Accessed September 25, 2014. www.gdeltproject.org/."Search the Database." Global Terrorism Database. August 1, 2014. Accessed September 25, 2014. http://www.start.umd.edu.Sources (Metadata)"Liberia Profile." BBC News. September 18, 2014. Accessed September 25, 2014. http://www.bbc.com."Liberia: War, Conflict & Peace." Insight on Conflict. January 1, 2014. Accessed September 25, 2014. http://www.insightonconflict.org."Country Risk Report." Healthcare Technologies for the World Traveler (HTH Worldwide). September 25, 2014. Accessed September 25, 2014. http://www.hthworldwide.com.Brooks, Cholo. "LIBERIA: The Rise of Terrorism In Africa." Global News Network (GNN) Liberia. February 26, 2014. Accessed September 25, 2014. http://www.gnnliberia.com.

  6. Geospatiality_data

    • zenodo.org
    bin, zip
    Updated Jan 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Johannes Mast; Johannes Mast (2025). Geospatiality_data [Dataset]. http://doi.org/10.5281/zenodo.14329235
    Explore at:
    bin, zipAvailable download formats
    Dataset updated
    Jan 21, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Johannes Mast; Johannes Mast
    Description

    This repository contains code and data for reproducing the study Geospatiality: The effect of topics on the presence of geolocation in English text data.

    The study analyzed the frequency of geolocations in texts across several distinct datasets from different sources. These sources were:

    For each source, a dataset was acquired and tested for the presence of geolocations in the texts, as well as annotated with topic-labels.
    The scripts use as inputs the data from the zip files in the data directory. Files need to be unzipped before running the scripts. Note that usernames have been anonymized.

    • E_Modeling.R Applies the mixed modeling approach described in the article.
    • F1_Analyze_FracGeo.R produces figures and tables visualising FracGeo, the fraction of geolocated text items per supertopic and dataset (Table 3 and Figure 3).
    • F2_Explore_Variables.R analyses FracGeo, across timesteps, authors, and text length (Figure 4).
    • F3_Analyze_Models.R analyses the fixed effects of the GLMM models for each dataset, and compares their correlation across datasets (Table 4, Figure 5, and Appendices A1-A6).
    • F4_Validate.R compares the georeferences and supertopic assignments of the models to the human annotations (Appendix 9 and Table 5).

    The file topic_taxonomy.xlsx contains the topic taxonomy which matches topics to site-specific categories (e.g. subreddits, subforums, stackexchange sites). For users without access to MS office, the file can be loaded using open scripting languages, for example R:

    library(openxlsx2)
    
    path <- "../2_Data_Processing/Topic_taxonomy.xlsx"
    
    tax_reddit <- openxlsx2::wb_read(path, sheet = "Topic_Taxonomy_Reddit")
    tax_Stackexchange <- openxlsx2::wb_read(path, sheet = "Topic_Taxonomy_Stackexchange")
    tax_Nairaland <- openxlsx2::wb_read(path, sheet = "Topic_Taxonomy_Nairaland")
    tax_GDELT <- openxlsx2::wb_read(path, sheet = "Topic_Taxonomy_GDELT")
    
    
  7. Analyzing the Capacity Gap between China and the United States as Trade...

    • figshare.com
    xlsx
    Updated May 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Minhua Lu (2025). Analyzing the Capacity Gap between China and the United States as Trade Powers [Dataset]. http://doi.org/10.6084/m9.figshare.28844381.v2
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 9, 2025
    Dataset provided by
    figshare
    Authors
    Minhua Lu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    China, United States
    Description

    All datasets used in this study are sourced from publicly available international organization databases. These databases adhere to the principle of open access and have no third-party usage restrictions or special constraints. Therefore, all data can be directly obtained from the official websites of the corresponding databases.The detailed information and links of these databases are as follows: United Nations Conference on Trade and Development (UNCTAD) database: (URL:https://unctad.org/statistics ),UN Comtrade database:(URL:https://comtrade.un.org/ ),World Trade Organization (WTO) database:(URL:https://www.wto.org/english/res_e/statis_e/statis_e.htm ),Trade Remedy Information Network of the Ministry of Commerce of the People’s Republic of China:(URL:https://cacs.mofcom.gov.cn/cacscms/view/statistics/ckajtj ),Global Database of Events, Language, and Tone(GDELT) database:(URL:https://www.gdeltproject.org/ ).If researchers encounter any issues during the data acquisition process, they are welcome to contact the corresponding author, Minhua Lu, at the email address lumark@shisu.edu.cn

  8. Z

    Data from: The impact of news exposure on collective attention in the United...

    • data.niaid.nih.gov
    Updated Mar 2, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michele Tizzoni (2020). The impact of news exposure on collective attention in the United States during the 2016 Zika epidemic [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3603915
    Explore at:
    Dataset updated
    Mar 2, 2020
    Dataset provided by
    Ciro Cattuto
    Michele Tizzoni
    André Panisson
    Daniela Paolotti
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United States
    Description

    This repository contains the data of the study "The impact of news exposure on collective attention in the United States during the 2016 Zika epidemic".

    Epidemiological data

    The folder zika_USA_weekly_cases_2016.zip contains weekly ZIKV incidence counts reported by the US Centers for Disease Control and Prevention in 2016, by state. Data were extracted from reports made publicly available by the CDC at: https://zenodo.org/record/584136#.Xk07-RNKjOQ

    Web news data

    The file news_GDELT_data.csv.gz contains all news items extracted from the GDELT platform (https://www.gdeltproject.org/) matching TAX_DISEASE_ZIKA as a Theme, and United_States as a Location in the GDELT platform.

    TV closed captions

    The file zika_TV_mentions_dataframe.csv contains all the TV news items of 2016 matching the word ``Zika" in the TV News Archive https://archive.org/details/tv

    Wikipedia pageview counts

    Dataset 1: wikipedia_dataset1_zika_daily_pageview_usa.csv

    Content of each line of the dataset: day, pageview_count

    The dataset contains the daily number of pageview counts of 128 different Wikipedia pages related to the Zika virus (aggregated and summed to total) originated in the United States, from January 1st to December 31st, 2016.

    Dataset 2: wikipedia_dataset2_zika_daily_pageview_bystate.zip

    Content of each line of the dataset: day, pageview_count, state

    The dataset contains the daily number of pageview counts of 128 different Wikipedia pages related to the Zika virus (aggregated and summed to total) originated in the United States, disaggregated by state, from January 1st to December 31st, 2016.

    Dataset 3: wikipedia_dataset3_zika_pagecount_by_city.csv

    Content of each line of the dataset: US_city, pageview_count_Zika,pageview_count_total

    The dataset contains the total number of pageview counts of 128 different Wikipedia pages related to the Zika virus (pageview_count_Zika) originated in 788 cities (US_city) of the United States with a population larger than 40,000 in 2016.The dataset also contains the total number of pageview counts to all Wikipedia pages (all Wikipedia projects, pageview_count_total) originated in 788 cities (US_city) of the United States with a population larger than 40,000 in 2016."

  9. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
The GDELT Project (2019). The GDELT Project [Dataset]. https://www.kaggle.com/gdelt/gdelt
Organization logo

The GDELT Project

A realtime database of global human society for open research

Explore at:
zip(0 bytes)Available download formats
Dataset updated
Feb 12, 2019
Dataset authored and provided by
The GDELT Project
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

The GDELT Project is the largest, most comprehensive, and highest resolution open database of human society ever created. Just the 2015 data alone records nearly three quarters of a trillion emotional snapshots and more than 1.5 billion location references, while its total archives span more than 215 years, making it one of the largest open-access spatio-temporal datasets in existance and pushing the boundaries of "big data" study of global human society. Its Global Knowledge Graph connects the world's people, organizations, locations, themes, counts, images and emotions into a single holistic network over the entire planet. How can you query, explore, model, visualize, interact, and even forecast this vast archive of human society?

Content

GDELT 2.0 has a wealth of features in the event database which includes events reported in articles published in 65 live translated languages, measurements of 2,300 emotions and themes, high resolution views of the non-Western world, relevant imagery, videos, and social media embeds, quotes, names, amounts, and more.

You may find these code books helpful:
GDELT Global Knowledge Graph Codebook V2.1 (PDF)
GDELT Event Codebook V2.0 (PDF)

Querying BigQuery tables

You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.github_repos.[TABLENAME]. [Fork this kernel to get started][98] to learn how to safely manage analyzing large BigQuery datasets.

Acknowledgements

You may redistribute, rehost, republish, and mirror any of the GDELT datasets in any form. However, any use or redistribution of the data must include a citation to the GDELT Project and a link to the website (https://www.gdeltproject.org/).

Search
Clear search
Close search
Google apps
Main menu