https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The GDELT Project is the largest, most comprehensive, and highest resolution open database of human society ever created. Just the 2015 data alone records nearly three quarters of a trillion emotional snapshots and more than 1.5 billion location references, while its total archives span more than 215 years, making it one of the largest open-access spatio-temporal datasets in existance and pushing the boundaries of "big data" study of global human society. Its Global Knowledge Graph connects the world's people, organizations, locations, themes, counts, images and emotions into a single holistic network over the entire planet. How can you query, explore, model, visualize, interact, and even forecast this vast archive of human society?
GDELT 2.0 has a wealth of features in the event database which includes events reported in articles published in 65 live translated languages, measurements of 2,300 emotions and themes, high resolution views of the non-Western world, relevant imagery, videos, and social media embeds, quotes, names, amounts, and more.
You may find these code books helpful:
GDELT Global Knowledge Graph Codebook V2.1 (PDF)
GDELT Event Codebook V2.0 (PDF)
You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.github_repos.[TABLENAME]
. [Fork this kernel to get started][98] to learn how to safely manage analyzing large BigQuery datasets.
You may redistribute, rehost, republish, and mirror any of the GDELT datasets in any form. However, any use or redistribution of the data must include a citation to the GDELT Project and a link to the website (https://www.gdeltproject.org/).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
GDELT (https://www.gdeltproject.org/) is a project that monitors news from all over the world and in more than 100 languages in order to gather data about current events. This is a subset of the GDELT dataset that is relevant to the CUTLER project.
This dataset is used by UNIKO and USTUTT to analyse social events and public news in the four pilot cities and helps policy makers in city pilots to understand media and public sentiments regarding these events and news. This data could also be useful to researchers doing research on online news spread behaviour.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Card for dwb2023/gdelt-gkg-march2020-v2
Dataset Details
Dataset Description
This dataset contains GDELT Global Knowledge Graph (GKG) data covering March 10-22, 2020, during the early phase of the COVID-19 pandemic. It captures global event interactions, actor relationships, and contextual narratives to support temporal, spatial, and thematic analysis.
Curated by: dwb2023
Dataset Sources
Repository: http://data.gdeltproject.org/gdeltv2 GKG… See the full description on the dataset page: https://huggingface.co/datasets/dwb2023/gdelt-gkg-march2020-v2.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
The Global Database of Events, Language, and Tone (GDELT Project) monitors the world's broadcast, print, and web news from nearly every corner of every country in over 100 languages and identifies the people, locations, organizations, counts, themes, sources, and events driving our global society every second of every day, creating a free open platform for computing on the entire world.
UNCLASSIFIED - Conflict Events in Liberia (2004-2014)Founded in 1847, the country of Liberia was one of the first democratic African nations. Decades of inequality at the hands of newly freed slaves and their descendants over the indigenous populations came to a head in 1989 kicking off a fourteen year long civil war whose ramifications are still felt today. According to Healthcare Technologies for the World Traveler, there are “no domestic or transnational terrorist organizations known to be operating in Liberia”. The same source attributes this to a strong UNMIL (United Nations Mission in Liberia) presence. With 15,000 UN soldiers in Liberia, it is one of the UN’s most expensive peacekeeping operations. That being said, there is still a fair amount of civil unrest that has only been amplified by the Ebola crisis currently gripping the region. As stated previously, there is a significant lack of rebel or terrorist organization presence since the end of the country’s civil war. The creation and perpetuation of a strong democratic system as well as a significant UN military presence should prevent such penetration in the future. However with the rapid spread of Ebola in the country causing widespread fear among Liberians there is a major potential for increased riots, protests, and other forms of civil discourse. This could lead to government crackdowns and violations of basic human rights. Attribute Table Field DescriptionsISO3 - International Organization for Standardization 3-digit country code ADM0_NAME - Administration level zero identification / name ADM1_NAME - Administration level one identification / name ADM2_NAME - Administration level two identification / name LOCATION - Location of Conflict Event ACTOR1 - First actor involved in conflict event ACTOR2 - Second actor involved in conflict event EVENT_TYPE - Classification of conflict event DATE - Date of conflict event YEAR - Year of conflict event SPA_ACC - Spatial accuracy of site location (1 – high, 2 – medium, 3 – low) ORG_SOURCE - Original source of conflict event report NUM_DTH - Number of reported deaths during conflict event NUM_INJ - Number of reported injuries during conflict event COMMENTS - Comments or notes regarding the conflict event SOURCE_DT - Source one creation date SOURCE - Source one SOURCE2_DT - Source two creation date SOURCE2 - Source two CollectionConflict Points were compiled from the GDELT, ACLED and GTD conflict databases, three authorities in the monitoring and recording of instances of conflict across the globe. Consistent naming conventions for geographic locations were attempted but name variants may exist which can include historical or less widespread interpretations.The data included herein have not been derived from a registered survey and should be considered approximate unless otherwise defined. While rigorous steps have been taken to ensure the quality of each dataset, DigitalGlobe is not responsible for the accuracy and completeness of data compiled from outside sources.Sources (HGIS)"ACLED (1997 – 2014)." ACLED. September 2014. Accessed October 2014. http://www.acleddata.com."The GDELT Project." Data: Querying, Analyzing and Downloading:. September 1, 2014. Accessed September 25, 2014. www.gdeltproject.org/."Search the Database." Global Terrorism Database. August 1, 2014. Accessed September 25, 2014. http://www.start.umd.edu.Sources (Metadata)"Liberia Profile." BBC News. September 18, 2014. Accessed September 25, 2014. http://www.bbc.com."Liberia: War, Conflict & Peace." Insight on Conflict. January 1, 2014. Accessed September 25, 2014. http://www.insightonconflict.org."Country Risk Report." Healthcare Technologies for the World Traveler (HTH Worldwide). September 25, 2014. Accessed September 25, 2014. http://www.hthworldwide.com.Brooks, Cholo. "LIBERIA: The Rise of Terrorism In Africa." Global News Network (GNN) Liberia. February 26, 2014. Accessed September 25, 2014. http://www.gnnliberia.com.
This repository contains code and data for reproducing the study Geospatiality: The effect of topics on the presence of geolocation in English text data.
The study analyzed the frequency of geolocations in texts across several distinct datasets from different sources. These sources were:
For each source, a dataset was acquired and tested for the presence of geolocations in the texts, as well as annotated with topic-labels.
The scripts use as inputs the data from the zip files in the data directory. Files need to be unzipped before running the scripts. Note that usernames have been anonymized.
E_Modeling.R
Applies the mixed modeling approach described in the article.F1_Analyze_FracGeo.R
produces figures and tables visualising FracGeo, the fraction of geolocated text items per supertopic and dataset (Table 3 and Figure 3).F2_Explore_Variables.R
analyses FracGeo, across timesteps, authors, and text length (Figure 4).F3_Analyze_Models.R
analyses the fixed effects of the GLMM models for each dataset, and compares their correlation across datasets (Table 4, Figure 5, and Appendices A1-A6).F4_Validate.R
compares the georeferences and supertopic assignments of the models to the human annotations (Appendix 9 and Table 5).The file topic_taxonomy.xlsx
contains the topic taxonomy which matches topics to site-specific categories (e.g. subreddits, subforums, stackexchange sites). For users without access to MS office, the file can be loaded using open scripting languages, for example R:
library(openxlsx2)
path <- "../2_Data_Processing/Topic_taxonomy.xlsx"
tax_reddit <- openxlsx2::wb_read(path, sheet = "Topic_Taxonomy_Reddit")
tax_Stackexchange <- openxlsx2::wb_read(path, sheet = "Topic_Taxonomy_Stackexchange")
tax_Nairaland <- openxlsx2::wb_read(path, sheet = "Topic_Taxonomy_Nairaland")
tax_GDELT <- openxlsx2::wb_read(path, sheet = "Topic_Taxonomy_GDELT")
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
All datasets used in this study are sourced from publicly available international organization databases. These databases adhere to the principle of open access and have no third-party usage restrictions or special constraints. Therefore, all data can be directly obtained from the official websites of the corresponding databases.The detailed information and links of these databases are as follows: United Nations Conference on Trade and Development (UNCTAD) database: (URL:https://unctad.org/statistics ),UN Comtrade database:(URL:https://comtrade.un.org/ ),World Trade Organization (WTO) database:(URL:https://www.wto.org/english/res_e/statis_e/statis_e.htm ),Trade Remedy Information Network of the Ministry of Commerce of the People’s Republic of China:(URL:https://cacs.mofcom.gov.cn/cacscms/view/statistics/ckajtj ),Global Database of Events, Language, and Tone(GDELT) database:(URL:https://www.gdeltproject.org/ ).If researchers encounter any issues during the data acquisition process, they are welcome to contact the corresponding author, Minhua Lu, at the email address lumark@shisu.edu.cn
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the data of the study "The impact of news exposure on collective attention in the United States during the 2016 Zika epidemic".
Epidemiological data
The folder zika_USA_weekly_cases_2016.zip contains weekly ZIKV incidence counts reported by the US Centers for Disease Control and Prevention in 2016, by state. Data were extracted from reports made publicly available by the CDC at: https://zenodo.org/record/584136#.Xk07-RNKjOQ
Web news data
The file news_GDELT_data.csv.gz contains all news items extracted from the GDELT platform (https://www.gdeltproject.org/) matching TAX_DISEASE_ZIKA as a Theme, and United_States as a Location in the GDELT platform.
TV closed captions
The file zika_TV_mentions_dataframe.csv contains all the TV news items of 2016 matching the word ``Zika" in the TV News Archive https://archive.org/details/tv
Wikipedia pageview counts
Dataset 1: wikipedia_dataset1_zika_daily_pageview_usa.csv
Content of each line of the dataset: day, pageview_count
The dataset contains the daily number of pageview counts of 128 different Wikipedia pages related to the Zika virus (aggregated and summed to total) originated in the United States, from January 1st to December 31st, 2016.
Dataset 2: wikipedia_dataset2_zika_daily_pageview_bystate.zip
Content of each line of the dataset: day, pageview_count, state
The dataset contains the daily number of pageview counts of 128 different Wikipedia pages related to the Zika virus (aggregated and summed to total) originated in the United States, disaggregated by state, from January 1st to December 31st, 2016.
Dataset 3: wikipedia_dataset3_zika_pagecount_by_city.csv
Content of each line of the dataset: US_city, pageview_count_Zika,pageview_count_total
The dataset contains the total number of pageview counts of 128 different Wikipedia pages related to the Zika virus (pageview_count_Zika) originated in 788 cities (US_city) of the United States with a population larger than 40,000 in 2016.The dataset also contains the total number of pageview counts to all Wikipedia pages (all Wikipedia projects, pageview_count_total) originated in 788 cities (US_city) of the United States with a population larger than 40,000 in 2016."
Not seeing a result you expected?
Learn how you can add new datasets to our index.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The GDELT Project is the largest, most comprehensive, and highest resolution open database of human society ever created. Just the 2015 data alone records nearly three quarters of a trillion emotional snapshots and more than 1.5 billion location references, while its total archives span more than 215 years, making it one of the largest open-access spatio-temporal datasets in existance and pushing the boundaries of "big data" study of global human society. Its Global Knowledge Graph connects the world's people, organizations, locations, themes, counts, images and emotions into a single holistic network over the entire planet. How can you query, explore, model, visualize, interact, and even forecast this vast archive of human society?
GDELT 2.0 has a wealth of features in the event database which includes events reported in articles published in 65 live translated languages, measurements of 2,300 emotions and themes, high resolution views of the non-Western world, relevant imagery, videos, and social media embeds, quotes, names, amounts, and more.
You may find these code books helpful:
GDELT Global Knowledge Graph Codebook V2.1 (PDF)
GDELT Event Codebook V2.0 (PDF)
You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.github_repos.[TABLENAME]
. [Fork this kernel to get started][98] to learn how to safely manage analyzing large BigQuery datasets.
You may redistribute, rehost, republish, and mirror any of the GDELT datasets in any form. However, any use or redistribution of the data must include a citation to the GDELT Project and a link to the website (https://www.gdeltproject.org/).