42 datasets found
  1. Climate Change: Earth Surface Temperature Data

    • kaggle.com
    • redivis.com
    zip
    Updated May 1, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Berkeley Earth (2017). Climate Change: Earth Surface Temperature Data [Dataset]. https://www.kaggle.com/datasets/berkeleyearth/climate-change-earth-surface-temperature-data
    Explore at:
    zip(88843537 bytes)Available download formats
    Dataset updated
    May 1, 2017
    Dataset authored and provided by
    Berkeley Earthhttp://berkeleyearth.org/
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Area covered
    Earth
    Description

    Some say climate change is the biggest threat of our age while others say it’s a myth based on dodgy science. We are turning some of the data over to you so you can form your own view.

    us-climate-change

    Even more than with other data sets that Kaggle has featured, there’s a huge amount of data cleaning and preparation that goes into putting together a long-time study of climate trends. Early data was collected by technicians using mercury thermometers, where any variation in the visit time impacted measurements. In the 1940s, the construction of airports caused many weather stations to be moved. In the 1980s, there was a move to electronic thermometers that are said to have a cooling bias.

    Given this complexity, there are a range of organizations that collate climate trends data. The three most cited land and ocean temperature data sets are NOAA’s MLOST, NASA’s GISTEMP and the UK’s HadCrut.

    We have repackaged the data from a newer compilation put together by the Berkeley Earth, which is affiliated with Lawrence Berkeley National Laboratory. The Berkeley Earth Surface Temperature Study combines 1.6 billion temperature reports from 16 pre-existing archives. It is nicely packaged and allows for slicing into interesting subsets (for example by country). They publish the source data and the code for the transformations they applied. They also use methods that allow weather observations from shorter time series to be included, meaning fewer observations need to be thrown away.

    In this dataset, we have include several files:

    Global Land and Ocean-and-Land Temperatures (GlobalTemperatures.csv):

    • Date: starts in 1750 for average land temperature and 1850 for max and min land temperatures and global ocean and land temperatures
    • LandAverageTemperature: global average land temperature in celsius
    • LandAverageTemperatureUncertainty: the 95% confidence interval around the average
    • LandMaxTemperature: global average maximum land temperature in celsius
    • LandMaxTemperatureUncertainty: the 95% confidence interval around the maximum land temperature
    • LandMinTemperature: global average minimum land temperature in celsius
    • LandMinTemperatureUncertainty: the 95% confidence interval around the minimum land temperature
    • LandAndOceanAverageTemperature: global average land and ocean temperature in celsius
    • LandAndOceanAverageTemperatureUncertainty: the 95% confidence interval around the global average land and ocean temperature

    Other files include:

    • Global Average Land Temperature by Country (GlobalLandTemperaturesByCountry.csv)
    • Global Average Land Temperature by State (GlobalLandTemperaturesByState.csv)
    • Global Land Temperatures By Major City (GlobalLandTemperaturesByMajorCity.csv)
    • Global Land Temperatures By City (GlobalLandTemperaturesByCity.csv)

    The raw data comes from the Berkeley Earth data page.

  2. Total population worldwide 1950-2100

    • statista.com
    • ai-chatbox.pro
    Updated Feb 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Total population worldwide 1950-2100 [Dataset]. https://www.statista.com/statistics/805044/total-population-worldwide/
    Explore at:
    Dataset updated
    Feb 24, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    World
    Description

    The world population surpassed eight billion people in 2022, having doubled from its figure less than 50 years previously. Looking forward, it is projected that the world population will reach nine billion in 2038, and 10 billion in 2060, but it will peak around 10.3 billion in the 2080s before it then goes into decline. Regional variations The global population has seen rapid growth since the early 1800s, due to advances in areas such as food production, healthcare, water safety, education, and infrastructure, however, these changes did not occur at a uniform time or pace across the world. Broadly speaking, the first regions to undergo their demographic transitions were Europe, North America, and Oceania, followed by Latin America and Asia (although Asia's development saw the greatest variation due to its size), while Africa was the last continent to undergo this transformation. Because of these differences, many so-called "advanced" countries are now experiencing population decline, particularly in Europe and East Asia, while the fastest population growth rates are found in Sub-Saharan Africa. In fact, the roughly two billion difference in population between now and the 2080s' peak will be found in Sub-Saharan Africa, which will rise from 1.2 billion to 3.2 billion in this time (although populations in other continents will also fluctuate). Changing projections The United Nations releases their World Population Prospects report every 1-2 years, and this is widely considered the foremost demographic dataset in the world. However, recent years have seen a notable decline in projections when the global population will peak, and at what number. Previous reports in the 2010s had suggested a peak of over 11 billion people, and that population growth would continue into the 2100s, however a sooner and shorter peak is now projected. Reasons for this include a more rapid population decline in East Asia and Europe, particularly China, as well as a prolongued development arc in Sub-Saharan Africa.

  3. Wonders of the World Image Dataset

    • kaggle.com
    Updated May 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bala Baskar (2022). Wonders of the World Image Dataset [Dataset]. https://www.kaggle.com/datasets/balabaskar/wonders-of-the-world-image-classification
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 3, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Bala Baskar
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Introduction

    The New 7 Wonders of the World was a campaign started in 2000 to choose Wonders of the World from a selection of 200 existing monuments. The popularity poll via free Web-based voting and small amounts of telephone voting was led by Canadian-Swiss Bernard Weber and organized by the New 7 Wonders Foundation (N7W) based in Zurich, Switzerland, with winners announced on 7 July 2007 in Lisbon, at Estádio da Luz. The poll was considered unscientific partly because it was possible for people to cast multiple votes.

    Context

    When someday, if we plan to go on a World tour, obviously there is going to be a bucket list of wonders or places around the world, that we wish to visit. Here, we have one set of "Wonders of the World" images scraped from Google Images. Let us use our deep learning skills to build multiclass classification to identify the place in the images.

    Data Preparation

    This dataset contains a total of 3846 images placed in folders, with which each folder representing one of the top new wonders of the world. Below is the list of wonders with images extracted from Google Images.

    • Venezuela Angel Falls
    • Taj Mahal
    • Stonehenge
    • Statue of Liberty
    • Chichen Itz
    • Christ the Redeemer
    • Pyramids of Giza
    • Eiffel Tower
    • Great Wall of China
    • Burj Khalifa
    • Roman Colosseum
    • Machu Pichu
  4. o

    Question Decomposition Meaning Dataset

    • opendatabay.com
    .undefined
    Updated Jul 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Question Decomposition Meaning Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/51c7d209-b1e2-4218-bdf1-c935416c3ca4
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 7, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Data Science and Analytics
    Description

    Welcome to BreakData, an innovative dataset designed for exploring language understanding [1]. This dataset provides a wealth of information concerning question decomposition, operators, splits, sources, and allowed tokens, enabling precise question answering [1]. It offers deep insights into human language comprehension and interpretation, proving highly valuable for researchers developing sophisticated AI technologies [1]. The goal of BreakData is to facilitate the development of advanced natural language processing (NLP) models, applicable in various areas such as automated customer support, healthcare chatbots, or automated marketing campaigns [1].

    Columns

    Based on the QDMR Lexicon: Source and Allowed Tokens file, the dataset includes the following columns: * source: This string column indicates the origin of the question [2]. * allowed_tokens: This string column specifies the tokens permitted for the question [2].

    The dataset also comprises other files, such as QDMR files which include questions or statements from common domains like healthcare or banking, requiring interpretation based on a series of operators [3]. These files necessitate the identification of keywords, entities (e.g., time references, monetary amounts, Boolean values), and relationships between them [3]. Additionally, LogicalForms files contain logical forms that serve as building blocks for linking ideas across different sets of incoming variables [3].

    Distribution

    The BreakData dataset is typically provided in CSV format [1, 4]. It is structured into nine distinct files, which include QDMR_train.csv, QDMR_validation.csv, QDMR-highlevel_train.csv, QDMR-highlevel_test.csv, logicalforms_train.csv, logicalforms_validation.csv, QDMRlexicon_train.csv, QDMRLexicon_test.csv, and QDHMLexiconHighLevelTest.csv [1]. While the dataset's structure is clear, specific numbers for rows or records within each file are not detailed in the provided information. The current version of the dataset is 1.0 [5].

    Usage

    This dataset presents an excellent opportunity to explore and comprehend the intricacies of language understanding [1]. It is ideal for training models for a variety of natural language processing (NLP) activities, including: * Question answering systems [1]. * Text analytics [1]. * Automated dialogue systems [1]. * Developing advanced NLP models to analyse questions using decompositions, operators, and splits [6]. * Training machine learning algorithms to predict the semantic meaning of questions based on their decomposition and split [6]. * Conducting text analytics by utilising the allowed tokens dataset to map how people communicate specific concepts across different contexts or topics [6]. * Optimising machine decisions for human-like interactions, leading to improved decision-making in applications like automated customer support, healthcare advice, and marketing campaigns [1, 3].

    Coverage

    The BreakData dataset covers a global region [5]. Its content is drawn from common domains such as healthcare and banking, featuring questions and statements that require linguistic analysis [1, 3]. There are no specific notes on time range or demographic scope beyond these general domains.

    License

    CC0

    Who Can Use It

    This dataset is primarily intended for: * Researchers developing sophisticated models to advance AI technologies [1]. * Data scientists and AI/ML engineers looking to train models for natural language understanding tasks [1]. * Those interested in analysing existing questions or commands with accurate decompositions and operators [1]. * Developers of machine learning models powered by NLP for seamless inference and improved results in customer engagement [3].

    Dataset Name Suggestions

    • BreakData Language Decomposition
    • Question Decomposition Meaning Dataset
    • NLP Language Understanding Hub
    • Semantic Question Analysis Data
    • BreakData NLP Foundation

    Attributes

    Original Data Source: Break (Question Decomposition Meaning)

  5. World Religion Project - Global Religion Dataset

    • thearda.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Association of Religion Data Archives, World Religion Project - Global Religion Dataset [Dataset]. http://doi.org/10.17605/OSF.IO/J7BCM
    Explore at:
    Dataset provided by
    Association of Religion Data Archives
    Dataset funded by
    The University of California, Davis
    The John Templeton Foundation
    Description

    The World Religion Project (WRP) aims to provide detailed information about religious adherence worldwide since 1945. It contains data about the number of adherents by religion in each of the states in the international system. These numbers are given for every half-decade period (1945, 1950, etc., through 2010). Percentages of the states' populations that practice a given religion are also provided. (Note: These percentages are expressed as decimals, ranging from 0 to 1, where 0 indicates that 0 percent of the population practices a given religion and 1 indicates that 100 percent of the population practices that religion.) Some of the religions (as detailed below) are divided into religious families. To the extent data are available, the breakdown of adherents within a given religion into religious families is also provided.

    The project was developed in three stages. The first stage consisted of the formation of a religion tree. A religion tree is a systematic classification of major religions and of religious families within those major religions. To develop the religion tree we prepared a comprehensive literature review, the aim of which was (i) to define a religion, (ii) to find tangible indicators of a given religion of religious families within a major religion, and (iii) to identify existing efforts at classifying world religions. (Please see the original survey instrument to view the structure of the religion tree.) The second stage consisted of the identification of major data sources of religious adherence and the collection of data from these sources according to the religion tree classification. This created a dataset that included multiple records for some states for a given point in time. It also contained multiple missing data for specific states, specific time periods and specific religions. The third stage consisted of cleaning the data, reconciling discrepancies of information from different sources and imputing data for the missing cases.

    The Global Religion Dataset: This dataset uses a religion-by-five-year unit. It aggregates the number of adherents of a given religion and religious group globally by five-year periods.

  6. d

    USGS Group on Earth Observations (GEO) Global Agricultural Monitoring (GLAM)...

    • catalog.data.gov
    • datasets.ai
    • +5more
    Updated Apr 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DOI/USGS/EROS (2025). USGS Group on Earth Observations (GEO) Global Agricultural Monitoring (GLAM) Ethiopia [Dataset]. https://catalog.data.gov/dataset/usgs-group-on-earth-observations-geo-global-agricultural-monitoring-glam-ethiopia
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Ethiopia
    Description

    The objective of GEO is to fulfil a vision of a world where decisions and actions are informed by coordinated, comprehensive and sustained Earth Observation (EO). This is being pursued mainly through the added value of co-ordinating existing institutions, organised communities, space agencies, in-situ monitoring agencies, scientific institutions, research centres, universities, modelling centres, technology developers and other groups that deal with one or more aspects of EO. To reach this overarching goal, GEO focuses on capacity development in three dimensions: infrastructure, individuals and institutions. In the field of agriculture, the general goal is to promote the utilization of Earth observations for advancing sustainable agriculture, aquaculture and fisheries. Key issues include early warning, risk assessment, food security, market efficiency and combating desertification. (Source: http://www.research-europe.com/index.php/2011/08/joao-soares-secretariat-expert-for-agriculture-group-on-earth-observations/)

  7. u

    Census MAF/TIGER database

    • gstore.unm.edu
    csv, geojson, gml +5
    Updated Jun 6, 2011
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Earth Data Analysis Center (2011). Census MAF/TIGER database [Dataset]. http://gstore.unm.edu/apps/rgis/datasets/3b22b9ab-1b9d-468b-9a46-112cc6e9c653/metadata/FGDC-STD-001-1998.html
    Explore at:
    zip(1), gml(5), geojson(5), json(5), kml(5), csv(5), shp(5), xls(5)Available download formats
    Dataset updated
    Jun 6, 2011
    Dataset provided by
    Earth Data Analysis Center
    Time period covered
    Jan 2010
    Area covered
    Luna County (35029), West Bounding Coordinate -109.050173 East Bounding Coordinate -108.208087 North Bounding Coordinate 32.777842 South Bounding Coordinate 31.332172
    Description

    The TIGER/Line Files are shapefiles and related database files (.dbf) that are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line File is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. Census tracts are small, relatively permanent statistical subdivisions of a county or equivalent entity, and were defined by local participants as part of the 2010 Census Participant Statistical Areas Program. The Census Bureau delineated the census tracts in situations where no local participant existed or where all the potential participants declined to participate. The primary purpose of census tracts is to provide a stable set of geographic units for the presentation of census data and comparison back to previous decennial censuses. Census tracts generally have a population size between 1,200 and 8,000 people, with an optimum size of 4,000 people. When first delineated, census tracts were designed to be homogeneous with respect to population characteristics, economic status, and living conditions. The spatial size of census tracts varies widely depending on the density of settlement. Physical changes in street patterns caused by highway construction, new development, and so forth, may require boundary revisions. In addition, census tracts occasionally are split due to population growth, or combined as a result of substantial population decline. Census tract boundaries generally follow visible and identifiable features. They may follow legal boundaries such as minor civil division (MCD) or incorporated place boundaries in some States and situations to allow for census tract-to-governmental unit relationships where the governmental boundaries tend to remain unchanged between censuses. State and county boundaries always are census tract boundaries in the standard census geographic hierarchy. In a few rare instances, a census tract may consist of noncontiguous areas. These noncontiguous areas may occur where the census tracts are coextensive with all or parts of legal entities that are themselves noncontiguous. For the 2010 Census, the census tract code range of 9400 through 9499 was enforced for census tracts that include a majority American Indian population according to Census 2000 data and/or their area was primarily covered by federally recognized American Indian reservations and/or off-reservation trust lands; the code range 9800 through 9899 was enforced for those census tracts that contained little or no population and represented a relatively large special land use area such as a National Park, military installation, or a business/industrial park; and the code range 9900 through 9998 was enforced for those census tracts that contained only water area, no land area.

  8. USGS Group on Earth Observations (GEO) Global Agricultural Monitoring (GLAM)...

    • data.nasa.gov
    • s.cnmilf.com
    • +3more
    Updated Mar 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). USGS Group on Earth Observations (GEO) Global Agricultural Monitoring (GLAM) Uganda [Dataset]. https://data.nasa.gov/dataset/usgs-group-on-earth-observations-geo-global-agricultural-monitoring-glam-uganda
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Area covered
    Uganda
    Description

    The objective of GEO is to fulfil a vision of a world where decisions and actions are informed by coordinated, comprehensive and sustained Earth Observation (EO). This is being pursued mainly through the added value of co-ordinating existing institutions, organised communities, space agencies, in-situ monitoring agencies, scientific institutions, research centres, universities, modelling centres, technology developers and other groups that deal with one or more aspects of EO. To reach this over arching goal, GEO focuses on capacity development in three dimensions: infrastructure, individuals and institutions. In the field of agriculture, the general goal is to promote the utilisation of Earth observations for advancing sustainable agriculture, aquaculture and fisheries. Key issues include early warning, risk assessment, food security, market efficiency and combating desertification. (Source: http://www.research-europe.com/index.php/2011/08/joao-soares-secretariat-expert-for-agriculture-group-on-earth-observations/)

  9. c

    USGS Group on Earth Observations (GEO) Global Agricultural Monitoring (GLAM)...

    • s.cnmilf.com
    • datasets.ai
    • +4more
    Updated Apr 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DOI/USGS/EROS (2025). USGS Group on Earth Observations (GEO) Global Agricultural Monitoring (GLAM) Russia [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/usgs-group-on-earth-observations-geo-global-agricultural-monitoring-glam-russia
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    The objective of GEO is to fulfil a vision of a world where decisions and actions are informed by coordinated, comprehensive and sustained Earth Observation (EO). This is being pursued mainly through the added value of co-ordinating existing institutions, organized communities, space agencies, in-situ monitoring agencies, scientific institutions, research centres, universities, modelling centres, technology developers and other groups that deal with one or more aspects of EO. To reach this over arching goal, GEO focuses on capacity development in three dimensions: infrastructure, individuals and institutions. In the field of agriculture, the general goal is to promote the utilisation of Earth observations for advancing sustainable agriculture, aquaculture and fisheries. Key issues include early warning, risk assessment, food security, market efficiency and combating desertification. (Source: http://www.research-europe.com/index.php/2011/08/joao-soares-secretariat-expert-for-agriculture-group-on-earth-observations/)

  10. d

    Mass Killings in America, 2006 - present

    • data.world
    csv, zip
    Updated Jul 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Associated Press (2025). Mass Killings in America, 2006 - present [Dataset]. https://data.world/associatedpress/mass-killings-public
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Jul 15, 2025
    Authors
    The Associated Press
    Time period covered
    Jan 1, 2006 - Jul 4, 2025
    Area covered
    Description

    THIS DATASET WAS LAST UPDATED AT 2:11 AM EASTERN ON JULY 15

    OVERVIEW

    2019 had the most mass killings since at least the 1970s, according to the Associated Press/USA TODAY/Northeastern University Mass Killings Database.

    In all, there were 45 mass killings, defined as when four or more people are killed excluding the perpetrator. Of those, 33 were mass shootings . This summer was especially violent, with three high-profile public mass shootings occurring in the span of just four weeks, leaving 38 killed and 66 injured.

    A total of 229 people died in mass killings in 2019.

    The AP's analysis found that more than 50% of the incidents were family annihilations, which is similar to prior years. Although they are far less common, the 9 public mass shootings during the year were the most deadly type of mass murder, resulting in 73 people's deaths, not including the assailants.

    One-third of the offenders died at the scene of the killing or soon after, half from suicides.

    About this Dataset

    The Associated Press/USA TODAY/Northeastern University Mass Killings database tracks all U.S. homicides since 2006 involving four or more people killed (not including the offender) over a short period of time (24 hours) regardless of weapon, location, victim-offender relationship or motive. The database includes information on these and other characteristics concerning the incidents, offenders, and victims.

    The AP/USA TODAY/Northeastern database represents the most complete tracking of mass murders by the above definition currently available. Other efforts, such as the Gun Violence Archive or Everytown for Gun Safety may include events that do not meet our criteria, but a review of these sites and others indicates that this database contains every event that matches the definition, including some not tracked by other organizations.

    This data will be updated periodically and can be used as an ongoing resource to help cover these events.

    Using this Dataset

    To get basic counts of incidents of mass killings and mass shootings by year nationwide, use these queries:

    Mass killings by year

    Mass shootings by year

    To get these counts just for your state:

    Filter killings by state

    Definition of "mass murder"

    Mass murder is defined as the intentional killing of four or more victims by any means within a 24-hour period, excluding the deaths of unborn children and the offender(s). The standard of four or more dead was initially set by the FBI.

    This definition does not exclude cases based on method (e.g., shootings only), type or motivation (e.g., public only), victim-offender relationship (e.g., strangers only), or number of locations (e.g., one). The time frame of 24 hours was chosen to eliminate conflation with spree killers, who kill multiple victims in quick succession in different locations or incidents, and to satisfy the traditional requirement of occurring in a “single incident.”

    Offenders who commit mass murder during a spree (before or after committing additional homicides) are included in the database, and all victims within seven days of the mass murder are included in the victim count. Negligent homicides related to driving under the influence or accidental fires are excluded due to the lack of offender intent. Only incidents occurring within the 50 states and Washington D.C. are considered.

    Methodology

    Project researchers first identified potential incidents using the Federal Bureau of Investigation’s Supplementary Homicide Reports (SHR). Homicide incidents in the SHR were flagged as potential mass murder cases if four or more victims were reported on the same record, and the type of death was murder or non-negligent manslaughter.

    Cases were subsequently verified utilizing media accounts, court documents, academic journal articles, books, and local law enforcement records obtained through Freedom of Information Act (FOIA) requests. Each data point was corroborated by multiple sources, which were compiled into a single document to assess the quality of information.

    In case(s) of contradiction among sources, official law enforcement or court records were used, when available, followed by the most recent media or academic source.

    Case information was subsequently compared with every other known mass murder database to ensure reliability and validity. Incidents listed in the SHR that could not be independently verified were excluded from the database.

    Project researchers also conducted extensive searches for incidents not reported in the SHR during the time period, utilizing internet search engines, Lexis-Nexis, and Newspapers.com. Search terms include: [number] dead, [number] killed, [number] slain, [number] murdered, [number] homicide, mass murder, mass shooting, massacre, rampage, family killing, familicide, and arson murder. Offender, victim, and location names were also directly searched when available.

    This project started at USA TODAY in 2012.

    Contacts

    Contact AP Data Editor Justin Myers with questions, suggestions or comments about this dataset at jmyers@ap.org. The Northeastern University researcher working with AP and USA TODAY is Professor James Alan Fox, who can be reached at j.fox@northeastern.edu or 617-416-4400.

  11. T

    World Coronavirus COVID-19 Cases

    • tradingeconomics.com
    csv, excel, json, xml
    Updated Mar 9, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2020). World Coronavirus COVID-19 Cases [Dataset]. https://tradingeconomics.com/world/coronavirus-cases
    Explore at:
    csv, excel, xml, jsonAvailable download formats
    Dataset updated
    Mar 9, 2020
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 4, 2020 - May 17, 2023
    Area covered
    World, World
    Description

    The World Health Organization reported 766440796 Coronavirus Cases since the epidemic began. In addition, countries reported 6932591 Coronavirus Deaths. This dataset provides - World Coronavirus Cases- actual values, historical data, forecast, chart, statistics, economic calendar and news.

  12. JHU Coronavirus COVID-19 Global Cases, by country

    • kaggle.com
    zip
    Updated May 18, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2020). JHU Coronavirus COVID-19 Global Cases, by country [Dataset]. https://www.kaggle.com/bigquery/covid19-jhu-csse
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    May 18, 2020
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    Description

    Overview

    This is the data repository for the 2019 Novel Coronavirus Visual Dashboard operated by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE). This database was created in response to the Coronavirus public health emergency to track reported cases in real-time. The data include the location and number of confirmed COVID-19 cases, deaths and recoveries for all affected countries, aggregated at the appropriate province or state. It was developed to enable researchers, public health authorities and the general public to track the outbreak as it unfolds. Additional information is available in the blog post, Mapping 2019-nCoV (https://systems.jhu.edu/research/public-health/ncov/), and included data sources are listed here: https://github.com/CSSEGISandData/COVID-19

    Sample Query 1

    How many confirmed COVID-19 cases were there in the US, by state? This query determines the total number of cases by province in February. A "province_state" can refer to any subset of the US in this particular dataset, including a county or state. SELECT province_state, confirmed AS feb_confirmed_cases, FROM bigquery-public-data.covid19_jhu_csse.summary WHERE country_region = "US" AND date = '2020-02-29' ORDER BY feb_confirmed_cases desc

    Sample Query 2

    Which countries with the highest number of confirmed cases have the most per capita? This query joins the Johns Hopkins dataset with the World Bank's global population data to determine which countries among those with the highest total number of confirmed cases have the most confirmed cases per capita.

    with country_pop AS( SELECT IF(country = "United States","US",IF(country="Iran, Islamic Rep.","Iran",country)) AS country, year_2018 FROM bigquery-public-data.world_bank_global_population.population_by_country)

    SELECT cases.date AS date, cases.country_region AS country_region, SUM(cases.confirmed) AS total_confirmed_cases, SUM(cases.confirmed)/AVG(country_pop.year_2018) * 100000 AS confirmed_cases_per_100000 FROM bigquery-public-data.covid19_jhu_csse.summary cases JOIN country_pop ON cases.country_region LIKE CONCAT('%',country_pop.country,'%') WHERE cases.country_region = "US" AND country_pop.country = "US" AND cases.date = DATE_SUB(current_date(),INTERVAL 1 day) GROUP BY country_region, date

    UNION ALL

    SELECT cases.date AS date, cases.country_region AS country_region, SUM(cases.confirmed) AS total_confirmed_cases, SUM(cases.confirmed)/AVG(country_pop.year_2018) * 100000 AS confirmed_cases_per_100000 FROM bigquery-public-data.covid19_jhu_csse.summary cases JOIN country_pop ON cases.country_region LIKE CONCAT('%',country_pop.country,'%') WHERE cases.country_region = "France" AND country_pop.country = "France" AND cases.date = DATE_SUB(current_date(),INTERVAL 1 day) GROUP BY country_region, date

    UNION ALL

    SELECT cases.date AS date, cases.country_region AS country_region, SUM(cases.confirmed) AS total_confirmed_cases, SUM(cases.confirmed)/AVG(country_pop.year_2018) * 100000 AS confirmed_cases_per_100000 FROM bigquery-public-data.covid19_jhu_csse.summary cases JOIN country_pop ON cases.country_region LIKE CONCAT('%',country_pop.country,'%') WHERE cases.country_region = "China" AND country_pop.country = "China" AND cases.date = DATE_SUB(current_date(),INTERVAL 1 day)

    GROUP BY country_region, date

    UNION ALL

    SELECT cases.date AS date, cases.country_region AS country_region, cases.confirmed AS total_confirmed_cases, cases.confirmed/country_pop.year_2018 * 100000 AS confirmed_cases_per_100000 FROM bigquery-public-data.covid19_jhu_csse.summary cases JOIN country_pop ON cases.country_region LIKE CONCAT('%',country_pop.country,'%') WHERE cases.country_region IN ("Italy", "Spain", "Germany", "Iran") AND cases.date = DATE_SUB(current_date(),INTERVAL 1 day) ORDER BY confirmed_cases_per_100000 desc

    Dataset source

    JHU CSSE

    Update frequency

    Daily

  13. M

    World Death Rate (1950-2025)

    • macrotrends.net
    csv
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MACROTRENDS (2025). World Death Rate (1950-2025) [Dataset]. https://www.macrotrends.net/global-metrics/countries/wld/world/death-rate
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jun 30, 2025
    Dataset authored and provided by
    MACROTRENDS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1950 - Dec 31, 2025
    Area covered
    World, World
    Description

    Historical chart and dataset showing World death rate by year from 1950 to 2025.

  14. Global Green Economy Index (GGEI)

    • kaggle.com
    Updated May 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeremy Tamanini (2024). Global Green Economy Index (GGEI) [Dataset]. https://www.kaggle.com/datasets/jeremytamanini/global-green-economy-index-ggei
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 8, 2024
    Dataset provided by
    Kaggle
    Authors
    Jeremy Tamanini
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    For the first time, the full results from the Global Green Economy Index (GGEI) are available in the public domain. Historically, only the aggregate results have been publicly accessible. The full dataset has been paywalled and accessible to our subscribers only. But the way in which we release GGEI data to the public is changing. Read on for a quick explanation for how and why.

    First, the how. The GGEI file publicly accessible today represents that dataset officially compiled in 2022. It contains the full results for each of the 18 indicators in the GGEI for 160 countries, across the four main dimensions of climate change & social equity, sector decarbonization, markets & ESG investment and the environment. Some (not all) of these data points have since been updated, as new datasets have been published. The GGEI is a dynamic model, updating in real-time as new data becomes available. Our subscribing clients will still receive this most timely version of the model, along with any customizations they may request.

    Now, the why. First and foremost, there is huge demand among academic researchers globally for the full GGEI dataset. Academic inquiry around the green transition, sustainable development, ESG investing, and green energy systems has exploded over the past several years. We receive hundreds of inquiries annually from these students and researchers to access the full GGEI dataset. Making it publicly accessible as we are today makes it easier for these individuals and institutions to use these GGEI to promote learning and green progress within their institutions.

    More broadly, the landscape for data has changed significantly. A decade ago when the GGEI was first published, datasets existed more in silos and users might subscribe to one specific dataset like the GGEI to answer a specific question. But today, data usage in the sustainability space has become much more of a system, whereby myriad data sources are synthesized into increasingly sophisticated models, often fueled by artificial intelligence. Making the GGEI more accessible will accelerate how this perspective on the global green economy can be integrated to these systems.

  15. n

    TROPIS - Tree Growth and Permanent Plot Information System database

    • cmr.earthdata.nasa.gov
    Updated Apr 20, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2017). TROPIS - Tree Growth and Permanent Plot Information System database [Dataset]. https://cmr.earthdata.nasa.gov/search/concepts/C1214155153-SCIOPS
    Explore at:
    Dataset updated
    Apr 20, 2017
    Time period covered
    Jan 1, 1970 - Present
    Area covered
    Description

    TROPIS is the acronym for the Tree Growth and Permanent Plot Information System sponsored by CIFOR to promote more effective use of existing data and knowledge about tree growth.

        TROPIS is concerned primarily with information about permanent plots and tree
        growth in both planted and natural forests throughout the world. It has five
        components:
    
        - a network of people willing to share permanent plot data and tree
        growth information;
        - an index to people and institutions with permanent plots;
        - a database management system to promote more efficient data management;
        - a method to find comparable sites elsewhere, so that observations
        can be supplemented or contrasted with other data; and
        - an inference system to allow growth estimates to be made in the
        absence of empirical data.
        - TROPIS is about people and information. The core of TROPIS is an
        index to people and their plots maintained in a relational
        database. The database is designed to fulfil two primary needs:
        - to provide for efficient cross-checking, error-checking and
        updating; and to facilitate searches for plots matching a wide range
        of specified criteria, including (but not limited to) location, forest
        type, taxa, plot area, measurement history.
    
        The database is essentially hierarchical: the key element of the
        database is the informant. Each informant may contribute information
        on many plot series, each of which has consistent objectives. In turn,
        each series may comprise many plots, each of which may have a
        different location or different size. Each plot may contain many
        species. A series may be a thinning or spacing experiment, some
        species or provenance trials, a continuous forest inventory system, or
        any other aggregation of plots convenient to the informant. Plots need
        not be current. Abandoned plots may be included provided that the
        location is known and the plot data remain accessible. In addition to
        details of the informant, we try to record details of additional
        contact people associated with plots, to maintain continuity when
        people transfer or retire. Thus the relational structure may appear
        complex, but ensures data integrity.
    
        At present, searches are possible only via mail, fax or email requests
        to the TROPIS co-ordinator at CIFOR. Self-service on-line searching
        will also be available in 1997. Clients may search for plots with
        specified taxa, locations, silvicultural treatment, or other specified
        criteria and combinations. TROPIS currently contains references to
        over 10,000 plots with over 2,000 species contributed by 100
        individuals world-wide.
    
        This database will help CIFOR as well as other users to make more
        efficient use of existing information, and to develop appropriate and
        effective techniques and policies for sustainable forest management
        world-wide.
    
        TROPIS is supported by the Government of Japan.
    
        This information is from the CIFOR web site.
    
  16. u

    Census MAF/TIGER database

    • gstore.unm.edu
    csv, geojson, gml +5
    Updated Jun 6, 2011
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Earth Data Analysis Center (2011). Census MAF/TIGER database [Dataset]. http://gstore.unm.edu/apps/rgisarchive/datasets/878cbbf9-b240-4e3a-97c4-0c3e68a52e48/metadata/FGDC-STD-001-1998.html
    Explore at:
    shp(5), json(5), zip(1), kml(5), csv(5), xls(5), geojson(5), gml(5)Available download formats
    Dataset updated
    Jun 6, 2011
    Dataset provided by
    Earth Data Analysis Center
    Time period covered
    Jan 2010
    Area covered
    Valencia County, West Bounding Coordinate -107.204675 East Bounding Coordinate -106.410974 North Bounding Coordinate 34.958064 South Bounding Coordinate 34.436993, Socorro County (35053)
    Description

    The TIGER/Line Files are shapefiles and related database files (.dbf) that are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line File is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. Census tracts are small, relatively permanent statistical subdivisions of a county or equivalent entity, and were defined by local participants as part of the 2010 Census Participant Statistical Areas Program. The Census Bureau delineated the census tracts in situations where no local participant existed or where all the potential participants declined to participate. The primary purpose of census tracts is to provide a stable set of geographic units for the presentation of census data and comparison back to previous decennial censuses. Census tracts generally have a population size between 1,200 and 8,000 people, with an optimum size of 4,000 people. When first delineated, census tracts were designed to be homogeneous with respect to population characteristics, economic status, and living conditions. The spatial size of census tracts varies widely depending on the density of settlement. Physical changes in street patterns caused by highway construction, new development, and so forth, may require boundary revisions. In addition, census tracts occasionally are split due to population growth, or combined as a result of substantial population decline. Census tract boundaries generally follow visible and identifiable features. They may follow legal boundaries such as minor civil division (MCD) or incorporated place boundaries in some States and situations to allow for census tract-to-governmental unit relationships where the governmental boundaries tend to remain unchanged between censuses. State and county boundaries always are census tract boundaries in the standard census geographic hierarchy. In a few rare instances, a census tract may consist of noncontiguous areas. These noncontiguous areas may occur where the census tracts are coextensive with all or parts of legal entities that are themselves noncontiguous. For the 2010 Census, the census tract code range of 9400 through 9499 was enforced for census tracts that include a majority American Indian population according to Census 2000 data and/or their area was primarily covered by federally recognized American Indian reservations and/or off-reservation trust lands; the code range 9800 through 9899 was enforced for those census tracts that contained little or no population and represented a relatively large special land use area such as a National Park, military installation, or a business/industrial park; and the code range 9900 through 9998 was enforced for those census tracts that contained only water area, no land area.

  17. u

    Census MAF/TIGER database

    • gstore.unm.edu
    csv, geojson, gml +5
    Updated Jun 6, 2011
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Earth Data Analysis Center (2011). Census MAF/TIGER database [Dataset]. https://gstore.unm.edu/apps/rgisarchive/datasets/5375df01-f194-433e-b224-c3c2a6ce1758/metadata/FGDC-STD-001-1998.html
    Explore at:
    json(5), kml(5), shp(5), xls(5), csv(5), zip(1), gml(5), geojson(5)Available download formats
    Dataset updated
    Jun 6, 2011
    Dataset provided by
    Earth Data Analysis Center
    Time period covered
    Jan 2010
    Area covered
    West Bounding Coordinate -106.058364 East Bounding Coordinate -105.200117 North Bounding Coordinate 36.995991 South Bounding Coordinate 36.013014, Rio Arriba County (35039)
    Description

    The TIGER/Line Files are shapefiles and related database files (.dbf) that are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line File is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. Census tracts are small, relatively permanent statistical subdivisions of a county or equivalent entity, and were defined by local participants as part of the 2010 Census Participant Statistical Areas Program. The Census Bureau delineated the census tracts in situations where no local participant existed or where all the potential participants declined to participate. The primary purpose of census tracts is to provide a stable set of geographic units for the presentation of census data and comparison back to previous decennial censuses. Census tracts generally have a population size between 1,200 and 8,000 people, with an optimum size of 4,000 people. When first delineated, census tracts were designed to be homogeneous with respect to population characteristics, economic status, and living conditions. The spatial size of census tracts varies widely depending on the density of settlement. Physical changes in street patterns caused by highway construction, new development, and so forth, may require boundary revisions. In addition, census tracts occasionally are split due to population growth, or combined as a result of substantial population decline. Census tract boundaries generally follow visible and identifiable features. They may follow legal boundaries such as minor civil division (MCD) or incorporated place boundaries in some States and situations to allow for census tract-to-governmental unit relationships where the governmental boundaries tend to remain unchanged between censuses. State and county boundaries always are census tract boundaries in the standard census geographic hierarchy. In a few rare instances, a census tract may consist of noncontiguous areas. These noncontiguous areas may occur where the census tracts are coextensive with all or parts of legal entities that are themselves noncontiguous. For the 2010 Census, the census tract code range of 9400 through 9499 was enforced for census tracts that include a majority American Indian population according to Census 2000 data and/or their area was primarily covered by federally recognized American Indian reservations and/or off-reservation trust lands; the code range 9800 through 9899 was enforced for those census tracts that contained little or no population and represented a relatively large special land use area such as a National Park, military installation, or a business/industrial park; and the code range 9900 through 9998 was enforced for those census tracts that contained only water area, no land area.

  18. u

    Bernalillo County 2010 Census Tracts

    • gstore.unm.edu
    csv, geojson, gml +5
    Updated Jun 6, 2011
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Earth Data Analysis Center (2011). Bernalillo County 2010 Census Tracts [Dataset]. https://gstore.unm.edu/apps/rgis/datasets/be59f00d-fb91-423f-9d8b-a46cd1b07f26/metadata/FGDC-STD-001-1998.html
    Explore at:
    zip(1), geojson(5), gml(5), shp(5), csv(5), kml(5), xls(5), json(5)Available download formats
    Dataset updated
    Jun 6, 2011
    Dataset provided by
    Earth Data Analysis Center
    Time period covered
    Jan 2010
    Area covered
    Cibola County (35006), West Bounding Coordinate -107.19617 East Bounding Coordinate -106.149575 North Bounding Coordinate 35.219639 South Bounding Coordinate 34.869024
    Description

    The TIGER/Line Files are shapefiles and related database files (.dbf) that are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line File is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. Census tracts are small, relatively permanent statistical subdivisions of a county or equivalent entity, and were defined by local participants as part of the 2010 Census Participant Statistical Areas Program. The Census Bureau delineated the census tracts in situations where no local participant existed or where all the potential participants declined to participate. The primary purpose of census tracts is to provide a stable set of geographic units for the presentation of census data and comparison back to previous decennial censuses. Census tracts generally have a population size between 1,200 and 8,000 people, with an optimum size of 4,000 people. When first delineated, census tracts were designed to be homogeneous with respect to population characteristics, economic status, and living conditions. The spatial size of census tracts varies widely depending on the density of settlement. Physical changes in street patterns caused by highway construction, new development, and so forth, may require boundary revisions. In addition, census tracts occasionally are split due to population growth, or combined as a result of substantial population decline. Census tract boundaries generally follow visible and identifiable features. They may follow legal boundaries such as minor civil division (MCD) or incorporated place boundaries in some States and situations to allow for census tract-to-governmental unit relationships where the governmental boundaries tend to remain unchanged between censuses. State and county boundaries always are census tract boundaries in the standard census geographic hierarchy. In a few rare instances, a census tract may consist of noncontiguous areas. These noncontiguous areas may occur where the census tracts are coextensive with all or parts of legal entities that are themselves noncontiguous. For the 2010 Census, the census tract code range of 9400 through 9499 was enforced for census tracts that include a majority American Indian population according to Census 2000 data and/or their area was primarily covered by federally recognized American Indian reservations and/or off-reservation trust lands; the code range 9800 through 9899 was enforced for those census tracts that contained little or no population and represented a relatively large special land use area such as a National Park, military installation, or a business/industrial park; and the code range 9900 through 9998 was enforced for those census tracts that contained only water area, no land area.

  19. T

    CORONAVIRUS DEATHS by Country Dataset

    • tradingeconomics.com
    csv, excel, json, xml
    Updated Mar 4, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2020). CORONAVIRUS DEATHS by Country Dataset [Dataset]. https://tradingeconomics.com/country-list/coronavirus-deaths
    Explore at:
    csv, excel, xml, jsonAvailable download formats
    Dataset updated
    Mar 4, 2020
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2025
    Area covered
    World
    Description

    This dataset provides values for CORONAVIRUS DEATHS reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.

  20. Lord of the Rings International Audience Research Project: World...

    • beta.ukdataservice.ac.uk
    • datacatalogue.cessda.eu
    Updated 2006
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    M. Barker; E. Mathijs (2006). Lord of the Rings International Audience Research Project: World Questionnaire Dataset, 2003-2004 [Dataset]. http://doi.org/10.5255/ukda-sn-5179-1
    Explore at:
    Dataset updated
    2006
    Dataset provided by
    UK Data Servicehttps://ukdataservice.ac.uk/
    DataCitehttps://www.datacite.org/
    Authors
    M. Barker; E. Mathijs
    Description

    This is a mixed methods study, comprising both qualitative and quantitative material. The aim of this project was to use the opportunity afforded by the release of the final part of the film trilogy of Lord of the Rings to gather materials allowing an exploration of

    • the role of fantasy, especially film fantasy, in the lives of different kinds of audience
    • the understanding they have of the 'location' (real or imaginary) of the author J.R.R. Tolkien's world, and its relation to their lived world
    • the role played in their responses by perceptions of the story's original 'Englishness', its New Zealand landscapes, and its Hollywood financing and marketing
    • the part played by all kinds of prefigurative processes in shaping responses in advance
    Within these broad aims, the objectives were to gather, over a fifteen month period, three large bodies of materials: three months of marketing, publicity, merchandising, and media coverage of the film prior to its release; responses from across the world to a questionnaire, available online with added paper-completed ones; and a set of follow-up interviews with individuals chosen for their exemplification of emergent patterns. This body of materials and data was to be organised in a way which permits both quantitative and qualitative exploration. The only materials currently deposited at the UK Data Archive (UKDA) are the questionnaire responses, which are held in a Microsoft 'Access 2000' database.

    Further information about the study may be found at the Lord of the Rings Research Project web site.


Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Berkeley Earth (2017). Climate Change: Earth Surface Temperature Data [Dataset]. https://www.kaggle.com/datasets/berkeleyearth/climate-change-earth-surface-temperature-data
Organization logo

Climate Change: Earth Surface Temperature Data

Exploring global temperatures since 1750

Explore at:
13 scholarly articles cite this dataset (View in Google Scholar)
zip(88843537 bytes)Available download formats
Dataset updated
May 1, 2017
Dataset authored and provided by
Berkeley Earthhttp://berkeleyearth.org/
License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Area covered
Earth
Description

Some say climate change is the biggest threat of our age while others say it’s a myth based on dodgy science. We are turning some of the data over to you so you can form your own view.

us-climate-change

Even more than with other data sets that Kaggle has featured, there’s a huge amount of data cleaning and preparation that goes into putting together a long-time study of climate trends. Early data was collected by technicians using mercury thermometers, where any variation in the visit time impacted measurements. In the 1940s, the construction of airports caused many weather stations to be moved. In the 1980s, there was a move to electronic thermometers that are said to have a cooling bias.

Given this complexity, there are a range of organizations that collate climate trends data. The three most cited land and ocean temperature data sets are NOAA’s MLOST, NASA’s GISTEMP and the UK’s HadCrut.

We have repackaged the data from a newer compilation put together by the Berkeley Earth, which is affiliated with Lawrence Berkeley National Laboratory. The Berkeley Earth Surface Temperature Study combines 1.6 billion temperature reports from 16 pre-existing archives. It is nicely packaged and allows for slicing into interesting subsets (for example by country). They publish the source data and the code for the transformations they applied. They also use methods that allow weather observations from shorter time series to be included, meaning fewer observations need to be thrown away.

In this dataset, we have include several files:

Global Land and Ocean-and-Land Temperatures (GlobalTemperatures.csv):

  • Date: starts in 1750 for average land temperature and 1850 for max and min land temperatures and global ocean and land temperatures
  • LandAverageTemperature: global average land temperature in celsius
  • LandAverageTemperatureUncertainty: the 95% confidence interval around the average
  • LandMaxTemperature: global average maximum land temperature in celsius
  • LandMaxTemperatureUncertainty: the 95% confidence interval around the maximum land temperature
  • LandMinTemperature: global average minimum land temperature in celsius
  • LandMinTemperatureUncertainty: the 95% confidence interval around the minimum land temperature
  • LandAndOceanAverageTemperature: global average land and ocean temperature in celsius
  • LandAndOceanAverageTemperatureUncertainty: the 95% confidence interval around the global average land and ocean temperature

Other files include:

  • Global Average Land Temperature by Country (GlobalLandTemperaturesByCountry.csv)
  • Global Average Land Temperature by State (GlobalLandTemperaturesByState.csv)
  • Global Land Temperatures By Major City (GlobalLandTemperaturesByMajorCity.csv)
  • Global Land Temperatures By City (GlobalLandTemperaturesByCity.csv)

The raw data comes from the Berkeley Earth data page.

Search
Clear search
Close search
Google apps
Main menu