Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Some say climate change is the biggest threat of our age while others say it’s a myth based on dodgy science. We are turning some of the data over to you so you can form your own view.
Even more than with other data sets that Kaggle has featured, there’s a huge amount of data cleaning and preparation that goes into putting together a long-time study of climate trends. Early data was collected by technicians using mercury thermometers, where any variation in the visit time impacted measurements. In the 1940s, the construction of airports caused many weather stations to be moved. In the 1980s, there was a move to electronic thermometers that are said to have a cooling bias.
Given this complexity, there are a range of organizations that collate climate trends data. The three most cited land and ocean temperature data sets are NOAA’s MLOST, NASA’s GISTEMP and the UK’s HadCrut.
We have repackaged the data from a newer compilation put together by the Berkeley Earth, which is affiliated with Lawrence Berkeley National Laboratory. The Berkeley Earth Surface Temperature Study combines 1.6 billion temperature reports from 16 pre-existing archives. It is nicely packaged and allows for slicing into interesting subsets (for example by country). They publish the source data and the code for the transformations they applied. They also use methods that allow weather observations from shorter time series to be included, meaning fewer observations need to be thrown away.
In this dataset, we have include several files:
Global Land and Ocean-and-Land Temperatures (GlobalTemperatures.csv):
Other files include:
The raw data comes from the Berkeley Earth data page.
The world population surpassed eight billion people in 2022, having doubled from its figure less than 50 years previously. Looking forward, it is projected that the world population will reach nine billion in 2038, and 10 billion in 2060, but it will peak around 10.3 billion in the 2080s before it then goes into decline. Regional variations The global population has seen rapid growth since the early 1800s, due to advances in areas such as food production, healthcare, water safety, education, and infrastructure, however, these changes did not occur at a uniform time or pace across the world. Broadly speaking, the first regions to undergo their demographic transitions were Europe, North America, and Oceania, followed by Latin America and Asia (although Asia's development saw the greatest variation due to its size), while Africa was the last continent to undergo this transformation. Because of these differences, many so-called "advanced" countries are now experiencing population decline, particularly in Europe and East Asia, while the fastest population growth rates are found in Sub-Saharan Africa. In fact, the roughly two billion difference in population between now and the 2080s' peak will be found in Sub-Saharan Africa, which will rise from 1.2 billion to 3.2 billion in this time (although populations in other continents will also fluctuate). Changing projections The United Nations releases their World Population Prospects report every 1-2 years, and this is widely considered the foremost demographic dataset in the world. However, recent years have seen a notable decline in projections when the global population will peak, and at what number. Previous reports in the 2010s had suggested a peak of over 11 billion people, and that population growth would continue into the 2100s, however a sooner and shorter peak is now projected. Reasons for this include a more rapid population decline in East Asia and Europe, particularly China, as well as a prolongued development arc in Sub-Saharan Africa.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The New 7 Wonders of the World was a campaign started in 2000 to choose Wonders of the World from a selection of 200 existing monuments. The popularity poll via free Web-based voting and small amounts of telephone voting was led by Canadian-Swiss Bernard Weber and organized by the New 7 Wonders Foundation (N7W) based in Zurich, Switzerland, with winners announced on 7 July 2007 in Lisbon, at Estádio da Luz. The poll was considered unscientific partly because it was possible for people to cast multiple votes.
When someday, if we plan to go on a World tour, obviously there is going to be a bucket list of wonders or places around the world, that we wish to visit. Here, we have one set of "Wonders of the World" images scraped from Google Images. Let us use our deep learning skills to build multiclass classification to identify the place in the images.
This dataset contains a total of 3846 images placed in folders, with which each folder representing one of the top new wonders of the world. Below is the list of wonders with images extracted from Google Images.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Welcome to BreakData, an innovative dataset designed for exploring language understanding [1]. This dataset provides a wealth of information concerning question decomposition, operators, splits, sources, and allowed tokens, enabling precise question answering [1]. It offers deep insights into human language comprehension and interpretation, proving highly valuable for researchers developing sophisticated AI technologies [1]. The goal of BreakData is to facilitate the development of advanced natural language processing (NLP) models, applicable in various areas such as automated customer support, healthcare chatbots, or automated marketing campaigns [1].
Based on the QDMR Lexicon: Source and Allowed Tokens
file, the dataset includes the following columns:
* source: This string column indicates the origin of the question [2].
* allowed_tokens: This string column specifies the tokens permitted for the question [2].
The dataset also comprises other files, such as QDMR files which include questions or statements from common domains like healthcare or banking, requiring interpretation based on a series of operators [3]. These files necessitate the identification of keywords, entities (e.g., time references, monetary amounts, Boolean values), and relationships between them [3]. Additionally, LogicalForms files contain logical forms that serve as building blocks for linking ideas across different sets of incoming variables [3].
The BreakData dataset is typically provided in CSV format [1, 4]. It is structured into nine distinct files, which include QDMR_train.csv
, QDMR_validation.csv
, QDMR-highlevel_train.csv
, QDMR-highlevel_test.csv
, logicalforms_train.csv
, logicalforms_validation.csv
, QDMRlexicon_train.csv
, QDMRLexicon_test.csv
, and QDHMLexiconHighLevelTest.csv
[1]. While the dataset's structure is clear, specific numbers for rows or records within each file are not detailed in the provided information. The current version of the dataset is 1.0 [5].
This dataset presents an excellent opportunity to explore and comprehend the intricacies of language understanding [1]. It is ideal for training models for a variety of natural language processing (NLP) activities, including: * Question answering systems [1]. * Text analytics [1]. * Automated dialogue systems [1]. * Developing advanced NLP models to analyse questions using decompositions, operators, and splits [6]. * Training machine learning algorithms to predict the semantic meaning of questions based on their decomposition and split [6]. * Conducting text analytics by utilising the allowed tokens dataset to map how people communicate specific concepts across different contexts or topics [6]. * Optimising machine decisions for human-like interactions, leading to improved decision-making in applications like automated customer support, healthcare advice, and marketing campaigns [1, 3].
The BreakData dataset covers a global region [5]. Its content is drawn from common domains such as healthcare and banking, featuring questions and statements that require linguistic analysis [1, 3]. There are no specific notes on time range or demographic scope beyond these general domains.
CC0
This dataset is primarily intended for: * Researchers developing sophisticated models to advance AI technologies [1]. * Data scientists and AI/ML engineers looking to train models for natural language understanding tasks [1]. * Those interested in analysing existing questions or commands with accurate decompositions and operators [1]. * Developers of machine learning models powered by NLP for seamless inference and improved results in customer engagement [3].
Original Data Source: Break (Question Decomposition Meaning)
The World Religion Project (WRP) aims to provide detailed information about religious adherence worldwide since 1945. It contains data about the number of adherents by religion in each of the states in the international system. These numbers are given for every half-decade period (1945, 1950, etc., through 2010). Percentages of the states' populations that practice a given religion are also provided. (Note: These percentages are expressed as decimals, ranging from 0 to 1, where 0 indicates that 0 percent of the population practices a given religion and 1 indicates that 100 percent of the population practices that religion.) Some of the religions (as detailed below) are divided into religious families. To the extent data are available, the breakdown of adherents within a given religion into religious families is also provided.
The project was developed in three stages. The first stage consisted of the formation of a religion tree. A religion tree is a systematic classification of major religions and of religious families within those major religions. To develop the religion tree we prepared a comprehensive literature review, the aim of which was (i) to define a religion, (ii) to find tangible indicators of a given religion of religious families within a major religion, and (iii) to identify existing efforts at classifying world religions. (Please see the original survey instrument to view the structure of the religion tree.) The second stage consisted of the identification of major data sources of religious adherence and the collection of data from these sources according to the religion tree classification. This created a dataset that included multiple records for some states for a given point in time. It also contained multiple missing data for specific states, specific time periods and specific religions. The third stage consisted of cleaning the data, reconciling discrepancies of information from different sources and imputing data for the missing cases.
The Global Religion Dataset: This dataset uses a religion-by-five-year unit. It aggregates the number of adherents of a given religion and religious group globally by five-year periods.
The objective of GEO is to fulfil a vision of a world where decisions and actions are informed by coordinated, comprehensive and sustained Earth Observation (EO). This is being pursued mainly through the added value of co-ordinating existing institutions, organised communities, space agencies, in-situ monitoring agencies, scientific institutions, research centres, universities, modelling centres, technology developers and other groups that deal with one or more aspects of EO. To reach this overarching goal, GEO focuses on capacity development in three dimensions: infrastructure, individuals and institutions. In the field of agriculture, the general goal is to promote the utilization of Earth observations for advancing sustainable agriculture, aquaculture and fisheries. Key issues include early warning, risk assessment, food security, market efficiency and combating desertification. (Source: http://www.research-europe.com/index.php/2011/08/joao-soares-secretariat-expert-for-agriculture-group-on-earth-observations/)
The TIGER/Line Files are shapefiles and related database files (.dbf) that are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line File is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. Census tracts are small, relatively permanent statistical subdivisions of a county or equivalent entity, and were defined by local participants as part of the 2010 Census Participant Statistical Areas Program. The Census Bureau delineated the census tracts in situations where no local participant existed or where all the potential participants declined to participate. The primary purpose of census tracts is to provide a stable set of geographic units for the presentation of census data and comparison back to previous decennial censuses. Census tracts generally have a population size between 1,200 and 8,000 people, with an optimum size of 4,000 people. When first delineated, census tracts were designed to be homogeneous with respect to population characteristics, economic status, and living conditions. The spatial size of census tracts varies widely depending on the density of settlement. Physical changes in street patterns caused by highway construction, new development, and so forth, may require boundary revisions. In addition, census tracts occasionally are split due to population growth, or combined as a result of substantial population decline. Census tract boundaries generally follow visible and identifiable features. They may follow legal boundaries such as minor civil division (MCD) or incorporated place boundaries in some States and situations to allow for census tract-to-governmental unit relationships where the governmental boundaries tend to remain unchanged between censuses. State and county boundaries always are census tract boundaries in the standard census geographic hierarchy. In a few rare instances, a census tract may consist of noncontiguous areas. These noncontiguous areas may occur where the census tracts are coextensive with all or parts of legal entities that are themselves noncontiguous. For the 2010 Census, the census tract code range of 9400 through 9499 was enforced for census tracts that include a majority American Indian population according to Census 2000 data and/or their area was primarily covered by federally recognized American Indian reservations and/or off-reservation trust lands; the code range 9800 through 9899 was enforced for those census tracts that contained little or no population and represented a relatively large special land use area such as a National Park, military installation, or a business/industrial park; and the code range 9900 through 9998 was enforced for those census tracts that contained only water area, no land area.
The objective of GEO is to fulfil a vision of a world where decisions and actions are informed by coordinated, comprehensive and sustained Earth Observation (EO). This is being pursued mainly through the added value of co-ordinating existing institutions, organised communities, space agencies, in-situ monitoring agencies, scientific institutions, research centres, universities, modelling centres, technology developers and other groups that deal with one or more aspects of EO. To reach this over arching goal, GEO focuses on capacity development in three dimensions: infrastructure, individuals and institutions. In the field of agriculture, the general goal is to promote the utilisation of Earth observations for advancing sustainable agriculture, aquaculture and fisheries. Key issues include early warning, risk assessment, food security, market efficiency and combating desertification. (Source: http://www.research-europe.com/index.php/2011/08/joao-soares-secretariat-expert-for-agriculture-group-on-earth-observations/)
The objective of GEO is to fulfil a vision of a world where decisions and actions are informed by coordinated, comprehensive and sustained Earth Observation (EO). This is being pursued mainly through the added value of co-ordinating existing institutions, organized communities, space agencies, in-situ monitoring agencies, scientific institutions, research centres, universities, modelling centres, technology developers and other groups that deal with one or more aspects of EO. To reach this over arching goal, GEO focuses on capacity development in three dimensions: infrastructure, individuals and institutions. In the field of agriculture, the general goal is to promote the utilisation of Earth observations for advancing sustainable agriculture, aquaculture and fisheries. Key issues include early warning, risk assessment, food security, market efficiency and combating desertification. (Source: http://www.research-europe.com/index.php/2011/08/joao-soares-secretariat-expert-for-agriculture-group-on-earth-observations/)
THIS DATASET WAS LAST UPDATED AT 2:11 AM EASTERN ON JULY 15
2019 had the most mass killings since at least the 1970s, according to the Associated Press/USA TODAY/Northeastern University Mass Killings Database.
In all, there were 45 mass killings, defined as when four or more people are killed excluding the perpetrator. Of those, 33 were mass shootings . This summer was especially violent, with three high-profile public mass shootings occurring in the span of just four weeks, leaving 38 killed and 66 injured.
A total of 229 people died in mass killings in 2019.
The AP's analysis found that more than 50% of the incidents were family annihilations, which is similar to prior years. Although they are far less common, the 9 public mass shootings during the year were the most deadly type of mass murder, resulting in 73 people's deaths, not including the assailants.
One-third of the offenders died at the scene of the killing or soon after, half from suicides.
The Associated Press/USA TODAY/Northeastern University Mass Killings database tracks all U.S. homicides since 2006 involving four or more people killed (not including the offender) over a short period of time (24 hours) regardless of weapon, location, victim-offender relationship or motive. The database includes information on these and other characteristics concerning the incidents, offenders, and victims.
The AP/USA TODAY/Northeastern database represents the most complete tracking of mass murders by the above definition currently available. Other efforts, such as the Gun Violence Archive or Everytown for Gun Safety may include events that do not meet our criteria, but a review of these sites and others indicates that this database contains every event that matches the definition, including some not tracked by other organizations.
This data will be updated periodically and can be used as an ongoing resource to help cover these events.
To get basic counts of incidents of mass killings and mass shootings by year nationwide, use these queries:
To get these counts just for your state:
Mass murder is defined as the intentional killing of four or more victims by any means within a 24-hour period, excluding the deaths of unborn children and the offender(s). The standard of four or more dead was initially set by the FBI.
This definition does not exclude cases based on method (e.g., shootings only), type or motivation (e.g., public only), victim-offender relationship (e.g., strangers only), or number of locations (e.g., one). The time frame of 24 hours was chosen to eliminate conflation with spree killers, who kill multiple victims in quick succession in different locations or incidents, and to satisfy the traditional requirement of occurring in a “single incident.”
Offenders who commit mass murder during a spree (before or after committing additional homicides) are included in the database, and all victims within seven days of the mass murder are included in the victim count. Negligent homicides related to driving under the influence or accidental fires are excluded due to the lack of offender intent. Only incidents occurring within the 50 states and Washington D.C. are considered.
Project researchers first identified potential incidents using the Federal Bureau of Investigation’s Supplementary Homicide Reports (SHR). Homicide incidents in the SHR were flagged as potential mass murder cases if four or more victims were reported on the same record, and the type of death was murder or non-negligent manslaughter.
Cases were subsequently verified utilizing media accounts, court documents, academic journal articles, books, and local law enforcement records obtained through Freedom of Information Act (FOIA) requests. Each data point was corroborated by multiple sources, which were compiled into a single document to assess the quality of information.
In case(s) of contradiction among sources, official law enforcement or court records were used, when available, followed by the most recent media or academic source.
Case information was subsequently compared with every other known mass murder database to ensure reliability and validity. Incidents listed in the SHR that could not be independently verified were excluded from the database.
Project researchers also conducted extensive searches for incidents not reported in the SHR during the time period, utilizing internet search engines, Lexis-Nexis, and Newspapers.com. Search terms include: [number] dead, [number] killed, [number] slain, [number] murdered, [number] homicide, mass murder, mass shooting, massacre, rampage, family killing, familicide, and arson murder. Offender, victim, and location names were also directly searched when available.
This project started at USA TODAY in 2012.
Contact AP Data Editor Justin Myers with questions, suggestions or comments about this dataset at jmyers@ap.org. The Northeastern University researcher working with AP and USA TODAY is Professor James Alan Fox, who can be reached at j.fox@northeastern.edu or 617-416-4400.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The World Health Organization reported 766440796 Coronavirus Cases since the epidemic began. In addition, countries reported 6932591 Coronavirus Deaths. This dataset provides - World Coronavirus Cases- actual values, historical data, forecast, chart, statistics, economic calendar and news.
This is the data repository for the 2019 Novel Coronavirus Visual Dashboard operated by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE). This database was created in response to the Coronavirus public health emergency to track reported cases in real-time. The data include the location and number of confirmed COVID-19 cases, deaths and recoveries for all affected countries, aggregated at the appropriate province or state. It was developed to enable researchers, public health authorities and the general public to track the outbreak as it unfolds. Additional information is available in the blog post, Mapping 2019-nCoV (https://systems.jhu.edu/research/public-health/ncov/), and included data sources are listed here: https://github.com/CSSEGISandData/COVID-19
How many confirmed COVID-19 cases were there in the US, by state?
This query determines the total number of cases by province in February. A "province_state" can refer to any subset of the US in this particular dataset, including a county or state.
SELECT
province_state,
confirmed AS feb_confirmed_cases,
FROM
bigquery-public-data.covid19_jhu_csse.summary
WHERE
country_region = "US"
AND date = '2020-02-29'
ORDER BY
feb_confirmed_cases desc
Which countries with the highest number of confirmed cases have the most per capita? This query joins the Johns Hopkins dataset with the World Bank's global population data to determine which countries among those with the highest total number of confirmed cases have the most confirmed cases per capita.
with country_pop AS(
SELECT
IF(country = "United States","US",IF(country="Iran, Islamic Rep.","Iran",country)) AS country,
year_2018
FROM
bigquery-public-data.world_bank_global_population.population_by_country
)
SELECT
cases.date AS date,
cases.country_region AS country_region,
SUM(cases.confirmed) AS total_confirmed_cases,
SUM(cases.confirmed)/AVG(country_pop.year_2018) * 100000 AS confirmed_cases_per_100000
FROM
bigquery-public-data.covid19_jhu_csse.summary
cases
JOIN
country_pop ON cases.country_region LIKE CONCAT('%',country_pop.country,'%')
WHERE
cases.country_region = "US"
AND country_pop.country = "US"
AND cases.date = DATE_SUB(current_date(),INTERVAL 1 day)
GROUP BY
country_region, date
UNION ALL
SELECT
cases.date AS date,
cases.country_region AS country_region,
SUM(cases.confirmed) AS total_confirmed_cases,
SUM(cases.confirmed)/AVG(country_pop.year_2018) * 100000 AS confirmed_cases_per_100000
FROM
bigquery-public-data.covid19_jhu_csse.summary
cases
JOIN
country_pop ON cases.country_region LIKE CONCAT('%',country_pop.country,'%')
WHERE
cases.country_region = "France"
AND country_pop.country = "France"
AND cases.date = DATE_SUB(current_date(),INTERVAL 1 day)
GROUP BY
country_region, date
UNION ALL
SELECT
cases.date AS date,
cases.country_region AS country_region,
SUM(cases.confirmed) AS total_confirmed_cases,
SUM(cases.confirmed)/AVG(country_pop.year_2018) * 100000 AS confirmed_cases_per_100000
FROM
bigquery-public-data.covid19_jhu_csse.summary
cases
JOIN
country_pop ON cases.country_region LIKE CONCAT('%',country_pop.country,'%')
WHERE
cases.country_region = "China"
AND country_pop.country = "China"
AND cases.date = DATE_SUB(current_date(),INTERVAL 1 day)
GROUP BY country_region, date
UNION ALL
SELECT
cases.date AS date,
cases.country_region AS country_region,
cases.confirmed AS total_confirmed_cases,
cases.confirmed/country_pop.year_2018 * 100000 AS confirmed_cases_per_100000
FROM
bigquery-public-data.covid19_jhu_csse.summary
cases
JOIN
country_pop ON cases.country_region LIKE CONCAT('%',country_pop.country,'%')
WHERE
cases.country_region IN ("Italy", "Spain", "Germany", "Iran")
AND cases.date = DATE_SUB(current_date(),INTERVAL 1 day)
ORDER BY
confirmed_cases_per_100000 desc
JHU CSSE
Daily
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Historical chart and dataset showing World death rate by year from 1950 to 2025.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
For the first time, the full results from the Global Green Economy Index (GGEI) are available in the public domain. Historically, only the aggregate results have been publicly accessible. The full dataset has been paywalled and accessible to our subscribers only. But the way in which we release GGEI data to the public is changing. Read on for a quick explanation for how and why.
First, the how. The GGEI file publicly accessible today represents that dataset officially compiled in 2022. It contains the full results for each of the 18 indicators in the GGEI for 160 countries, across the four main dimensions of climate change & social equity, sector decarbonization, markets & ESG investment and the environment. Some (not all) of these data points have since been updated, as new datasets have been published. The GGEI is a dynamic model, updating in real-time as new data becomes available. Our subscribing clients will still receive this most timely version of the model, along with any customizations they may request.
Now, the why. First and foremost, there is huge demand among academic researchers globally for the full GGEI dataset. Academic inquiry around the green transition, sustainable development, ESG investing, and green energy systems has exploded over the past several years. We receive hundreds of inquiries annually from these students and researchers to access the full GGEI dataset. Making it publicly accessible as we are today makes it easier for these individuals and institutions to use these GGEI to promote learning and green progress within their institutions.
More broadly, the landscape for data has changed significantly. A decade ago when the GGEI was first published, datasets existed more in silos and users might subscribe to one specific dataset like the GGEI to answer a specific question. But today, data usage in the sustainability space has become much more of a system, whereby myriad data sources are synthesized into increasingly sophisticated models, often fueled by artificial intelligence. Making the GGEI more accessible will accelerate how this perspective on the global green economy can be integrated to these systems.
TROPIS is the acronym for the Tree Growth and Permanent Plot Information System sponsored by CIFOR to promote more effective use of existing data and knowledge about tree growth.
TROPIS is concerned primarily with information about permanent plots and tree
growth in both planted and natural forests throughout the world. It has five
components:
- a network of people willing to share permanent plot data and tree
growth information;
- an index to people and institutions with permanent plots;
- a database management system to promote more efficient data management;
- a method to find comparable sites elsewhere, so that observations
can be supplemented or contrasted with other data; and
- an inference system to allow growth estimates to be made in the
absence of empirical data.
- TROPIS is about people and information. The core of TROPIS is an
index to people and their plots maintained in a relational
database. The database is designed to fulfil two primary needs:
- to provide for efficient cross-checking, error-checking and
updating; and to facilitate searches for plots matching a wide range
of specified criteria, including (but not limited to) location, forest
type, taxa, plot area, measurement history.
The database is essentially hierarchical: the key element of the
database is the informant. Each informant may contribute information
on many plot series, each of which has consistent objectives. In turn,
each series may comprise many plots, each of which may have a
different location or different size. Each plot may contain many
species. A series may be a thinning or spacing experiment, some
species or provenance trials, a continuous forest inventory system, or
any other aggregation of plots convenient to the informant. Plots need
not be current. Abandoned plots may be included provided that the
location is known and the plot data remain accessible. In addition to
details of the informant, we try to record details of additional
contact people associated with plots, to maintain continuity when
people transfer or retire. Thus the relational structure may appear
complex, but ensures data integrity.
At present, searches are possible only via mail, fax or email requests
to the TROPIS co-ordinator at CIFOR. Self-service on-line searching
will also be available in 1997. Clients may search for plots with
specified taxa, locations, silvicultural treatment, or other specified
criteria and combinations. TROPIS currently contains references to
over 10,000 plots with over 2,000 species contributed by 100
individuals world-wide.
This database will help CIFOR as well as other users to make more
efficient use of existing information, and to develop appropriate and
effective techniques and policies for sustainable forest management
world-wide.
TROPIS is supported by the Government of Japan.
This information is from the CIFOR web site.
The TIGER/Line Files are shapefiles and related database files (.dbf) that are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line File is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. Census tracts are small, relatively permanent statistical subdivisions of a county or equivalent entity, and were defined by local participants as part of the 2010 Census Participant Statistical Areas Program. The Census Bureau delineated the census tracts in situations where no local participant existed or where all the potential participants declined to participate. The primary purpose of census tracts is to provide a stable set of geographic units for the presentation of census data and comparison back to previous decennial censuses. Census tracts generally have a population size between 1,200 and 8,000 people, with an optimum size of 4,000 people. When first delineated, census tracts were designed to be homogeneous with respect to population characteristics, economic status, and living conditions. The spatial size of census tracts varies widely depending on the density of settlement. Physical changes in street patterns caused by highway construction, new development, and so forth, may require boundary revisions. In addition, census tracts occasionally are split due to population growth, or combined as a result of substantial population decline. Census tract boundaries generally follow visible and identifiable features. They may follow legal boundaries such as minor civil division (MCD) or incorporated place boundaries in some States and situations to allow for census tract-to-governmental unit relationships where the governmental boundaries tend to remain unchanged between censuses. State and county boundaries always are census tract boundaries in the standard census geographic hierarchy. In a few rare instances, a census tract may consist of noncontiguous areas. These noncontiguous areas may occur where the census tracts are coextensive with all or parts of legal entities that are themselves noncontiguous. For the 2010 Census, the census tract code range of 9400 through 9499 was enforced for census tracts that include a majority American Indian population according to Census 2000 data and/or their area was primarily covered by federally recognized American Indian reservations and/or off-reservation trust lands; the code range 9800 through 9899 was enforced for those census tracts that contained little or no population and represented a relatively large special land use area such as a National Park, military installation, or a business/industrial park; and the code range 9900 through 9998 was enforced for those census tracts that contained only water area, no land area.
The TIGER/Line Files are shapefiles and related database files (.dbf) that are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line File is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. Census tracts are small, relatively permanent statistical subdivisions of a county or equivalent entity, and were defined by local participants as part of the 2010 Census Participant Statistical Areas Program. The Census Bureau delineated the census tracts in situations where no local participant existed or where all the potential participants declined to participate. The primary purpose of census tracts is to provide a stable set of geographic units for the presentation of census data and comparison back to previous decennial censuses. Census tracts generally have a population size between 1,200 and 8,000 people, with an optimum size of 4,000 people. When first delineated, census tracts were designed to be homogeneous with respect to population characteristics, economic status, and living conditions. The spatial size of census tracts varies widely depending on the density of settlement. Physical changes in street patterns caused by highway construction, new development, and so forth, may require boundary revisions. In addition, census tracts occasionally are split due to population growth, or combined as a result of substantial population decline. Census tract boundaries generally follow visible and identifiable features. They may follow legal boundaries such as minor civil division (MCD) or incorporated place boundaries in some States and situations to allow for census tract-to-governmental unit relationships where the governmental boundaries tend to remain unchanged between censuses. State and county boundaries always are census tract boundaries in the standard census geographic hierarchy. In a few rare instances, a census tract may consist of noncontiguous areas. These noncontiguous areas may occur where the census tracts are coextensive with all or parts of legal entities that are themselves noncontiguous. For the 2010 Census, the census tract code range of 9400 through 9499 was enforced for census tracts that include a majority American Indian population according to Census 2000 data and/or their area was primarily covered by federally recognized American Indian reservations and/or off-reservation trust lands; the code range 9800 through 9899 was enforced for those census tracts that contained little or no population and represented a relatively large special land use area such as a National Park, military installation, or a business/industrial park; and the code range 9900 through 9998 was enforced for those census tracts that contained only water area, no land area.
The TIGER/Line Files are shapefiles and related database files (.dbf) that are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line File is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. Census tracts are small, relatively permanent statistical subdivisions of a county or equivalent entity, and were defined by local participants as part of the 2010 Census Participant Statistical Areas Program. The Census Bureau delineated the census tracts in situations where no local participant existed or where all the potential participants declined to participate. The primary purpose of census tracts is to provide a stable set of geographic units for the presentation of census data and comparison back to previous decennial censuses. Census tracts generally have a population size between 1,200 and 8,000 people, with an optimum size of 4,000 people. When first delineated, census tracts were designed to be homogeneous with respect to population characteristics, economic status, and living conditions. The spatial size of census tracts varies widely depending on the density of settlement. Physical changes in street patterns caused by highway construction, new development, and so forth, may require boundary revisions. In addition, census tracts occasionally are split due to population growth, or combined as a result of substantial population decline. Census tract boundaries generally follow visible and identifiable features. They may follow legal boundaries such as minor civil division (MCD) or incorporated place boundaries in some States and situations to allow for census tract-to-governmental unit relationships where the governmental boundaries tend to remain unchanged between censuses. State and county boundaries always are census tract boundaries in the standard census geographic hierarchy. In a few rare instances, a census tract may consist of noncontiguous areas. These noncontiguous areas may occur where the census tracts are coextensive with all or parts of legal entities that are themselves noncontiguous. For the 2010 Census, the census tract code range of 9400 through 9499 was enforced for census tracts that include a majority American Indian population according to Census 2000 data and/or their area was primarily covered by federally recognized American Indian reservations and/or off-reservation trust lands; the code range 9800 through 9899 was enforced for those census tracts that contained little or no population and represented a relatively large special land use area such as a National Park, military installation, or a business/industrial park; and the code range 9900 through 9998 was enforced for those census tracts that contained only water area, no land area.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides values for CORONAVIRUS DEATHS reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.
This is a mixed methods study, comprising both qualitative and quantitative material. The aim of this project was to use the opportunity afforded by the release of the final part of the film trilogy of Lord of the Rings to gather materials allowing an exploration of
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Some say climate change is the biggest threat of our age while others say it’s a myth based on dodgy science. We are turning some of the data over to you so you can form your own view.
Even more than with other data sets that Kaggle has featured, there’s a huge amount of data cleaning and preparation that goes into putting together a long-time study of climate trends. Early data was collected by technicians using mercury thermometers, where any variation in the visit time impacted measurements. In the 1940s, the construction of airports caused many weather stations to be moved. In the 1980s, there was a move to electronic thermometers that are said to have a cooling bias.
Given this complexity, there are a range of organizations that collate climate trends data. The three most cited land and ocean temperature data sets are NOAA’s MLOST, NASA’s GISTEMP and the UK’s HadCrut.
We have repackaged the data from a newer compilation put together by the Berkeley Earth, which is affiliated with Lawrence Berkeley National Laboratory. The Berkeley Earth Surface Temperature Study combines 1.6 billion temperature reports from 16 pre-existing archives. It is nicely packaged and allows for slicing into interesting subsets (for example by country). They publish the source data and the code for the transformations they applied. They also use methods that allow weather observations from shorter time series to be included, meaning fewer observations need to be thrown away.
In this dataset, we have include several files:
Global Land and Ocean-and-Land Temperatures (GlobalTemperatures.csv):
Other files include:
The raw data comes from the Berkeley Earth data page.