The world population surpassed eight billion people in 2022, having doubled from its figure less than 50 years previously. Looking forward, it is projected that the world population will reach nine billion in 2038, and 10 billion in 2060, but it will peak around 10.3 billion in the 2080s before it then goes into decline. Regional variations The global population has seen rapid growth since the early 1800s, due to advances in areas such as food production, healthcare, water safety, education, and infrastructure, however, these changes did not occur at a uniform time or pace across the world. Broadly speaking, the first regions to undergo their demographic transitions were Europe, North America, and Oceania, followed by Latin America and Asia (although Asia's development saw the greatest variation due to its size), while Africa was the last continent to undergo this transformation. Because of these differences, many so-called "advanced" countries are now experiencing population decline, particularly in Europe and East Asia, while the fastest population growth rates are found in Sub-Saharan Africa. In fact, the roughly two billion difference in population between now and the 2080s' peak will be found in Sub-Saharan Africa, which will rise from 1.2 billion to 3.2 billion in this time (although populations in other continents will also fluctuate). Changing projections The United Nations releases their World Population Prospects report every 1-2 years, and this is widely considered the foremost demographic dataset in the world. However, recent years have seen a notable decline in projections when the global population will peak, and at what number. Previous reports in the 2010s had suggested a peak of over 11 billion people, and that population growth would continue into the 2100s, however a sooner and shorter peak is now projected. Reasons for this include a more rapid population decline in East Asia and Europe, particularly China, as well as a prolongued development arc in Sub-Saharan Africa.
These data were produced by the WorldPop Research Group at the University of Southampton. This work was part of the GRID3 project with funding from the Bill and Melinda Gates Foundation and the United Kingdom’s Department for International Development (OPP1182408). Project partners included the United Nations Population Fund, Center for International Earth Science Information Network in the Earth Institute at Columbia University, and the Flowminder Foundation. These data may be distributed using a Creative Commons Attribution Share-Alike 4.0 License. Contact release@worldpop.org for more information.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The "Forest Proximate People" (FPP) dataset is one of the data layers contributing to the development of indicator #13, “number of forest-dependent people in extreme poverty,” of the Collaborative Partnership on Forests (CPF) Global Core Set of forest-related indicators (GCS). The FPP dataset provides an estimate of the number of people living in or within 5 kilometers of forests (forest-proximate people) for the year 2019 with a spatial resolution of 100 meters at a global level.
For more detail, such as the theory behind this indicator and the definition of parameters, and to cite this data, see: Newton, P., Castle, S.E., Kinzer, A.T., Miller, D.C., Oldekop, J.A., Linhares-Juvenal, T., Pina, L. Madrid, M., & de Lamo, J. 2022. The number of forest- and tree-proximate people: A new methodology and global estimates. Background Paper to The State of the World’s Forests 2022 report. Rome, FAO.
Contact points:
Maintainer: Leticia Pina
Maintainer: Sarah E., Castle
Data lineage:
The FPP data are generated using Google Earth Engine. Forests are defined by the Copernicus Global Land Cover (CGLC) (Buchhorn et al. 2020) classification system’s definition of forests: tree cover ranging from 15-100%, with or without understory of shrubs and grassland, and including both open and closed forests. Any area classified as forest sized ≥ 1 ha in 2019 was included in this definition. Population density was defined by the WorldPop global population data for 2019 (WorldPop 2018). High density urban populations were excluded from the analysis. High density urban areas were defined as any contiguous area with a total population (using 2019 WorldPop data for population) of at least 50,000 people and comprised of pixels all of which met at least one of two criteria: either the pixel a) had at least 1,500 people per square km, or b) was classified as “built-up” land use by the CGLC dataset (where “built-up” was defined as land covered by buildings and other manmade structures) (Dijkstra et al. 2020). Using these datasets, any rural people living in or within 5 kilometers of forests in 2019 were classified as forest proximate people. Euclidean distance was used as the measure to create a 5-kilometer buffer zone around each forest cover pixel. The scripts for generating the forest-proximate people and the rural-urban datasets using different parameters or for different years are published and available to users. For more detail, such as the theory behind this indicator and the definition of parameters, and to cite this data, see: Newton, P., Castle, S.E., Kinzer, A.T., Miller, D.C., Oldekop, J.A., Linhares-Juvenal, T., Pina, L., Madrid, M., & de Lamo, J. 2022. The number of forest- and tree-proximate people: a new methodology and global estimates. Background Paper to The State of the World’s Forests 2022. Rome, FAO.
References:
Buchhorn, M., Smets, B., Bertels, L., De Roo, B., Lesiv, M., Tsendbazar, N.E., Herold, M., Fritz, S., 2020. Copernicus Global Land Service: Land Cover 100m: collection 3 epoch 2019. Globe.
Dijkstra, L., Florczyk, A.J., Freire, S., Kemper, T., Melchiorri, M., Pesaresi, M. and Schiavina, M., 2020. Applying the degree of urbanisation to the globe: A new harmonised definition reveals a different picture of global urbanisation. Journal of Urban Economics, p.103312.
WorldPop (www.worldpop.org - School of Geography and Environmental Science, University of Southampton; Department of Geography and Geosciences, University of Louisville; Departement de Geographie, Universite de Namur) and Center for International Earth Science Information Network (CIESIN), Columbia University, 2018. Global High Resolution Population Denominators Project - Funded by The Bill and Melinda Gates Foundation (OPP1134076). https://dx.doi.org/10.5258/SOTON/WP00645
Online resources:
GEE asset for "Forest proximate people - 5km cutoff distance"
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
All cities with a population > 1000 or seats of adm div (ca 80.000)Sources and ContributionsSources : GeoNames is aggregating over hundred different data sources. Ambassadors : GeoNames Ambassadors help in many countries. Wiki : A wiki allows to view the data and quickly fix error and add missing places. Donations and Sponsoring : Costs for running GeoNames are covered by donations and sponsoring.Enrichment:add country name
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Some say climate change is the biggest threat of our age while others say it’s a myth based on dodgy science. We are turning some of the data over to you so you can form your own view.
Even more than with other data sets that Kaggle has featured, there’s a huge amount of data cleaning and preparation that goes into putting together a long-time study of climate trends. Early data was collected by technicians using mercury thermometers, where any variation in the visit time impacted measurements. In the 1940s, the construction of airports caused many weather stations to be moved. In the 1980s, there was a move to electronic thermometers that are said to have a cooling bias.
Given this complexity, there are a range of organizations that collate climate trends data. The three most cited land and ocean temperature data sets are NOAA’s MLOST, NASA’s GISTEMP and the UK’s HadCrut.
We have repackaged the data from a newer compilation put together by the Berkeley Earth, which is affiliated with Lawrence Berkeley National Laboratory. The Berkeley Earth Surface Temperature Study combines 1.6 billion temperature reports from 16 pre-existing archives. It is nicely packaged and allows for slicing into interesting subsets (for example by country). They publish the source data and the code for the transformations they applied. They also use methods that allow weather observations from shorter time series to be included, meaning fewer observations need to be thrown away.
In this dataset, we have include several files:
Global Land and Ocean-and-Land Temperatures (GlobalTemperatures.csv):
Other files include:
The raw data comes from the Berkeley Earth data page.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about book subjects. It has 2 rows and is filtered where the books is Origins : how the Earth shaped human history. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
What happens in the vast stretches of the world's oceans - both wondrous and worrisome - has too often been out of sight, out of mind. The sea represents the last major scientific frontier on planet earth - a place where expeditions continue to discover not only new species, but even new phyla. The role of these species in the ecosystem, where they sit in the tree of life, and how they respond to environmental changes really do constitute mysteries of the deep. Despite technological advances that now allow people to access, exploit or affect nearly all parts of the ocean, we still understand very little of the ocean's biodiversity and how it is changing under our influence. The goal of the research presented here is to estimate and visualize, for the first time, the global impact humans are having on the ocean's ecosystems. Our analysis, published in Science, February 15, 2008 (http://doi.org/10.1126/science.1149345), shows that over 40% of the world's oceans are heavily affected by human activities and few if any areas remain untouched. This dataset contains raw stressor data from 17 different human activities that directly or indirectly have an impact on the ecological communities in the ocean's ecosystems. For more information on specific dataset, see the methods section. All data are projected in WGS 1984 Mollweide.
The Macquarie Island Station Area GIS Dataset is a topographic and facilities data base covering Australia's Macquarie Island Station and its immediate environs. The database includes all man made and natural features within the operational area of the station proper. Attributes are held for many facilities including, buildings, site services, communications, fuel storage, aeronautical and management zones. The spatial data have been compiled from low level aerial photography, ground surveys and engineering plans. Detail attribution of hydraulic site services includes make, size and engineering plan number.
The dataset conforms to the SCAR Feature Catalogue which includes data quality information.
The data is included in the data available for download from a Related URL below. The data conforms to the SCAR Feature Catalogue which includes data quality information. See a Related URL below. Data described by this metadata record has Dataset_id = 25. Each feature has a Qinfo number which, when entered at the 'Search datasets & quality' tab, provides data quality information for the feature.
Changes have occurred at the station since this dataset was produced. For example some buildings and other structures have been removed and some added. As a result the data available for download from a Related URL below is updated with new data having different Dataset_id(s).
Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
License information was derived automatically
The Gridded Population of the World, Version 4 (GPWv4): Population Density, Revision 11 consists of estimates of human population density (number of persons per square kilometer) based on counts consistent with national censuses and population registers, for the years 2000, 2005, 2010, 2015, and 2020. A proportional allocation gridding algorithm, utilizing approximately 13.5 million national and sub-national administrative units, was used to assign population counts to 30 arc-second grid cells. The population density rasters were created by dividing the population count raster for a given target year by the land area raster. The data files were produced as global rasters at 30 arc-second (~1 km at the equator) resolution.
Purpose: To provide estimates of population density for the years 2000, 2005, 2010, 2015, and 2020, based on counts consistent with national censuses and population registers, as raster data to facilitate data integration.
Recommended Citation(s)*: Center for International Earth Science Information Network - CIESIN - Columbia University. 2018. Gridded Population of the World, Version 4 (GPWv4): Population Density, Revision 11. Palisades, NY: NASA Socioeconomic Data and Applications Center (SEDAC). https://doi.org/10.7927/H49C6VHW. Accessed DAY MONTH YEAR.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 2 rows and is filtered where the book is Between heaven and earth : the religious worlds people make and the scholars who study them. It features 7 columns including author, publication date, language, and book publisher.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset and the validation are fully described in a Nature Scientific Data Descriptor https://www.nature.com/articles/s41597-019-0265-5
If you want to use this dataset in an interactive environment, then use this link https://mybinder.org/v2/gh/GeographerAtLarge/TravelTime/HEAD
The following text is a summary of the information in the above Data Descriptor.
The dataset is a suite of global travel-time accessibility indicators for the year 2015, at approximately one-kilometre spatial resolution for the entire globe. The indicators show an estimated (and validated), land-based travel time to the nearest city and nearest port for a range of city and port sizes.
The datasets are in GeoTIFF format and are suitable for use in Geographic Information Systems and statistical packages for mapping access to cities and ports and for spatial and statistical analysis of the inequalities in access by different segments of the population.
These maps represent a unique global representation of physical access to essential services offered by cities and ports.
The datasets travel_time_to_cities_x.tif (where x has values from 1 to 12) The value of each pixel is the estimated travel time in minutes to the nearest urban area in 2015. There are 12 data layers based on different sets of urban areas, defined by their population in year 2015 (see PDF report).
travel_time_to_ports_x (x ranges from 1 to 5)
The value of each pixel is the estimated travel time to the nearest port in 2015. There are 5 data layers based on different port sizes.
Format Raster Dataset, GeoTIFF, LZW compressed Unit Minutes
Data type Byte (16 bit Unsigned Integer)
No data value 65535
Flags None
Spatial resolution 30 arc seconds
Spatial extent
Upper left -180, 85
Lower left -180, -60 Upper right 180, 85 Lower right 180, -60 Spatial Reference System (SRS) EPSG:4326 - WGS84 - Geographic Coordinate System (lat/long)
Temporal resolution 2015
Temporal extent Updates may follow for future years, but these are dependent on the availability of updated inputs on travel times and city locations and populations.
Methodology Travel time to the nearest city or port was estimated using an accumulated cost function (accCost) in the gdistance R package (van Etten, 2018). This function requires two input datasets: (i) a set of locations to estimate travel time to and (ii) a transition matrix that represents the cost or time to travel across a surface.
The set of locations were based on populated urban areas in the 2016 version of the Joint Research Centre’s Global Human Settlement Layers (GHSL) datasets (Pesaresi and Freire, 2016) that represent low density (LDC) urban clusters and high density (HDC) urban areas (https://ghsl.jrc.ec.europa.eu/datasets.php). These urban areas were represented by points, spaced at 1km distance around the perimeter of each urban area.
Marine ports were extracted from the 26th edition of the World Port Index (NGA, 2017) which contains the location and physical characteristics of approximately 3,700 major ports and terminals. Ports are represented as single points
The transition matrix was based on the friction surface (https://map.ox.ac.uk/research-project/accessibility_to_cities) from the 2015 global accessibility map (Weiss et al, 2018).
Code The R code used to generate the 12 travel time maps is included in the zip file that can be downloaded with these data layers. The processing zones are also available.
Validation The underlying friction surface was validated by comparing travel times between 47,893 pairs of locations against journey times from a Google API. Our estimated journey times were generally shorter than those from the Google API. Across the tiles, the median journey time from our estimates was 88 minutes within an interquartile range of 48 to 143 minutes while the median journey time estimated by the Google API was 106 minutes within an interquartile range of 61 to 167 minutes. Across all tiles, the differences were skewed to the left and our travel time estimates were shorter than those reported by the Google API in 72% of the tiles. The median difference was −13.7 minutes within an interquartile range of −35.5 to 2.0 minutes while the absolute difference was 30 minutes or less for 60% of the tiles and 60 minutes or less for 80% of the tiles. The median percentage difference was −16.9% within an interquartile range of −30.6% to 2.7% while the absolute percentage difference was 20% or less in 43% of the tiles and 40% or less in 80% of the tiles.
This process and results are included in the validation zip file.
Usage Notes The accessibility layers can be visualised and analysed in many Geographic Information Systems or remote sensing software such as QGIS, GRASS, ENVI, ERDAS or ArcMap, and also by statistical and modelling packages such as R or MATLAB. They can also be used in cloud-based tools for geospatial analysis such as Google Earth Engine.
The nine layers represent travel times to human settlements of different population ranges. Two or more layers can be combined into one layer by recording the minimum pixel value across the layers. For example, a map of travel time to the nearest settlement of 5,000 to 50,000 people could be generated by taking the minimum of the three layers that represent the travel time to settlements with populations between 5,000 and 10,000, 10,000 and 20,000 and, 20,000 and 50,000 people.
The accessibility layers also permit user-defined hierarchies that go beyond computing the minimum pixel value across layers. A user-defined complete hierarchy can be generated when the union of all categories adds up to the global population, and the intersection of any two categories is empty. Everything else is up to the user in terms of logical consistency with the problem at hand.
The accessibility layers are relative measures of the ease of access from a given location to the nearest target. While the validation demonstrates that they do correspond to typical journey times, they cannot be taken to represent actual travel times. Errors in the friction surface will be accumulated as part of the accumulative cost function and it is likely that locations that are further away from targets will have greater a divergence from a plausible travel time than those that are closer to the targets. Care should be taken when referring to travel time to the larger cities when the locations of interest are extremely remote, although they will still be plausible representations of relative accessibility. Furthermore, a key assumption of the model is that all journeys will use the fastest mode of transport and take the shortest path.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
BreakData Welcome to BreakData, an innovative and cutting-edge dataset devoted to exploring language understanding. This dataset contains a wealth of information related to question decomposition, operators, splits, sources, and allowed tokens and can be used to answer questions with precision. With deep insights into how humans comprehend and interpret language, BreakData provides an immense value for researchers developing sophisticated models that can help advance AI technologies. Our goal is to enable the development of more complex natural language processing which can be used in various applications such as automated customer support, chatbots for health care advice or automated marketing campaigns. Dive into this intriguing dataset now and discover how your work could change the world!
More Datasets For more datasets, click here.
Featured Notebooks 🚨 Your notebook can be here! 🚨! How to use the dataset This dataset provides an exciting opportunity to explore and understand the complexities of language understanding. With this dataset, you can train models for natural language processing (NLP) activities such as question answering, text analytics, automated dialog systems, and more.
In order to make most effective use of the BreakData dataset, it’s important to know how it is organized and what types of data are included in each file. The BreakData dataset is broken down into nine different files:
QDMR_train.csv
QDMR_validation.csv
QDMR-highlevel_train.csv
QDMR-highlevel_test.csv
logicalforms_train.csv
logicalforms_validation.csv
QDMRlexicon_train.csv
QDMRLexicon_test csv
QDHMLexiconHighLevelTest csv
Each file contains a different set of data that can be used to train your models for natural language understanding tasks or analyze existing questions or commands with accurate decompositions and operators from these datasets into their component parts and understand their relationships with each other:
1) The QDMR files include questions or statements from common domains like health care or banking that need to be interpreted according to a series of operators (elements such as verbs). This task requires identifying keywords in the statement or question text that trigger certain responses indicating variable values and variables themselves so any model trained on these datasets will need to accurately identify entities like time references (dates/times), monetary amounts, Boolean values (yes/no), etc., as well as relationships between those entities–all while following a defined rule set specific domain languages specialize in interpreting such text accurately by modeling complex context dependent queries requiring linguistic analysis in multiple steps through rigorous training on this kind of data would optimize decisions made by machines based on human relevant interactions like conversations inducing more accurate next best actions resulting in better decision making respectively matching human scale solution accuracy rate given growing customer demands being served increasingly faster leveraging machine learning models powered by breakdata NLP layer accuracy enabled interpreters able do seamless inference while using this comprehensive training set providing deeper insights with improved results transforming customer engagement quality at unprecedented rate .
2) The LogicalForms files include logical forms containing the building blocks (elements such as operators) for linking ideas together together across different sets of incoming variables which
Research Ideas Developing advanced natural language processing models to analyze questions using decompositions, operators, and splits. Training a machine learning algorithm to predict the semantic meaning of questions based on their decomposition and split. Conducting advanced text analytics by using the allowed tokens dataset to map out how people communicate specific concepts in different contexts or topics
CC0
Original Data Source: Break (Question Decomposition Meaning)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The High Resolution Settlement Layer (HRSL) provides estimates of human population distribution at a resolution of 1 arc-second (approximately 30m) for the year 2015. The population estimates are based on recent census data and high-resolution (0.5m) satellite imagery from DigitalGlobe. The population grids provide detailed delineation of settlements in both urban and rural areas, which is useful for many research areas—from disaster response and humanitarian planning to the development of communications infrastructure. The settlement extent data were developed by the Connectivity Lab at Facebook using computer vision techniques to classify blocks of optical satellite data as settled (containing buildings) or not. Center for International Earth Science Information Networks (CIESIN) at Earth Institute Columbia University used proportional allocation to distribute population data from subnational census data to the settlement extents. Citation: Facebook Connectivity Lab and Center for International Earth Science Information Network - CIESIN - Columbia University. 2016. High Resolution Settlement Layer (HRSL). Source imagery for HRSL © 2016 DigitalGlobe.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Surface Earth System Analysis and Modeling Environment (SESAME) Human-Earth Atlas includes hundreds of variables capturing both human and non-human aspects of the Earth system on two common spatial grids of 1- and 0.25-degree resolution. The Atlas is structured by common spheres, and many variables resolve changes over time. Many of the national-level tabular human system variables are downscaled to spatial grids using dasymetric mapping, accounting for country boundary changes over time. An associated software toolbox allows users to add raster, point, line, polygon, and tabular datasets, transforming them onto a standardized spatial grid at the desired resolution as well as to work conveniently with jurisdictional (e.g. country) data.
File Description: atlas: Contains netCDF files at 1-degree resolution in netCDF format. atlas_p25: Contains selected netCDF files at 0.25-degree resolution. genscripts: Original Jupyter notebook scripts used to generate the atlas. SESAME_Atlas_Documentation_v1.pdf: Documentation file for the SESAME Human-Earth Atlas. SESAME_Human-Earth_Atlas_v1.xlsx: Comprehensive summary and documentation for the SESAME Human-Earth Atlas, including details on pre- and post-processing steps.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Australian Antarctic Data Centre's Mawson Station GIS data were originally mapped from March 1996 aerial photography. Refer to the metadata record 'Mawson Station GIS Dataset'. Since then various features have been added to this data as structures have been removed, moved or established. Some of these features have been surveyed. These surveys have metadata records from which the report describing the survey can be downloaded. However, other features have been 'eyed in' as more accurate data were not available. The eyeing in has been done based on advice from Australian Antarctic Division staff and using as a guide sources such as an aerial photograph, an Engineering plan, a map or a sketch. GPS data or measurements using a measuring tape may also have been used.
The data are included in the data available for download from a Related URL below. The data conform to the SCAR Feature Catalogue which includes data quality information. See a Related URL below. Data described by this metadata record has Dataset_id = 119. Each feature has a Qinfo number which, when entered at the 'Search datasets and quality' tab, provides data quality information for the feature.
This point GIS dataset shows the locations of the fire hydrants and fire hose reels at Mawson station, Antarctica. The data are formatted according to the SCAR Feature Catalogue (see Related URL below). Enter the Qinfo number of any feature at the 'Search datasets and quality' tab to search for data quality information about the feature: for example, the source of the data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset links to the Long Term Ecological Research (LTER) Florida Coastal Everglades (FCE) Core Research Data Table of Contents (DTOC). The DTOC contains links to 173 individual datasets, which may be queried from the DTOC page. FCE Core Research Data are long-term data sets that address FCE LTER objectives and hypotheses, and that are supported primarily by LTER funds. All data are provided with accompanying metadata. Metadata includes details about the data including how, when and by whom a particular set of data was collected, and information regarding the data format. The FCE practice of dataset versioning has been discontinued as of March 2013. All long-term data will have new data appended to the file and the accompanying metadata will be updated. FCE data may be freely downloaded with as few restrictions as possible. Consultation or collaboration with the original investigators is strongly encouraged. Please keep the dataset originator informed of any plans to use the dataset, and include the dataset's proper citation and Digital Object Identifier (DOI) found under the 'How to cite these data' on the dataset's summary table. Resources in this dataset:Resource Title: GeoData catalog record. File Name: Web Page, url: https://geodata.nal.usda.gov/geonetwork/srv/eng/catalog.search#/metadata/FlCoastalEverglades_eaa_2015_March_19_1527
The High-resolution Urban Meteorology for Impacts Dataset, HUMID, will be useful for studies examining spatial variability of near surface meteorology and the impacts of urban heat islands across many disciplines including epidemiology, ecology, and climatology. We have explicitly included representation of spatial meteorological variability over urban areas in the contiguous United States (CONUS) as compared to other observation-only gridded meteorology products by employing the High-Resolution Land Data Assimilation System (HRLDAS), which accounts for the fine-scale impacts of spatiotemporally varying land surfaces on weather. Further, we include in situ meteorological observations such as local mesonets to bias correct the HRLDAS output, creating a model-observation fusion product. The data spans 1 January 1981 to 31 December 2018, covering all of CONUS at 1 km grid spacing. The dataset includes daily maximum, minimum, and mean values for a variety of temperature estimates such as 2 m temperature, skin temperature, urban temperatures, as well as specific humidity and surface energy budget terms.
The full variable list with corresponding file and variable metadata is in this file [https://rda.ucar.edu/OS/web/datasets/d314008/docs/humid_dataset_readme.pdf].
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains a list of the location of sixteen soil samples taken from the vicinity of the Casey EPH fuel tank on the 13/02/2012 and 20/03/2012. Soil samples were taken for Total Petroleum Hydrocarbon (TPH) analysis and will be submitted to Analytical Services Tasmania for said analysis. The investigation is related to a leak detected from the threaded unions on the Casey EPH fuel tank supply line (refer to IHIS incident report 2684). Samples 100114 - 100120 were taken at selected locations within the recognisable spill area and down-gradient of the site on the 13/02/2012 by Dan Wilkins (Scientific Officer, Terrestrial and Nearshore Ecosystems, Science Branch.) Samples 99319-99332 were taken from a 5 m grid sampling pattern on the 20/03/2012 by Johan Mets (Plant Operator), acting under the direction of Dan Wilkins. Frozen conditions prevented samples being obtained from recommended depths (i.e. under the road base).
Fields in the dataset: STD: Sample Tracking Database number (unique identifier) Easting: Easting (UTM 49S) Northing: Northing (UTM 49S) Sample Depth: Depth of sample beneath soil surface (where recorded) Comment: Comment on location of sample and any observation about hydrocarbon Sample Date: Date of sample collection in dd/mm/yyyy format Sampler: Name of sample collector
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset provides a unique and comprehensive corpus for natural language processing tasks, specifically text summarization tools for validating reward models from OpenAI. It contains columns that provide summaries of text from the TL;DR, CNN, and Daily Mail datasets, along with additional information including choices made by workers when summarizing the text, batch information provided to differentiate different summaries created by workers, and dataset attribute splits. All of this data allows users to train state-of-the-art natural language processing systems with real-world data in order to create reliable concise summaries from long form text. This remarkable collection enables developers to explore the possibilities of cutting-edge summarization research while directly holding themselves accountable compared against human generated results
More Datasets For more datasets, click here.
Featured Notebooks 🚨 Your notebook can be here! 🚨! How to use the dataset This dataset provides a comprehensive corpus of human-generated summaries for text from the TL;DR, CNN, and Daily Mail datasets to help machine learning models understand and evaluate natural language processing. The dataset contains training and validation data to optimize machine learning tasks.
To use this dataset for summarization tasks:
Gather information about the text you would like to summarize by looking at the info column entries in the two .csv files (train and validation). Choose which summary you want from the choice column of either .csv file based on your preference for worker or batch type summarization. Review entries in the selected summary's corresponding summaries columns for alternative options with similar content but different word choices/styles that you prefer over the original choice worker or batch entry.. Look through split, worker, batch information for more information regarding each choice before selecting one to use as your desired summary according to its accuracy or clarity with regards to its content Research Ideas Training a natural language processing model to automatically generate summaries of text, using summary and choice data from this dataset. Evaluating OpenAI's reward model for natural language processing on the validation data in order to improve accuracy and performance. Analyzing the worker and batch information, in order to assess different trends among workers or batches that could be indicative of bias or other issues affecting summarization accuracy
Original Data Source: OpenAI Summarization Corpus
The world population surpassed eight billion people in 2022, having doubled from its figure less than 50 years previously. Looking forward, it is projected that the world population will reach nine billion in 2038, and 10 billion in 2060, but it will peak around 10.3 billion in the 2080s before it then goes into decline. Regional variations The global population has seen rapid growth since the early 1800s, due to advances in areas such as food production, healthcare, water safety, education, and infrastructure, however, these changes did not occur at a uniform time or pace across the world. Broadly speaking, the first regions to undergo their demographic transitions were Europe, North America, and Oceania, followed by Latin America and Asia (although Asia's development saw the greatest variation due to its size), while Africa was the last continent to undergo this transformation. Because of these differences, many so-called "advanced" countries are now experiencing population decline, particularly in Europe and East Asia, while the fastest population growth rates are found in Sub-Saharan Africa. In fact, the roughly two billion difference in population between now and the 2080s' peak will be found in Sub-Saharan Africa, which will rise from 1.2 billion to 3.2 billion in this time (although populations in other continents will also fluctuate). Changing projections The United Nations releases their World Population Prospects report every 1-2 years, and this is widely considered the foremost demographic dataset in the world. However, recent years have seen a notable decline in projections when the global population will peak, and at what number. Previous reports in the 2010s had suggested a peak of over 11 billion people, and that population growth would continue into the 2100s, however a sooner and shorter peak is now projected. Reasons for this include a more rapid population decline in East Asia and Europe, particularly China, as well as a prolongued development arc in Sub-Saharan Africa.