Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Some say climate change is the biggest threat of our age while others say it’s a myth based on dodgy science. We are turning some of the data over to you so you can form your own view.
Even more than with other data sets that Kaggle has featured, there’s a huge amount of data cleaning and preparation that goes into putting together a long-time study of climate trends. Early data was collected by technicians using mercury thermometers, where any variation in the visit time impacted measurements. In the 1940s, the construction of airports caused many weather stations to be moved. In the 1980s, there was a move to electronic thermometers that are said to have a cooling bias.
Given this complexity, there are a range of organizations that collate climate trends data. The three most cited land and ocean temperature data sets are NOAA’s MLOST, NASA’s GISTEMP and the UK’s HadCrut.
We have repackaged the data from a newer compilation put together by the Berkeley Earth, which is affiliated with Lawrence Berkeley National Laboratory. The Berkeley Earth Surface Temperature Study combines 1.6 billion temperature reports from 16 pre-existing archives. It is nicely packaged and allows for slicing into interesting subsets (for example by country). They publish the source data and the code for the transformations they applied. They also use methods that allow weather observations from shorter time series to be included, meaning fewer observations need to be thrown away.
In this dataset, we have include several files:
Global Land and Ocean-and-Land Temperatures (GlobalTemperatures.csv):
Other files include:
The raw data comes from the Berkeley Earth data page.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 2 rows and is filtered where the book is Between heaven and earth : the religious worlds people make and the scholars who study them. It features 7 columns including author, publication date, language, and book publisher.
This dataset contains human population density for the state of California and a small portion of western Nevada for the year 2000. The population density is based on US Census Bureau data and has a cell size of 990 meters.
The purpose of the dataset is to provide a consistent statewide human density GIS layer for display, analysis and modeling purposes.
The state of California, and a very small portion of western Nevada, was divided into pixels with a cell size 0.98 km2, or 990 meters on each side. For each pixel, the US Census Bureau data was clipped, the total human population was calculated, and that population was divided by the area to get human density (people/km2) for each pixel.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the White Earth population over the last 20 plus years. It lists the population for each year, along with the year on year change in population, as well as the change in percentage terms for each year. The dataset can be utilized to understand the population change of White Earth across the last two decades. For example, using this dataset, we can identify if the population is declining or increasing. If there is a change, when the population peaked, or if it is still growing and has not reached its peak. We can also compare the trend with the overall trend of United States population over the same period of time.
Key observations
In 2023, the population of White Earth was 93, a 0% decrease year-by-year from 2022. Previously, in 2022, White Earth population was 93, a decline of 4.12% compared to a population of 97 in 2021. Over the last 20 plus years, between 2000 and 2023, population of White Earth increased by 28. In this period, the peak population was 99 in the year 2020. The numbers suggest that the population has already reached its peak and is showing a trend of decline. Source: U.S. Census Bureau Population Estimates Program (PEP).
When available, the data consists of estimates from the U.S. Census Bureau Population Estimates Program (PEP).
Data Coverage:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for White Earth Population by Year. You can refer the same here
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Initially, the format of this dataset was .json, so I converted it to .csv for ease of data processing.
"Online articles from the 25 most popular news sites in Vietnam in July 2022, suitable for practicing Natural Language Processing in Vietnamese.
Online news outlets are an unavoidable part of our society today due to their easy access, mostly free. Their effects on the way communities think and act is becoming a concern for a multitude of groups of people, including legislators, content creators, and marketers, just to name a few. Aside from the effects, what is being written on the news should be a good reflection of people’s will, attention, and even cultural standard.
In Vietnam, even though journalists have received much criticism, especially in recent years, news outlets still receive a lot of traffic (27%) compared to other methods to receive information."
Original Data Source: Vietnamese Online News .csv dataset
Tiny solid and liquid particles suspended in the atmosphere are called aerosols. Windblown dust, sea salts, volcanic ash, smoke from wildfires, and pollution from factories are all examples of aerosols. Depending upon their size, type, and location, aerosols can either cool the surface, or warm it. They can help clouds to form, or they can inhibit cloud formation. And if inhaled, some aerosols can be harmful to people’s health.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This graph was retired this internet :
The "Richest People in the World - 2024" dataset provides a detailed overview of the wealthiest individuals globally for the year 2024. This dataset includes crucial information about the top executives, their net worth, and the countries they are based in, offering valuable insights for economic analysis, market research, and financial studies.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The "Forest Proximate People" (FPP) dataset is one of the data layers contributing to the development of indicator #13, “number of forest-dependent people in extreme poverty,” of the Collaborative Partnership on Forests (CPF) Global Core Set of forest-related indicators (GCS). The FPP dataset provides an estimate of the number of people living in or within 5 kilometers of forests (forest-proximate people) for the year 2019 with a spatial resolution of 100 meters at a global level.
For more detail, such as the theory behind this indicator and the definition of parameters, and to cite this data, see: Newton, P., Castle, S.E., Kinzer, A.T., Miller, D.C., Oldekop, J.A., Linhares-Juvenal, T., Pina, L. Madrid, M., & de Lamo, J. 2022. The number of forest- and tree-proximate people: A new methodology and global estimates. Background Paper to The State of the World’s Forests 2022 report. Rome, FAO.
Contact points:
Maintainer: Leticia Pina
Maintainer: Sarah E., Castle
Data lineage:
The FPP data are generated using Google Earth Engine. Forests are defined by the Copernicus Global Land Cover (CGLC) (Buchhorn et al. 2020) classification system’s definition of forests: tree cover ranging from 15-100%, with or without understory of shrubs and grassland, and including both open and closed forests. Any area classified as forest sized ≥ 1 ha in 2019 was included in this definition. Population density was defined by the WorldPop global population data for 2019 (WorldPop 2018). High density urban populations were excluded from the analysis. High density urban areas were defined as any contiguous area with a total population (using 2019 WorldPop data for population) of at least 50,000 people and comprised of pixels all of which met at least one of two criteria: either the pixel a) had at least 1,500 people per square km, or b) was classified as “built-up” land use by the CGLC dataset (where “built-up” was defined as land covered by buildings and other manmade structures) (Dijkstra et al. 2020). Using these datasets, any rural people living in or within 5 kilometers of forests in 2019 were classified as forest proximate people. Euclidean distance was used as the measure to create a 5-kilometer buffer zone around each forest cover pixel. The scripts for generating the forest-proximate people and the rural-urban datasets using different parameters or for different years are published and available to users. For more detail, such as the theory behind this indicator and the definition of parameters, and to cite this data, see: Newton, P., Castle, S.E., Kinzer, A.T., Miller, D.C., Oldekop, J.A., Linhares-Juvenal, T., Pina, L., Madrid, M., & de Lamo, J. 2022. The number of forest- and tree-proximate people: a new methodology and global estimates. Background Paper to The State of the World’s Forests 2022. Rome, FAO.
References:
Buchhorn, M., Smets, B., Bertels, L., De Roo, B., Lesiv, M., Tsendbazar, N.E., Herold, M., Fritz, S., 2020. Copernicus Global Land Service: Land Cover 100m: collection 3 epoch 2019. Globe.
Dijkstra, L., Florczyk, A.J., Freire, S., Kemper, T., Melchiorri, M., Pesaresi, M. and Schiavina, M., 2020. Applying the degree of urbanisation to the globe: A new harmonised definition reveals a different picture of global urbanisation. Journal of Urban Economics, p.103312.
WorldPop (www.worldpop.org - School of Geography and Environmental Science, University of Southampton; Department of Geography and Geosciences, University of Louisville; Departement de Geographie, Universite de Namur) and Center for International Earth Science Information Network (CIESIN), Columbia University, 2018. Global High Resolution Population Denominators Project - Funded by The Bill and Melinda Gates Foundation (OPP1134076). https://dx.doi.org/10.5258/SOTON/WP00645
Online resources:
GEE asset for "Forest proximate people - 5km cutoff distance"
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
BreakData Welcome to BreakData, an innovative and cutting-edge dataset devoted to exploring language understanding. This dataset contains a wealth of information related to question decomposition, operators, splits, sources, and allowed tokens and can be used to answer questions with precision. With deep insights into how humans comprehend and interpret language, BreakData provides an immense value for researchers developing sophisticated models that can help advance AI technologies. Our goal is to enable the development of more complex natural language processing which can be used in various applications such as automated customer support, chatbots for health care advice or automated marketing campaigns. Dive into this intriguing dataset now and discover how your work could change the world!
More Datasets For more datasets, click here.
Featured Notebooks 🚨 Your notebook can be here! 🚨! How to use the dataset This dataset provides an exciting opportunity to explore and understand the complexities of language understanding. With this dataset, you can train models for natural language processing (NLP) activities such as question answering, text analytics, automated dialog systems, and more.
In order to make most effective use of the BreakData dataset, it’s important to know how it is organized and what types of data are included in each file. The BreakData dataset is broken down into nine different files:
QDMR_train.csv
QDMR_validation.csv
QDMR-highlevel_train.csv
QDMR-highlevel_test.csv
logicalforms_train.csv
logicalforms_validation.csv
QDMRlexicon_train.csv
QDMRLexicon_test csv
QDHMLexiconHighLevelTest csv
Each file contains a different set of data that can be used to train your models for natural language understanding tasks or analyze existing questions or commands with accurate decompositions and operators from these datasets into their component parts and understand their relationships with each other:
1) The QDMR files include questions or statements from common domains like health care or banking that need to be interpreted according to a series of operators (elements such as verbs). This task requires identifying keywords in the statement or question text that trigger certain responses indicating variable values and variables themselves so any model trained on these datasets will need to accurately identify entities like time references (dates/times), monetary amounts, Boolean values (yes/no), etc., as well as relationships between those entities–all while following a defined rule set specific domain languages specialize in interpreting such text accurately by modeling complex context dependent queries requiring linguistic analysis in multiple steps through rigorous training on this kind of data would optimize decisions made by machines based on human relevant interactions like conversations inducing more accurate next best actions resulting in better decision making respectively matching human scale solution accuracy rate given growing customer demands being served increasingly faster leveraging machine learning models powered by breakdata NLP layer accuracy enabled interpreters able do seamless inference while using this comprehensive training set providing deeper insights with improved results transforming customer engagement quality at unprecedented rate .
2) The LogicalForms files include logical forms containing the building blocks (elements such as operators) for linking ideas together together across different sets of incoming variables which
Research Ideas Developing advanced natural language processing models to analyze questions using decompositions, operators, and splits. Training a machine learning algorithm to predict the semantic meaning of questions based on their decomposition and split. Conducting advanced text analytics by using the allowed tokens dataset to map out how people communicate specific concepts in different contexts or topics
CC0
Original Data Source: Break (Question Decomposition Meaning)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset features three gridded population dadasets of Germany on a 10m grid. The units are people per grid cell.
Datasets
DE_POP_VOLADJ16: This dataset was produced by disaggregating national census counts to 10m grid cells based on a weighted dasymetric mapping approach. A building density, building height and building type dataset were used as underlying covariates, with an adjusted volume for multi-family residential buildings.
DE_POP_TDBP: This dataset is considered a best product, based on a dasymetric mapping approach that disaggregated municipal census counts to 10m grid cells using the same three underyling covariate layers.
DE_POP_BU: This dataset is based on a bottom-up gridded population estimate. A building density, building height and building type layer were used to compute a living floor area dataset in a 10m grid. Using federal statistics on the average living floor are per capita, this bottom-up estimate was created.
Please refer to the related publication for details.
Temporal extent
The building density layer is based on Sentinel-2 time series data from 2018 and Sentinel-1 time series data from 2017 (doi: http://doi.org/10.1594/PANGAEA.920894)
The building height layer is representative for ca. 2015 (doi: 10.5281/zenodo.4066295)
The building types layer is based on Sentinel-2 time series data from 2018 and Sentinel-1 time series data from 2017 (doi: 10.5281/zenodo.4601219)
The underlying census data is from 2018.
Data format
The data come in tiles of 30x30km (see shapefile). The projection is EPSG:3035. The images are compressed GeoTiff files (*.tif). There is a mosaic in GDAL Virtual format (*.vrt), which can readily be opened in most Geographic Information Systems.
Further information
For further information, please see the publication or contact Franz Schug (franz.schug@geo.hu-berlin.de).
A web-visualization of this dataset is available here.
Publication
Schug, F., Frantz, D., van der Linden, S., & Hostert, P. (2021). Gridded population mapping for Germany based on building density, height and type from Earth Observation data using census disaggregation and bottom-up estimates. PLOS ONE. DOI: 10.1371/journal.pone.0249044
Acknowledgements
Census data were provided by the German Federal Statistical Offices.
Funding
This dataset was produced with funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (MAT_STOCKS, grant agreement No 741950).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Approximately 70%–75% of people worldwide have no formally registered land rights. Fit-For-Purpose Land Administration was introduced to address this problem and focuses on delineating visible cadastral boundaries from earth observation imagery. Recent studies have shown the potential of deep learning models to extract these visible cadastral boundaries automatically. However, studies are limited by the small size and geographical coverage of available datasets and by the lack of information about which cadastral boundaries are visible, i.e., associated with a physical object boundary. To overcome these problems, we present CadastreVision, a benchmark dataset containing cadastral reference data and corresponding multi-resolution earth observation imagery from The Netherlands, with a spatial resolution ranging from 0.1 m to 10 m. The ratio between visible and non-visible cadastral boundaries is essential to evaluate the potential automation level in cadastral boundary extraction from earth observation images and interpret results obtained by deep learning models. We investigate this ratio using a novel analysis pipeline that overlays cadastral reference data with visible topographic object boundaries. Our results show that approximately 72% of the total length of cadastral boundaries in The Netherlands are visible. CadastreVision will enable new developments in cadastral boundary delineation and future endeavours to investigate knowledge transfer to data-scarce areas
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
All cities with a population > 1000 or seats of adm div (ca 80.000)Sources and ContributionsSources : GeoNames is aggregating over hundred different data sources. Ambassadors : GeoNames Ambassadors help in many countries. Wiki : A wiki allows to view the data and quickly fix error and add missing places. Donations and Sponsoring : Costs for running GeoNames are covered by donations and sponsoring.Enrichment:add country name
https://www.worldbank.org/en/about/legal/terms-of-use-for-datasetshttps://www.worldbank.org/en/about/legal/terms-of-use-for-datasets
This dataset contains information about the world's forest area changes since 1990. Studying global forest area changes is crucial for assessing environmental health, informing conservation strategies, and understanding the impact of human activities on biodiversity and climate regulation.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16711385%2F03fa2134e725d952c43d82c111934754%2Fhero-description.jpg?generation=1705537159656663&alt=media" alt="">
Hydra 3 Pulse was an artistic experiment for the purpose to inspire people on Earth.. Dataset provided by the ESDC. Please refer to the datasets landing page at http://esdcdoi.esac.esa.int/doi/html/data/hre/hreda/86a9cd477bfb4253b0d7f0dba5c1d951.html
https://en.wikipedia.org/wiki/Public_domainhttps://en.wikipedia.org/wiki/Public_domain
Countries distinguish between metropolitan (homeland) and independent and semi-independent portions of sovereign states. If you want to see the dependent overseas regions broken out (like in ISO codes, see France for example), use map units instead.Each country is coded with a world region that roughly follows the United Nations setup.Countries are coded with standard ISO and FIPS codes. French INSEE codes are also included.Includes some thematic data from the United Nations (1), U.S. Central Intelligence Agency, and elsewhere.Data source: Admin0-Countries
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Non-Hispanic population of White Earth by race. It includes the distribution of the Non-Hispanic population of White Earth across various race categories as identified by the Census Bureau. The dataset can be utilized to understand the Non-Hispanic population distribution of White Earth across relevant racial categories.
Key observations
With a zero Hispanic population, White Earth is 100% Non-Hispanic. Among the Non-Hispanic population, the largest racial group is White alone with a population of 76 (100% of the total Non-Hispanic population).
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates.
Racial categories include:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for White Earth Population by Race & Ethnicity. You can refer the same here
The world population surpassed eight billion people in 2022, having doubled from its figure less than 50 years previously. Looking forward, it is projected that the world population will reach nine billion in 2038, and 10 billion in 2060, but it will peak around 10.3 billion in the 2080s before it then goes into decline. Regional variations The global population has seen rapid growth since the early 1800s, due to advances in areas such as food production, healthcare, water safety, education, and infrastructure, however, these changes did not occur at a uniform time or pace across the world. Broadly speaking, the first regions to undergo their demographic transitions were Europe, North America, and Oceania, followed by Latin America and Asia (although Asia's development saw the greatest variation due to its size), while Africa was the last continent to undergo this transformation. Because of these differences, many so-called "advanced" countries are now experiencing population decline, particularly in Europe and East Asia, while the fastest population growth rates are found in Sub-Saharan Africa. In fact, the roughly two billion difference in population between now and the 2080s' peak will be found in Sub-Saharan Africa, which will rise from 1.2 billion to 3.2 billion in this time (although populations in other continents will also fluctuate). Changing projections The United Nations releases their World Population Prospects report every 1-2 years, and this is widely considered the foremost demographic dataset in the world. However, recent years have seen a notable decline in projections when the global population will peak, and at what number. Previous reports in the 2010s had suggested a peak of over 11 billion people, and that population growth would continue into the 2100s, however a sooner and shorter peak is now projected. Reasons for this include a more rapid population decline in East Asia and Europe, particularly China, as well as a prolongued development arc in Sub-Saharan Africa.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset and the validation are fully described in a Nature Scientific Data Descriptor https://www.nature.com/articles/s41597-019-0265-5
If you want to use this dataset in an interactive environment, then use this link https://mybinder.org/v2/gh/GeographerAtLarge/TravelTime/HEAD
The following text is a summary of the information in the above Data Descriptor.
The dataset is a suite of global travel-time accessibility indicators for the year 2015, at approximately one-kilometre spatial resolution for the entire globe. The indicators show an estimated (and validated), land-based travel time to the nearest city and nearest port for a range of city and port sizes.
The datasets are in GeoTIFF format and are suitable for use in Geographic Information Systems and statistical packages for mapping access to cities and ports and for spatial and statistical analysis of the inequalities in access by different segments of the population.
These maps represent a unique global representation of physical access to essential services offered by cities and ports.
The datasets travel_time_to_cities_x.tif (where x has values from 1 to 12) The value of each pixel is the estimated travel time in minutes to the nearest urban area in 2015. There are 12 data layers based on different sets of urban areas, defined by their population in year 2015 (see PDF report).
travel_time_to_ports_x (x ranges from 1 to 5)
The value of each pixel is the estimated travel time to the nearest port in 2015. There are 5 data layers based on different port sizes.
Format Raster Dataset, GeoTIFF, LZW compressed Unit Minutes
Data type Byte (16 bit Unsigned Integer)
No data value 65535
Flags None
Spatial resolution 30 arc seconds
Spatial extent
Upper left -180, 85
Lower left -180, -60 Upper right 180, 85 Lower right 180, -60 Spatial Reference System (SRS) EPSG:4326 - WGS84 - Geographic Coordinate System (lat/long)
Temporal resolution 2015
Temporal extent Updates may follow for future years, but these are dependent on the availability of updated inputs on travel times and city locations and populations.
Methodology Travel time to the nearest city or port was estimated using an accumulated cost function (accCost) in the gdistance R package (van Etten, 2018). This function requires two input datasets: (i) a set of locations to estimate travel time to and (ii) a transition matrix that represents the cost or time to travel across a surface.
The set of locations were based on populated urban areas in the 2016 version of the Joint Research Centre’s Global Human Settlement Layers (GHSL) datasets (Pesaresi and Freire, 2016) that represent low density (LDC) urban clusters and high density (HDC) urban areas (https://ghsl.jrc.ec.europa.eu/datasets.php). These urban areas were represented by points, spaced at 1km distance around the perimeter of each urban area.
Marine ports were extracted from the 26th edition of the World Port Index (NGA, 2017) which contains the location and physical characteristics of approximately 3,700 major ports and terminals. Ports are represented as single points
The transition matrix was based on the friction surface (https://map.ox.ac.uk/research-project/accessibility_to_cities) from the 2015 global accessibility map (Weiss et al, 2018).
Code The R code used to generate the 12 travel time maps is included in the zip file that can be downloaded with these data layers. The processing zones are also available.
Validation The underlying friction surface was validated by comparing travel times between 47,893 pairs of locations against journey times from a Google API. Our estimated journey times were generally shorter than those from the Google API. Across the tiles, the median journey time from our estimates was 88 minutes within an interquartile range of 48 to 143 minutes while the median journey time estimated by the Google API was 106 minutes within an interquartile range of 61 to 167 minutes. Across all tiles, the differences were skewed to the left and our travel time estimates were shorter than those reported by the Google API in 72% of the tiles. The median difference was −13.7 minutes within an interquartile range of −35.5 to 2.0 minutes while the absolute difference was 30 minutes or less for 60% of the tiles and 60 minutes or less for 80% of the tiles. The median percentage difference was −16.9% within an interquartile range of −30.6% to 2.7% while the absolute percentage difference was 20% or less in 43% of the tiles and 40% or less in 80% of the tiles.
This process and results are included in the validation zip file.
Usage Notes The accessibility layers can be visualised and analysed in many Geographic Information Systems or remote sensing software such as QGIS, GRASS, ENVI, ERDAS or ArcMap, and also by statistical and modelling packages such as R or MATLAB. They can also be used in cloud-based tools for geospatial analysis such as Google Earth Engine.
The nine layers represent travel times to human settlements of different population ranges. Two or more layers can be combined into one layer by recording the minimum pixel value across the layers. For example, a map of travel time to the nearest settlement of 5,000 to 50,000 people could be generated by taking the minimum of the three layers that represent the travel time to settlements with populations between 5,000 and 10,000, 10,000 and 20,000 and, 20,000 and 50,000 people.
The accessibility layers also permit user-defined hierarchies that go beyond computing the minimum pixel value across layers. A user-defined complete hierarchy can be generated when the union of all categories adds up to the global population, and the intersection of any two categories is empty. Everything else is up to the user in terms of logical consistency with the problem at hand.
The accessibility layers are relative measures of the ease of access from a given location to the nearest target. While the validation demonstrates that they do correspond to typical journey times, they cannot be taken to represent actual travel times. Errors in the friction surface will be accumulated as part of the accumulative cost function and it is likely that locations that are further away from targets will have greater a divergence from a plausible travel time than those that are closer to the targets. Care should be taken when referring to travel time to the larger cities when the locations of interest are extremely remote, although they will still be plausible representations of relative accessibility. Furthermore, a key assumption of the model is that all journeys will use the fastest mode of transport and take the shortest path.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Context Social Media has been taking up everything on the Internet. People getting the latest news, useful resources, life partner and what not. In a world where Social media plays a big role in giving news, we must also know that news which affects our sentiments are going to get spread like a wildfire. Based on the Headline and the title, and according to the date given and the Social media platforms, you have to predict how it has affected the human sentiment scores. You have to predict the column “SentimentTitle” and “SentimentHeadline”.
Content This is a subset of the dataset of the same name available in the UCI Machine Learning Repository The collected data relates to a period of 8 months, between November 2015 and July 2016, accounting for about 100,000 news items on four different topics: economy, microsoft, obama and palestine.
Dataset Information The attributes for each of the dataset are :
IDLink (numeric): Unique identifier of news items Title (string): Title of the news item according to the official media sources Headline (string): Headline of the news item according to the official media sources Source (string): Original news outlet that published the news item Topic (string): Query topic used to obtain the items in the official media sources Publish-Date (timestamp): Date and time of the news items' publication Facebook (numeric): Final value of the news items' popularity according to the social media source Facebook Google-Plus (numeric): Final value of the news items' popularity according to the social media source Google+ LinkedIn (numeric): Final value of the news items' popularity according to the social media source LinkedIn SentimentTitle: Sentiment score of the title, Higher the score, better is the impact or +ve sentiment and vice-versa. (Target Variable 1) SentimentHeadline: Sentiment score of the text in the news items' headline. Higher the score, better is the impact or +ve sentiment. (Target Variable 2)
Original Data Source: News Popularity in Multiple Social Media Platforms
This dataset include news media reports on the evolution of societal value and governance network on water resources in China over 1946-2017. The newspaper adopted in this dataset is People’s Daily. The database include the tracking of water issues published in newspapers reflecting societal values on water resources as either economic development oriented or environmental sustainability oriented, as well as governance actors that mentioned along with the events.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Some say climate change is the biggest threat of our age while others say it’s a myth based on dodgy science. We are turning some of the data over to you so you can form your own view.
Even more than with other data sets that Kaggle has featured, there’s a huge amount of data cleaning and preparation that goes into putting together a long-time study of climate trends. Early data was collected by technicians using mercury thermometers, where any variation in the visit time impacted measurements. In the 1940s, the construction of airports caused many weather stations to be moved. In the 1980s, there was a move to electronic thermometers that are said to have a cooling bias.
Given this complexity, there are a range of organizations that collate climate trends data. The three most cited land and ocean temperature data sets are NOAA’s MLOST, NASA’s GISTEMP and the UK’s HadCrut.
We have repackaged the data from a newer compilation put together by the Berkeley Earth, which is affiliated with Lawrence Berkeley National Laboratory. The Berkeley Earth Surface Temperature Study combines 1.6 billion temperature reports from 16 pre-existing archives. It is nicely packaged and allows for slicing into interesting subsets (for example by country). They publish the source data and the code for the transformations they applied. They also use methods that allow weather observations from shorter time series to be included, meaning fewer observations need to be thrown away.
In this dataset, we have include several files:
Global Land and Ocean-and-Land Temperatures (GlobalTemperatures.csv):
Other files include:
The raw data comes from the Berkeley Earth data page.