The world population surpassed eight billion people in 2022, having doubled from its figure less than 50 years previously. Looking forward, it is projected that the world population will reach nine billion in 2038, and 10 billion in 2060, but it will peak around 10.3 billion in the 2080s before it then goes into decline. Regional variations The global population has seen rapid growth since the early 1800s, due to advances in areas such as food production, healthcare, water safety, education, and infrastructure, however, these changes did not occur at a uniform time or pace across the world. Broadly speaking, the first regions to undergo their demographic transitions were Europe, North America, and Oceania, followed by Latin America and Asia (although Asia's development saw the greatest variation due to its size), while Africa was the last continent to undergo this transformation. Because of these differences, many so-called "advanced" countries are now experiencing population decline, particularly in Europe and East Asia, while the fastest population growth rates are found in Sub-Saharan Africa. In fact, the roughly two billion difference in population between now and the 2080s' peak will be found in Sub-Saharan Africa, which will rise from 1.2 billion to 3.2 billion in this time (although populations in other continents will also fluctuate). Changing projections The United Nations releases their World Population Prospects report every 1-2 years, and this is widely considered the foremost demographic dataset in the world. However, recent years have seen a notable decline in projections when the global population will peak, and at what number. Previous reports in the 2010s had suggested a peak of over 11 billion people, and that population growth would continue into the 2100s, however a sooner and shorter peak is now projected. Reasons for this include a more rapid population decline in East Asia and Europe, particularly China, as well as a prolongued development arc in Sub-Saharan Africa.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Some say climate change is the biggest threat of our age while others say it’s a myth based on dodgy science. We are turning some of the data over to you so you can form your own view.
Even more than with other data sets that Kaggle has featured, there’s a huge amount of data cleaning and preparation that goes into putting together a long-time study of climate trends. Early data was collected by technicians using mercury thermometers, where any variation in the visit time impacted measurements. In the 1940s, the construction of airports caused many weather stations to be moved. In the 1980s, there was a move to electronic thermometers that are said to have a cooling bias.
Given this complexity, there are a range of organizations that collate climate trends data. The three most cited land and ocean temperature data sets are NOAA’s MLOST, NASA’s GISTEMP and the UK’s HadCrut.
We have repackaged the data from a newer compilation put together by the Berkeley Earth, which is affiliated with Lawrence Berkeley National Laboratory. The Berkeley Earth Surface Temperature Study combines 1.6 billion temperature reports from 16 pre-existing archives. It is nicely packaged and allows for slicing into interesting subsets (for example by country). They publish the source data and the code for the transformations they applied. They also use methods that allow weather observations from shorter time series to be included, meaning fewer observations need to be thrown away.
In this dataset, we have include several files:
Global Land and Ocean-and-Land Temperatures (GlobalTemperatures.csv):
Other files include:
The raw data comes from the Berkeley Earth data page.
The increased world population is among the fierce problems the world is facing right now and it will get uncontrolled in the coming future if proper steps for its betterment were not taken immediately. This world has observed the fastest growth during the 20th century. In the 1950s world population was 2.7 billion, By the end of this year it will cross 8 billion. This dataset is uploaded with the assumption to use your Data Science, Machine learning, and Predictive analytics skills and answer the following questions. 1. Which countries have the highest growth rate. 2. What are the densely populated countries in the world. 3. Keeping in view all the variables in mind which countries should take serious steps to control their population.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
All cities with a population > 1000 or seats of adm div (ca 80.000)Sources and ContributionsSources : GeoNames is aggregating over hundred different data sources. Ambassadors : GeoNames Ambassadors help in many countries. Wiki : A wiki allows to view the data and quickly fix error and add missing places. Donations and Sponsoring : Costs for running GeoNames are covered by donations and sponsoring.Enrichment:add country name
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This Hotel Dataset: Rates, Reviews & Amenities(6k+) dataset includes hotel rates, guest reviews, and available amenities from two popular travel websites, TripAdvisor and Booking.com. The dataset can be used to analyze trends and insights in the hospitality industry, and inform decisions related to pricing, marketing, and customer service. Booking.com: Founded in 1996 in Amsterdam, Booking.com has grown from a small Dutch start-up to one of the world’s leading digital travel companies. Part of Booking Holdings Inc. (NASDAQ: BKNG), Booking.com’s mission is to make it easier for everyone to experience the world.
By investing in technology that takes the friction out of travel, Booking.com seamlessly connects millions of travelers to memorable experiences, a variety of transportation options, and incredible places to stay – from homes to hotels, and much more. As one of the world’s largest travel marketplaces for both established brands and entrepreneurs of all sizes, Booking.com enables properties around the world to reach a global audience and grow their businesses.
Booking.com is available in 43 languages and offers more than 28 million reported accommodation listings, including over 6.6 million homes, apartments, and other unique places to stay. Wherever you want to go and whatever you want to do, Booking.com makes it easy and supports you with 24/7 customer support. Tripadvisor, the world's largest travel guidance platform*, helps hundreds of millions of people each month** become better travelers, from planning to booking to taking a trip. Travelers across the globe use the Tripadvisor site and app to discover where to stay, what to do and where to eat based on guidance from those who have been there before. With more than 1 billion reviews and opinions of nearly 8 million businesses, travelers turn to Tripadvisor to find deals on accommodations, book experiences, reserve tables at delicious restaurants and discover great places nearby. As a travel guidance company available in 43 markets and 22 languages, Tripadvisor makes planning easy no matter the trip type. The subsidiaries of Tripadvisor, Inc. (Nasdaq: TRIP), own and operate a portfolio of travel media brands and businesses, operating under various websites and apps.
Soil is a key natural resource that provides the foundation of basic ecosystem services. Soil determines the types of farms and forests that can grow on a landscape. Soil filters water. Soil helps regulate the Earth's climate by storing large amounts of carbon. Activities that degrade soils reduce the value of the ecosystem services that soil provides. For example, since 1850 35% of human caused green house gas emissions are linked to land use change. The Soil Science Society of America is a good source of of additional information.Dataset SummaryThis layer provides access to a 30 arc-second (roughly 1 km) cell-sized raster with attributes describing the basic properties of soil derived from the Harmonized World Soil Database v 1.2. The values in this layer are for the dominant soil in each mapping unit (sequence field = 1).Attributes in this layer include:Soil Phase 1 and Soil Phase 2 - Phases identify characteristics of soils important for land use or management. Soils may have up to 2 phases with phase 1 being more important than phase 2.Other Properties - provides additional information important for agriculture.Additionally, 3 class description fields were added by Esri based on the document Harmonized World Soil Database Version 1.2 for use in web map pop-ups:Soil Phase 1 DescriptionSoil Phase 2 DescriptionOther Properties DescriptionThe layer is symbolized with the Soil Unit Name field.The document Harmonized World Soil Database Version 1.2 provides more detail on the soil properties attributes contained in this layer.Other attributes contained in this layer include:Soil Mapping Unit Name - the name of the spatially dominant major soil groupSoil Mapping Unit Symbol - a two letter code for labeling the spatially dominant major soil group in thematic mapsData Source - the HWSD is an aggregation of datasets. The data sources are the European Soil Database (ESDB), the 1:1 million soil map of China (CHINA), the Soil and Terrain Database Program (SOTWIS), and the Digital Soil Map of the World (DSMW).Percentage of Mapping Unit covered by dominant componentMore information on the Harmonized World Soil Database is available here.Other layers created from the Harmonized World Soil Database are available on ArcGIS Online:World Soils Harmonized World Soil Database - Bulk DensityWorld Soils Harmonized World Soil Database – ChemistryWorld Soils Harmonized World Soil Database - Exchange CapacityWorld Soils Harmonized World Soil Database – HydricWorld Soils Harmonized World Soil Database – TextureThe authors of this data set request that projects using these data include the following citation:FAO/IIASA/ISRIC/ISSCAS/JRC, 2012. Harmonized World Soil Database (version 1.2). FAO, Rome, Italy and IIASA, Laxenburg, Austria.What can you do with this layer?This layer is suitable for both visualization and analysis. It can be used in ArcGIS Online in web maps and applications and can be used in ArcGIS Desktop.This layer has query, identify, and export image services available. This layer is restricted to a maximum area of 16,000 x 16,000 pixels - an area 4,000 kilometers on a side or an area approximately the size of Europe. The source data for this layer are available here.This layer is part of a larger collection of landscape layers that you can use to perform a wide variety of mapping and analysis tasks.The Living Atlas of the World provides an easy way to explore the landscape layers and many other beautiful and authoritative maps on hundreds of topics.Geonet is a good resource for learning more about landscape layers and the Living Atlas of the World. To get started follow these links:Living Atlas Discussion GroupSoil Data Discussion GroupThe Esri Insider Blog provides an introduction to the Ecophysiographic Mapping project.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Gross Domestic Product (GDP) in the United States was worth 29184.89 billion US dollars in 2024, according to official data from the World Bank. The GDP value of the United States represents 27.49 percent of the world economy. This dataset provides - United States GDP - actual values, historical data, forecast, chart, statistics, economic calendar and news.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Labeled datasets are useful in machine learning research.
This public dataset contains approximately 9 million URLs and metadata for images that have been annotated with labels spanning more than 6,000 categories.
Tables: 1) annotations_bbox 2) dict 3) images 4) labels
Update Frequency: Quarterly
Fork this kernel to get started.
https://bigquery.cloud.google.com/dataset/bigquery-public-data:open_images
https://cloud.google.com/bigquery/public-data/openimages
APA-style citation: Google Research (2016). The Open Images dataset [Image urls and labels]. Available from github: https://github.com/openimages/dataset.
Use: The annotations are licensed by Google Inc. under CC BY 4.0 license.
The images referenced in the dataset are listed as having a CC BY 2.0 license. Note: while we tried to identify images that are licensed under a Creative Commons Attribution license, we make no representations or warranties regarding the license status of each image and you should verify the license for each image yourself.
Banner Photo by Mattias Diesel from Unsplash.
Which labels are in the dataset? Which labels have "bus" in their display names? How many images of a trolleybus are in the dataset? What are some landing pages of images with a trolleybus? Which images with cherries are in the training set?
Success.ai’s LinkedIn Data Solutions offer unparalleled access to a vast dataset of 700 million public LinkedIn profiles and 70 million LinkedIn company records, making it one of the most comprehensive and reliable LinkedIn datasets available on the market today. Our employee data and LinkedIn data are ideal for businesses looking to streamline recruitment efforts, build highly targeted lead lists, or develop personalized B2B marketing campaigns.
Whether you’re looking for recruiting data, conducting investment research, or seeking to enrich your CRM systems with accurate and up-to-date LinkedIn profile data, Success.ai provides everything you need with pinpoint precision. By tapping into LinkedIn company data, you’ll have access to over 40 critical data points per profile, including education, professional history, and skills.
Key Benefits of Success.ai’s LinkedIn Data: Our LinkedIn data solution offers more than just a dataset. With GDPR-compliant data, AI-enhanced accuracy, and a price match guarantee, Success.ai ensures you receive the highest-quality data at the best price in the market. Our datasets are delivered in Parquet format for easy integration into your systems, and with millions of profiles updated daily, you can trust that you’re always working with fresh, relevant data.
API Integration: Our datasets are easily accessible via API, allowing for seamless integration into your existing systems. This ensures that you can automate data retrieval and update processes, maintaining the flow of fresh, accurate information directly into your applications.
Global Reach and Industry Coverage: Our LinkedIn data covers professionals across all industries and sectors, providing you with detailed insights into businesses around the world. Our geographic coverage spans 259M profiles in the United States, 22M in the United Kingdom, 27M in India, and thousands of profiles in regions such as Europe, Latin America, and Asia Pacific. With LinkedIn company data, you can access profiles of top companies from the United States (6M+), United Kingdom (2M+), and beyond, helping you scale your outreach globally.
Why Choose Success.ai’s LinkedIn Data: Success.ai stands out for its tailored approach and white-glove service, making it easy for businesses to receive exactly the data they need without managing complex data platforms. Our dedicated Success Managers will curate and deliver your dataset based on your specific requirements, so you can focus on what matters most—reaching the right audience. Whether you’re sourcing employee data, LinkedIn profile data, or recruiting data, our service ensures a seamless experience with 99% data accuracy.
Key Use Cases:
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
This data for global, regional (EU-27), and country-specific (G20 member countries) energy and emission pathways required to achieve a defined carbon budget of under 450 Gt/CO2, developed to limit the mean global temperature rise to 1.5°C, over 50% likelihood. The data were calculated with the 1.5°C sectorial pathways of the One Earth Climate Model—an integrated energy assessment model devised at the University of Technology Sydney (UTS). The data consist of the following six zip-folder datasets (refer to Section 2 for an explanation of the data): 1. Appendix folder: Each file contains one worksheet, which summarizes the overall 1.5°C scenario. 2. Sector folder (XLSX): Each file contains one worksheet, which summarizes the industry sectors analysed. 3. Sector folder (CSV): The data contained are the same as those described in point 2. 4. Sector emissions folder: Each file contains one worksheet, which summarizes the total annual emissions for each industry sector. 5. Scope emissions folder (XLSX): Each file contains one worksheet, which summarizes the total annual emissions for each industry sector—with the additional specificity of emission scope. 6. Scope emissions folder (CSV): The data contained are the same as those described in point 5. Methods The data consist of the following six zipped dataset folders, each containing 21 separate files for each of the areas assessed. 1. Appendix zip folder: contains 21 XLSX files. Each file contains one worksheet, which summarizes the overall 1.5 °C scenario. This tab is called the ‘Appendix’ and contains: electricity generation (TWh/a), transport—final energy (PJ/a), heat supply and air conditioning (PJ/a), installed capacity (GW), final energy demand (PJ/a), energy-related CO2 emissions (million tons/a), and primary energy demand (PJ/a). 2. Sector zip folder (XLSX): contains 21 XLSX files. Each file contains one worksheet, which summarizes the industry sectors analysed. Key industry metrics are provided, such as the energy and carbon intensities of the GICS sectors analysed. Due to industry specificity—and the choice of methodology—the units of data vary between the different sectors. 3. Sector zip folder (CSV): contains 21 CSV files. The data contained are the same as those described in point 2. However, the data have been organized in a database layout and saved in the CSV file format, significantly improving data parsing. 4. Sector emission zip folder: contains 21 XLSX files. Each file contains one worksheet, which summarizes the total annual emissions (MtCO2/a) for each industry sector. 5. Scope emissions zip folder (XLSX): contains 21 XLSX files. Each file contains one worksheet, which summarizes the total annual emissions (MtCO2/a) for each industry sector—and specifies the emission scopes. This tab also provides an additional breakdown of emissions into the categories of CO2 and total GHG emissions. Two accounting methodologies are presented: (i) the OECM approach, which defines Scope 1 emissions as those related to heat and energy use; and (ii) the production-centric approach, which places the emission burden of other non-energy and Scope 3 emissions on the producer, because they are categorized as Scope 1 emissions. 6. Scope emissions zip folder (CSV): contains 21 CSV files. The data contained are the same as those described in point 5. However, the data have been organized in a database layout and saved in the CSV file format to improve data parsing. The six datasets are summarized in Table 1, with further information on the data presented in the following sub-sections. Table 1: Overview of the data files/datasets
Label
Name of data file/dataset
File types
Data repository and identifier (DOI or accession number)
Dataset 1
Appendix
XLSX
https://doi.org/10.5061/dryad.cz8w9gj82
Dataset 2
Sector_XLSX
XLSX
https://doi.org/10.5061/dryad.cz8w9gj82
Dataset 3
Sector_CSV
CSV
https://doi.org/10.5061/dryad.cz8w9gj82
Dataset 4
Sector_Emission
XLSX
https://doi.org/10.5061/dryad.cz8w9gj82
Dataset 5
Scope_Emission_XLSX
XLSX
https://doi.org/10.5061/dryad.cz8w9gj82
Dataset 6
Scope_Emission_CSV
CSV
https://doi.org/10.5061/dryad.cz8w9gj82
1.1. Description of data parameters The datasets contain the following scenario input parameters: 1. Market development: current and assumed development of the demand by sector, such as cement produced, passenger kilometers travelled, or assumed market volume in US$2015 gross domestic product (GDP). 2. Energy intensity—activity based: energy use per unit of service and/or product; for example, in megajoules (MJ) per passenger kilometer travelled (MJ/pkm), MJ per ton of steel (MJ/ton steel), aluminum, or cement. 3. Energy intensity—finance based: energy use per unit of investment in MJ per US$ GDP (MJ/$GDP) contributed by, for example, the forestry or agricultural sector. The dataset contains the following scenario output parameters: 4. Carbon intensity: current and future carbon intensities per unit of product or service; for example, in tons of CO2 per ton of steel produced (tCO2/ton steel) or grams of carbon dioxide per passenger kilometer (gCO2/pkm). 5. Scope 1, 2, and 3 emissions: datasets for each of the industry sectors and countries analysed. In addition to the emissions data, the deviations of the emissions from those of the year 2019 are provided. 6. Country scenarios: complete country scenario datasets of historical data (2012, 2015–2020) and future projections (2025–2050 in 5-year increments). Energy demand and supply data by technology, fuel, and sector are provided, including the overall energy and carbon emissions balance of the country analysed. 1.2. Geographic resolution: country data provided The dataset contains data for the following 21 countries and regions: · Regions: global, EU-27 · Countries: G20 member countries—Canada, USA, Mexico, Brazil, Argentina, Germany, France, Italy, United Kingdom, Türkiye, Russian Federation, Saudi Arabia, South Africa, Indonesia, India, China, Japan, South Korea, and Australia 1.3. Sectorial resolution: industry sector data provided The dataset contains data for the following industry sectors: Agriculture & food processing, forestry & wood products, chemical industry, aluminum industry, construction and buildings, water utilities, textile & leather industry, steel industry, cement industry, transport sector (aviation: freight & passenger transport; shipping: freight & passenger transport; and road transport: freight & passenger transport). 1.4. Time resolution The scenario data are provided for the years 2017, 2018, 2019, 2020, 2025, 2030, 2035, 2040, 2045, and 2050.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Gross Domestic Product (GDP) in World was worth 111326.37 billion US dollars in 2024, according to official data from the World Bank. This dataset includes a chart with historical data for World GDP.
LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset
This dataset contains one million real-world conversations with 25 state-of-the-art LLMs. It is collected from 210K unique IP addresses in the wild on the Vicuna demo and Chatbot Arena website from April to August 2023. Each sample includes a conversation ID, model name, conversation text in OpenAI API JSON format, detected language tag, and OpenAI moderation API tag. User consent is obtained through the "Terms of use"… See the full description on the dataset page: https://huggingface.co/datasets/lmsys/lmsys-chat-1m.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Leaf Area Index (LAI) is a fundamental vegetation structural variable that drives energy and mass exchanges between the plant and the atmosphere. Moderate-resolution (300m – 7km) global LAI data products have been widely applied to track global vegetation changes, drive Earth system models, monitor crop growth and productivity, etc. Yet, cutting-edge applications in climate adaptation, hydrology, and sustainable agriculture require LAI information at higher spatial resolution (< 100m) to model and understand heterogeneous landscapes.
This dataset was built to assist a machine-learning-based approach for mapping LAI from 30m-resolution Landsat images across the contiguous US (CONUS). The data was derived from the Moderate Resolution Imaging Spectroradiometer (MODIS) Version 6 LAI/FPAR, Landsat Collection 1 surface reflectance, and NLCD Land Cover datasets over 2006 – 2018 using Google Earth Engine. Each record/sample/row includes a MODIS LAI value, corresponding Landsat surface reflectance in green, red, NIR, SWIR1 bands, a land cover (biome) type, geographic location, and other auxiliary information. Each sample represents a MODIS LAI pixel (500m) within which a single biome type dominates 90% of the area. The spatial homogeneity of the samples was further controlled by a screening process based on the coefficient of variation of the Landsat surface reflectance. In total, there are approximately 1.6 million samples, stratified by biome, Landsat sensor, and saturation status from the MODIS LAI algorithm. This dataset can be used to train machine learning models and generate LAI maps for Landsat 5, 7, 8 surface reflectance images within CONUS. Detailed information on the sample generation and quality control can be found in the related journal article. Resources in this dataset:Resource Title: README. File Name: LAI_train_samples_CONUS_README.txtResource Description: Description and metadata of the main datasetResource Software Recommended: Notepad,url: https://www.microsoft.com/en-us/p/windows-notepad/9msmlrh6lzf3?activetab=pivot:overviewtab Resource Title: LAI_training_samples_CONUS. File Name: LAI_train_samples_CONUS_v0.1.1.csvResource Description: This CSV file consists of the training samples for estimating Leaf Area Index based on Landsat surface reflectance images (Collection 1 Tire 1). Each sample has a MODIS LAI value and corresponding surface reflectance derived from Landsat pixels within the MODIS pixel.
Contact: Yanghui Kang (kangyanghui@gmail.com)
Column description
UID: Unique identifier. Format: LATITUDE_LONGITUDE_SENSOR_PATHROW_DATE
Landsat_ID: Landsat image ID
Date: Landsat image date in "YYYYMMDD"
Latitude: Latitude (WGS84) of the MODIS LAI pixel center
Longitude: Longitude (WGS84) of the MODIS LAI pixel center
MODIS_LAI: MODIS LAI value in "m2/m2"
MODIS_LAI_std: MODIS LAI standard deviation in "m2/m2"
MODIS_LAI_sat: 0 - MODIS Main (RT) method used no saturation; 1 - MODIS Main (RT) method with saturation
NLCD_class: Majority class code from the National Land Cover Dataset (NLCD)
NLCD_frequency: Percentage of the area cover by the majority class from NLCD
Biome: Biome type code mapped from NLCD (see below for more information)
Blue: Landsat surface reflectance in the blue band
Green: Landsat surface reflectance in the green band
Red: Landsat surface reflectance in the red band
Nir: Landsat surface reflectance in the near infrared band
Swir1: Landsat surface reflectance in the shortwave infrared 1 band
Swir2: Landsat surface reflectance in the shortwave infrared 2 band
Sun_zenith: Solar zenith angle from the Landsat image metadata. This is a scene-level value.
Sun_azimuth: Solar azimuth angle from the Landsat image metadata. This is a scene-level value.
NDVI: Normalized Difference Vegetation Index computed from Landsat surface reflectance
EVI: Enhanced Vegetation Index computed from Landsat surface reflectance
NDWI: Normalized Difference Water Index computed from Landsat surface reflectance
GCI: Green Chlorophyll Index = Nir/Green - 1
Biome code
1 - Deciduous Forest
2 - Evergreen Forest
3 - Mixed Forest
4 - Shrubland
5 - Grassland/Pasture
6 - Cropland
7 - Woody Wetland
8 - Herbaceous Wetland
Reference Dataset: All data was accessed through Google Earth Engine Gorelick, N., Hancher, M., Dixon, M., Ilyushchenko, S., Thau, D., & Moore, R. (2017). Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sensing of Environment. MODIS Version 6 Leaf Area Index/FPAR 4-day L5 Global 500m Myneni, R., Y. Knyazikhin, T. Park. MOD15A2H MODIS/Terra Leaf Area Index/FPAR 8-Day L4 Global 500m SIN Grid V006. 2015, distributed by NASA EOSDIS Land Processes DAAC, https://doi.org/10.5067/MODIS/MOD15A2H.006 Landsat 5/7/8 Collection 1 Surface Reflectance Landsat Level-2 Surface Reflectance Science Product courtesy of the U.S. Geological Survey. Masek, J.G., Vermote, E.F., Saleous N.E., Wolfe, R., Hall, F.G., Huemmrich, K.F., Gao, F., Kutler, J., and Lim, T-K. (2006). A Landsat surface reflectance dataset for North America, 1990–2000. IEEE Geoscience and Remote Sensing Letters 3(1):68-72. http://dx.doi.org/10.1109/LGRS.2005.857030. Vermote, E., Justice, C., Claverie, M., & Franch, B. (2016). Preliminary analysis of the performance of the Landsat 8/OLI land surface reflectance product. Remote Sensing of Environment. http://dx.doi.org/10.1016/j.rse.2016.04.008. National Land Cover Dataset (NLCD) Yang, Limin, Jin, Suming, Danielson, Patrick, Homer, Collin G., Gass, L., Bender, S.M., Case, Adam, Costello, C., Dewitz, Jon A., Fry, Joyce A., Funk, M., Granneman, Brian J., Liknes, G.C., Rigge, Matthew B., Xian, George, A new generation of the United States National Land Cover Database—Requirements, research priorities, design, and implementation strategies: ISPRS Journal of Photogrammetry and Remote Sensing, v. 146, p. 108–123, at https://doi.org/10.1016/j.isprsjprs.2018.09.006 Resource Software Recommended: Microsoft Excel,url: https://www.microsoft.com/en-us/microsoft-365/excel
The Earth Surface Mineral Dust Source Investigation (EMIT) instrument measures surface mineralogy, targeting the Earth’s arid dust source regions. EMIT is installed on the International Space Station. EMIT uses imaging spectroscopy to take measurements of sunlit regions of interest between 52° N latitude and 52° S latitude. An interactive map showing the regions being investigated, current and forecasted data coverage, and additional data resources can be found on the VSWIR Imaging Spectroscopy Interface for Open Science (VISIONS) EMIT Open Data Portal.In addition to its primary objective described above, EMIT has demonstrated the capacity to characterize methane (CH4) and carbon dioxide (CO2) point-source emissions by measuring gas absorption features in the shortwave infrared bands. The EMIT Level 2B Carbon Dioxide Enhancement Data (EMITL2BCO2ENH) Version 2 data product is a total vertical column enhancement estimate of carbon dioxide in parts per million meter (ppm m) based on an adaptive matched filter approach. EMITL2BCO2ENH provides per-pixel carbon dioxide enhancement data used to identify carbon dioxide plume complexes, per-pixel carbon dioxide uncertainty due to sensor noise, and per-pixel carbon dioxide sensitivity that can be used to remove bias from the enhancement data. The EMITL2BCO2ENH Version 2 data product includes methane enhancement granules for all captured scenes, regardless of carbon dioxide plume complex identification. Each granule contains three Cloud Optimized GeoTIFF (COG) files at a spatial resolution of 60 meters (m): Carbon Dioxide Enhancement (EMIT_L2B_CO2ENH), Carbon Dioxide Uncertainty (EMIT_L2B_CO2UNCERT), and Carbon Dioxide Sensitivity (EMIT_L2B_CO2SENS). The EMITL2BCO2ENH COG files contain carbon dioxide enhancement data based primarily on EMITL1BRAD radiance values.Each granule is approximately 75 kilometers (km) by 75 km, nominal at the equator, with some granules near the end of an orbit segment reaching 150 km in length.Known Issues Data acquisition gap: From September 13, 2022, through January 6, 2023, a power issue outside of EMIT caused a pause in operations. Due to this shutdown, no data were acquired during that timeframe.Improvements/Changes from Previous Versions Carbon dioxide uncertainty and sensitivity variables have been added. For more details on the uncertainty variable, see Section 6 of the Algorithm Theoretical Basis Document (ATBD) and Section 4.2.2 for details on the sensitivity variable. Enhancement, uncertainty, and sensitivity data are now included for all granules, including those without plume complexes. Version 1 of this product only included enhancement data for granules where plumes were present. The matched filter used to produce carbon dioxide enhancement data has been improved by adjusting the channels used to those that fall within 500-1340 nanometer (nm), 1500-1790 nm, or 1950-2450 nm. More details can be found in Section 4.2.3 of the ATBD.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These datasets were used to evaluate the main controls on last ~6 million years erosion rate variability of the northwestern Himalaya. The Earth’s climate has been cooling during the last ~15 million years and started fluctuating between cold and warm periods since ~2-3 million years ago. Many researchers think that these long-term climatic changes were accompanied by changes in continental erosion. However, quantifying erosion rates in the geological past is challenging, and previous studies reached contrasting conclusions. In this study, we quantified erosion rates in the north-western Indian Himalaya over the past 6 million years by measuring in situ-produced cosmogenic 10Be in exhumed older foreland basin sediments. The 10Be is produced by cosmic rays in minerals at the Earth's surface, and its abundance indicates erosion rates. Our reconstructed erosion rates show a quasi-cyclic pattern with a periodicity of ~1 million year and a gradual increase towards the present. We suggest that both patterns—cyclicity and gradual increase—are unrelated to climatic changes. Instead, we propose that the growth of the Himalaya by repeatedly scraping off rocks from the Indian plate (basal accretion), resulted in changes of its topography that were accompanied by changes in erosion rates. In this scenario, basal accretion episodically changes rock-uplift patterns, which brings landscapes out of equilibrium and results in quasi-cyclic variations in erosion rates. We used numerical landscape evolution simulations to demonstrate that this hypothesis is physically plausible. Datasets provided here includes summary of the location, depositional age, and stratigraphic position of 41 Siwalik sandstone samples collected from the Haripur section in Himachal Pradesh, India (Dataset S1); 10Be analysis results of Siwalik samples (2021-006_Mandal-et-al_Dataset-S1); sample location and 10Be analysis results of modern river sands from the Yamuna River and its tributaries near the Dehradun Basin (2021-006_Mandal-et-al_Dataset-S2); input parameters for the calculation of paleoerosion rates (2021-006_Mandal-et-al_Dataset-S3); and reconstructed 10Be paleoconcentrations and paleoerosion rates (Dataset S4). Moreover, the data include a compilation of published magnetostratigraphy-derived sediment accumulation rates in the late Cenozoic Himalayan foreland basin (2021-006_Mandal-et-al_Dataset-S5). We also include a movie (2021-006_Mandal-et-al_Movie-S1) that is a complete numerical landscape evolution model run with four consecutive accretion cycles of equal magnitude. For more information (for e.g., sampling method, analytical procedure, and data processing) please refer to the associated data description file and the main article (Mandal et al., 2021).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
As one of the research directions at OLIVES Lab @ Georgia Tech, we focus on the robustness of data-driven algorithms under diverse challenging conditions where trained models can possibly be depolyed. To achieve this goal, we introduced a large-sacle (>2M images) traffic sign recognition dataset (CURE-TSR) which is among the most comprehensive datasets with controlled synthetic challenging conditions. Traffic sign images in the CURE-TSR dataset were cropped from the CURE-TSD dataset, which includes around 1.7 million real-world and simulator images with more than 2 million traffic sign instances. Real-world images were obtained from the BelgiumTS video sequences and simulated images were generated with the Unreal Engine 4 game development tool. Sign types include speed limit, goods vehicles, no overtaking, no stopping, no parking, stop, bicycle, hump, no left, no right, priority to, no entry, yield, and parking. Unreal and real sequences were processed with state-of-the-art visual effect software Adobe(c) After Effects to simulate challenging conditions, which include rain, snow, haze, shadow, darkness, brightness, blurriness, dirtiness, colorlessness, sensor and codec errors. Please refer to our GitHub page for code, papers, and more information.
Instructions:
The name format of the provided images are as follows: "sequenceType_signType_challengeType_challengeLevel_Index.bmp"
sequenceType: 01 - Real data 02 - Unreal data
signType: 01 - speed_limit 02 - goods_vehicles 03 - no_overtaking 04 - no_stopping 05 - no_parking 06 - stop 07 - bicycle 08 - hump 09 - no_left 10 - no_right 11 - priority_to 12 - no_entry 13 - yield 14 - parking
challengeType: 00 - No challenge 01 - Decolorization 02 - Lens blur 03 - Codec error 04 - Darkening 05 - Dirty lens 06 - Exposure 07 - Gaussian blur 08 - Noise 09 - Rain 10 - Shadow 11 - Snow 12 - Haze
challengeLevel: A number in between [01-05] where 01 is the least severe and 05 is the most severe challenge.
Index: A number shows different instances of traffic signs in the same conditions.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
If you use the dataset, cite the paper: https://doi.org/10.1016/j.eswa.2022.117541
The most comprehensive dataset to date regarding climate change and human opinions via Twitter. It has the heftiest temporal coverage, spanning over 13 years, includes over 15 million tweets spatially distributed across the world, and provides the geolocation of most tweets. Seven dimensions of information are tied to each tweet, namely geolocation, user gender, climate change stance and sentiment, aggressiveness, deviations from historic temperature, and topic modeling, while accompanied by environmental disaster events information. These dimensions were produced by testing and evaluating a plethora of state-of-the-art machine learning algorithms and methods, both supervised and unsupervised, including BERT, RNN, LSTM, CNN, SVM, Naive Bayes, VADER, Textblob, Flair, and LDA.
The following columns are in the dataset:
➡ created_at: The timestamp of the tweet. ➡ id: The unique id of the tweet. ➡ lng: The longitude the tweet was written. ➡ lat: The latitude the tweet was written. ➡ topic: Categorization of the tweet in one of ten topics namely, seriousness of gas emissions, importance of human intervention, global stance, significance of pollution awareness events, weather extremes, impact of resource overconsumption, Donald Trump versus science, ideological positions on global warming, politics, and undefined. ➡ sentiment: A score on a continuous scale. This scale ranges from -1 to 1 with values closer to 1 being translated to positive sentiment, values closer to -1 representing a negative sentiment while values close to 0 depicting no sentiment or being neutral. ➡ stance: That is if the tweet supports the belief of man-made climate change (believer), if the tweet does not believe in man-made climate change (denier), and if the tweet neither supports nor refuses the belief of man-made climate change (neutral). ➡ gender: Whether the user that made the tweet is male, female, or undefined. ➡ temperature_avg: The temperature deviation in Celsius and relative to the January 1951-December 1980 average at the time and place the tweet was written. ➡ aggressiveness: That is if the tweet contains aggressive language or not.
Since Twitter forbids making public the text of the tweets, in order to retrieve it you need to do a process called hydrating. Tools such as Twarc or Hydrator can be used to hydrate tweets.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The GDELT Project is the largest, most comprehensive, and highest resolution open database of human society ever created. Just the 2015 data alone records nearly three quarters of a trillion emotional snapshots and more than 1.5 billion location references, while its total archives span more than 215 years, making it one of the largest open-access spatio-temporal datasets in existance and pushing the boundaries of "big data" study of global human society. Its Global Knowledge Graph connects the world's people, organizations, locations, themes, counts, images and emotions into a single holistic network over the entire planet. How can you query, explore, model, visualize, interact, and even forecast this vast archive of human society?
GDELT 2.0 has a wealth of features in the event database which includes events reported in articles published in 65 live translated languages, measurements of 2,300 emotions and themes, high resolution views of the non-Western world, relevant imagery, videos, and social media embeds, quotes, names, amounts, and more.
You may find these code books helpful:
GDELT Global Knowledge Graph Codebook V2.1 (PDF)
GDELT Event Codebook V2.0 (PDF)
You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.github_repos.[TABLENAME]
. [Fork this kernel to get started][98] to learn how to safely manage analyzing large BigQuery datasets.
You may redistribute, rehost, republish, and mirror any of the GDELT datasets in any form. However, any use or redistribution of the data must include a citation to the GDELT Project and a link to the website (https://www.gdeltproject.org/).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Documented March 19, 2023
!!NEW!!!
GeoDAR reservoirs were registered to the drainage network! Please see the auxiliary data "GeoDAR-TopoCat" at https://zenodo.org/records/7750736. "GeoDAR-TopoCat" contains the drainage topology (reaches and upstream/downstream relationships) and catchment boundary for each reservoir in GeoDAR, based on the algorithm used for Lake-TopoCat (doi:10.5194/essd-15-3483-2023).
Documented April 1, 2022
Citation
Wang, J., Walter, B. A., Yao, F., Song, C., Ding, M., Maroof, A. S., Zhu, J., Fan, C., McAlister, J. M., Sikder, M. S., Sheng, Y., Allen, G. H., Crétaux, J.-F., and Wada, Y.: GeoDAR: georeferenced global dams and reservoirs database for bridging attributes and geolocations. Earth System Science Data, 14, 1869–1899, 2022, https://doi.org/10.5194/essd-14-1869-2022.
Please cite the reference above (which was fully peer-reviewed), NOT the preprint version. Thank you.
Contact
Dr. Jida Wang, jidawang@ksu.edu, gdbruins@ucla.edu
Data description and components
Data folder “GeoDAR_v10_v11” (.zip) contains two consecutive, peer-reviewed versions (v1.0 and v1.1) of the Georeferenced global Dams And Reservoirs (GeoDAR) dataset:
As by-products of GeoDAR harmonization, folder “GeoDAR_v10_v11” also contains:
Attribute description
Attribute |
Description and values |
v1.0 dams (file name: GeoDAR_v10_dams; format: comma-separated values (csv) and point shapefile) | |
id_v10 |
Dam ID for GeoDAR version 1.0 (type: integer). Note this is not the same as the International Code in ICOLD WRD but is linked to the International Code via encryption. |
lat |
Latitude of the dam point in decimal degree (type: float) based on datum World Geodetic System (WGS) 1984. |
lon |
Longitude of the dam point in decimal degree (type: float) on WGS 1984. |
geo_mtd |
Georeferencing method (type: text). Unique values include “geo-matching CanVec”, “geo-matching LRD”, “geo-matching MARS”, “geo-matching NID”, “geo-matching ODC”, “geo-matching ODM”, “geo-matching RSB”, “geocoding (Google Maps)”, and “Wada et al. (2017)”. Refer to Table 2 in Wang et al. (2022) for abbreviations. |
qa_rank |
Quality assurance (QA) ranking (type: text). Unique values include “M1”, “M2”, “M3”, “C1”, “C2”, “C3”, “C4”, and “C5”. The QA ranking provides a general measure for our georeferencing quality. Refer to Supplementary Tables S1 and S3 in Wang et al. (2022) for more explanation. |
rv_mcm |
Reservoir storage capacity in million cubic meters (type: float). Values are only available for large dams in Wada et al. (2017). Capacity values of other WRD records are not released due to ICOLD’s proprietary restriction. Also see Table S4 in Wang et al. (2022). |
val_scn |
Validation result (type: text). Unique values include “correct”, “register”, “mismatch”, “misplacement”, and “Google Maps”. Refer to Table 4 in Wang et al. (2022) for explanation. |
val_src |
Primary validation source (type: text). Values include “CanVec”, “Google Maps”, “JDF”, “LRD”, “MARS”, “NID”, “NPCGIS”, “NRLD”, “ODC”, “ODM”, “RSB”, and “Wada et al. (2017)”. Refer to Table 2 in Wang et al. (2022) for abbreviations. |
qc |
Roles and name initials of co-authors/participants during data quality control (QC) and validation. Name initials are given to each assigned dam or region and are listed generally in chronological order for each role. Collation and harmonization of large dams in Wada et al. (2017) (see Table S4 in Wang et al. (2022)) were performed by JW, and this information is not repeated in the qc attribute for a reduced file size. Although we tried to track the name initials thoroughly, the lists may not be always exhaustive, and other undocumented adjustments and corrections were most likely performed by JW. |
v1.1 dams (file name: GeoDAR_v11_dams; format: comma-separated values (csv) and point shapefile) | |
id_v11 |
Dam ID for GeoDAR version 1.1 (type: integer). Note this is not the same as the International Code in ICOLD WRD but is linked to the International Code via encryption. |
id_v10 |
v1.0 ID of this dam/reservoir (as in id_v10) if it is also included in v1.0 (type: integer). |
id_grd_v13 |
GRanD ID of this dam if also included in GRanD v1.3 (type: integer). |
lat |
Latitude of the dam point in decimal degree (type: float) on WGS 1984. Value may be different from that in v1.0. |
lon |
Longitude of the dam point in decimal degree (type: float) on WGS 1984. Value may be different from that in v1.0. |
geo_mtd |
Same as the value of geo_mtd in v1.0 if this dam is included in v1.0. |
qa_rank |
Same as the value of qa_rank in v1.0 if this dam is included in v1.0. |
val_scn |
Same as the value of val_scn in v1.0 if this dam is included in v1.0. |
val_src |
Same as the value of val_src in v1.0 if this dam is included in v1.0. |
rv_mcm_v10 |
Same as the value of rv_mcm in v1.0 if this dam is included in v1.0. |
rv_mcm_v11 |
Reservoir storage capacity in million cubic meters (type: float). Due to ICOLD’s proprietary restriction, provided values are limited to dams in Wada et al. (2017) and GRanD v1.3. If a dam is in both Wada et al. (2017) and GRanD v1.3, the value from the latter (if valid) takes precedence. |
har_src |
Source(s) to harmonize the dam points. Unique values include “GeoDAR v1.0 alone”, “GRanD v1.3 and GeoDAR 1.0”, “GRanD v1.3 and other ICOLD”, and “GRanD v1.3 alone”. Refer to Table 1 in Wang et al. (2022) for more details. |
pnt_src |
Source(s) of the dam point spatial coordinates. Unique values include “GeoDAR v1.0”, “original |
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
According to our latest research, the global synthetic data video generator market size reached USD 1.32 billion in 2024 and is anticipated to grow at a robust CAGR of 38.7% from 2025 to 2033. By the end of 2033, the market is projected to reach USD 18.59 billion, driven by rapid advancements in artificial intelligence, the growing need for high-quality training data for machine learning models, and increasing adoption across industries such as autonomous vehicles, healthcare, and surveillance. The surge in demand for data privacy, coupled with the necessity to overcome data scarcity and bias in real-world datasets, is significantly fueling the synthetic data video generator market's growth trajectory.
One of the primary growth factors for the synthetic data video generator market is the escalating demand for high-fidelity, annotated video datasets required to train and validate AI-driven systems. Traditional data collection methods are often hampered by privacy concerns, high costs, and the sheer complexity of obtaining diverse and representative video samples. Synthetic data video generators address these challenges by enabling the creation of large-scale, customizable, and bias-free datasets that closely mimic real-world scenarios. This capability is particularly vital for sectors such as autonomous vehicles and robotics, where the accuracy and safety of AI models depend heavily on the quality and variety of training data. As organizations strive to accelerate innovation and reduce the risks associated with real-world data collection, the adoption of synthetic data video generation technologies is expected to expand rapidly.
Another significant driver for the synthetic data video generator market is the increasing regulatory scrutiny surrounding data privacy and compliance. With stricter regulations such as GDPR and CCPA coming into force, organizations face mounting challenges in using real-world video data that may contain personally identifiable information. Synthetic data offers an effective solution by generating video datasets devoid of any real individuals, thereby ensuring compliance while still enabling advanced analytics and machine learning. Moreover, synthetic data video generators empower businesses to simulate rare or hazardous events that are difficult or unethical to capture in real life, further enhancing model robustness and preparedness. This advantage is particularly pronounced in healthcare, surveillance, and automotive industries, where data privacy and safety are paramount.
Technological advancements and increasing integration with cloud-based platforms are also propelling the synthetic data video generator market forward. The proliferation of cloud computing has made it easier for organizations of all sizes to access scalable synthetic data generation tools without significant upfront investments in hardware or infrastructure. Furthermore, the continuous evolution of generative adversarial networks (GANs) and other deep learning techniques has dramatically improved the realism and utility of synthetic video data. As a result, companies are now able to generate highly realistic, scenario-specific video datasets at scale, reducing both the time and cost required for AI development. This democratization of synthetic data technology is expected to unlock new opportunities across a wide array of applications, from entertainment content production to advanced surveillance systems.
From a regional perspective, North America currently dominates the synthetic data video generator market, accounting for the largest share in 2024, followed closely by Europe and Asia Pacific. The strong presence of leading AI technology providers, robust investment in research and development, and early adoption by automotive and healthcare sectors are key contributors to North America's market leadership. Europe is also witnessing significant growth, driven by stringent data privacy regulations and increased focus on AI-driven innovation. Meanwhile, Asia Pacific is emerging as a high-growth region, fueled by rapid digital transformation, expanding IT infrastructure, and increasing investments in autonomous systems and smart city projects. Latin America and Middle East & Africa, while still nascent, are expected to experience steady uptake as awareness and technological capabilities continue to grow.
The synthetic data video generator market by comp
The world population surpassed eight billion people in 2022, having doubled from its figure less than 50 years previously. Looking forward, it is projected that the world population will reach nine billion in 2038, and 10 billion in 2060, but it will peak around 10.3 billion in the 2080s before it then goes into decline. Regional variations The global population has seen rapid growth since the early 1800s, due to advances in areas such as food production, healthcare, water safety, education, and infrastructure, however, these changes did not occur at a uniform time or pace across the world. Broadly speaking, the first regions to undergo their demographic transitions were Europe, North America, and Oceania, followed by Latin America and Asia (although Asia's development saw the greatest variation due to its size), while Africa was the last continent to undergo this transformation. Because of these differences, many so-called "advanced" countries are now experiencing population decline, particularly in Europe and East Asia, while the fastest population growth rates are found in Sub-Saharan Africa. In fact, the roughly two billion difference in population between now and the 2080s' peak will be found in Sub-Saharan Africa, which will rise from 1.2 billion to 3.2 billion in this time (although populations in other continents will also fluctuate). Changing projections The United Nations releases their World Population Prospects report every 1-2 years, and this is widely considered the foremost demographic dataset in the world. However, recent years have seen a notable decline in projections when the global population will peak, and at what number. Previous reports in the 2010s had suggested a peak of over 11 billion people, and that population growth would continue into the 2100s, however a sooner and shorter peak is now projected. Reasons for this include a more rapid population decline in East Asia and Europe, particularly China, as well as a prolongued development arc in Sub-Saharan Africa.