100+ datasets found

Global Summary of the Month, version 1.0
data.cnra.ca.gov
data.wu.ac.at
csv, kmz, pdf
Updated Mar 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Oceanic and Atmospheric Administration (2023). Global Summary of the Month, version 1.0 [Dataset]. https://data.cnra.ca.gov/dataset/global-summary-of-the-month-version-1-0
Explore at:
pdf, csv, kmzAvailable download formats
Dataset updated
Mar 1, 2023
Dataset authored and provided by
National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
Description
The global summaries data set contains a monthly (GSOM) resolution of meteorological elements (max temp, snow, etc) from 1763 to present with updates weekly. The major parameters are: monthly mean maximum, mean minimum and mean temperatures; monthly total precipitation and snowfall; departure from normal of the mean temperature and total precipitation; monthly heating and cooling degree days; number of days that temperatures and precipitation are above or below certain thresholds; and extreme daily temperature and precipitation amounts. The primary source data set source is the Global Historical Climatology Network (GHCN)-Daily Data set. The global summaries data set also contains a yearly (GSOY) resolution of meteorological elements. See associated resources for more information. This data is not to be confused with "GHCN-Monthly", "Annual Summaries" or "NCDC Summary of the Month". There are unique elements that are produced globally within the GSOM and GSOY data files. There are also bias corrected temperature data in GHCN-Monthly, which will not be available in GSOM and GSOY. The GSOM and GSOY data set is going to replace the legacy DSI-3220 and expand to include non-U.S. (a.k.a. global) stations. DSI-3220 only included National Weather Service (NWS) COOP Published, or "Published in CD", sites.
f
Summary data file containing play mode, median and mean of speech length in...
datasetcatalog.nlm.nih.gov
Updated Apr 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Craig, Hugh; Colyvas, Kim; Egan, Gabriel (2023). Summary data file containing play mode, median and mean of speech length in words, with play metadata. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001001019
Explore at:
Dataset updated
Apr 21, 2023
Authors
Craig, Hugh; Colyvas, Kim; Egan, Gabriel
Description
Summary data file containing play mode, median and mean of speech length in words, with play metadata.
b
Guidelines for Computing Summary Statistics for Data-Sets Containing...
datahub.bvcentre.ca
Updated Jun 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Guidelines for Computing Summary Statistics for Data-Sets Containing Non-Detects - Dataset - BVRC DataHub [Dataset]. https://datahub.bvcentre.ca/dataset/guidelines-for-computing-summary-statistics-for-data-sets-containing-non-detects
Explore at:
Dataset updated
Jun 3, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
INTRODUCTION As part of its responsibilities, the BC Ministry of Environment monitors water quality in the province’s streams, rivers, and lakes. Often, it is necessary to compile statistics involving concentrations of contaminants or other compounds. Quite often the instruments used cannot measure concentrations below certain values. These observations are called non-detects or less thans. However, non-detects pose a difficulty when it is necessary to compute statistical measurements such as the mean, the median, and the standard deviation for a data set. The way non-detects are handled can affect the quality of any statistics generated. Non-detects, or censored data are found in many fields such as medicine, engineering, biology, and environmetrics. In such fields, it is often the case that the measurements of interest are below some threshold. Dealing with non-detects is a significant issue and statistical tools using survival or reliability methods have been developed. Basically, there are three approaches for treating data containing censored values: 1. substitution, which gives poor results and therefore, is not recommended in the literature; 2. maximum likelihood estimation, which requires an assumption of some distributional form; and 3. and nonparametric methods which assess the shape of the data based on observed percentiles rather than a strict distributional form. This document provides guidance on how to record censor data, and on when and how to use certain analysis methods when the percentage of censored observations is less than 50%. The methods presented in this document are:1. substitution; 2. Kaplan-Meier, as part of nonparametric methods; 3. lognormal model based on maximum likelihood estimation; 4. and robust regression on order statistics, which is a semiparametric method. Statistical software suitable for survival or reliability analysis is available for dealing with censored data. This software has been widely used in medical and engineering environments. In this document, methods are illustrated with both R and JMP software packages, when possible. JMP often requires some intermediate steps to obtain summary statistics with most of the methods described in this document. R, with the NADA package is usually straightforward. The package NADA was developed specifically for computing statistics with non-detects in environmental data based on Helsel (2005b). The data used to illustrate the methods described for computing summary statistics for non-detects are either simulated or based on information acquired from the B.C. Ministry of Environment. This document is strongly based on the book Nondetects And Data Analysis written by Dennis R. Helsel in 2005 (Helsel, 2005b).
Global Summary of the Month (GSOM), Version 1
catalog.data.gov
s.cnmilf.com
+1more
Updated Sep 19, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NOAA National Centers for Environmental Information (Point of Contact) (2023). Global Summary of the Month (GSOM), Version 1 [Dataset]. https://catalog.data.gov/dataset/global-summary-of-the-month-gsom-version-12
Explore at:
Dataset updated
Sep 19, 2023
Dataset provided by
National Centers for Environmental Informationhttps://www.ncei.noaa.gov/
National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
Description
This Global Summaries dataset, known as GSOM for Monthly, contains a monthly resolution of meteorological elements from 1763 to present with updates applied weekly. The major parameters are: monthly mean maximum, mean minimum and mean temperatures; monthly total precipitation and snowfall; departure from normal of the mean temperature and total precipitation; monthly heating and cooling degree days; number of days that temperatures and precipitation are above or below certain thresholds; extreme daily temperature and precipitation amounts; number of days with fog; and number of days with thunderstorms. The primary input data source is the Global Historical Climatology Network - Daily (GHCN-Daily) dataset. The Global Summaries datasets also include a yearly resolution of meteorological elements in the GSOY (for Yearly) dataset. See associated resources for more information. These datasets are not to be confused with "GHCN-Monthly", "Annual Summaries" or "NCDC Summary of the Month". There are unique elements that are produced globally within the GSOM and GSOY data files. There are also bias corrected temperature data in GHCN-Monthly, which are not available in GSOM and GSOY. The GSOM and GSOY datasets replace the legacy U.S. COOP Summaries (DSI-3220), and have been expanded to include non-U.S. (global) stations. U.S. COOP Summaries (DSI-3220) only includes National Weather Service (NWS) COOP Published, or "Published in CD", sites.
d
Summary statistics by region
catalog.data.gov
data.usgs.gov
+2more
Updated Nov 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Summary statistics by region [Dataset]. https://catalog.data.gov/dataset/summary-statistics-by-region
Explore at:
Dataset updated
Nov 19, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
This dataset tabulates summary statistics for estimates of amounts of copper in undiscovered porphyry copper deposits by world region and for the globe. Data re reported by region, by aggregation method (randomized and sorted), and by selected statistics. These include quantiles, means, standard deviations, standard error of the mean , and Upper and Lower 95% of the mean.
h
Human Values, Purpose & Meaning: data summary (2025)
humanclarityinstitute.com
Updated Nov 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Human Clarity Institute (2025). Human Values, Purpose & Meaning: data summary (2025) [Dataset]. https://humanclarityinstitute.com/data/human-values-purpose-meaning-data-2025/
Explore at:
Dataset updated
Nov 25, 2025
Dataset provided by
Human Clarity Institute
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This page summarises how people understand their core values, how meaningful life feels, and how digital noise and AI shape decisions, focus and long-term direction.
Summary for Policymakers of the Working Group I Contribution to the IPCC...
catalogue.ceda.ac.uk
data-search.nerc.ac.uk
Updated Mar 9, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Erich Fischer; Mathias Hauser (2024). Summary for Policymakers of the Working Group I Contribution to the IPCC Sixth Assessment Report - data for Figure SPM.5 (v20210809) [Dataset]. https://catalogue.ceda.ac.uk/uuid/2787230b963942009e452255a3880609
Explore at:
Dataset updated
Mar 9, 2024
Dataset provided by
Centre for Environmental Data Analysishttp://www.ceda.ac.uk/
Authors
Erich Fischer; Mathias Hauser
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 1850 - Dec 31, 2100
Area covered
Earth
Variables measured
latitude, longitude
Description
Data for Figure SPM.5 from the Summary for Policymakers (SPM) of the Working Group I (WGI) Contribution to the Intergovernmental Panel on Climate Change (IPCC) Sixth Assessment Report (AR6).

Figure SPM.5 shows changes in annual mean surface temperatures, precipitation, and total column soil moisture.

How to cite this dataset

When citing this dataset, please include both the data citation below (under 'Citable as') and the following citation for the report component from which the figure originates:

IPCC, 2021: Summary for Policymakers. In: Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change [Masson-Delmotte, V., P. Zhai, A. Pirani, S.L. Connors, C. Péan, S. Berger, N. Caud, Y. Chen, L. Goldfarb, M.I. Gomis, M. Huang, K. Leitzell, E. Lonnoy, J.B.R. Matthews, T.K. Maycock, T. Waterfield, O. Yelekçi, R. Yu, and B. Zhou (eds.)]. Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA, pp. 3−32, doi:10.1017/9781009157896.001.

Figure subpanels

The figure has four panels with 11 maps. All data is provided, except for panel a1.

List of data provided

This dataset contains:

Annual mean temperature change (°C) (relative to 1850-1900)

Annual mean precipitation change (%) (relative to 1850-1900)

Annual mean soil moisture change (standard deviation of interannual variability) (relative to 1850-1900)

The data is given for global warming levels (GWLs), namely +1.0°C (temperature only), +1.5°C, 2.0°C, and +4.0°C.

Data provided in relation to figure

Panel a: - Data file: Panel_a2_Simulated_temperature_change_at_1C.nc, simulated annual mean temperature change (°C) at 1°C global warming relative to 1850-1900 (right).

Panel b: - Data file: Panel_b1_Simulated_temperature_change_at_1_5C.nc, simulated annual mean temperature change (°C) at 1.5°C global warming relative to 1850-1900 (left). - Data file: Panel_b2_Simulated_temperature_change_at_2C.nc, simulated annual mean temperature change (°C) at 2.0°C global warming relative to 1850-1900 (center). - Data file: Panel_b3_Simulated_temperature_change_at_4C.nc, simulated annual mean temperature change (°C) at 4.0°C global warming relative to 1850-1900 (right).

Panel c: - Data file: Panel_c1_Simulated_precipitation_change_at_1_5C.nc, simulated annual mean precipitation change (%) at 1.5°C global warming relative to 1850-1900 (left). - Data file: Panel_c2_Simulated_precipitation_change_at_2C.nc, simulated annual mean precipitation change (%) at 2.0°C global warming relative to 1850-1900 (center). - Data file: Panel_c3_Simulated_precipitation_change_at_4C.nc, simulated annual mean precipitation change (%) at 4.0°C global warming relative to 1850-1900 (right).

Panel d: - Data file: Figure_SPM5_d1_cmip6_SM_tot_change_at_1_5C.nc, simulated annual mean total column soil moisture change (standard deviation) at 1.5°C global warming relative to 1850-1900 (left). - Data file: Figure_SPM5_d2_cmip6_SM_tot_change_at_2C.nc, simulated annual mean total column soil moisture change (standard deviation) at 2.0°C global warming relative to 1850-1900 (center). - Data file: Figure_SPM5_d3_cmip6_SM_tot_change_at_4C.nc, simulated annual mean total column soil moisture change (standard deviation) at 4.0°C global warming relative to 1850-1900 (right).

Sources of additional information

The following weblink is provided in the Related Documents section of this catalogue record:

Link to the report webpage, which includes the component containing the figure (Summary for Policymakers), the Technical Summary (Figures TS.3 and TS.5) and the Supplementary Material for Chapters 1, 4 and 11, which contains details on the input data used in Tables 1.SM.1 (Figure 1.14), 4.SM.1 (Figures 4.31 and 4.32) and 11.SM.9 (Figure 11.19).
Data Set of Extracted Summary Statistics from Equipment Sensor Data
data.europa.eu
data.niaid.nih.gov
unknown
Updated Jan 24, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2021). Data Set of Extracted Summary Statistics from Equipment Sensor Data [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-4462777?locale=en
Explore at:
unknown(149706)Available download formats
Dataset updated
Jan 24, 2021
Dataset authored and provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data set was generated in accordance with the semiconductor industry and contains values of summary statistics from sensor recordings of the high-precision and high-tech production equipment. Basically, the semiconductor production consists of hundreds of process steps performing physical and chemical operations on so-called wafers, i.e. slices based on semiconductor material. In the production chain, each process equipment is equipped with several sensors recording physical parameters like gas flow, temperature, voltage, etc., resulting in so-called sensor data. Out of the sensor data, values of summary statistics are extracted. These are values like mean, standard deviation and gradients. To keep the entire production as stable as possible, these values are used to monitor the whole production in order to intervene in case of deviations. After the production, each device on the wafer is tested in the most careful way resulting in so-called wafer test data. In some cases, suspicious patterns occur in the wafer test data potentially leading to failure. In this case the root cause must be found in the production chain. For this purpose, the given data is provided. The aim is to find correlations between the wafer test data and the values of summary statistics in order to identify the root cause. The given data is divided into four data sets: "XTrain.csv", "YTrain.csv", "XTest.csv" and "YTest.csv". "XTrain.csv" and "XTest.csv" represent the values of summary statistics originating in the production chain separated for the purpose of training and validating a statistical model. Included are 114 observations of 77 parameters (values of summary statistics). The "YTrain.csv" and "YTest.csv" contain the corresponding wafer test data (144 observations of one parameter).
NOAA Global Surface Summary of Day
registry.opendata.aws
Updated Apr 20, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NOAA (2018). NOAA Global Surface Summary of Day [Dataset]. https://registry.opendata.aws/noaa-gsod/
Explore at:
Dataset updated
Apr 20, 2018
Dataset provided by
National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
Description
Global Surface Summary of the Day is derived from The Integrated Surface Hourly (ISH) dataset. The ISH dataset includes global data obtained from the USAF Climatology Center, located in the Federal Climate Complex with NCDC. The latest daily summary data are normally available 1-2 days after the date-time of the observations used in the daily summaries. The online data files begin with 1929 and are at the time of this writing at the Version 8 software level. Over 9000 stations' data are typically available. The daily elements included in the dataset (as available from each station) are:
Mean temperature (.1 Fahrenheit)
Mean dew point (.1 Fahrenheit)
Mean sea level pressure (.1 mb)
Mean station pressure (.1 mb)
Mean visibility (.1 miles)
Mean wind speed (.1 knots)
Maximum sustained wind speed (.1 knots)
Maximum wind gust (.1 knots)
Maximum temperature (.1 Fahrenheit)
Minimum temperature (.1 Fahrenheit)
Precipitation amount (.01 inches)
Snow depth (.1 inches)
Indicator for occurrence of: Fog, Rain or Drizzle, Snow or Ice Pellets, Hail, Thunder, Tornado/Funnel Cloud.

Global summary of day data for 18 surface meteorological elements are derived from the synoptic/hourly observations contained in USAF DATSAV3 Surface data and Federal Climate Complex Integrated Surface Hourly (ISH). Historical data are generally available for 1929 to the present, with data from 1973 to the present being the most complete. For some periods, one or more countries' data may not be available due to data restrictions or communications problems. In deriving the summary of day data, a minimum of 4 observations for the day must be present (allows for stations which report 4 synoptic observations/day). Since the data are converted to constant units (e.g, knots), slight rounding error from the originally reported values may occur (e.g, 9.9 instead of 10.0). The mean daily values described below are based on the hours of operation for the station. For some stations/countries, the visibility will sometimes 'cluster' around a value (such as 10 miles) due to the practice of not reporting visibilities greater than certain distances. The daily extremes and totals--maximum wind gust, precipitation amount, and snow depth--will only appear if the station reports the data sufficiently to provide a valid value. Therefore, these three elements will appear less frequently than other values. Also, these elements are derived from the stations' reports during the day, and may comprise a 24-hour period which includes a portion of the previous day. The data are reported and summarized based on Greenwich Mean Time (GMT, 0000Z - 2359Z) since the original synoptic/hourly data are reported and based on GMT.
Data from: Global Summary of the Year (GSOY), Version 1
catalog.data.gov
s.cnmilf.com
Updated Sep 19, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NOAA National Centers for Environmental Information (Point of Contact) (2023). Global Summary of the Year (GSOY), Version 1 [Dataset]. https://catalog.data.gov/dataset/global-summary-of-the-year-gsoy-version-12
Explore at:
Dataset updated
Sep 19, 2023
Dataset provided by
National Centers for Environmental Informationhttps://www.ncei.noaa.gov/
National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
Description
This Global Summaries dataset, known as GSOY for Yearly, contains a yearly resolution of meteorological elements from 1763 to present with updates applied weekly. The major parameters are: â€“ average annual temperature, average annual minimum and maximum temperatures; total annual precipitation and snowfall; departure from normal of the mean temperature and total precipitation; heating and cooling degree days; number of days that temperatures and precipitation are above or below certain thresholds; extreme annual minimum and maximum temperatures; number of days with fog; and number of days with thunderstorms. The primary input data source is the Global Historical Climatology Network - Daily (GHCN-Daily) dataset. The Global Summaries datasets also include a monthly resolution of meteorological elements in the GSOM (for Monthly) dataset. See associated resources for more information. These datasets are not to be confused with "GHCN-Monthly", "Annual Summaries" or "NCDC Summary of the Month". There are unique elements that are produced globally within the GSOM and GSOY data files. There are also bias corrected temperature data in GHCN-Monthly, which are not available in GSOM and GSOY. The GSOM and GSOY datasets replace the legacy U.S. COOP Summaries (DSI-3220), and have been expanded to include non-U.S. (global) stations. U.S. COOP Summaries (DSI-3220) only includes National Weather Service (NWS) COOP Published, or "Published in CD", sites.
Data from: A dataset to model Levantine landcover and land-use change...
zenodo.org
data.niaid.nih.gov
+1more
zip
Updated Dec 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael Kempf; Michael Kempf (2023). A dataset to model Levantine landcover and land-use change connected to climate change, the Arab Spring and COVID-19 [Dataset]. http://doi.org/10.5281/zenodo.10396148
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10396148
Dataset updated
Dec 16, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Michael Kempf; Michael Kempf
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Dec 16, 2023
Area covered
Levant
Description
Overview

This dataset is the repository for the following paper submitted to Data in Brief:

Kempf, M. A dataset to model Levantine landcover and land-use change connected to climate change, the Arab Spring and COVID-19. Data in Brief (submitted: December 2023).

The Data in Brief article contains the supplement information and is the related data paper to:

Kempf, M. Climate change, the Arab Spring, and COVID-19 - Impacts on landcover transformations in the Levant. Journal of Arid Environments (revision submitted: December 2023).

Description/abstract

The Levant region is highly vulnerable to climate change, experiencing prolonged heat waves that have led to societal crises and population displacement. Since 2010, the area has been marked by socio-political turmoil, including the Syrian civil war and currently the escalation of the so-called Israeli-Palestinian Conflict, which strained neighbouring countries like Jordan due to the influx of Syrian refugees and increases population vulnerability to governmental decision-making. Jordan, in particular, has seen rapid population growth and significant changes in land-use and infrastructure, leading to over-exploitation of the landscape through irrigation and construction. This dataset uses climate data, satellite imagery, and land cover information to illustrate the substantial increase in construction activity and highlights the intricate relationship between climate change predictions and current socio-political developments in the Levant.

Folder structure

The main folder after download contains all data, in which the following subfolders are stored are stored as zipped files:

“code” stores the above described 9 code chunks to read, extract, process, analyse, and visualize the data.

“MODIS_merged” contains the 16-days, 250 m resolution NDVI imagery merged from three tiles (h20v05, h21v05, h21v06) and cropped to the study area, n=510, covering January 2001 to December 2022 and including January and February 2023.

“mask” contains a single shapefile, which is the merged product of administrative boundaries, including Jordan, Lebanon, Israel, Syria, and Palestine (“MERGED_LEVANT.shp”).

“yield_productivity” contains .csv files of yield information for all countries listed above.

“population” contains two files with the same name but different format. The .csv file is for processing and plotting in R. The .ods file is for enhanced visualization of population dynamics in the Levant (Socio_cultural_political_development_database_FAO2023.ods).

“GLDAS” stores the raw data of the NASA Global Land Data Assimilation System datasets that can be read, extracted (variable name), and processed using code “8_GLDAS_read_extract_trend” from the respective folder. One folder contains data from 1975-2022 and a second the additional January and February 2023 data.

“built_up” contains the landcover and built-up change data from 1975 to 2022. This folder is subdivided into two subfolder which contain the raw data and the already processed data. “raw_data” contains the unprocessed datasets and “derived_data” stores the cropped built_up datasets at 5 year intervals, e.g., “Levant_built_up_1975.tif”.

Code structure

1_MODIS_NDVI_hdf_file_extraction.R

This is the first code chunk that refers to the extraction of MODIS data from .hdf file format. The following packages must be installed and the raw data must be downloaded using a simple mass downloader, e.g., from google chrome. Packages: terra. Download MODIS data from after registration from: https://lpdaac.usgs.gov/products/mod13q1v061/ or https://search.earthdata.nasa.gov/search (MODIS/Terra Vegetation Indices 16-Day L3 Global 250m SIN Grid V061, last accessed, 09th of October 2023). The code reads a list of files, extracts the NDVI, and saves each file to a single .tif-file with the indication “NDVI”. Because the study area is quite large, we have to load three different (spatially) time series and merge them later. Note that the time series are temporally consistent.

2_MERGE_MODIS_tiles.R

In this code, we load and merge the three different stacks to produce large and consistent time series of NDVI imagery across the study area. We further use the package gtools to load the files in (1, 2, 3, 4, 5, 6, etc.). Here, we have three stacks from which we merge the first two (stack 1, stack 2) and store them. We then merge this stack with stack 3. We produce single files named NDVI_final_*consecutivenumber*.tif. Before saving the final output of single merged files, create a folder called “merged” and set the working directory to this folder, e.g., setwd("your directory_MODIS/merged").

3_CROP_MODIS_merged_tiles.R

Now we want to crop the derived MODIS tiles to our study area. We are using a mask, which is provided as .shp file in the repository, named "MERGED_LEVANT.shp". We load the merged .tif files and crop the stack with the vector. Saving to individual files, we name them “NDVI_merged_clip_*consecutivenumber*.tif. We now produced single cropped NDVI time series data from MODIS.
The repository provides the already clipped and merged NDVI datasets.

4_TREND_analysis_NDVI.R

Now, we want to perform trend analysis from the derived data. The data we load is tricky as it contains 16-days return period across a year for the period of 22 years. Growing season sums contain MAM (March-May), JJA (June-August), and SON (September-November). December is represented as a single file, which means that the period DJF (December-February) is represented by 5 images instead of 6. For the last DJF period (December 2022), the data from January and February 2023 can be added. The code selects the respective images from the stack, depending on which period is under consideration. From these stacks, individual annually resolved growing season sums are generated and the slope is calculated. We can then extract the p-values of the trend and characterize all values with high confidence level (0.05). Using the ggplot2 package and the melt function from reshape2 package, we can create a plot of the reclassified NDVI trends together with a local smoother (LOESS) of value 0.3.
To increase comparability and understand the amplitude of the trends, z-scores were calculated and plotted, which show the deviation of the values from the mean. This has been done for the NDVI values as well as the GLDAS climate variables as a normalization technique.

5_BUILT_UP_change_raster.R

Let us look at the landcover changes now. We are working with the terra package and get raster data from here: https://ghsl.jrc.ec.europa.eu/download.php?ds=bu (last accessed 03. March 2023, 100 m resolution, global coverage). Here, one can download the temporal coverage that is aimed for and reclassify it using the code after cropping to the individual study area. Here, I summed up different raster to characterize the built-up change in continuous values between 1975 and 2022.

6_POPULATION_numbers_plot.R

For this plot, one needs to load the .csv-file “Socio_cultural_political_development_database_FAO2023.csv” from the repository. The ggplot script provided produces the desired plot with all countries under consideration.

7_YIELD_plot.R

In this section, we are using the country productivity from the supplement in the repository “yield_productivity” (e.g., "Jordan_yield.csv". Each of the single country yield datasets is plotted in a ggplot and combined using the patchwork package in R.

8_GLDAS_read_extract_trend

The last code provides the basis for the trend analysis of the climate variables used in the paper. The raw data can be accessed https://disc.gsfc.nasa.gov/datasets?keywords=GLDAS%20Noah%20Land%20Surface%20Model%20L4%20monthly&page=1 (last accessed 9th of October 2023). The raw data comes in .nc file format and various variables can be extracted using the [“^a variable name”] command from the spatraster collection. Each time you run the code, this variable name must be adjusted to meet the requirements for the variables (see this link for abbreviations: https://disc.gsfc.nasa.gov/datasets/GLDAS_CLSM025_D_2.0/summary, last accessed 09th of October 2023; or the respective code chunk when reading a .nc file with the ncdf4 package in R) or run print(nc) from the code or use names(the spatraster collection).
Choosing one variable, the code uses the MERGED_LEVANT.shp mask from the repository to crop and mask the data to the outline of the study area.
From the processed data, trend analysis are conducted and z-scores were calculated following the code described above. However, annual trends require the frequency of the time series analysis to be set to value = 12. Regarding, e.g., rainfall, which is measured as annual sums and not means, the chunk r.sum=r.sum/12 has to be removed or set to r.sum=r.sum/1 to avoid calculating annual mean values (see other variables). Seasonal subset can be calculated as described in the code. Here, 3-month subsets were chosen for growing seasons, e.g. March-May (MAM), June-July (JJA), September-November (SON), and DJF (December-February, including Jan/Feb of the consecutive year).
From the data, mean values of 48 consecutive years are calculated and trend analysis are performed as describe above. In the same way, p-values are extracted and 95 % confidence level values are marked with dots on the raster plot. This analysis can be performed with a much longer time series, other variables, ad different spatial extent across the globe due to the availability of the GLDAS variables.
National Energy Efficiency Data-Framework (NEED) report: summary of analysis...
gov.uk
s3.amazonaws.com
Updated Aug 11, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department for Business, Energy & Industrial Strategy (2023). National Energy Efficiency Data-Framework (NEED) report: summary of analysis 2021 [Dataset]. https://www.gov.uk/government/statistics/national-energy-efficiency-data-framework-need-report-summary-of-analysis-2021
Explore at:
Dataset updated
Aug 11, 2023
Dataset provided by
GOV.UKhttp://gov.uk/
Authors
Department for Business, Energy & Industrial Strategy
Description
The National Energy Efficiency Data-Framework (NEED) was set up to provide a better understanding of energy use and energy efficiency in domestic and non-domestic buildings in Great Britain. The data framework matches data about a property together - including energy consumption and energy efficiency measures installed - at household level.

11 August 2023 Error notice: revisions to the June 2021 Domestic NEED annual report

We identified 2 processing errors in this edition of the Domestic NEED Annual report and corrected them. The changes are small and do not affect the overall findings of the report, only the domestic energy consumption estimates. The revisions are summarised here:

Error 1: Local authority consumption estimates

Extent of the error: Table LA13 and LA14 revised to correct for a processing error that was identified after accessible versions of the tables were published on 10 June 2022. The update did not include the ‘Unknown’ category which meant columns were misplaced. Table LA7 (2019 only) has also been updated to include data for counties, these rows previously appeared as [no data].

Years affected: 2017-2019

Countries affected: England and Wales

Data tables affected:

Local authority table, England and Wales, 2019 (Tables LA7, LA13 and LA14)

Local authority table, England and Wales, 2018 (Tables LA13 and LA14)

Local authority table, England and Wales, 2017 (Tables LA13 and LA14)

Error 2: Some properties incorrectly excluded from the Scotland multiple attributes tables

Extent of the error: These corrections primarily affect the number in sample column for all years as some properties were incorrectly excluded from the consumption estimates. There have also been revisions to the mean, median, upper and lower quartiles. Using 2019 as an example, around 80% of the updated mean and median values are within 300 kWh of what was previously published.

Years affected: 2017-2019

Countries affected: Scotland

Data tables affected: Multiple attributes tables: Scotland, 2019 (all tables)

4 August 2021 Error notice: revisions to the June 2021 Domestic NEED annual report

We identified 2 processing errors in this edition of the Domestic NEED Annual report and corrected them. The changes are small and do not affect the overall findings of the report, only the domestic energy consumption estimates. The impact of energy efficiency measures analysis remains unchanged. The revisions are summarised here:

Error 1: Some properties incorrectly excluded from the 2019 gas consumption estimates

Extent of the error: The properties that were incorrectly excluded made up around 1% of all properties that should have been included

Years affected: 2019

Countries affected: England and Wales, Scotland

Data table and documents affected:

the 2019 gas estimates in all Consumption tables

the 2019 gas estimates in the NEED data explorer

<a rel="external" href="https://assets.publishing.service.gov
Poor statistical reporting, inadequate data presentation and spin persist...
plos.figshare.com
zip
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joanna Diong; Annie A. Butler; Simon C. Gandevia; Martin E. Héroux (2023). Poor statistical reporting, inadequate data presentation and spin persist despite editorial advice [Dataset]. http://doi.org/10.1371/journal.pone.0202121
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0202121
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Joanna Diong; Annie A. Butler; Simon C. Gandevia; Martin E. Héroux
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Journal of Physiology and British Journal of Pharmacology jointly published an editorial series in 2011 to improve standards in statistical reporting and data analysis. It is not known whether reporting practices changed in response to the editorial advice. We conducted a cross-sectional analysis of reporting practices in a random sample of research papers published in these journals before (n = 202) and after (n = 199) publication of the editorial advice. Descriptive data are presented. There was no evidence that reporting practices improved following publication of the editorial advice. Overall, 76-84% of papers with written measures that summarized data variability used standard errors of the mean, and 90-96% of papers did not report exact p-values for primary analyses and post-hoc tests. 76-84% of papers that plotted measures to summarize data variability used standard errors of the mean, and only 2-4% of papers plotted raw data used to calculate variability. Of papers that reported p-values between 0.05 and 0.1, 56-63% interpreted these as trends or statistically significant. Implied or gross spin was noted incidentally in papers before (n = 10) and after (n = 9) the editorial advice was published. Overall, poor statistical reporting, inadequate data presentation and spin were present before and after the editorial advice was published. While the scientific community continues to implement strategies for improving reporting practices, our results indicate stronger incentives or enforcements are needed.
C
N Summary from the United States Air Force
data.cnra.ca.gov
ncei.noaa.gov
+3more
Updated May 9, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ocean Data Partners (2019). N Summary from the United States Air Force [Dataset]. https://data.cnra.ca.gov/dataset/n-summary-from-the-united-states-air-force
Explore at:
Dataset updated
May 9, 2019
Dataset authored and provided by
Ocean Data Partners
Area covered
United States
Description
N Summaries are means and percentage frequency tabulations for stations around the world. There are approximately 2674 worldwide stations with five or more years data for which the N Summary was prepared. The summaries were calculated using surface weather observations collated by the United States Air Force. The parameters in an N Summary are one or more of the following monthly, seasonal, and annual tabulations: 1. Percentage frequency of surface winds (kts) by day, hour and month, to 16 points of the compass, 2. Percentage frequency of surface winds (kts) (seasonal and annual) to 16 points of the compass, 3. Precipitation amounts (in.), 4. Mean frequency of daily maximum temperature (Deg. F), mean maximum, and extreme maximum temperature (Deg. F), 5. Mean frequency of daily minimum temperature (Deg. F), mean minimum and extreme minimum temperature (Deg. F), and mean daily temperature range (Deg. F), 6. Mean number of days favorable for indicated military operations, 7. Miscellaneous data; mean number of days of occurrence of various weather phenomena, 8. Mean number of days with indicated total and low cloud amounts (oktas), 9. Percentage frequency of observations with low clouds (amount in 8ths, height in feet) and visibility (miles) reported, 10. Relative humidity means, 11. Percentage frequency distribution of wind speed (kts) and temperature (Deg. F), 12. Percentage frequency of visibility (miles) and various atmospheric phenomena, 13. Mean number of days with specified phenomena, 14. Mean cloudiness (%), 15. Snow depth (in.), 16. Percentage frequency of surface winds (kts) to 8 points of the compass (monthly), 17. Percentage frequency of surface winds (kts) to 8 points of the compass/ seasonal, 18. Sea level pressure (mb), means, and standard deviations. Some of the tabulations are for all hours of the day while others may be for each 3-hourly, 6-hourly, or 12-hourly segment of the day, and in some cases, for only one observation a day.
CAFE (Corporate Average Fuel Economy) - Summary of CAFE Civil Penalties...
catalog.data.gov
data.transportation.gov
+1more
Updated May 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Highway Traffic Safety Administration (2024). CAFE (Corporate Average Fuel Economy) - Summary of CAFE Civil Penalties Collected [Dataset]. https://catalog.data.gov/dataset/cafe-corporate-average-fuel-economy-summary-of-cafe-civil-penalties-collected
Explore at:
Dataset updated
May 1, 2024
Dataset provided by
U.S. Department of Transportation, National Highway Traffic Safety Administrationhttp://nhtsa.gov/
Description
NHTSA's Corporate Average Fuel Economy (CAFE) program requires manufacturers of passenger cars and light trucks, produced for sale in the U.S., to meet CAFE standards, expressed in miles per gallon (mpg). The purpose of the CAFE program is to reduce the nation's energy consumption by increasing the fuel economy of cars and light trucks. The CAFE Public Information Center (PIC) is the authoritative source for Corporate Average Fuel Economy (CAFE) program data. This site allows fuel economy data to be viewed in report and/or graph format. The data can be sorted and filtered to produce custom reports which can also be downloaded as Excel or pdf files. NHTSA periodically updates the CAFE data in the PIC and, therefore, each report and graph is date stamped to indicate the last time NHTSA made updates.
Global Summary of the Year, version 1.0
data.cnra.ca.gov
data.wu.ac.at
csv, kmz, pdf
Updated Mar 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Oceanic and Atmospheric Administration (2023). Global Summary of the Year, version 1.0 [Dataset]. https://data.cnra.ca.gov/dataset/global-summary-of-the-year-version-1-0
Explore at:
pdf, csv, kmzAvailable download formats
Dataset updated
Mar 1, 2023
Dataset authored and provided by
National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
Description
The global summaries data set contains a yearly (GSOY) resolution of meteorological elements (max temp, snow, etc) from 1763 to present with updates weekly. The major parameters are: monthly mean maximum, mean minimum and mean temperatures; monthly total precipitation and snowfall; departure from normal of the mean temperature and total precipitation; monthly heating and cooling degree days; number of days that temperatures and precipitation are above or below certain thresholds; and extreme daily temperature and precipitation amounts. The primary source data set source is the Global Historical Climatology Network (GHCN)-Daily Data set. The global summaries data set also contains a monthly (GSOM) resolution of meteorological elements. See associated resources for more information. This data is not to be confused with "GHCN-Monthly", "Annual Summaries" or "NCDC Summary of the Month". There are unique elements that are produced globally within the GSOM and GSOY data files. There are also bias corrected temperature data in GHCN-Monthly, which will not be available in GSOM and GSOY. The GSOM and GSOY data set is going to replace the legacy DSI-3220 and expand to include non-U.S. (a.k.a. global) stations. DSI-3220 only included National Weather Service (NWS) COOP Published, or "Published in CD", sites.
u
NCEP Re-analysis Monthly Mean Data 2001-2004 for SBI Domain (Matlab) [NCEP]
data.ucar.edu
arcticdata.io
+1more
matlab
Updated Oct 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kent Moore (2025). NCEP Re-analysis Monthly Mean Data 2001-2004 for SBI Domain (Matlab) [NCEP] [Dataset]. http://doi.org/10.5065/D6ZK5DR6
Explore at:
matlabAvailable download formats
Unique identifier
https://doi.org/10.5065/D6ZK5DR6
Dataset updated
Oct 7, 2025
Authors
Kent Moore
Time period covered
Jan 1, 2001 - Oct 31, 2004
Area covered

Description
This data set contains National Centers for Environmental Prediction (NCEP) re-analysis monthly mean data 2001-2004 for the SBI domain in Matlab format.
Data summary and estimated mean numbers of infections.
plos.figshare.com
xls
Updated Jun 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amanda Ross; Cristian Koepfli; Xiaohong Li; Sonja Schoepflin; Peter Siba; Ivo Mueller; Ingrid Felger; Thomas Smith (2023). Data summary and estimated mean numbers of infections. [Dataset]. http://doi.org/10.1371/journal.pone.0042496.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0042496.t002
Dataset updated
Jun 10, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Amanda Ross; Cristian Koepfli; Xiaohong Li; Sonja Schoepflin; Peter Siba; Ivo Mueller; Ingrid Felger; Thomas Smith
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
‡The expected heterozygosity is the probability that 2 clones taken at random from the population carry different alleles.
GLO climate data stats summary
researchdata.edu.au
data.wu.ac.at
Updated May 6, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bioregional Assessment Program (2016). GLO climate data stats summary [Dataset]. https://researchdata.edu.au/glo-climate-stats-summary/2992384
Explore at:
Dataset updated
May 6, 2016
Dataset provided by
Data.govhttps://data.gov/
Authors
Bioregional Assessment Program
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract

The dataset was derived by the Bioregional Assessment Programme from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.

Various climate variables summary for all 15 subregions based on Bureau of Meteorology Australian Water Availability Project (BAWAP) climate grids. Including

Time series mean annual BAWAP rainfall from 1900 - 2012.

Long term average BAWAP rainfall and Penman Potentail Evapotranspiration (PET) from Jan 1981 - Dec 2012 for each month

Values calculated over the years 1981 - 2012 (inclusive), for 17 time periods (i.e., annual, 4 seasons and 12 months) for the following 8 meteorological variables: (i) BAWAP_P (precipitation); (ii) Penman ETp; (iii) Tavg (average temperature); (iv) Tmax (maximum temperature); (v) Tmin (minimum temperature); (vi) VPD (Vapour Pressure Deficit); (vii) Rn (net radiation); and (viii) Wind speed. For each of the 17 time periods for each of the 8 meteorological variables have calculated the: (a) average; (b) maximum; (c) minimum; (d) average plus standard deviation (stddev); (e) average minus stddev; (f) stddev; and (g) trend.

Correlation coefficients (-1 to 1) between rainfall and 4 remote rainfall drivers between 1957-2006 for the four seasons. The data and methodology are described in Risbey et al. (2009).

As described in the Risbey et al. (2009) paper, the rainfall was from 0.05 degree gridded data described in Jeffrey et al. (2001 - known as the SILO datasets); sea surface temperature was from the Hadley Centre Sea Ice and Sea Surface Temperature dataset (HadISST) on a 1 degree grid. BLK=Blocking; DMI=Dipole Mode Index; SAM=Southern Annular Mode; SOI=Southern Oscillation Index; DJF=December, January, February; MAM=March, April, May; JJA=June, July, August; SON=September, October, November. The analysis is a summary of Fig. 15 of Risbey et al. (2009).

There are 4 csv files here:

BAWAP_P_annual_BA_SYB_GLO.csv

Desc: Time series mean annual BAWAP rainfall from 1900 - 2012.

Source data: annual BILO rainfall

P_PET_monthly_BA_SYB_GLO.csv

long term average BAWAP rainfall and Penman PET from 198101 - 201212 for each month

Climatology_Trend_BA_SYB_GLO.csv

Values calculated over the years 1981 - 2012 (inclusive), for 17 time periods (i.e., annual, 4 seasons and 12 months) for the following 8 meteorological variables: (i) BAWAP_P; (ii) Penman ETp; (iii) Tavg; (iv) Tmax; (v) Tmin; (vi) VPD; (vii) Rn; and (viii) Wind speed. For each of the 17 time periods for each of the 8 meteorological variables have calculated the: (a) average; (b) maximum; (c) minimum; (d) average plus standard deviation (stddev); (e) average minus stddev; (f) stddev; and (g) trend

Risbey_Remote_Rainfall_Drivers_Corr_Coeffs_BA_NSB_GLO.csv

Correlation coefficients (-1 to 1) between rainfall and 4 remote rainfall drivers between 1957-2006 for the four seasons. The data and methodology are described in Risbey et al. (2009). As described in the Risbey et al. (2009) paper, the rainfall was from 0.05 degree gridded data described in Jeffrey et al. (2001 - known as the SILO datasets); sea surface temperature was from the Hadley Centre Sea Ice and Sea Surface Temperature dataset (HadISST) on a 1 degree grid. BLK=Blocking; DMI=Dipole Mode Index; SAM=Southern Annular Mode; SOI=Southern Oscillation Index; DJF=December, January, February; MAM=March, April, May; JJA=June, July, August; SON=September, October, November. The analysis is a summary of Fig. 15 of Risbey et al. (2009).

Dataset History

Dataset was created from various BAWAP source data, including Monthly BAWAP rainfall, Tmax, Tmin, VPD, etc, and other source data including monthly Penman PET, Correlation coefficient data. Data were extracted from national datasets for the GLO subregion.

BAWAP_P_annual_BA_SYB_GLO.csv

Desc: Time series mean annual BAWAP rainfall from 1900 - 2012.

Source data: annual BILO rainfall

P_PET_monthly_BA_SYB_GLO.csv

long term average BAWAP rainfall and Penman PET from 198101 - 201212 for each month

Climatology_Trend_BA_SYB_GLO.csv

Values calculated over the years 1981 - 2012 (inclusive), for 17 time periods (i.e., annual, 4 seasons and 12 months) for the following 8 meteorological variables: (i) BAWAP_P; (ii) Penman ETp; (iii) Tavg; (iv) Tmax; (v) Tmin; (vi) VPD; (vii) Rn; and (viii) Wind speed. For each of the 17 time periods for each of the 8 meteorological variables have calculated the: (a) average; (b) maximum; (c) minimum; (d) average plus standard deviation (stddev); (e) average minus stddev; (f) stddev; and (g) trend

Risbey_Remote_Rainfall_Drivers_Corr_Coeffs_BA_NSB_GLO.csv

Correlation coefficients (-1 to 1) between rainfall and 4 remote rainfall drivers between 1957-2006 for the four seasons. The data and methodology are described in Risbey et al. (2009). As described in the Risbey et al. (2009) paper, the rainfall was from 0.05 degree gridded data described in Jeffrey et al. (2001 - known as the SILO datasets); sea surface temperature was from the Hadley Centre Sea Ice and Sea Surface Temperature dataset (HadISST) on a 1 degree grid. BLK=Blocking; DMI=Dipole Mode Index; SAM=Southern Annular Mode; SOI=Southern Oscillation Index; DJF=December, January, February; MAM=March, April, May; JJA=June, July, August; SON=September, October, November. The analysis is a summary of Fig. 15 of Risbey et al. (2009).

Dataset Citation

Bioregional Assessment Programme (2014) GLO climate data stats summary. Bioregional Assessment Derived Dataset. Viewed 18 July 2018, http://data.bioregionalassessments.gov.au/dataset/afed85e0-7819-493d-a847-ec00a318e657.

Dataset Ancestors

Derived From Natural Resource Management (NRM) Regions 2010

Derived From Bioregional Assessment areas v03

Derived From BILO Gridded Climate Data: Daily Climate Data for each year from 1900 to 2012

Derived From Bioregional Assessment areas v01

Derived From Bioregional Assessment areas v02

Derived From GEODATA TOPO 250K Series 3

Derived From NSW Catchment Management Authority Boundaries 20130917

Derived From Geological Provinces - Full Extent

Derived From GEODATA TOPO 250K Series 3, File Geodatabase format (.gdb)

🌆 City Lifestyle Segmentation Dataset

kaggle.com

zip

Updated Nov 15, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

UmutUygurr (2025). 🌆 City Lifestyle Segmentation Dataset [Dataset]. https://www.kaggle.com/datasets/umuttuygurr/city-lifestyle-segmentation-dataset

Explore at:

zip(11274 bytes)Available download formats

Dataset updated

Nov 15, 2025

Authors

UmutUygurr

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22121490%2F7189944f8fc292a094c90daa799d08ca%2FChatGPT%20Image%2015%20Kas%202025%2014_07_37.png?generation=1763204959770660&alt=media" alt="">

🌆 About This Dataset

This synthetic dataset simulates 300 global cities across 6 major geographic regions, designed specifically for unsupervised machine learning and clustering analysis. It explores how economic status, environmental quality, infrastructure, and digital access shape urban lifestyles worldwide.

🎯 Perfect For:

📊 K-Means, DBSCAN, Agglomerative Clustering
🔬 PCA & t-SNE Dimensionality Reduction
🗺️ Geospatial Visualization (Plotly, Folium)
📈 Correlation Analysis & Feature Engineering
🎓 Educational Projects (Beginner to Intermediate)

📦 What's Inside?

Feature	Description	Range
10 Features	Economic, environmental & social indicators	Realistically scaled
300 Cities	Europe, Asia, Americas, Africa, Oceania	Diverse distributions
Strong Correlations	Income ↔ Rent (+0.8), Density ↔ Pollution (+0.6)	ML-ready
No Missing Values	Clean, preprocessed data	Ready for analysis
4-5 Natural Clusters	Metropolitan hubs, eco-towns, developing centers	Pre-validated

🔥 Key Features

✅ Realistic Correlations: Income strongly predicts rent (+0.8), internet access (+0.7), and happiness (+0.6)
✅ Regional Diversity: Each region has distinct economic and environmental characteristics
✅ Clustering-Ready: Naturally separable into 4-5 lifestyle archetypes
✅ Beginner-Friendly: No data cleaning required, includes example code
✅ Documented: Comprehensive README with methodology and use cases

🚀 Quick Start Example

import pandas as pd
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

# Load and prepare
df = pd.read_csv('city_lifestyle_dataset.csv')
X = df.drop(['city_name', 'country'], axis=1)
X_scaled = StandardScaler().fit_transform(X)

# Cluster
kmeans = KMeans(n_clusters=5, random_state=42)
df['cluster'] = kmeans.fit_predict(X_scaled)

# Analyze
print(df.groupby('cluster').mean())

🎓 Learning Outcomes

After working with this dataset, you will be able to: 1. Apply K-Means, DBSCAN, and Hierarchical Clustering 2. Use PCA for dimensionality reduction and visualization 3. Interpret correlation matrices and feature relationships 4. Create geographic visualizations with cluster assignments 5. Profile and name discovered clusters based on characteristics

📚 Ideal For These Projects

🏆 Kaggle Competitions: Practice clustering techniques
📝 Academic Projects: Urban planning, sociology, environmental science
💼 Portfolio Work: Showcase ML skills to employers
🎓 Learning: Hands-on practice with unsupervised learning
🔬 Research: Urban lifestyle segmentation studies

🌍 Expected Clusters

Cluster	Characteristics	Example Cities
Metropolitan Tech Hubs	High income, density, rent	Silicon Valley, Singapore
Eco-Friendly Towns	Low density, clean air, high happiness	Nordic cities
Developing Centers	Mid income, high density, poor air	Emerging markets
Low-Income Suburban	Low infrastructure, income	Rural areas
Industrial Mega-Cities	Very high density, pollution	Manufacturing hubs

🛠️ Technical Details

Format: CSV (UTF-8)
Size: ~300 rows × 10 columns
Missing Values: 0%
Data Types: 2 categorical, 8 numerical
Target Variable: None (unsupervised)
Correlation Strength: Pre-validated (r: 0.4 to 0.8)

📖 What Makes This Dataset Special?

Unlike random synthetic data, this dataset was carefully engineered with: - ✨ Realistic correlation structures based on urban research - 🌍 Regional characteristics matching real-world patterns - 🎯 Optimal cluster separability (validated via silhouette scores) - 📚 Comprehensive documentation and starter code

🏅 Use This Dataset If You Want To:

✓ Learn clustering without data cleaning hassles
✓ Practice PCA and dimensionality reduction
✓ Create beautiful geographic visualizations
✓ Understand feature correlation in real-world contexts
✓ Build a portfolio project with clear business insights

📊 Acknowledgments

This dataset was designed for educational purposes in machine learning and data science. While synthetic, it reflects real patterns observed in global urban development research.

Happy Clustering! 🎉

Facebook

Twitter

Click to copy link

Link copied

Cite

National Oceanic and Atmospheric Administration (2023). Global Summary of the Month, version 1.0 [Dataset]. https://data.cnra.ca.gov/dataset/global-summary-of-the-month-version-1-0

Global Summary of the Month, version 1.0

Explore at:

10 scholarly articles cite this dataset (View in Google Scholar)

pdf, csv, kmzAvailable download formats

Dataset updated

Mar 1, 2023

Dataset authored and provided by

National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/

Description

The global summaries data set contains a monthly (GSOM) resolution of meteorological elements (max temp, snow, etc) from 1763 to present with updates weekly. The major parameters are: monthly mean maximum, mean minimum and mean temperatures; monthly total precipitation and snowfall; departure from normal of the mean temperature and total precipitation; monthly heating and cooling degree days; number of days that temperatures and precipitation are above or below certain thresholds; and extreme daily temperature and precipitation amounts. The primary source data set source is the Global Historical Climatology Network (GHCN)-Daily Data set. The global summaries data set also contains a yearly (GSOY) resolution of meteorological elements. See associated resources for more information. This data is not to be confused with "GHCN-Monthly", "Annual Summaries" or "NCDC Summary of the Month". There are unique elements that are produced globally within the GSOM and GSOY data files. There are also bias corrected temperature data in GHCN-Monthly, which will not be available in GSOM and GSOY. The GSOM and GSOY data set is going to replace the legacy DSI-3220 and expand to include non-U.S. (a.k.a. global) stations. DSI-3220 only included National Weather Service (NWS) COOP Published, or "Published in CD", sites.

Clear search

Close search

Google apps

Main menu

Global Summary of the Month, version 1.0

Summary data file containing play mode, median and mean of speech length in...

Guidelines for Computing Summary Statistics for Data-Sets Containing...

Global Summary of the Month (GSOM), Version 1

Summary statistics by region

Human Values, Purpose & Meaning: data summary (2025)

Summary for Policymakers of the Working Group I Contribution to the IPCC...

Data Set of Extracted Summary Statistics from Equipment Sensor Data

NOAA Global Surface Summary of Day

Data from: Global Summary of the Year (GSOY), Version 1

Data from: A dataset to model Levantine landcover and land-use change...

National Energy Efficiency Data-Framework (NEED) report: summary of analysis...

11 August 2023 Error notice: revisions to the June 2021 Domestic NEED annual report

Error 1: Local authority consumption estimates

4 August 2021 Error notice: revisions to the June 2021 Domestic NEED annual report

Error 1: Some properties incorrectly excluded from the 2019 gas consumption estimates

Poor statistical reporting, inadequate data presentation and spin persist...

N Summary from the United States Air Force

CAFE (Corporate Average Fuel Economy) - Summary of CAFE Civil Penalties...

Global Summary of the Year, version 1.0

NCEP Re-analysis Monthly Mean Data 2001-2004 for SBI Domain (Matlab) [NCEP]

Data summary and estimated mean numbers of infections.

GLO climate data stats summary

Abstract

Dataset History

Dataset Citation

Dataset Ancestors

🌆 City Lifestyle Segmentation Dataset

🌆 About This Dataset

🎯 Perfect For:

📦 What's Inside?

🔥 Key Features

🚀 Quick Start Example

🎓 Learning Outcomes

📚 Ideal For These Projects

🌍 Expected Clusters

🛠️ Technical Details

📖 What Makes This Dataset Special?

🏅 Use This Dataset If You Want To:

📊 Acknowledgments

Global Summary of the Month, version 1.0See More Versions

Global Summary of the Month, version 1.0