100+ datasets found
  1. Global Summary of the Month, version 1.0

    • data.cnra.ca.gov
    • data.wu.ac.at
    csv, kmz, pdf
    Updated Mar 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Oceanic and Atmospheric Administration (2023). Global Summary of the Month, version 1.0 [Dataset]. https://data.cnra.ca.gov/dataset/global-summary-of-the-month-version-1-0
    Explore at:
    pdf, csv, kmzAvailable download formats
    Dataset updated
    Mar 1, 2023
    Dataset authored and provided by
    National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
    Description

    The global summaries data set contains a monthly (GSOM) resolution of meteorological elements (max temp, snow, etc) from 1763 to present with updates weekly. The major parameters are: monthly mean maximum, mean minimum and mean temperatures; monthly total precipitation and snowfall; departure from normal of the mean temperature and total precipitation; monthly heating and cooling degree days; number of days that temperatures and precipitation are above or below certain thresholds; and extreme daily temperature and precipitation amounts. The primary source data set source is the Global Historical Climatology Network (GHCN)-Daily Data set. The global summaries data set also contains a yearly (GSOY) resolution of meteorological elements. See associated resources for more information. This data is not to be confused with "GHCN-Monthly", "Annual Summaries" or "NCDC Summary of the Month". There are unique elements that are produced globally within the GSOM and GSOY data files. There are also bias corrected temperature data in GHCN-Monthly, which will not be available in GSOM and GSOY. The GSOM and GSOY data set is going to replace the legacy DSI-3220 and expand to include non-U.S. (a.k.a. global) stations. DSI-3220 only included National Weather Service (NWS) COOP Published, or "Published in CD", sites.

  2. f

    Summary data file containing play mode, median and mean of speech length in...

    • datasetcatalog.nlm.nih.gov
    Updated Apr 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Craig, Hugh; Colyvas, Kim; Egan, Gabriel (2023). Summary data file containing play mode, median and mean of speech length in words, with play metadata. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001001019
    Explore at:
    Dataset updated
    Apr 21, 2023
    Authors
    Craig, Hugh; Colyvas, Kim; Egan, Gabriel
    Description

    Summary data file containing play mode, median and mean of speech length in words, with play metadata.

  3. b

    Guidelines for Computing Summary Statistics for Data-Sets Containing...

    • datahub.bvcentre.ca
    Updated Jun 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Guidelines for Computing Summary Statistics for Data-Sets Containing Non-Detects - Dataset - BVRC DataHub [Dataset]. https://datahub.bvcentre.ca/dataset/guidelines-for-computing-summary-statistics-for-data-sets-containing-non-detects
    Explore at:
    Dataset updated
    Jun 3, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    INTRODUCTION As part of its responsibilities, the BC Ministry of Environment monitors water quality in the province’s streams, rivers, and lakes. Often, it is necessary to compile statistics involving concentrations of contaminants or other compounds. Quite often the instruments used cannot measure concentrations below certain values. These observations are called non-detects or less thans. However, non-detects pose a difficulty when it is necessary to compute statistical measurements such as the mean, the median, and the standard deviation for a data set. The way non-detects are handled can affect the quality of any statistics generated. Non-detects, or censored data are found in many fields such as medicine, engineering, biology, and environmetrics. In such fields, it is often the case that the measurements of interest are below some threshold. Dealing with non-detects is a significant issue and statistical tools using survival or reliability methods have been developed. Basically, there are three approaches for treating data containing censored values: 1. substitution, which gives poor results and therefore, is not recommended in the literature; 2. maximum likelihood estimation, which requires an assumption of some distributional form; and 3. and nonparametric methods which assess the shape of the data based on observed percentiles rather than a strict distributional form. This document provides guidance on how to record censor data, and on when and how to use certain analysis methods when the percentage of censored observations is less than 50%. The methods presented in this document are:1. substitution; 2. Kaplan-Meier, as part of nonparametric methods; 3. lognormal model based on maximum likelihood estimation; 4. and robust regression on order statistics, which is a semiparametric method. Statistical software suitable for survival or reliability analysis is available for dealing with censored data. This software has been widely used in medical and engineering environments. In this document, methods are illustrated with both R and JMP software packages, when possible. JMP often requires some intermediate steps to obtain summary statistics with most of the methods described in this document. R, with the NADA package is usually straightforward. The package NADA was developed specifically for computing statistics with non-detects in environmental data based on Helsel (2005b). The data used to illustrate the methods described for computing summary statistics for non-detects are either simulated or based on information acquired from the B.C. Ministry of Environment. This document is strongly based on the book Nondetects And Data Analysis written by Dennis R. Helsel in 2005 (Helsel, 2005b).

  4. Global Summary of the Month (GSOM), Version 1

    • catalog.data.gov
    • s.cnmilf.com
    • +1more
    Updated Sep 19, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NOAA National Centers for Environmental Information (Point of Contact) (2023). Global Summary of the Month (GSOM), Version 1 [Dataset]. https://catalog.data.gov/dataset/global-summary-of-the-month-gsom-version-12
    Explore at:
    Dataset updated
    Sep 19, 2023
    Dataset provided by
    National Centers for Environmental Informationhttps://www.ncei.noaa.gov/
    National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
    Description

    This Global Summaries dataset, known as GSOM for Monthly, contains a monthly resolution of meteorological elements from 1763 to present with updates applied weekly. The major parameters are: monthly mean maximum, mean minimum and mean temperatures; monthly total precipitation and snowfall; departure from normal of the mean temperature and total precipitation; monthly heating and cooling degree days; number of days that temperatures and precipitation are above or below certain thresholds; extreme daily temperature and precipitation amounts; number of days with fog; and number of days with thunderstorms. The primary input data source is the Global Historical Climatology Network - Daily (GHCN-Daily) dataset. The Global Summaries datasets also include a yearly resolution of meteorological elements in the GSOY (for Yearly) dataset. See associated resources for more information. These datasets are not to be confused with "GHCN-Monthly", "Annual Summaries" or "NCDC Summary of the Month". There are unique elements that are produced globally within the GSOM and GSOY data files. There are also bias corrected temperature data in GHCN-Monthly, which are not available in GSOM and GSOY. The GSOM and GSOY datasets replace the legacy U.S. COOP Summaries (DSI-3220), and have been expanded to include non-U.S. (global) stations. U.S. COOP Summaries (DSI-3220) only includes National Weather Service (NWS) COOP Published, or "Published in CD", sites.

  5. d

    Summary statistics by region

    • catalog.data.gov
    • data.usgs.gov
    • +2more
    Updated Nov 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Summary statistics by region [Dataset]. https://catalog.data.gov/dataset/summary-statistics-by-region
    Explore at:
    Dataset updated
    Nov 19, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    This dataset tabulates summary statistics for estimates of amounts of copper in undiscovered porphyry copper deposits by world region and for the globe. Data re reported by region, by aggregation method (randomized and sorted), and by selected statistics. These include quantiles, means, standard deviations, standard error of the mean , and Upper and Lower 95% of the mean.

  6. h

    Human Values, Purpose & Meaning: data summary (2025)

    • humanclarityinstitute.com
    Updated Nov 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Human Clarity Institute (2025). Human Values, Purpose & Meaning: data summary (2025) [Dataset]. https://humanclarityinstitute.com/data/human-values-purpose-meaning-data-2025/
    Explore at:
    Dataset updated
    Nov 25, 2025
    Dataset provided by
    Human Clarity Institute
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This page summarises how people understand their core values, how meaningful life feels, and how digital noise and AI shape decisions, focus and long-term direction.

  7. Summary for Policymakers of the Working Group I Contribution to the IPCC...

    • catalogue.ceda.ac.uk
    • data-search.nerc.ac.uk
    Updated Mar 9, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erich Fischer; Mathias Hauser (2024). Summary for Policymakers of the Working Group I Contribution to the IPCC Sixth Assessment Report - data for Figure SPM.5 (v20210809) [Dataset]. https://catalogue.ceda.ac.uk/uuid/2787230b963942009e452255a3880609
    Explore at:
    Dataset updated
    Mar 9, 2024
    Dataset provided by
    Centre for Environmental Data Analysishttp://www.ceda.ac.uk/
    Authors
    Erich Fischer; Mathias Hauser
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1850 - Dec 31, 2100
    Area covered
    Earth
    Variables measured
    latitude, longitude
    Description

    Data for Figure SPM.5 from the Summary for Policymakers (SPM) of the Working Group I (WGI) Contribution to the Intergovernmental Panel on Climate Change (IPCC) Sixth Assessment Report (AR6).

    Figure SPM.5 shows changes in annual mean surface temperatures, precipitation, and total column soil moisture.

    How to cite this dataset

    When citing this dataset, please include both the data citation below (under 'Citable as') and the following citation for the report component from which the figure originates:

    IPCC, 2021: Summary for Policymakers. In: Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change [Masson-Delmotte, V., P. Zhai, A. Pirani, S.L. Connors, C. Péan, S. Berger, N. Caud, Y. Chen, L. Goldfarb, M.I. Gomis, M. Huang, K. Leitzell, E. Lonnoy, J.B.R. Matthews, T.K. Maycock, T. Waterfield, O. Yelekçi, R. Yu, and B. Zhou (eds.)]. Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA, pp. 3−32, doi:10.1017/9781009157896.001.

    Figure subpanels

    The figure has four panels with 11 maps. All data is provided, except for panel a1.

    List of data provided

    This dataset contains:

    • Annual mean temperature change (°C) (relative to 1850-1900)
    • Annual mean precipitation change (%) (relative to 1850-1900)
    • Annual mean soil moisture change (standard deviation of interannual variability) (relative to 1850-1900)

    The data is given for global warming levels (GWLs), namely +1.0°C (temperature only), +1.5°C, 2.0°C, and +4.0°C.

    Data provided in relation to figure

    Panel a: - Data file: Panel_a2_Simulated_temperature_change_at_1C.nc, simulated annual mean temperature change (°C) at 1°C global warming relative to 1850-1900 (right).

    Panel b: - Data file: Panel_b1_Simulated_temperature_change_at_1_5C.nc, simulated annual mean temperature change (°C) at 1.5°C global warming relative to 1850-1900 (left). - Data file: Panel_b2_Simulated_temperature_change_at_2C.nc, simulated annual mean temperature change (°C) at 2.0°C global warming relative to 1850-1900 (center). - Data file: Panel_b3_Simulated_temperature_change_at_4C.nc, simulated annual mean temperature change (°C) at 4.0°C global warming relative to 1850-1900 (right).

    Panel c: - Data file: Panel_c1_Simulated_precipitation_change_at_1_5C.nc, simulated annual mean precipitation change (%) at 1.5°C global warming relative to 1850-1900 (left). - Data file: Panel_c2_Simulated_precipitation_change_at_2C.nc, simulated annual mean precipitation change (%) at 2.0°C global warming relative to 1850-1900 (center). - Data file: Panel_c3_Simulated_precipitation_change_at_4C.nc, simulated annual mean precipitation change (%) at 4.0°C global warming relative to 1850-1900 (right).

    Panel d: - Data file: Figure_SPM5_d1_cmip6_SM_tot_change_at_1_5C.nc, simulated annual mean total column soil moisture change (standard deviation) at 1.5°C global warming relative to 1850-1900 (left). - Data file: Figure_SPM5_d2_cmip6_SM_tot_change_at_2C.nc, simulated annual mean total column soil moisture change (standard deviation) at 2.0°C global warming relative to 1850-1900 (center). - Data file: Figure_SPM5_d3_cmip6_SM_tot_change_at_4C.nc, simulated annual mean total column soil moisture change (standard deviation) at 4.0°C global warming relative to 1850-1900 (right).

    Sources of additional information

    The following weblink is provided in the Related Documents section of this catalogue record:

    • Link to the report webpage, which includes the component containing the figure (Summary for Policymakers), the Technical Summary (Figures TS.3 and TS.5) and the Supplementary Material for Chapters 1, 4 and 11, which contains details on the input data used in Tables 1.SM.1 (Figure 1.14), 4.SM.1 (Figures 4.31 and 4.32) and 11.SM.9 (Figure 11.19).
  8. Data Set of Extracted Summary Statistics from Equipment Sensor Data

    • data.europa.eu
    • data.niaid.nih.gov
    unknown
    Updated Jan 24, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2021). Data Set of Extracted Summary Statistics from Equipment Sensor Data [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-4462777?locale=en
    Explore at:
    unknown(149706)Available download formats
    Dataset updated
    Jan 24, 2021
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data set was generated in accordance with the semiconductor industry and contains values of summary statistics from sensor recordings of the high-precision and high-tech production equipment. Basically, the semiconductor production consists of hundreds of process steps performing physical and chemical operations on so-called wafers, i.e. slices based on semiconductor material. In the production chain, each process equipment is equipped with several sensors recording physical parameters like gas flow, temperature, voltage, etc., resulting in so-called sensor data. Out of the sensor data, values of summary statistics are extracted. These are values like mean, standard deviation and gradients. To keep the entire production as stable as possible, these values are used to monitor the whole production in order to intervene in case of deviations. After the production, each device on the wafer is tested in the most careful way resulting in so-called wafer test data. In some cases, suspicious patterns occur in the wafer test data potentially leading to failure. In this case the root cause must be found in the production chain. For this purpose, the given data is provided. The aim is to find correlations between the wafer test data and the values of summary statistics in order to identify the root cause. The given data is divided into four data sets: "XTrain.csv", "YTrain.csv", "XTest.csv" and "YTest.csv". "XTrain.csv" and "XTest.csv" represent the values of summary statistics originating in the production chain separated for the purpose of training and validating a statistical model. Included are 114 observations of 77 parameters (values of summary statistics). The "YTrain.csv" and "YTest.csv" contain the corresponding wafer test data (144 observations of one parameter).

  9. NOAA Global Surface Summary of Day

    • registry.opendata.aws
    Updated Apr 20, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NOAA (2018). NOAA Global Surface Summary of Day [Dataset]. https://registry.opendata.aws/noaa-gsod/
    Explore at:
    Dataset updated
    Apr 20, 2018
    Dataset provided by
    National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
    Description

    Global Surface Summary of the Day is derived from The Integrated Surface Hourly (ISH) dataset. The ISH dataset includes global data obtained from the USAF Climatology Center, located in the Federal Climate Complex with NCDC. The latest daily summary data are normally available 1-2 days after the date-time of the observations used in the daily summaries. The online data files begin with 1929 and are at the time of this writing at the Version 8 software level. Over 9000 stations' data are typically available. The daily elements included in the dataset (as available from each station) are:
    Mean temperature (.1 Fahrenheit)
    Mean dew point (.1 Fahrenheit)
    Mean sea level pressure (.1 mb)
    Mean station pressure (.1 mb)
    Mean visibility (.1 miles)
    Mean wind speed (.1 knots)
    Maximum sustained wind speed (.1 knots)
    Maximum wind gust (.1 knots)
    Maximum temperature (.1 Fahrenheit)
    Minimum temperature (.1 Fahrenheit)
    Precipitation amount (.01 inches)
    Snow depth (.1 inches)
    Indicator for occurrence of: Fog, Rain or Drizzle, Snow or Ice Pellets, Hail, Thunder, Tornado/Funnel Cloud.

    Global summary of day data for 18 surface meteorological elements are derived from the synoptic/hourly observations contained in USAF DATSAV3 Surface data and Federal Climate Complex Integrated Surface Hourly (ISH). Historical data are generally available for 1929 to the present, with data from 1973 to the present being the most complete. For some periods, one or more countries' data may not be available due to data restrictions or communications problems. In deriving the summary of day data, a minimum of 4 observations for the day must be present (allows for stations which report 4 synoptic observations/day). Since the data are converted to constant units (e.g, knots), slight rounding error from the originally reported values may occur (e.g, 9.9 instead of 10.0). The mean daily values described below are based on the hours of operation for the station. For some stations/countries, the visibility will sometimes 'cluster' around a value (such as 10 miles) due to the practice of not reporting visibilities greater than certain distances. The daily extremes and totals--maximum wind gust, precipitation amount, and snow depth--will only appear if the station reports the data sufficiently to provide a valid value. Therefore, these three elements will appear less frequently than other values. Also, these elements are derived from the stations' reports during the day, and may comprise a 24-hour period which includes a portion of the previous day. The data are reported and summarized based on Greenwich Mean Time (GMT, 0000Z - 2359Z) since the original synoptic/hourly data are reported and based on GMT.

  10. Data from: Global Summary of the Year (GSOY), Version 1

    • catalog.data.gov
    • s.cnmilf.com
    Updated Sep 19, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NOAA National Centers for Environmental Information (Point of Contact) (2023). Global Summary of the Year (GSOY), Version 1 [Dataset]. https://catalog.data.gov/dataset/global-summary-of-the-year-gsoy-version-12
    Explore at:
    Dataset updated
    Sep 19, 2023
    Dataset provided by
    National Centers for Environmental Informationhttps://www.ncei.noaa.gov/
    National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
    Description

    This Global Summaries dataset, known as GSOY for Yearly, contains a yearly resolution of meteorological elements from 1763 to present with updates applied weekly. The major parameters are: – average annual temperature, average annual minimum and maximum temperatures; total annual precipitation and snowfall; departure from normal of the mean temperature and total precipitation; heating and cooling degree days; number of days that temperatures and precipitation are above or below certain thresholds; extreme annual minimum and maximum temperatures; number of days with fog; and number of days with thunderstorms. The primary input data source is the Global Historical Climatology Network - Daily (GHCN-Daily) dataset. The Global Summaries datasets also include a monthly resolution of meteorological elements in the GSOM (for Monthly) dataset. See associated resources for more information. These datasets are not to be confused with "GHCN-Monthly", "Annual Summaries" or "NCDC Summary of the Month". There are unique elements that are produced globally within the GSOM and GSOY data files. There are also bias corrected temperature data in GHCN-Monthly, which are not available in GSOM and GSOY. The GSOM and GSOY datasets replace the legacy U.S. COOP Summaries (DSI-3220), and have been expanded to include non-U.S. (global) stations. U.S. COOP Summaries (DSI-3220) only includes National Weather Service (NWS) COOP Published, or "Published in CD", sites.

  11. Data from: A dataset to model Levantine landcover and land-use change...

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Dec 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Kempf; Michael Kempf (2023). A dataset to model Levantine landcover and land-use change connected to climate change, the Arab Spring and COVID-19 [Dataset]. http://doi.org/10.5281/zenodo.10396148
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 16, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Michael Kempf; Michael Kempf
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 16, 2023
    Area covered
    Levant
    Description

    Overview

    This dataset is the repository for the following paper submitted to Data in Brief:

    Kempf, M. A dataset to model Levantine landcover and land-use change connected to climate change, the Arab Spring and COVID-19. Data in Brief (submitted: December 2023).

    The Data in Brief article contains the supplement information and is the related data paper to:

    Kempf, M. Climate change, the Arab Spring, and COVID-19 - Impacts on landcover transformations in the Levant. Journal of Arid Environments (revision submitted: December 2023).

    Description/abstract

    The Levant region is highly vulnerable to climate change, experiencing prolonged heat waves that have led to societal crises and population displacement. Since 2010, the area has been marked by socio-political turmoil, including the Syrian civil war and currently the escalation of the so-called Israeli-Palestinian Conflict, which strained neighbouring countries like Jordan due to the influx of Syrian refugees and increases population vulnerability to governmental decision-making. Jordan, in particular, has seen rapid population growth and significant changes in land-use and infrastructure, leading to over-exploitation of the landscape through irrigation and construction. This dataset uses climate data, satellite imagery, and land cover information to illustrate the substantial increase in construction activity and highlights the intricate relationship between climate change predictions and current socio-political developments in the Levant.

    Folder structure

    The main folder after download contains all data, in which the following subfolders are stored are stored as zipped files:

    “code” stores the above described 9 code chunks to read, extract, process, analyse, and visualize the data.

    “MODIS_merged” contains the 16-days, 250 m resolution NDVI imagery merged from three tiles (h20v05, h21v05, h21v06) and cropped to the study area, n=510, covering January 2001 to December 2022 and including January and February 2023.

    “mask” contains a single shapefile, which is the merged product of administrative boundaries, including Jordan, Lebanon, Israel, Syria, and Palestine (“MERGED_LEVANT.shp”).

    “yield_productivity” contains .csv files of yield information for all countries listed above.

    “population” contains two files with the same name but different format. The .csv file is for processing and plotting in R. The .ods file is for enhanced visualization of population dynamics in the Levant (Socio_cultural_political_development_database_FAO2023.ods).

    “GLDAS” stores the raw data of the NASA Global Land Data Assimilation System datasets that can be read, extracted (variable name), and processed using code “8_GLDAS_read_extract_trend” from the respective folder. One folder contains data from 1975-2022 and a second the additional January and February 2023 data.

    “built_up” contains the landcover and built-up change data from 1975 to 2022. This folder is subdivided into two subfolder which contain the raw data and the already processed data. “raw_data” contains the unprocessed datasets and “derived_data” stores the cropped built_up datasets at 5 year intervals, e.g., “Levant_built_up_1975.tif”.

    Code structure

    1_MODIS_NDVI_hdf_file_extraction.R


    This is the first code chunk that refers to the extraction of MODIS data from .hdf file format. The following packages must be installed and the raw data must be downloaded using a simple mass downloader, e.g., from google chrome. Packages: terra. Download MODIS data from after registration from: https://lpdaac.usgs.gov/products/mod13q1v061/ or https://search.earthdata.nasa.gov/search (MODIS/Terra Vegetation Indices 16-Day L3 Global 250m SIN Grid V061, last accessed, 09th of October 2023). The code reads a list of files, extracts the NDVI, and saves each file to a single .tif-file with the indication “NDVI”. Because the study area is quite large, we have to load three different (spatially) time series and merge them later. Note that the time series are temporally consistent.


    2_MERGE_MODIS_tiles.R


    In this code, we load and merge the three different stacks to produce large and consistent time series of NDVI imagery across the study area. We further use the package gtools to load the files in (1, 2, 3, 4, 5, 6, etc.). Here, we have three stacks from which we merge the first two (stack 1, stack 2) and store them. We then merge this stack with stack 3. We produce single files named NDVI_final_*consecutivenumber*.tif. Before saving the final output of single merged files, create a folder called “merged” and set the working directory to this folder, e.g., setwd("your directory_MODIS/merged").


    3_CROP_MODIS_merged_tiles.R


    Now we want to crop the derived MODIS tiles to our study area. We are using a mask, which is provided as .shp file in the repository, named "MERGED_LEVANT.shp". We load the merged .tif files and crop the stack with the vector. Saving to individual files, we name them “NDVI_merged_clip_*consecutivenumber*.tif. We now produced single cropped NDVI time series data from MODIS.
    The repository provides the already clipped and merged NDVI datasets.


    4_TREND_analysis_NDVI.R


    Now, we want to perform trend analysis from the derived data. The data we load is tricky as it contains 16-days return period across a year for the period of 22 years. Growing season sums contain MAM (March-May), JJA (June-August), and SON (September-November). December is represented as a single file, which means that the period DJF (December-February) is represented by 5 images instead of 6. For the last DJF period (December 2022), the data from January and February 2023 can be added. The code selects the respective images from the stack, depending on which period is under consideration. From these stacks, individual annually resolved growing season sums are generated and the slope is calculated. We can then extract the p-values of the trend and characterize all values with high confidence level (0.05). Using the ggplot2 package and the melt function from reshape2 package, we can create a plot of the reclassified NDVI trends together with a local smoother (LOESS) of value 0.3.
    To increase comparability and understand the amplitude of the trends, z-scores were calculated and plotted, which show the deviation of the values from the mean. This has been done for the NDVI values as well as the GLDAS climate variables as a normalization technique.


    5_BUILT_UP_change_raster.R


    Let us look at the landcover changes now. We are working with the terra package and get raster data from here: https://ghsl.jrc.ec.europa.eu/download.php?ds=bu (last accessed 03. March 2023, 100 m resolution, global coverage). Here, one can download the temporal coverage that is aimed for and reclassify it using the code after cropping to the individual study area. Here, I summed up different raster to characterize the built-up change in continuous values between 1975 and 2022.


    6_POPULATION_numbers_plot.R


    For this plot, one needs to load the .csv-file “Socio_cultural_political_development_database_FAO2023.csv” from the repository. The ggplot script provided produces the desired plot with all countries under consideration.


    7_YIELD_plot.R


    In this section, we are using the country productivity from the supplement in the repository “yield_productivity” (e.g., "Jordan_yield.csv". Each of the single country yield datasets is plotted in a ggplot and combined using the patchwork package in R.


    8_GLDAS_read_extract_trend


    The last code provides the basis for the trend analysis of the climate variables used in the paper. The raw data can be accessed https://disc.gsfc.nasa.gov/datasets?keywords=GLDAS%20Noah%20Land%20Surface%20Model%20L4%20monthly&page=1 (last accessed 9th of October 2023). The raw data comes in .nc file format and various variables can be extracted using the [“^a variable name”] command from the spatraster collection. Each time you run the code, this variable name must be adjusted to meet the requirements for the variables (see this link for abbreviations: https://disc.gsfc.nasa.gov/datasets/GLDAS_CLSM025_D_2.0/summary, last accessed 09th of October 2023; or the respective code chunk when reading a .nc file with the ncdf4 package in R) or run print(nc) from the code or use names(the spatraster collection).
    Choosing one variable, the code uses the MERGED_LEVANT.shp mask from the repository to crop and mask the data to the outline of the study area.
    From the processed data, trend analysis are conducted and z-scores were calculated following the code described above. However, annual trends require the frequency of the time series analysis to be set to value = 12. Regarding, e.g., rainfall, which is measured as annual sums and not means, the chunk r.sum=r.sum/12 has to be removed or set to r.sum=r.sum/1 to avoid calculating annual mean values (see other variables). Seasonal subset can be calculated as described in the code. Here, 3-month subsets were chosen for growing seasons, e.g. March-May (MAM), June-July (JJA), September-November (SON), and DJF (December-February, including Jan/Feb of the consecutive year).
    From the data, mean values of 48 consecutive years are calculated and trend analysis are performed as describe above. In the same way, p-values are extracted and 95 % confidence level values are marked with dots on the raster plot. This analysis can be performed with a much longer time series, other variables, ad different spatial extent across the globe due to the availability of the GLDAS variables.

  12. National Energy Efficiency Data-Framework (NEED) report: summary of analysis...

    • gov.uk
    • s3.amazonaws.com
    Updated Aug 11, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department for Business, Energy & Industrial Strategy (2023). National Energy Efficiency Data-Framework (NEED) report: summary of analysis 2021 [Dataset]. https://www.gov.uk/government/statistics/national-energy-efficiency-data-framework-need-report-summary-of-analysis-2021
    Explore at:
    Dataset updated
    Aug 11, 2023
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Department for Business, Energy & Industrial Strategy
    Description

    The National Energy Efficiency Data-Framework (NEED) was set up to provide a better understanding of energy use and energy efficiency in domestic and non-domestic buildings in Great Britain. The data framework matches data about a property together - including energy consumption and energy efficiency measures installed - at household level.

    11 August 2023 Error notice: revisions to the June 2021 Domestic NEED annual report

    We identified 2 processing errors in this edition of the Domestic NEED Annual report and corrected them. The changes are small and do not affect the overall findings of the report, only the domestic energy consumption estimates. The revisions are summarised here:

    Error 1: Local authority consumption estimates

    Error 2: Some properties incorrectly excluded from the Scotland multiple attributes tables

    • Extent of the error: These corrections primarily affect the number in sample column for all years as some properties were incorrectly excluded from the consumption estimates. There have also been revisions to the mean, median, upper and lower quartiles. Using 2019 as an example, around 80% of the updated mean and median values are within 300 kWh of what was previously published.
    • Years affected: 2017-2019
    • Countries affected: Scotland
    • Data tables affected: Multiple attributes tables: Scotland, 2019 (all tables)

    4 August 2021 Error notice: revisions to the June 2021 Domestic NEED annual report

    We identified 2 processing errors in this edition of the Domestic NEED Annual report and corrected them. The changes are small and do not affect the overall findings of the report, only the domestic energy consumption estimates. The impact of energy efficiency measures analysis remains unchanged. The revisions are summarised here:

    Error 1: Some properties incorrectly excluded from the 2019 gas consumption estimates

    • Extent of the error: The properties that were incorrectly excluded made up around 1% of all properties that should have been included
    • Years affected: 2019
    • Countries affected: England and Wales, Scotland
    • Data table and documents affected:
  13. Poor statistical reporting, inadequate data presentation and spin persist...

    • plos.figshare.com
    zip
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joanna Diong; Annie A. Butler; Simon C. Gandevia; Martin E. Héroux (2023). Poor statistical reporting, inadequate data presentation and spin persist despite editorial advice [Dataset]. http://doi.org/10.1371/journal.pone.0202121
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Joanna Diong; Annie A. Butler; Simon C. Gandevia; Martin E. Héroux
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Journal of Physiology and British Journal of Pharmacology jointly published an editorial series in 2011 to improve standards in statistical reporting and data analysis. It is not known whether reporting practices changed in response to the editorial advice. We conducted a cross-sectional analysis of reporting practices in a random sample of research papers published in these journals before (n = 202) and after (n = 199) publication of the editorial advice. Descriptive data are presented. There was no evidence that reporting practices improved following publication of the editorial advice. Overall, 76-84% of papers with written measures that summarized data variability used standard errors of the mean, and 90-96% of papers did not report exact p-values for primary analyses and post-hoc tests. 76-84% of papers that plotted measures to summarize data variability used standard errors of the mean, and only 2-4% of papers plotted raw data used to calculate variability. Of papers that reported p-values between 0.05 and 0.1, 56-63% interpreted these as trends or statistically significant. Implied or gross spin was noted incidentally in papers before (n = 10) and after (n = 9) the editorial advice was published. Overall, poor statistical reporting, inadequate data presentation and spin were present before and after the editorial advice was published. While the scientific community continues to implement strategies for improving reporting practices, our results indicate stronger incentives or enforcements are needed.

  14. C

    N Summary from the United States Air Force

    • data.cnra.ca.gov
    • ncei.noaa.gov
    • +3more
    Updated May 9, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ocean Data Partners (2019). N Summary from the United States Air Force [Dataset]. https://data.cnra.ca.gov/dataset/n-summary-from-the-united-states-air-force
    Explore at:
    Dataset updated
    May 9, 2019
    Dataset authored and provided by
    Ocean Data Partners
    Area covered
    United States
    Description

    N Summaries are means and percentage frequency tabulations for stations around the world. There are approximately 2674 worldwide stations with five or more years data for which the N Summary was prepared. The summaries were calculated using surface weather observations collated by the United States Air Force. The parameters in an N Summary are one or more of the following monthly, seasonal, and annual tabulations: 1. Percentage frequency of surface winds (kts) by day, hour and month, to 16 points of the compass, 2. Percentage frequency of surface winds (kts) (seasonal and annual) to 16 points of the compass, 3. Precipitation amounts (in.), 4. Mean frequency of daily maximum temperature (Deg. F), mean maximum, and extreme maximum temperature (Deg. F), 5. Mean frequency of daily minimum temperature (Deg. F), mean minimum and extreme minimum temperature (Deg. F), and mean daily temperature range (Deg. F), 6. Mean number of days favorable for indicated military operations, 7. Miscellaneous data; mean number of days of occurrence of various weather phenomena, 8. Mean number of days with indicated total and low cloud amounts (oktas), 9. Percentage frequency of observations with low clouds (amount in 8ths, height in feet) and visibility (miles) reported, 10. Relative humidity means, 11. Percentage frequency distribution of wind speed (kts) and temperature (Deg. F), 12. Percentage frequency of visibility (miles) and various atmospheric phenomena, 13. Mean number of days with specified phenomena, 14. Mean cloudiness (%), 15. Snow depth (in.), 16. Percentage frequency of surface winds (kts) to 8 points of the compass (monthly), 17. Percentage frequency of surface winds (kts) to 8 points of the compass/ seasonal, 18. Sea level pressure (mb), means, and standard deviations. Some of the tabulations are for all hours of the day while others may be for each 3-hourly, 6-hourly, or 12-hourly segment of the day, and in some cases, for only one observation a day.

  15. CAFE (Corporate Average Fuel Economy) - Summary of CAFE Civil Penalties...

    • catalog.data.gov
    • data.transportation.gov
    • +1more
    Updated May 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Highway Traffic Safety Administration (2024). CAFE (Corporate Average Fuel Economy) - Summary of CAFE Civil Penalties Collected [Dataset]. https://catalog.data.gov/dataset/cafe-corporate-average-fuel-economy-summary-of-cafe-civil-penalties-collected
    Explore at:
    Dataset updated
    May 1, 2024
    Description

    NHTSA's Corporate Average Fuel Economy (CAFE) program requires manufacturers of passenger cars and light trucks, produced for sale in the U.S., to meet CAFE standards, expressed in miles per gallon (mpg). The purpose of the CAFE program is to reduce the nation's energy consumption by increasing the fuel economy of cars and light trucks. The CAFE Public Information Center (PIC) is the authoritative source for Corporate Average Fuel Economy (CAFE) program data. This site allows fuel economy data to be viewed in report and/or graph format. The data can be sorted and filtered to produce custom reports which can also be downloaded as Excel or pdf files. NHTSA periodically updates the CAFE data in the PIC and, therefore, each report and graph is date stamped to indicate the last time NHTSA made updates.

  16. Global Summary of the Year, version 1.0

    • data.cnra.ca.gov
    • data.wu.ac.at
    csv, kmz, pdf
    Updated Mar 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Oceanic and Atmospheric Administration (2023). Global Summary of the Year, version 1.0 [Dataset]. https://data.cnra.ca.gov/dataset/global-summary-of-the-year-version-1-0
    Explore at:
    pdf, csv, kmzAvailable download formats
    Dataset updated
    Mar 1, 2023
    Dataset authored and provided by
    National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
    Description

    The global summaries data set contains a yearly (GSOY) resolution of meteorological elements (max temp, snow, etc) from 1763 to present with updates weekly. The major parameters are: monthly mean maximum, mean minimum and mean temperatures; monthly total precipitation and snowfall; departure from normal of the mean temperature and total precipitation; monthly heating and cooling degree days; number of days that temperatures and precipitation are above or below certain thresholds; and extreme daily temperature and precipitation amounts. The primary source data set source is the Global Historical Climatology Network (GHCN)-Daily Data set. The global summaries data set also contains a monthly (GSOM) resolution of meteorological elements. See associated resources for more information. This data is not to be confused with "GHCN-Monthly", "Annual Summaries" or "NCDC Summary of the Month". There are unique elements that are produced globally within the GSOM and GSOY data files. There are also bias corrected temperature data in GHCN-Monthly, which will not be available in GSOM and GSOY. The GSOM and GSOY data set is going to replace the legacy DSI-3220 and expand to include non-U.S. (a.k.a. global) stations. DSI-3220 only included National Weather Service (NWS) COOP Published, or "Published in CD", sites.

  17. u

    NCEP Re-analysis Monthly Mean Data 2001-2004 for SBI Domain (Matlab) [NCEP]

    • data.ucar.edu
    • arcticdata.io
    • +1more
    matlab
    Updated Oct 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kent Moore (2025). NCEP Re-analysis Monthly Mean Data 2001-2004 for SBI Domain (Matlab) [NCEP] [Dataset]. http://doi.org/10.5065/D6ZK5DR6
    Explore at:
    matlabAvailable download formats
    Dataset updated
    Oct 7, 2025
    Authors
    Kent Moore
    Time period covered
    Jan 1, 2001 - Oct 31, 2004
    Area covered
    Description

    This data set contains National Centers for Environmental Prediction (NCEP) re-analysis monthly mean data 2001-2004 for the SBI domain in Matlab format.

  18. Data summary and estimated mean numbers of infections.

    • plos.figshare.com
    xls
    Updated Jun 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amanda Ross; Cristian Koepfli; Xiaohong Li; Sonja Schoepflin; Peter Siba; Ivo Mueller; Ingrid Felger; Thomas Smith (2023). Data summary and estimated mean numbers of infections. [Dataset]. http://doi.org/10.1371/journal.pone.0042496.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 10, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Amanda Ross; Cristian Koepfli; Xiaohong Li; Sonja Schoepflin; Peter Siba; Ivo Mueller; Ingrid Felger; Thomas Smith
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ‡The expected heterozygosity is the probability that 2 clones taken at random from the population carry different alleles.

  19. GLO climate data stats summary

    • researchdata.edu.au
    • data.wu.ac.at
    Updated May 6, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bioregional Assessment Program (2016). GLO climate data stats summary [Dataset]. https://researchdata.edu.au/glo-climate-stats-summary/2992384
    Explore at:
    Dataset updated
    May 6, 2016
    Dataset provided by
    Data.govhttps://data.gov/
    Authors
    Bioregional Assessment Program
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract

    The dataset was derived by the Bioregional Assessment Programme from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.

    Various climate variables summary for all 15 subregions based on Bureau of Meteorology Australian Water Availability Project (BAWAP) climate grids. Including

    1. Time series mean annual BAWAP rainfall from 1900 - 2012.

    2. Long term average BAWAP rainfall and Penman Potentail Evapotranspiration (PET) from Jan 1981 - Dec 2012 for each month

    3. Values calculated over the years 1981 - 2012 (inclusive), for 17 time periods (i.e., annual, 4 seasons and 12 months) for the following 8 meteorological variables: (i) BAWAP_P (precipitation); (ii) Penman ETp; (iii) Tavg (average temperature); (iv) Tmax (maximum temperature); (v) Tmin (minimum temperature); (vi) VPD (Vapour Pressure Deficit); (vii) Rn (net radiation); and (viii) Wind speed. For each of the 17 time periods for each of the 8 meteorological variables have calculated the: (a) average; (b) maximum; (c) minimum; (d) average plus standard deviation (stddev); (e) average minus stddev; (f) stddev; and (g) trend.

    4. Correlation coefficients (-1 to 1) between rainfall and 4 remote rainfall drivers between 1957-2006 for the four seasons. The data and methodology are described in Risbey et al. (2009).

    As described in the Risbey et al. (2009) paper, the rainfall was from 0.05 degree gridded data described in Jeffrey et al. (2001 - known as the SILO datasets); sea surface temperature was from the Hadley Centre Sea Ice and Sea Surface Temperature dataset (HadISST) on a 1 degree grid. BLK=Blocking; DMI=Dipole Mode Index; SAM=Southern Annular Mode; SOI=Southern Oscillation Index; DJF=December, January, February; MAM=March, April, May; JJA=June, July, August; SON=September, October, November. The analysis is a summary of Fig. 15 of Risbey et al. (2009).

    There are 4 csv files here:

    BAWAP_P_annual_BA_SYB_GLO.csv

    Desc: Time series mean annual BAWAP rainfall from 1900 - 2012.

    Source data: annual BILO rainfall

    P_PET_monthly_BA_SYB_GLO.csv

    long term average BAWAP rainfall and Penman PET from 198101 - 201212 for each month

    Climatology_Trend_BA_SYB_GLO.csv

    Values calculated over the years 1981 - 2012 (inclusive), for 17 time periods (i.e., annual, 4 seasons and 12 months) for the following 8 meteorological variables: (i) BAWAP_P; (ii) Penman ETp; (iii) Tavg; (iv) Tmax; (v) Tmin; (vi) VPD; (vii) Rn; and (viii) Wind speed. For each of the 17 time periods for each of the 8 meteorological variables have calculated the: (a) average; (b) maximum; (c) minimum; (d) average plus standard deviation (stddev); (e) average minus stddev; (f) stddev; and (g) trend

    Risbey_Remote_Rainfall_Drivers_Corr_Coeffs_BA_NSB_GLO.csv

    Correlation coefficients (-1 to 1) between rainfall and 4 remote rainfall drivers between 1957-2006 for the four seasons. The data and methodology are described in Risbey et al. (2009). As described in the Risbey et al. (2009) paper, the rainfall was from 0.05 degree gridded data described in Jeffrey et al. (2001 - known as the SILO datasets); sea surface temperature was from the Hadley Centre Sea Ice and Sea Surface Temperature dataset (HadISST) on a 1 degree grid. BLK=Blocking; DMI=Dipole Mode Index; SAM=Southern Annular Mode; SOI=Southern Oscillation Index; DJF=December, January, February; MAM=March, April, May; JJA=June, July, August; SON=September, October, November. The analysis is a summary of Fig. 15 of Risbey et al. (2009).

    Dataset History

    Dataset was created from various BAWAP source data, including Monthly BAWAP rainfall, Tmax, Tmin, VPD, etc, and other source data including monthly Penman PET, Correlation coefficient data. Data were extracted from national datasets for the GLO subregion.

    BAWAP_P_annual_BA_SYB_GLO.csv

    Desc: Time series mean annual BAWAP rainfall from 1900 - 2012.

    Source data: annual BILO rainfall

    P_PET_monthly_BA_SYB_GLO.csv

    long term average BAWAP rainfall and Penman PET from 198101 - 201212 for each month

    Climatology_Trend_BA_SYB_GLO.csv

    Values calculated over the years 1981 - 2012 (inclusive), for 17 time periods (i.e., annual, 4 seasons and 12 months) for the following 8 meteorological variables: (i) BAWAP_P; (ii) Penman ETp; (iii) Tavg; (iv) Tmax; (v) Tmin; (vi) VPD; (vii) Rn; and (viii) Wind speed. For each of the 17 time periods for each of the 8 meteorological variables have calculated the: (a) average; (b) maximum; (c) minimum; (d) average plus standard deviation (stddev); (e) average minus stddev; (f) stddev; and (g) trend

    Risbey_Remote_Rainfall_Drivers_Corr_Coeffs_BA_NSB_GLO.csv

    Correlation coefficients (-1 to 1) between rainfall and 4 remote rainfall drivers between 1957-2006 for the four seasons. The data and methodology are described in Risbey et al. (2009). As described in the Risbey et al. (2009) paper, the rainfall was from 0.05 degree gridded data described in Jeffrey et al. (2001 - known as the SILO datasets); sea surface temperature was from the Hadley Centre Sea Ice and Sea Surface Temperature dataset (HadISST) on a 1 degree grid. BLK=Blocking; DMI=Dipole Mode Index; SAM=Southern Annular Mode; SOI=Southern Oscillation Index; DJF=December, January, February; MAM=March, April, May; JJA=June, July, August; SON=September, October, November. The analysis is a summary of Fig. 15 of Risbey et al. (2009).

    Dataset Citation

    Bioregional Assessment Programme (2014) GLO climate data stats summary. Bioregional Assessment Derived Dataset. Viewed 18 July 2018, http://data.bioregionalassessments.gov.au/dataset/afed85e0-7819-493d-a847-ec00a318e657.

    Dataset Ancestors

  20. 🌆 City Lifestyle Segmentation Dataset

    • kaggle.com
    zip
    Updated Nov 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UmutUygurr (2025). 🌆 City Lifestyle Segmentation Dataset [Dataset]. https://www.kaggle.com/datasets/umuttuygurr/city-lifestyle-segmentation-dataset
    Explore at:
    zip(11274 bytes)Available download formats
    Dataset updated
    Nov 15, 2025
    Authors
    UmutUygurr
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22121490%2F7189944f8fc292a094c90daa799d08ca%2FChatGPT%20Image%2015%20Kas%202025%2014_07_37.png?generation=1763204959770660&alt=media" alt="">

    🌆 About This Dataset

    This synthetic dataset simulates 300 global cities across 6 major geographic regions, designed specifically for unsupervised machine learning and clustering analysis. It explores how economic status, environmental quality, infrastructure, and digital access shape urban lifestyles worldwide.

    🎯 Perfect For:

    • 📊 K-Means, DBSCAN, Agglomerative Clustering
    • 🔬 PCA & t-SNE Dimensionality Reduction
    • 🗺️ Geospatial Visualization (Plotly, Folium)
    • 📈 Correlation Analysis & Feature Engineering
    • 🎓 Educational Projects (Beginner to Intermediate)

    📦 What's Inside?

    FeatureDescriptionRange
    10 FeaturesEconomic, environmental & social indicatorsRealistically scaled
    300 CitiesEurope, Asia, Americas, Africa, OceaniaDiverse distributions
    Strong CorrelationsIncome ↔ Rent (+0.8), Density ↔ Pollution (+0.6)ML-ready
    No Missing ValuesClean, preprocessed dataReady for analysis
    4-5 Natural ClustersMetropolitan hubs, eco-towns, developing centersPre-validated

    🔥 Key Features

    Realistic Correlations: Income strongly predicts rent (+0.8), internet access (+0.7), and happiness (+0.6)
    Regional Diversity: Each region has distinct economic and environmental characteristics
    Clustering-Ready: Naturally separable into 4-5 lifestyle archetypes
    Beginner-Friendly: No data cleaning required, includes example code
    Documented: Comprehensive README with methodology and use cases

    🚀 Quick Start Example

    import pandas as pd
    from sklearn.cluster import KMeans
    from sklearn.preprocessing import StandardScaler
    
    # Load and prepare
    df = pd.read_csv('city_lifestyle_dataset.csv')
    X = df.drop(['city_name', 'country'], axis=1)
    X_scaled = StandardScaler().fit_transform(X)
    
    # Cluster
    kmeans = KMeans(n_clusters=5, random_state=42)
    df['cluster'] = kmeans.fit_predict(X_scaled)
    
    # Analyze
    print(df.groupby('cluster').mean())
    

    🎓 Learning Outcomes

    After working with this dataset, you will be able to: 1. Apply K-Means, DBSCAN, and Hierarchical Clustering 2. Use PCA for dimensionality reduction and visualization 3. Interpret correlation matrices and feature relationships 4. Create geographic visualizations with cluster assignments 5. Profile and name discovered clusters based on characteristics

    📚 Ideal For These Projects

    • 🏆 Kaggle Competitions: Practice clustering techniques
    • 📝 Academic Projects: Urban planning, sociology, environmental science
    • 💼 Portfolio Work: Showcase ML skills to employers
    • 🎓 Learning: Hands-on practice with unsupervised learning
    • 🔬 Research: Urban lifestyle segmentation studies

    🌍 Expected Clusters

    ClusterCharacteristicsExample Cities
    Metropolitan Tech HubsHigh income, density, rentSilicon Valley, Singapore
    Eco-Friendly TownsLow density, clean air, high happinessNordic cities
    Developing CentersMid income, high density, poor airEmerging markets
    Low-Income SuburbanLow infrastructure, incomeRural areas
    Industrial Mega-CitiesVery high density, pollutionManufacturing hubs

    🛠️ Technical Details

    • Format: CSV (UTF-8)
    • Size: ~300 rows × 10 columns
    • Missing Values: 0%
    • Data Types: 2 categorical, 8 numerical
    • Target Variable: None (unsupervised)
    • Correlation Strength: Pre-validated (r: 0.4 to 0.8)

    📖 What Makes This Dataset Special?

    Unlike random synthetic data, this dataset was carefully engineered with: - ✨ Realistic correlation structures based on urban research - 🌍 Regional characteristics matching real-world patterns - 🎯 Optimal cluster separability (validated via silhouette scores) - 📚 Comprehensive documentation and starter code

    🏅 Use This Dataset If You Want To:

    ✓ Learn clustering without data cleaning hassles
    ✓ Practice PCA and dimensionality reduction
    ✓ Create beautiful geographic visualizations
    ✓ Understand feature correlation in real-world contexts
    ✓ Build a portfolio project with clear business insights

    📊 Acknowledgments

    This dataset was designed for educational purposes in machine learning and data science. While synthetic, it reflects real patterns observed in global urban development research.

    Happy Clustering! 🎉

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
National Oceanic and Atmospheric Administration (2023). Global Summary of the Month, version 1.0 [Dataset]. https://data.cnra.ca.gov/dataset/global-summary-of-the-month-version-1-0
Organization logo

Global Summary of the Month, version 1.0

Explore at:
10 scholarly articles cite this dataset (View in Google Scholar)
pdf, csv, kmzAvailable download formats
Dataset updated
Mar 1, 2023
Dataset authored and provided by
National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
Description

The global summaries data set contains a monthly (GSOM) resolution of meteorological elements (max temp, snow, etc) from 1763 to present with updates weekly. The major parameters are: monthly mean maximum, mean minimum and mean temperatures; monthly total precipitation and snowfall; departure from normal of the mean temperature and total precipitation; monthly heating and cooling degree days; number of days that temperatures and precipitation are above or below certain thresholds; and extreme daily temperature and precipitation amounts. The primary source data set source is the Global Historical Climatology Network (GHCN)-Daily Data set. The global summaries data set also contains a yearly (GSOY) resolution of meteorological elements. See associated resources for more information. This data is not to be confused with "GHCN-Monthly", "Annual Summaries" or "NCDC Summary of the Month". There are unique elements that are produced globally within the GSOM and GSOY data files. There are also bias corrected temperature data in GHCN-Monthly, which will not be available in GSOM and GSOY. The GSOM and GSOY data set is going to replace the legacy DSI-3220 and expand to include non-U.S. (a.k.a. global) stations. DSI-3220 only included National Weather Service (NWS) COOP Published, or "Published in CD", sites.

Search
Clear search
Close search
Google apps
Main menu