82 datasets found
  1. Dataset for Stock Market Index of 7 Economies

    • kaggle.com
    zip
    Updated Jul 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saad Aziz (2023). Dataset for Stock Market Index of 7 Economies [Dataset]. https://www.kaggle.com/datasets/saadaziz1985/dataset-for-stock-market-index-of-7-countries
    Explore at:
    zip(1917326 bytes)Available download formats
    Dataset updated
    Jul 4, 2023
    Authors
    Saad Aziz
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context:

    The provided dataset is extracted from yahoo finance using pandas and yahoo finance library in python. This deals with stock market index of the world best economies. The code generated data from Jan 01, 2003 to Jun 30, 2023 that’s more than 20 years. There are 18 CSV files, dataset is generated for 16 different stock market indices comprising of 7 different countries. Below is the list of countries along with number of indices extracted through yahoo finance library, while two CSV files deals with annualized return and compound annual growth rate (CAGR) has been computed from the extracted data.

    Number of Countries & Index:

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F15657145%2F90ce8a986761636e3edbb49464b304d8%2FNumber%20of%20Index.JPG?generation=1688490342207096&alt=media" alt="">

    Content:

    Unit of analysis: Stock Market Index Analysis

    This dataset is useful for research purposes, particularly for conducting comparative analyses involving capital market performance and could be used along with other economic indicators.

    There are 18 distinct CSV files associated with this dataset. First 16 CSV files deals with number of indices and last two CSV file deals with annualized return of each year and CAGR of each index. If data in any column is blank, it portrays that index was launch in later years, for instance: Bse500 (India), this index launch in 2007, so earlier values are blank, similarly China_Top300 index launch in year 2021 so early fields are blank too.

    The extraction process involves applying different criteria, like in 16 CSV files all columns are included, Adj Close is used to calculate annualized return. The algorithm extracts data based on index name (code given by the yahoo finance) according start and end date.

    Annualized return and CAGR has been calculated and illustrated in below image along with machine readable file (CSV) attached to that.

    To extract the data provided in the attachment, various criteria were applied:

    1. Content Filtering: The data was filtered based on several attributes, including the index name, start and end date. This filtering process ensured that only relevant data meeting the specified criteria.

    2. Collaborative Filtering: Another filtering technique used was collaborative filtering using yahoo finance, which relies on index similarity. This approach involves finding indices that are similar to other index or extended dataset scope to other countries or economies. By leveraging this method, the algorithm identifies and extracts data based on similarities between indices.

    In the last two CSV files, one belongs to annualized return, that was calculated based on the Adj close column and new DataFrame created to store its outcome. Below is the image of annualized returns of all index (if unreadable, machine-readable or CSV format is attached with the dataset).

    Annualized Return:

    As far as annualised rate of return is concerned, most of the time India stock market indices leading, followed by USA, Canada and Japan stock market indices.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F15657145%2F37645bd90623ea79f3708a958013c098%2FAnnualized%20Return.JPG?generation=1688525901452892&alt=media" alt="">

    Compound Annual Growth Rate (CAGR):

    The best performing index based on compound growth is Sensex (India) that comprises of top 30 companies is 15.60%, followed by Nifty500 (India) that is 11.34% and Nasdaq (USA) all is 10.60%.

    The worst performing index is China top300, however this is launch in 2021 (post pandemic), so would not possible to examine at that stage (due to less data availability). Furthermore, UK and Russia indices are also top 5 in the worst order.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F15657145%2F58ae33f60a8800749f802b46ec1e07e7%2FCAGR.JPG?generation=1688490409606631&alt=media" alt="">

    Geography: Stock Market Index of the World Top Economies

    Time period: Jan 01, 2003 – June 30, 2023

    Variables: Stock Market Index Title, Open, High, Low, Close, Adj Close, Volume, Year, Month, Day, Yearly_Return and CAGR

    File Type: CSV file

    Inspiration:

    • Time series prediction model
    • Investment opportunities in world best economies
    • Comparative Analysis of past data with other stock market indices or other indices

    Disclaimer:

    This is not a financial advice; due diligence is required in each investment decision.

  2. Consumer Price Index (CPI)

    • catalog.data.gov
    • datasets.ai
    Updated May 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bureau of Labor Statistics (2022). Consumer Price Index (CPI) [Dataset]. https://catalog.data.gov/dataset/consumer-price-index-cpi-ee18b
    Explore at:
    Dataset updated
    May 16, 2022
    Dataset provided by
    Bureau of Labor Statisticshttp://www.bls.gov/
    Description

    The Consumer Price Index (CPI) is a measure of the average change over time in the prices paid by urban consumers for a market basket of consumer goods and services. Indexes are available for the U.S. and various geographic areas. Average price data for select utility, automotive fuel, and food items are also available. Prices for the goods and services used to calculate the CPI are collected in 75 urban areas throughout the country and from about 23,000 retail and service establishments. Data on rents are collected from about 43,000 landlords or tenants. More information and details about the data provided can be found at http://www.bls.gov/cpi

  3. D

    Data from: U-Index, a dataset and an impact metric for informatics tools and...

    • datasetcatalog.nlm.nih.gov
    • data.niaid.nih.gov
    • +2more
    Updated Feb 22, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Winnenburg, Rainer; Shah, Nigam H.; Callahan, Alison (2019). U-Index, a dataset and an impact metric for informatics tools and databases [Dataset]. http://doi.org/10.5061/dryad.gj651
    Explore at:
    Dataset updated
    Feb 22, 2019
    Authors
    Winnenburg, Rainer; Shah, Nigam H.; Callahan, Alison
    Description

    Measuring the usage of informatics resources such as software tools and databases is essential to quantifying their impact, value and return on investment. We have developed a publicly available dataset of informatics resource publications and their citation network, along with an associated metric (u-Index) to measure informatics resources’ impact over time. Our dataset differentiates the context in which citations occur to distinguish between ‘awareness’ and ‘usage’, and uses a citing universe of open access publications to derive citation counts for quantifying impact. Resources with a high ratio of usage citations to awareness citations are likely to be widely used by others and have a high u-Index score. We have pre-calculated the u-Index for nearly 100,000 informatics resources. We demonstrate how the u-Index can be used to track informatics resource impact over time. The method of calculating the u-Index metric, the pre-computed u-Index values, and the dataset we compiled to calculate the u-Index are publicly available.

  4. Case Mix Index

    • data.chhs.ca.gov
    • data.ca.gov
    • +2more
    docx, pdf, xlsx, zip
    Updated Nov 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Health Care Access and Information (2025). Case Mix Index [Dataset]. https://data.chhs.ca.gov/dataset/case-mix-index
    Explore at:
    docx, pdf, xlsx(192727), zipAvailable download formats
    Dataset updated
    Nov 6, 2025
    Dataset authored and provided by
    Department of Health Care Access and Information
    Description

    The Case Mix Index (CMI) is the average relative DRG weight of a hospital’s inpatient discharges, calculated by summing the Medicare Severity-Diagnosis Related Group (MS-DRG) weight for each discharge and dividing the total by the number of discharges. The CMI reflects the diversity, clinical complexity, and resource needs of all the patients in the hospital. A higher CMI indicates a more complex and resource-intensive case load. Although the MS-DRG weights, provided by the Centers for Medicare & Medicaid Services (CMS), were designed for the Medicare population, they are applied here to all discharges regardless of payer. Note: It is not meaningful to add the CMI values together.

  5. T

    India Consumer Price Index (CPI)

    • tradingeconomics.com
    • es.tradingeconomics.com
    • +13more
    csv, excel, json, xml
    Updated May 15, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2018). India Consumer Price Index (CPI) [Dataset]. https://tradingeconomics.com/india/consumer-price-index-cpi
    Explore at:
    excel, xml, json, csvAvailable download formats
    Dataset updated
    May 15, 2018
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 31, 2011 - Oct 31, 2025
    Area covered
    India
    Description

    Consumer Price Index CPI in India increased to 197.30 points in October from 197 points in September of 2025. This dataset provides - India Consumer Price Index (CPI) - actual values, historical data, forecast, chart, statistics, economic calendar and news.

  6. Claude.ai Usage Data

    • kaggle.com
    zip
    Updated Sep 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yash Dogra (2025). Claude.ai Usage Data [Dataset]. https://www.kaggle.com/datasets/yashdogra/anthropic
    Explore at:
    zip(2746225 bytes)Available download formats
    Dataset updated
    Sep 16, 2025
    Authors
    Yash Dogra
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Data Documentation

    This document describes the data sources and variables used in the third Anthropic Economic Index (AEI) report.

    Claude.ai Usage Data

    Overview

    The core dataset contains Claude AI usage metrics aggregated by geography and analysis dimensions (facets).

    Source files: - aei_raw_claude_ai_2025-08-04_to_2025-08-11.csv (pre-enrichment data in data/intermediate/) - aei_enriched_claude_ai_2025-08-04_to_2025-08-11.csv (enriched data in data/output/)

    Note on data sources: The AEI raw file contains raw counts and percentages. Derived metrics (indices, tiers, per capita calculations, automation/augmentation percentages) are calculated during the enrichment process in aei_report_v3_preprocessing_claude_ai.ipynb.

    Data Schema

    Each row represents one metric value for a specific geography and facet combination:

    ColumnTypeDescription
    geo_idstringGeographic identifier (ISO-2 country code for countries, US state code, or "GLOBAL", ISO-3 country codes in enriched data)
    geographystringGeographic level: "country", "state_us", or "global"
    date_startdateStart of data collection period
    date_enddateEnd of data collection period
    platform_and_productstring"Claude AI (Free and Pro)"
    facetstringAnalysis dimension (see Facets below)
    levelintegerSub-level within facet (0-2)
    variablestringMetric name (see Variables below)
    cluster_namestringSpecific entity within facet (task, pattern, etc.). For intersections, format is "base::category"
    valuefloatNumeric metric value

    Facets

    • country: Country-level aggregations
    • state_us: US state-level aggregations
    • onet_task: O*NET occupational tasks
    • collaboration: Human-AI collaboration patterns
    • request: Request complexity levels (0=highest granularity, 1=middle granularity, 2=lowest granularity)
    • onet_task::collaboration: Intersection of tasks and collaboration patterns
    • request::collaboration: Intersection of request categories and collaboration patterns

    Core Variables

    Variables follow the pattern {prefix}_{suffix} with specific meanings:

    From AEI processing: *_count, *_pct From enrichment: *_per_capita, *_per_capita_index, *_pct_index, *_tier, automation_pct, augmentation_pct, soc_pct

    Usage Metrics

    • usage_count: Total number of conversations/interactions in a geography
    • usage_pct: Percentage of total usage (relative to parent geography - gobal for countries, US for states)
    • usage_per_capita: Usage count divided by working age population
    • usage_per_capita_index: Concentration index showing if a geography has more/less usage than expected based on population share (1.0 = proportional, >1.0 = over-representation, <1.0 = under-representation)
    • usage_tier: Usage adoption tier (0 = no/little adoption, 1-4 = quartiles of adoption among geographies with sufficient usage)

    Content Facet Metrics

    O*NET Task Metrics: - onet_task_count: Number of conversations using this specific O*NET task - onet_task_pct: Percentage of geographic total using this task - onet_task_pct_index: Specialization index comparing task usage to baseline (global for countries, US for states) - onet_task_collaboration_count: Number of conversations with both this task and collaboration pattern (intersection) - onet_task_collaboration_pct: Percentage of the base task's total that has this collaboration pattern (sums to 100% within each task)

    Occupation Metrics

    • soc_pct: Percentage of classified O*NET tasks associated with this SOC major occupation group (e.g., Management, Computer and Mathematical)

    Request Metrics: - request_count: Number of conversations in this request category level - request_pct: Percentage of geographic total in this category - request_pct_index: Specialization index comparing request usage to baseline - request_collaboration_count: Number of conversations with both this request category and collaboration pattern (intersection) - request_collaboration_pct: Percentage of the base request's total that has this collaboration pattern (sums to 100% within each request)

    Collaboration Pattern Metrics: - collaboration_count: Number of conversations with this collaboration pattern - collaboration_pct: Percentage of geographic total with this pattern - collaboration_pct_index: Specialization index comparing pattern to baseline - automation_pct: Percentage of classifiable collaboration that is automation-focused (directive, feedback loop patterns) - augmentation_pct: Percentage of classifiable collaboration that is augmentation-focused (validation, task iteration, learning patterns)

    Demographic & Economic Metrics

    • ...
  7. d

    The Superdiversity Index - Dataset - B2FIND

    • demo-b2find.dkrz.de
    Updated Oct 15, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2015). The Superdiversity Index - Dataset - B2FIND [Dataset]. http://demo-b2find.dkrz.de/dataset/54c23d32-8f8a-5f7d-9488-b4b6a9aa049a
    Explore at:
    Dataset updated
    Oct 15, 2015
    Description

    The Superdiversity dataset includes the Superdiversity Index (SI) calculated on the diversity of the emotional content expressed in texts of different communities. The emotional valences of words used by a community are extracted from Twitter data produced by that specific community. The Superdiversity dataset includes the SI built on Twitter data and lexicon-based Sentiment Analysis. In addition, the dataset comprises other possible diversity measures calculated from the same data from which the SI is calculated, such as the number of tweets in the community language and the Type-Token Ratio, the number of languages in a community. The SI ranges in [0, 1]: a value of 0 means an emotional content very close between the computed valences and a standard emotional lexicon. a value of 0.5 indicates no correlation between the emotional content of words used by the community on Twitter and the standard emotional content. a value of 1 would correspond to the use of terms with the opposite emotional content compared to the standard. Data is computed at three different geographical scales based on the Classification of Territorial Units for Statistics (NUTS), i.e., NUTS1, NUTS2, and NUTS3, for two different nations Italy and the United Kingdom. The untagged Twitter dataset is composed of just under 73,175,500 geolocalised tweets gathered for 3 months, from the 1st August to the 31st October of 2015.

  8. c

    Historical changes of annual temperature and precipitation indices at...

    • kilthub.cmu.edu
    txt
    Updated Aug 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuchuan Lai; David Dzombak (2024). Historical changes of annual temperature and precipitation indices at selected 210 U.S. cities [Dataset]. http://doi.org/10.1184/R1/7961012.v6
    Explore at:
    txtAvailable download formats
    Dataset updated
    Aug 22, 2024
    Dataset provided by
    Carnegie Mellon University
    Authors
    Yuchuan Lai; David Dzombak
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    United States
    Description

    Historical changes of annual temperature and precipitation indices at selected 210 U.S. cities

    This dataset provide:

    Annual average temperature, total precipitation, and temperature and precipitation extremes calculations for 210 U.S. cities.

    Historical rates of changes in annual temperature, precipitation, and the selected temperature and precipitation extreme indices in the 210 U.S. cities.

    Estimated thresholds (reference levels) for the calculations of annual extreme indices including warm and cold days, warm and cold nights, and precipitation amount from very wet days in the 210 cities.

    Annual average of daily mean temperature, Tmax, and Tmin are included for annual average temperature calculations. Calculations were based on the compiled daily temperature and precipitation records at individual cities.

    Temperature and precipitation extreme indices include: warmest daily Tmax and Tmin, coldest daily Tmax and Tmin , warm days and nights, cold days and nights, maximum 1-day precipitation, maximum consecutive 5-day precipitation, precipitation amounts from very wet days.

    Number of missing daily Tmax, Tmin, and precipitation values are included for each city.

    Rates of change were calculated using linear regression, with some climate indices applied with the Box-Cox transformation prior to the linear regression.

    The historical observations from ACIS belong to Global Historical Climatological Network - daily (GHCN-D) datasets. The included stations were based on NRCC’s “ThreadEx” project, which combined daily temperature and precipitation extremes at 255 NOAA Local Climatological Locations, representing all large and medium size cities in U.S. (See Owen et al. (2006) Accessing NOAA Daily Temperature and Precipitation Extremes Based on Combined/Threaded Station Records).

    Resources:

    See included README file for more information.

    Additional technical details and analyses can be found in: Lai, Y., & Dzombak, D. A. (2019). Use of historical data to assess regional climate change. Journal of climate, 32(14), 4299-4320. https://doi.org/10.1175/JCLI-D-18-0630.1

    Other datasets from the same project can be accessed at: https://kilthub.cmu.edu/projects/Use_of_historical_data_to_assess_regional_climate_change/61538

    ACIS database for historical observations: http://scacis.rcc-acis.org/

    GHCN-D datasets can also be accessed at: https://www.ncei.noaa.gov/data/global-historical-climatology-network-daily/

    Station information for each city can be accessed at: http://threadex.rcc-acis.org/

    • 2024 August updated -

      Annual calculations for 2022 and 2023 were added.

      Linear regression results and thresholds for extremes were updated because of the addition of 2022 and 2023 data.

      Note that future updates may be infrequent.

    • 2022 January updated -

      Annual calculations for 2021 were added.

      Linear regression results and thresholds for extremes were updated because of the addition of 2021 data.

    • 2021 January updated -

      Annual calculations for 2020 were added.

      Linear regression results and thresholds for extremes were updated because of the addition of 2020 data.

    • 2020 January updated -

      Annual calculations for 2019 were added.

      Linear regression results and thresholds for extremes were updated because of the addition of 2019 data.

      Thresholds for all 210 cities were combined into one single file – Thresholds.csv.

    • 2019 June updated -

      Baltimore was updated with the 2018 data (previously version shows NA for 2018) and new ID to reflect the GCHN ID of Baltimore-Washington International AP. city_info file was updated accordingly.

      README file was updated to reflect the use of "wet days" index in this study. The 95% thresholds for calculation of wet days utilized all daily precipitation data from the reference period and can be different from the same index from some other studies, where only days with at least 1 mm of precipitation were utilized to calculate the thresholds. Thus the thresholds in this study can be lower than the ones that would've be calculated from the 95% percentiles from wet days (i.e., with at least 1 mm of precipitation).

  9. d

    Data from: Data and code from: A high throughput approach for measuring soil...

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    Updated Sep 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Data and code from: A high throughput approach for measuring soil slaking index [Dataset]. https://catalog.data.gov/dataset/data-and-code-from-a-high-throughput-approach-for-measuring-soil-slaking-index
    Explore at:
    Dataset updated
    Sep 2, 2025
    Dataset provided by
    Agricultural Research Service
    Description

    This dataset includes soil wet aggregate stability measurements from the Upper Mississippi River Basin LTAR site in Ames, Iowa. Samples were collected in 2021 from this long-term tillage and cover crop trial in a corn-based agroecosystem. We measured wet aggregate stability using digital photography to quantify disintegration (slaking) of submerged aggregates over time, similar to the technique described by Fajardo et al. (2016) and Rieke et al. (2021). However, we adapted the technique to larger sample numbers by using a multi-well tray to submerge 20-36 aggregates simultaneously. We used this approach to measure slaking index of 160 soil samples (2120 aggregates). This dataset includes slaking index calculated for each aggregates, and also summarized by samples. There were usually 10-12 aggregates measured per sample. We focused primarily on methodological issues, assessing the statistical power of slaking index, needed replication, sensitivity to cultural practices, and sensitivity to sample collection date. We found that small numbers of highly unstable aggregates lead to skewed distributions for slaking index. We concluded at least 20 aggregates per sample were preferred to provide confidence in measurement precision. However, the experiment had high statistical power with only 10-12 replicates per sample. Slaking index was not sensitive to the initial size of dry aggregates (3 to 10 mm diameter); therefore, pre-sieving soils was not necessary. The field trial showed greater aggregate stability under no-till than chisel plow practice, and changing stability over a growing season. These results will be useful to researchers and agricultural practitioners who want a simple, fast, low-cost method for measuring wet aggregate stability on many samples.

  10. Yost Index with 90% confidence intervals (with all contributing source files...

    • figshare.com
    zip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Francis P. Boscoe; Bian Liu; Furrina F. Lee; Li Niu; jordana lafantasie (2023). Yost Index with 90% confidence intervals (with all contributing source files - LARGE) [Dataset]. http://doi.org/10.6084/m9.figshare.16649773.v3
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Francis P. Boscoe; Bian Liu; Furrina F. Lee; Li Niu; jordana lafantasie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We extend our previous work with the Yost Index by adding 90% confidence intervals to the index values. These were calculated using the variance replicate estimates published in association with the American Community Survey of the United States Census Bureau.

    In the file yost-tract-2015-2019.csv, the data fields consists of 11-digit geographic ID built from FIPS codes (2 digit state, 3 digit county, 6 digit census tract); Yost index, 90% lower confidence interval; 90% upper confidence interval. Data is provided for 72,793 census tracts for which sufficient data were available. The Yost Index ranges from 1 (lowest socioeconomic position) to 100 (highest socioeconomic position).

    For those only interested in using the index as we have calculated it, the file yost-tract-2015-2019 is the only file you need. The other 368 files here are provided for anyone who wishes to replicate our results using the R program yost-conf-intervals.R. The program presumes the user is running Windows machine and that all files reside in a folder called C:/yostindex. The R program requires a number of packages, all of which are specified in lines 10-22 of the program.

    Details of this project were published in Boscoe FP, Liu B, LaFantasie J, Niu L, Lee FF. Estimating uncertainty in a socioeconomic index derived from the American Community Survey. SSM-Population Health 2022; 18: 101078. Full text

    Additional years of data following this format are planned to be added to this repository in time.

  11. H

    Data from: Long-term, gridded standardized precipitation index for Hawai‘i

    • hydroshare.org
    • dataone.org
    • +1more
    zip
    Updated Sep 22, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthew Lucas; Clay Trauernicht; Abby Frazier; Tomoaki Miura (2020). Long-term, gridded standardized precipitation index for Hawai‘i [Dataset]. http://doi.org/10.4211/hs.822553ead1d04869b5b3e1e3a3817ec6
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Sep 22, 2020
    Dataset provided by
    HydroShare
    Authors
    Matthew Lucas; Clay Trauernicht; Abby Frazier; Tomoaki Miura
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1920 - Dec 31, 2011
    Area covered
    Description

    This dataset contains gridded monthly Standardized Precipitation Index (SPI) at 10 timescales: 1-, 3-, 6-, 9-, 12-, 18-, 24-, 36-, 48-, and 60-month intervals from 1920 to 2012 at 250 m resolution for seven of the eight main Hawaiian Islands (18.849°N, 154.668°W to 22.269°N, 159.816°W; the island of Ni‘ihau is excluded due to lack of data). The gridded data use a World Geographic Coordinate System 1984 (WGS84) and are stored as individual GeoTIFF files for each month-year, organized by SPI interval, as indicated by the GeoTIFF file name. Thus, for example, the file “spi3_1999_11.tif” would contain the gridded 3-month SPI values calculated for the month of November in the year 1999. Currently, the data are available from 1920 to 2012, but the datasets will be updated as new gridded monthly rainfall data become available.SPI is a normalized drought index that converts monthly rainfall totals into the number of standard deviations (z-score) by which the observed, cumulative rainfall diverges from the long-term mean. The conversion of raw rainfall to a z-score is done by fitting a designated probability distribution function to the observed precipitation data for a site. In doing so, anomalous rainfall quantities take the form of positive and negative SPI z-scores. Additionally, because distribution fitting is based on long-term (>30 years) precipitation data at that location, SPI score is relative, making comparisons across different climates possible.The creation of a statewide Hawai‘i SPI dataset relied on a 93-year (1920-2012) high resolution (250 m) spatially interpolated monthly gridded rainfall dataset [1]. This dataset is recognized as the highest quality precipitation data available [2] for the main Hawaiian Islands. After performing extensive quality control on the monthly rainfall station data (including homogeneity testing of over 1,100 stations [1,3]) and a geostatistical method comparison, ordinary kriging was using to generate a time series of gridded monthly rainfall from January 1920 to December 2012 at 250 m resolution [3]. This dataset was then used to calculate monthly SPI for 10 timescales (1-, 3-, 6-, 9-, 12-, 18-, 24-, 36-, 48-, and 60-month) at each grid cell. A 3-month SPI in May 2001, for example, represents the March-April-May (MAM) total rainfall in 2001 compared to the MAM rainfall in the entire time series. The resolution of the gridded rainfall dataset provides a more precise representation of drought (and pluvial) events compared to the other available drought products.Frazier, A.G.; Giambelluca, T.W.; Diaz, H.F.; Needham, H.L. Comparison of geostatistical approaches to spatially interpolate month-year rainfall for the Hawaiian Islands. Int. J. Climatol. 2016, 36, 1459–1470, doi:10.1002/joc.4437.Giambelluca, T.W.; Chen, Q.; Frazier, A.G.; Price, J.P.; Chen, Y.-L.; Chu, P.-S.; Eischeid, J.K.; Delparte, D.M. Online Rainfall Atlas of Hawai‘i. B. Am. Meteorol. Soc. 2013, 94, 313–316, doi:10.1175/BAMS-D-11-00228.1.Frazier, A.G.; Giambelluca, T.W. Spatial trend analysis of Hawaiian rainfall from 1920 to 2012. Int. J. Climatol. 2017, 37, 2522–2531, doi:10.1002/joc.4862.

  12. HadEX3: Global land-surface climate extremes indices v3.0.4 (1901-2018)

    • catalogue.ceda.ac.uk
    Updated Mar 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Robert J. H. Dunn; Lisa Alexander; Markus Donat; Xuebin Zhang; Margot Bador; Nicholas Herold; Tanya Lippmann; Robert J. Allan; Enric Aguilar; Abdoul Aziz; Manola Brunet; John Caesar; Guillaume Chagnaud; Vincent Cheng; Thelma Cinco; Imke Durre; Rosaline de Guzman; Tin Mar Htay; Wan Maisarah Wan Ibadullah; Muhammad Khairul Izzat Bin Ibrahim; Mahbobeh Khoshkam; Andries Kruge; Hisayuki Kubota; Tan Wee Leng; Gerald Lim; Lim Li-Sha; Jose Marengo; Sifiso Mbatha; Simon McGree; Matthew Menne; Maria de los Milagros Skansi; Sandile Ngwenya; Francis Nkrumah; Chalump Oonariya; Jose Daniel Pabon-Caicedo; Geremy Panthou; Cham Pham; Fatemeh Rahimzadeh; Andrea Ramos; Ernesto Salgado; Jim Salinger; Youssouph Sane; Ardhasena Sopaheluwakan; Arvind Srivastava; Ying Sun; Bertrand Trimbal; Nichanun Trachow; Blair Trewin; Gerard van der Schrier; Jorge Vazquez-Aguirre; Ricardo Vasquez; Claudia Villarroel; Lucie Vincent; Theo Vischel; Russ Vose; Mohd Noor' Arifin Bin Hj Yussof (2024). HadEX3: Global land-surface climate extremes indices v3.0.4 (1901-2018) [Dataset]. https://catalogue.ceda.ac.uk/uuid/115d5e4ebf7148ec941423ec86fa9f26
    Explore at:
    Dataset updated
    Mar 7, 2024
    Dataset provided by
    Centre for Environmental Data Analysishttp://www.ceda.ac.uk/
    Authors
    Robert J. H. Dunn; Lisa Alexander; Markus Donat; Xuebin Zhang; Margot Bador; Nicholas Herold; Tanya Lippmann; Robert J. Allan; Enric Aguilar; Abdoul Aziz; Manola Brunet; John Caesar; Guillaume Chagnaud; Vincent Cheng; Thelma Cinco; Imke Durre; Rosaline de Guzman; Tin Mar Htay; Wan Maisarah Wan Ibadullah; Muhammad Khairul Izzat Bin Ibrahim; Mahbobeh Khoshkam; Andries Kruge; Hisayuki Kubota; Tan Wee Leng; Gerald Lim; Lim Li-Sha; Jose Marengo; Sifiso Mbatha; Simon McGree; Matthew Menne; Maria de los Milagros Skansi; Sandile Ngwenya; Francis Nkrumah; Chalump Oonariya; Jose Daniel Pabon-Caicedo; Geremy Panthou; Cham Pham; Fatemeh Rahimzadeh; Andrea Ramos; Ernesto Salgado; Jim Salinger; Youssouph Sane; Ardhasena Sopaheluwakan; Arvind Srivastava; Ying Sun; Bertrand Trimbal; Nichanun Trachow; Blair Trewin; Gerard van der Schrier; Jorge Vazquez-Aguirre; Ricardo Vasquez; Claudia Villarroel; Lucie Vincent; Theo Vischel; Russ Vose; Mohd Noor' Arifin Bin Hj Yussof
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Time period covered
    Jan 1, 1901 - Dec 31, 2018
    Area covered
    Earth
    Variables measured
    time, Max TN, Max TX, Min TN, Min TX, Ice Days, latitude, longitude, Frost Days, Summer days, and 24 more
    Description

    HadEX3 is a land-surface dataset of climate extremes indices available on a 1.875 x 1.25 longitude-latitude grid. These 29 indices have been developed by the World Meteorological Organization (WMO) Expert Team on Climate Change Detection and Indices (ETCCDI). Daily precipitation, as well as maximum and minimum temperature observations, are used to calculate these indices at each station. The daily data, as well as indices, have been supplied, quality controlled and combined to make a gridded set of NetCDF files covering 1901-2018 (inclusive).

    Spatial coverage is determined by the number of stations present at each time point as well as the spatial correlation structure between the stations for each index. The spatial coverage is lowest at the beginning of the dataset, rising until around 1960 where it plateaus, and then declines slightly after 2010.

    All indices are available as annual quantities, with a subset also available on a monthly basis. A number of the indices use a reference period to determine thresholds. For these, we provide two versions, one set using 1961-1990 and another using the more recent 1981-2010 (these reference periods have been indicated in the file name as either 'ref-6190' or 'ref-8110').

    Version 3.0.4 was added due to an error in how the Rx1day and Rx5day data were being handled for one of the West African data sources. More details can be found in the HadEX3 blog under 'Details/Docs' tab.

    Additionally, an extension to HadEX3, comprising additional indices recommended by the WMO Expert Team on Sector-specific Climate Indices (ET-SCI), has been produced. These data are available in a separate dataset connected to this record, marked as supplemental to this dataset.

  13. Report on Evaluation of the Interaction-based Hazard Index Formula with Data...

    • catalog.data.gov
    • s.cnmilf.com
    Updated Aug 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2024). Report on Evaluation of the Interaction-based Hazard Index Formula with Data on Trihalomethanes [Dataset]. https://catalog.data.gov/dataset/report-on-evaluation-of-the-interaction-based-hazard-index-formula-with-data-on-trihalomet
    Explore at:
    Dataset updated
    Aug 3, 2024
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    The endpoints selected for evaluation of the HIINT formula were percent relative liver weight of mice (PcLiv) and the logarithm of ALT [Log(ALT)], where the log transformation was used to help stabilize the increases in variance with dose found in the ALT dataset.

  14. H

    Agrobiodiversity Index gridded datasets

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Jul 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sarah Jones; Natalia Estrada-Carmona; Roseline Remans (2022). Agrobiodiversity Index gridded datasets [Dataset]. http://doi.org/10.7910/DVN/2PEPLH
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 12, 2022
    Dataset provided by
    Harvard Dataverse
    Authors
    Sarah Jones; Natalia Estrada-Carmona; Roseline Remans
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Gridded datasets used in Jones et al. (2021) paper 'Agrobiodiversity Index scores show agrobiodiversity is underutilized in national food systems'. Details of how datasets were made and underlying sources are provided in Jones et al. (2021) Supplementary Information. Datasets included: - H_2010_spam_V2r0_42c: crop commodity diversity (Shannon's diversity index) at 10x10km resolution, based on SPAM 2010 V2 physical area maps - sr_2010_spam_v2r0_42c: crop commodity richness at 10x10km resolution, based on SPAM 2010 V2 physical area maps - sr_2010_spam_v2r0_42c_maj22: locations of cropland with at least 22 crop commodities (1) versus cropland with <22 crop commodities at 10x10km resolution, based on SPAM 2010 V2 physical area maps - Livestock_8_shannons_LSU: livestock diversity (Shannon's diversity index) calculated from population numbers converted to standard livestock units at 1x1km resolution, based on Global Livestock of the World v3 - Fish_srichness raster: freshwater fish species richness per major river basin, based on Tedesco et al (2017) - CropPasture_2000_bool: locations where cropland and pasture co-exist (1) versus locations where either cropland OR pasture exist (0), at 10x10km resolution, based on cropland and pasture maps for the year 2000 available from EarthStat - esa2015_natag_1km_pc: percentage of natural or semi-natural vegetation within a 1x1km window around cropped pixels, based on European Space Agency Climate Change Initiative (ESA-CCI) land cover maps for 2015 Not uploaded (no post-processing so data can be accessed at source): - potential soil biodiversity index (see https://esdac.jrc.ec.europa.eu/content/global-soil-biodiversity-atlas) - tree cover on agricultural land (see Zomer et al. 2016 and https://apps.worldagroforestry.org/global-tree-cover/index.html)

  15. H

    Amenity Index

    • hydroshare.org
    • beta.hydroshare.org
    • +1more
    zip
    Updated Jan 6, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthew Wheelwright (2017). Amenity Index [Dataset]. https://www.hydroshare.org/resource/373527434e7544fc8d68ee78cd657da9
    Explore at:
    zip(1.2 MB)Available download formats
    Dataset updated
    Jan 6, 2017
    Dataset provided by
    HydroShare
    Authors
    Matthew Wheelwright
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2008 - Dec 31, 2012
    Area covered
    Description

    This dataset contains an amenity index for all Utah Census Places. Using Census data and other available datasets, an index for 'amenityness' was created. Following Ganning and Flint's previous research note, the index is calculated using physical amenities as well as socio-economic indicators (See: Ganning, Joanna Paulson, and Courtney G. Flint. “Constructing a Community-Level Amenity Index.” Society & Natural Resources 23, no. 12 (November 2, 2010): 1253–58. doi:10.1080/08941920903030132).

    This index adds context to any study where an understanding of local, natural amenities is valued. It contains the following physical amenity variables: 1. Open Land 2. Open Water 3. Kapos Classification 4. Number of Recreation sites

    It also includes key indicators which come from Census Data. 1. Employment Diversity 2. Median Household Income 3. Seasonal Housing 4. Population Growth 5. In-migration from out of state 6. In-migration from out of county 7. College education 8. New Housing Built 9. Housing Valued over $175,000 10. Median Rent 11. Median Housing Value 12. Employment in arts, entertainment, recreation, accommodation, and food services.

    For more information on the calculations and variables, please read Ganning & Flint's research note and the ReadMe file linked to this dataset or the data dictionary within the excel spreadsheet as Tab 2.

  16. Location Affordability Index v.3

    • hudgis-hud.opendata.arcgis.com
    • data.lojic.org
    • +3more
    Updated Jan 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Housing and Urban Development (2025). Location Affordability Index v.3 [Dataset]. https://hudgis-hud.opendata.arcgis.com/datasets/location-affordability-index-v-3
    Explore at:
    Dataset updated
    Jan 24, 2025
    Dataset provided by
    United States Department of Housing and Urban Developmenthttp://www.hud.gov/
    Authors
    Department of Housing and Urban Development
    Area covered
    Description

    First launched by the U.S. Department of Housing and Urban Development (HUD) and Department of Transportation (DOT) in November 2013, the Location Affordability Index (LAI) provides ubiquitous, standardized household housing and transportation cost estimates for all 50 states and the District of Columbia. Because what is affordable is different for everyone, users can choose among eight household profiles—which vary by household income, size, and number of commuters—and see the impact of the built environment on affordability in a given location while holding household demographics constant.

    Version 3 updates the constituent data sets with 2012-2016 American Community Survey data and makes several methodological tweaks, most notably moving to modeling at the Census tract level rather at the block group. As with Version 2, the inputs to the simultaneous equation model (SEM) include six endogenous variables—housing costs, car ownership, and transit usage for both owners and renters—and 18 exogenous variables, with vehicle miles traveled still modeled separately due to data limitations.To learn more about the Location Affordability Index (v.3) visit: https://www.hudexchange.info/programs/location-affordability-index/, for questions about the spatial attribution of this dataset, please reach out to us at GISHelpdesk@hud.gov. Date of Coverage: 2012-2016 Data Dictionary: DD_Location Affordability Indev v.3.0LAI Version 3 Data and MethodologyLAI Version 3 Technical Documentation

  17. e

    Health index

    • data.europa.eu
    excel xls
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    North Gate II & III - INS (STATBEL - Statistics Belgium), Health index [Dataset]. https://data.europa.eu/data/datasets/2d799e04338f86f4dd29b43f5a13f41ee9c2899e?locale=en
    Explore at:
    excel xlsAvailable download formats
    Dataset authored and provided by
    North Gate II & III - INS (STATBEL - Statistics Belgium)
    Description

    From 1994 onwards The consumer price index, which takes into account price trends of all goods and services, forms the basis of another index: the health index. This health index has been calculated since January 1994 (introduced by the Royal Decree of 24 December 1993). The value of this index is determined by removing a number of products from the consumer price index product basket, in particular alcoholic beverages (bought in a shop or consumed in a bar), tobacco products and motor fuels except for LPG. What is the purpose of the health index? The health index is used for the indexation of housing rents. The health index is defined in the Law of 23 April 2015 on employment promotion (Belgian Official Journal of 27 April 2015). The smoothed health index, also called smoothed index (the average value of the health indexes of the last 4 months) is used as a basis for the indexation of retirement pensions, social security benefits and some salaries and wages. Public wages and social benefits are indexed as soon as the smoothed index reaches a given value, called the central index. The smoothed index is also called moving average. In order to perform a 2% index jump (laid down in the Law of 23 April 2015 on employment promotion), the smoothed health index has been temporarily blocked at its value of March 2015 (100.66). The smoothed health index was then reduced by 2% from April 2015. When the reduced smoothed health index (also called the reference index) had increased again by 2% or in other words when it had exceeded the value of 100.66, the index was no longer blocked. It occurred in April 2016. Since April 2016 the smoothed health index is calculated in the same manner as the reference index and therefore corresponds to the arithmetical mean of the health indexes of the last 4 months multiplied by a factor of 0.98. More information All health indices from 1994 onward can be found through our index-search. Simply enter the desired year and month to view the consumer price index and the health index on all available bases. Health index: list of excluded products (PDF, 83.29 Kb) More information on the index link to public services wages and pensions can be found on the website of the Wages Service of the FPS Finances:https://persopoint.be/fr/services/administration-des-salaires/principes-generaux-de-l-index. The table below shows the health index of the 13 most recent available months. be.STAT allows you to search from 1994 onward. Purpose and brief description The health index was introduced in January 1994 (by Royal Decree of 24 December 1993 implementing the law of 6 January 1989 on the safeguarding of the country's competitiveness) and is derived from the consumer price index. The value of the health index is obtained by excluding a number of products from the product and service basket from the consumer price index, in particular alcoholic beverages (bought in the shop or consumed in a café), tobacco products and motor fuels (with the exception of LPG). Population Belgian private households Frequency Monthly Timing of publication The results are available on the penultimate working day of the reference period Remarks Remark Confidentiality consumer price indices - Although the headings are published in the index and are therefore generally known, the exact definition of the goods and services is kept secret. This confidentiality is required to prevent attempts to manipulate the index, by resolute actions on certain goods and services. The confidentiality of the definitions guarantees the index objectiveness. Metadata Consumer price index - Health index.pdf

  18. u

    Data from: Data and code from: Topographic wetness index as a proxy for soil...

    • agdatacommons.nal.usda.gov
    • s.cnmilf.com
    • +1more
    zip
    Updated Nov 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    H. Edwin Winzeler; Quentin D. Read (2025). Data and code from: Topographic wetness index as a proxy for soil moisture in a hillslope catena: flow algorithms and map generalization [Dataset]. http://doi.org/10.15482/USDA.ADC/1528088
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 21, 2025
    Dataset provided by
    Ag Data Commons
    Authors
    H. Edwin Winzeler; Quentin D. Read
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    This dataset contains all data and code necessary to reproduce the analysis presented in the manuscript: Winzeler, H.E., Owens, P.R., Read Q.D.., Libohova, Z., Ashworth, A., Sauer, T. 2022. 2022. Topographic wetness index as a proxy for soil moisture in a hillslope catena: flow algorithms and map generalization. Land 11:2018. DOI: 10.3390/land11112018. There are several steps to this analysis. The relevant scripts for each are listed below. The first step is to use the raw digital elevation data (DEM) to produce different versions of the topographic wetness index (TWI) for the study region (Calculating TWI). Then, these TWI output files are processed, along with soil moisture (volumetric water content or VWC) time series data from a number of sensors located within the study region, to create analysis-ready data objects (Processing TWI and VWC). Next, models are fit relating TWI to soil moisture (Model fitting) and results are plotted (Visualizing main results). A number of additional analyses were also done (Additional analyses). Input data The DEM of the study region is archived in this dataset as SourceDem.zip. This contains the DEM of the study region (DEM1.sgrd) and associated auxiliary files all called DEM1.* with different extensions. In addition, the DEM is provided as a .tif file called USGS_one_meter_x39y400_AR_R6_WashingtonCO_2015.tif. The remaining data and code files are archived in the repository created with a GitHub release on 2022-10-11, twi-moisture-0.1.zip. The data are found in a subfolder called data.

    2017_LoggerData_HEW.csv through 2021_HEW.csv: Soil moisture (VWC) logger data for each year 2017-2021 (5 files total). 2882174.csv: weather data from a nearby station. DryPeriods2017-2021.csv: starting and ending days for dry periods 2017-2021. LoggerLocations.csv: Geographic locations and metadata for each VWC logger. Logger_Locations_TWI_2017-2021.xlsx: 546 topographic wetness indexes calculated at each VWC logger location. note: This is intermediate input created in the first step of the pipeline.

    Code pipeline To reproduce the analysis in the manuscript run these scripts in the following order. The scripts are all found in the root directory of the repository. See the manuscript for more details on the methods. Calculating TWI

    TerrainAnalysis.R: Taking the DEM file as input, calculates 546 different topgraphic wetness indexes using a variety of different algorithms. Each algorithm is run multiple times with different input parameters, as described in more detail in the manuscript. After performing this step, it is necessary to use the SAGA-GIS GUI to extract the TWI values for each of the sensor locations. The output generated in this way is included in this repository as Logger_Locations_TWI_2017-2021.xlsx. Therefore it is not necessary to rerun this step of the analysis but the code is provided for completeness.

    Processing TWI and VWC

    read_process_data.R: Takes raw TWI and moisture data files and processes them into analysis-ready format, saving the results as CSV. qc_avg_moisture.R: Does additional quality control on the moisture data and averages it across different time periods.

    Model fitting Models were fit regressing soil moisture (average VWC for a certain time period) against a TWI index, with and without soil depth as a covariate. In each case, for both the model without depth and the model with depth, prediction performance was calculated with and without spatially-blocked cross-validation. Where cross validation wasn't used, we simply used the predictions from the model fit to all the data.

    fit_combos.R: Models were fit to each combination of soil moisture averaged over 57 months (all months from April 2017-December 2021) and 546 TWI indexes. In addition models were fit to soil moisture averaged over years, and to the grand mean across the full study period. fit_dryperiods.R: Models were fit to soil moisture averaged over previously identified dry periods within the study period (each 1 or 2 weeks in length), again for each of the 546 indexes. fit_summer.R: Models were fit to the soil moisture average for the months of June-September for each of the five years, again for each of the 546 indexes.

    Visualizing main results Preliminary visualization of results was done in a series of RMarkdown notebooks. All the notebooks follow the same general format, plotting model performance (observed-predicted correlation) across different combinations of time period and characteristics of the TWI indexes being compared. The indexes are grouped by SWI versus TWI, DEM filter used, flow algorithm, and any other parameters that varied. The notebooks show the model performance metrics with and without the soil depth covariate, and with and without spatially-blocked cross-validation. Crossing those two factors, there are four values for model performance for each combination of time period and TWI index presented.

    performance_plots_bymonth.Rmd: Using the results from the models fit to each month of data separately, prediction performance was averaged by month across the five years of data to show within-year trends. performance_plots_byyear.Rmd: Using the results from the models fit to each month of data separately, prediction performance was averaged by year to show trends across multiple years. performance_plots_dry_periods.Rmd: Prediction performance was presented for the models fit to the previously identified dry periods. performance_plots_summer.Rmd: Prediction performance was presented for the models fit to the June-September moisture averages.

    Additional analyses Some additional analyses were done that may not be published in the final manuscript but which are included here for completeness.

    2019dryperiod.Rmd: analysis, done separately for each day, of a specific dry period in 2019. alldryperiodsbyday.Rmd: analysis, done separately for each day, of the same dry periods discussed above. best_indices.R: after fitting models, this script was used to quickly identify some of the best-performing indexes for closer scrutiny. wateryearfigs.R: exploratory figures showing median and quantile interval of VWC for sensors in low and high TWI locations for each water year. Resources in this dataset:Resource Title: Digital elevation model of study region. File Name: SourceDEM.zipResource Description: .zip archive containing digital elevation model files for the study region. See dataset description for more details.Resource Title: twi-moisture-0.1: Archived git repository containing all other necessary data and code . File Name: twi-moisture-0.1.zipResource Description: .zip archive containing all data and code, other than the digital elevation model archived as a separate file. This file was generated by a GitHub release made on 2022-10-11 of the git repository hosted at https://github.com/qdread/twi-moisture (private repository). See dataset description and README file contained within this archive for more details.

  19. Ground-based magnetic local time dependent Pc5 ULF index dataset

    • zenodo.org
    zip
    Updated Jul 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Veera Lipsanen; Veera Lipsanen (2025). Ground-based magnetic local time dependent Pc5 ULF index dataset [Dataset]. http://doi.org/10.5281/zenodo.15372450
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 10, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Veera Lipsanen; Veera Lipsanen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This ULF index is constructed from 1-minute ground-based magnetic field observations provided by SuperMAG. The index is derived for the north (N) and east (E) magnetic field components from magnetometers on the northern hemisphere between 65 to 70 magnetic latitude and divided into four MLT sectors. The index is available from January 1995 to December 2023. The index is given in monthly files with following naming convention: Pc5_MLTd_index_LAT_65_70_YYYYMM_COMPONENT.dat

    date: YYYY-MM-DD
    time: hh:mm:ss

    P: Pc5 index in a given sector (nT^2)
    N: number of SuperMAG stations used to calculate the index

    Definition of MLT sectors:
    Day: 6-18 MLT
    Night: 18-6 MLT
    Dawn: 3-9 MLT
    Noon: 9-15 MLT
    Dusk: 15-21 MLT
    Midnight: 21-3 MLT

  20. Sport Activity Dataset - MTS-5

    • kaggle.com
    zip
    Updated Jul 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jarno Matarmaa (2023). Sport Activity Dataset - MTS-5 [Dataset]. https://www.kaggle.com/datasets/jarnomatarmaa/sportdata-mts-5
    Explore at:
    zip(498699 bytes)Available download formats
    Dataset updated
    Jul 13, 2023
    Authors
    Jarno Matarmaa
    License

    https://ec.europa.eu/info/legal-notice_enhttps://ec.europa.eu/info/legal-notice_en

    Description

    Description

    Dataset consists of data in categories walking, running, biking, skiing, and roller skiing (5). Sport activities have been recorded by an individual active (non-competitive) athlete. Data is pre-processed, standardized and splitted in four parts (each dimension in its own file): * HR-DATA_std_1140x69 (heart rate signals) * SPD-DATA_std_1140x69 (speed signals) * ALT-DATA_std_1140x69 (altitude signals) * META-DATA_1140x4 (labels and details)

    NOTE: Signal order between the separate files must not be confused when processing the data. Signal order is critical; first index in each of the file comes from the same activity which label corresponds to first index in the target data file, and so on. So, data should be constructed and files combined into the same table while reading the files, ideally using nested data structure. Something like in the picture below:

    You may check the related TSC projects in GitHub: - "https://github.com/JABE22/MasterProject">Sport Activity Classification Using Classical Machine Learning and Time Series Methods - Symbolic Representation of Multivariate Time Series Signals in Sport Activity Classification - Kaggle Project

    https://mediauploads.data.world/e1ccd4d36522e04c0061d12d05a87407bec80716f6fe7301991eaaccd577baa8_mts_data.png" alt="Nested data structure for multivariate time series classifiers">

    In the following picture one can see five signal samples for each dimension (Heart Rate, Speed, Altitude) in standard feature value format. So, each figure contains signal from five different random activities (can be same or different category). However, for example, signal indexes number 1 in each three figure are from the same activity. Figures just visualizes what kind of signals dataset consists. They do not have any particular meaning.

    https://mediauploads.data.world/162b7086448d8dbd202d282014bcf12bd95bd3174b41c770aa1044bab22ad655_signal_samples.png" alt="Signals from sport activities (Heart Rate, Speed, and Altitude)">

    Dataset size and construction procedure

    The original amount of sport activities is 228. From each of them, starting from the index 100 (seconds), have been picked 5 x 69 second consecutive segments, that is expressed as a formula below:

    https://mediauploads.data.world/68ce83092ec65f6fbaee90e5de6e12df40498e08fa6725c111f1205835c1a842_segment_equation.png" alt="Data segmentation and augmentation formula">

    where 𝐷 = 𝑜𝑟𝑖𝑔𝑖𝑛𝑎𝑙 𝑓𝑖𝑙𝑡𝑒𝑟𝑒𝑑 𝑑𝑎𝑡𝑎 ,𝑁 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑐𝑡𝑖𝑣𝑖𝑡𝑖𝑒𝑠 , 𝑠 = 𝑠𝑒𝑔𝑚𝑒𝑛𝑡 𝑠𝑡𝑎𝑟𝑡 𝑖𝑛𝑑𝑒𝑥 , 𝑙 = 𝑠𝑒𝑔𝑚𝑒𝑛𝑡 𝑙𝑒𝑛𝑔𝑡ℎ, and 𝑛 = 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑒𝑔𝑚𝑒𝑛𝑡𝑠 from a single original sequence 𝐷𝑖 , resulting the new set of equal length segments 𝐷𝑠𝑒𝑔. And in this certain case the equation takes the form of:

    https://mediauploads.data.world/63dd87bf3d0010923ad05a8286224526e241b17bbbce790133030d8e73f3d3a7_data_segmentation_formula.png" alt="Data segmentation and augmentation formula with values">

    Thus, dataset has dimesions of 1140 x 69 x 3.

    Additional information

    Data has been recorded without knowing it will be used in research, therefore it represents well real-world application of data source and can provide excellent tool to test algorithms in real data.

    Recording devices

    Data has been recorded using two type of Garmin devices. Models are Forerunner 920XT and vivosport. Vivosport is activity tracker and measures heart rate from the wrist using optical sensor, whereas 920XT requires external sensor belt (hear rate + inertial) installed under chest when doing exercises. Otherwise devices are not essentially different, they uses GPS location to measure speed and inertial barometer to measure elevation changes.

    Device manuals - Garmin FR-920XT - Garmin Vivosport

    Person profile

    Age: 30-31, Weight: 82, Length: 181, Active athlete (non-competitive)

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Saad Aziz (2023). Dataset for Stock Market Index of 7 Economies [Dataset]. https://www.kaggle.com/datasets/saadaziz1985/dataset-for-stock-market-index-of-7-countries
Organization logo

Dataset for Stock Market Index of 7 Economies

Time Series Dataset for Stock Market Indices of the 7 Top Economies of the World

Explore at:
zip(1917326 bytes)Available download formats
Dataset updated
Jul 4, 2023
Authors
Saad Aziz
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context:

The provided dataset is extracted from yahoo finance using pandas and yahoo finance library in python. This deals with stock market index of the world best economies. The code generated data from Jan 01, 2003 to Jun 30, 2023 that’s more than 20 years. There are 18 CSV files, dataset is generated for 16 different stock market indices comprising of 7 different countries. Below is the list of countries along with number of indices extracted through yahoo finance library, while two CSV files deals with annualized return and compound annual growth rate (CAGR) has been computed from the extracted data.

Number of Countries & Index:

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F15657145%2F90ce8a986761636e3edbb49464b304d8%2FNumber%20of%20Index.JPG?generation=1688490342207096&alt=media" alt="">

Content:

Unit of analysis: Stock Market Index Analysis

This dataset is useful for research purposes, particularly for conducting comparative analyses involving capital market performance and could be used along with other economic indicators.

There are 18 distinct CSV files associated with this dataset. First 16 CSV files deals with number of indices and last two CSV file deals with annualized return of each year and CAGR of each index. If data in any column is blank, it portrays that index was launch in later years, for instance: Bse500 (India), this index launch in 2007, so earlier values are blank, similarly China_Top300 index launch in year 2021 so early fields are blank too.

The extraction process involves applying different criteria, like in 16 CSV files all columns are included, Adj Close is used to calculate annualized return. The algorithm extracts data based on index name (code given by the yahoo finance) according start and end date.

Annualized return and CAGR has been calculated and illustrated in below image along with machine readable file (CSV) attached to that.

To extract the data provided in the attachment, various criteria were applied:

  1. Content Filtering: The data was filtered based on several attributes, including the index name, start and end date. This filtering process ensured that only relevant data meeting the specified criteria.

  2. Collaborative Filtering: Another filtering technique used was collaborative filtering using yahoo finance, which relies on index similarity. This approach involves finding indices that are similar to other index or extended dataset scope to other countries or economies. By leveraging this method, the algorithm identifies and extracts data based on similarities between indices.

In the last two CSV files, one belongs to annualized return, that was calculated based on the Adj close column and new DataFrame created to store its outcome. Below is the image of annualized returns of all index (if unreadable, machine-readable or CSV format is attached with the dataset).

Annualized Return:

As far as annualised rate of return is concerned, most of the time India stock market indices leading, followed by USA, Canada and Japan stock market indices.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F15657145%2F37645bd90623ea79f3708a958013c098%2FAnnualized%20Return.JPG?generation=1688525901452892&alt=media" alt="">

Compound Annual Growth Rate (CAGR):

The best performing index based on compound growth is Sensex (India) that comprises of top 30 companies is 15.60%, followed by Nifty500 (India) that is 11.34% and Nasdaq (USA) all is 10.60%.

The worst performing index is China top300, however this is launch in 2021 (post pandemic), so would not possible to examine at that stage (due to less data availability). Furthermore, UK and Russia indices are also top 5 in the worst order.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F15657145%2F58ae33f60a8800749f802b46ec1e07e7%2FCAGR.JPG?generation=1688490409606631&alt=media" alt="">

Geography: Stock Market Index of the World Top Economies

Time period: Jan 01, 2003 – June 30, 2023

Variables: Stock Market Index Title, Open, High, Low, Close, Adj Close, Volume, Year, Month, Day, Yearly_Return and CAGR

File Type: CSV file

Inspiration:

  • Time series prediction model
  • Investment opportunities in world best economies
  • Comparative Analysis of past data with other stock market indices or other indices

Disclaimer:

This is not a financial advice; due diligence is required in each investment decision.

Search
Clear search
Close search
Google apps
Main menu