100+ datasets found
  1. Dataset for Stock Market Index of 7 Economies

    • kaggle.com
    zip
    Updated Jul 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saad Aziz (2023). Dataset for Stock Market Index of 7 Economies [Dataset]. https://www.kaggle.com/datasets/saadaziz1985/dataset-for-stock-market-index-of-7-countries
    Explore at:
    zip(1917326 bytes)Available download formats
    Dataset updated
    Jul 4, 2023
    Authors
    Saad Aziz
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context:

    The provided dataset is extracted from yahoo finance using pandas and yahoo finance library in python. This deals with stock market index of the world best economies. The code generated data from Jan 01, 2003 to Jun 30, 2023 that’s more than 20 years. There are 18 CSV files, dataset is generated for 16 different stock market indices comprising of 7 different countries. Below is the list of countries along with number of indices extracted through yahoo finance library, while two CSV files deals with annualized return and compound annual growth rate (CAGR) has been computed from the extracted data.

    Number of Countries & Index:

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F15657145%2F90ce8a986761636e3edbb49464b304d8%2FNumber%20of%20Index.JPG?generation=1688490342207096&alt=media" alt="">

    Content:

    Unit of analysis: Stock Market Index Analysis

    This dataset is useful for research purposes, particularly for conducting comparative analyses involving capital market performance and could be used along with other economic indicators.

    There are 18 distinct CSV files associated with this dataset. First 16 CSV files deals with number of indices and last two CSV file deals with annualized return of each year and CAGR of each index. If data in any column is blank, it portrays that index was launch in later years, for instance: Bse500 (India), this index launch in 2007, so earlier values are blank, similarly China_Top300 index launch in year 2021 so early fields are blank too.

    The extraction process involves applying different criteria, like in 16 CSV files all columns are included, Adj Close is used to calculate annualized return. The algorithm extracts data based on index name (code given by the yahoo finance) according start and end date.

    Annualized return and CAGR has been calculated and illustrated in below image along with machine readable file (CSV) attached to that.

    To extract the data provided in the attachment, various criteria were applied:

    1. Content Filtering: The data was filtered based on several attributes, including the index name, start and end date. This filtering process ensured that only relevant data meeting the specified criteria.

    2. Collaborative Filtering: Another filtering technique used was collaborative filtering using yahoo finance, which relies on index similarity. This approach involves finding indices that are similar to other index or extended dataset scope to other countries or economies. By leveraging this method, the algorithm identifies and extracts data based on similarities between indices.

    In the last two CSV files, one belongs to annualized return, that was calculated based on the Adj close column and new DataFrame created to store its outcome. Below is the image of annualized returns of all index (if unreadable, machine-readable or CSV format is attached with the dataset).

    Annualized Return:

    As far as annualised rate of return is concerned, most of the time India stock market indices leading, followed by USA, Canada and Japan stock market indices.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F15657145%2F37645bd90623ea79f3708a958013c098%2FAnnualized%20Return.JPG?generation=1688525901452892&alt=media" alt="">

    Compound Annual Growth Rate (CAGR):

    The best performing index based on compound growth is Sensex (India) that comprises of top 30 companies is 15.60%, followed by Nifty500 (India) that is 11.34% and Nasdaq (USA) all is 10.60%.

    The worst performing index is China top300, however this is launch in 2021 (post pandemic), so would not possible to examine at that stage (due to less data availability). Furthermore, UK and Russia indices are also top 5 in the worst order.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F15657145%2F58ae33f60a8800749f802b46ec1e07e7%2FCAGR.JPG?generation=1688490409606631&alt=media" alt="">

    Geography: Stock Market Index of the World Top Economies

    Time period: Jan 01, 2003 – June 30, 2023

    Variables: Stock Market Index Title, Open, High, Low, Close, Adj Close, Volume, Year, Month, Day, Yearly_Return and CAGR

    File Type: CSV file

    Inspiration:

    • Time series prediction model
    • Investment opportunities in world best economies
    • Comparative Analysis of past data with other stock market indices or other indices

    Disclaimer:

    This is not a financial advice; due diligence is required in each investment decision.

  2. D

    Data from: U-Index, a dataset and an impact metric for informatics tools and...

    • datasetcatalog.nlm.nih.gov
    • data.niaid.nih.gov
    • +2more
    Updated Feb 22, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Winnenburg, Rainer; Shah, Nigam H.; Callahan, Alison (2019). U-Index, a dataset and an impact metric for informatics tools and databases [Dataset]. http://doi.org/10.5061/dryad.gj651
    Explore at:
    Dataset updated
    Feb 22, 2019
    Authors
    Winnenburg, Rainer; Shah, Nigam H.; Callahan, Alison
    Description

    Measuring the usage of informatics resources such as software tools and databases is essential to quantifying their impact, value and return on investment. We have developed a publicly available dataset of informatics resource publications and their citation network, along with an associated metric (u-Index) to measure informatics resources’ impact over time. Our dataset differentiates the context in which citations occur to distinguish between ‘awareness’ and ‘usage’, and uses a citing universe of open access publications to derive citation counts for quantifying impact. Resources with a high ratio of usage citations to awareness citations are likely to be widely used by others and have a high u-Index score. We have pre-calculated the u-Index for nearly 100,000 informatics resources. We demonstrate how the u-Index can be used to track informatics resource impact over time. The method of calculating the u-Index metric, the pre-computed u-Index values, and the dataset we compiled to calculate the u-Index are publicly available.

  3. Mapping Uncertainty Due to Missing Data in the Global Ocean Health Index

    • plos.figshare.com
    tiff
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Melanie Frazier; Catherine Longo; Benjamin S. Halpern (2023). Mapping Uncertainty Due to Missing Data in the Global Ocean Health Index [Dataset]. http://doi.org/10.1371/journal.pone.0160377
    Explore at:
    tiffAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Melanie Frazier; Catherine Longo; Benjamin S. Halpern
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Indicators are increasingly used to measure environmental systems; however, they are often criticized for failing to measure and describe uncertainty. Uncertainty is particularly difficult to evaluate and communicate in the case of composite indicators which aggregate many indicators of ecosystem condition. One of the ongoing goals of the Ocean Health Index (OHI) has been to improve our approach to dealing with missing data, which is a major source of uncertainty. Here we: (1) quantify the potential influence of gapfilled data on index scores from the 2015 global OHI assessment; (2) develop effective methods of tracking, quantifying, and communicating this information; and (3) provide general guidance for implementing gapfilling procedures for existing and emerging indicators, including regional OHI assessments. For the overall OHI global index score, the percent contribution of gapfilled data was relatively small (18.5%); however, it varied substantially among regions and goals. In general, smaller territorial jurisdictions and the food provision and tourism and recreation goals required the most gapfilling. We found the best approach for managing gapfilled data was to mirror the general framework used to organize, calculate, and communicate the Index data and scores. Quantifying gapfilling provides a measure of the reliability of the scores for different regions and components of an indicator. Importantly, this information highlights the importance of the underlying datasets used to calculate composite indicators and can inform and incentivize future data collection.

  4. Case Mix Index

    • data.chhs.ca.gov
    • data.ca.gov
    • +2more
    docx, pdf, xlsx, zip
    Updated Nov 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Health Care Access and Information (2025). Case Mix Index [Dataset]. https://data.chhs.ca.gov/dataset/case-mix-index
    Explore at:
    docx, pdf, xlsx(192727), zipAvailable download formats
    Dataset updated
    Nov 6, 2025
    Dataset authored and provided by
    Department of Health Care Access and Information
    Description

    The Case Mix Index (CMI) is the average relative DRG weight of a hospital’s inpatient discharges, calculated by summing the Medicare Severity-Diagnosis Related Group (MS-DRG) weight for each discharge and dividing the total by the number of discharges. The CMI reflects the diversity, clinical complexity, and resource needs of all the patients in the hospital. A higher CMI indicates a more complex and resource-intensive case load. Although the MS-DRG weights, provided by the Centers for Medicare & Medicaid Services (CMS), were designed for the Medicare population, they are applied here to all discharges regardless of payer. Note: It is not meaningful to add the CMI values together.

  5. Report on Evaluation of the Interaction-based Hazard Index Formula with Data...

    • catalog.data.gov
    • s.cnmilf.com
    Updated Aug 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2024). Report on Evaluation of the Interaction-based Hazard Index Formula with Data on Trihalomethanes [Dataset]. https://catalog.data.gov/dataset/report-on-evaluation-of-the-interaction-based-hazard-index-formula-with-data-on-trihalomet
    Explore at:
    Dataset updated
    Aug 3, 2024
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    The endpoints selected for evaluation of the HIINT formula were percent relative liver weight of mice (PcLiv) and the logarithm of ALT [Log(ALT)], where the log transformation was used to help stabilize the increases in variance with dose found in the ALT dataset.

  6. AI Global Index

    • kaggle.com
    zip
    Updated Apr 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kateryna Meleshenko (2023). AI Global Index [Dataset]. https://www.kaggle.com/datasets/katerynameleshenko/ai-index/code
    Explore at:
    zip(2441 bytes)Available download formats
    Dataset updated
    Apr 26, 2023
    Authors
    Kateryna Meleshenko
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    The Dataset "AI Global index" includes The Global AI Index itself and seven indicators affecting the Index on 62 countries, as well as general information about the countries (region, cluster, income group and political regime).

    The Global AI Index is the first index to benchmark nations on their level of investment, innovation and implementation of artificial intelligence.

    Talent, Infrastructure and Operating Environment are the factors of AI Implementation group of indicators, which represents the application of artificial intelligence by professionals in various sectors, such as businesses, governments, and communities. - Talent indicator focuses on the availability of skilled practitioners for the provision of artificial intelligence solutions. - Infrastructure indicator focuses on the reliability and scale of access infrastructure, from electricity and internet, to super computing capabilities. - Operating Environment indicator focuses on the regulatory context, and public opinion surrounding artificial intelligence.

    Research and Development are the factors of Innovation group of indicators, which reflects the progress made in technology and methodology, which signify the potential for artificial intelligence to evolve and improve. - Research indicator focuses on the extent of specialist research and researchers; investigating the amount of publications and citations in credible academic journals. - Development indicator focuses on the development of fundamental platforms and algorithms upon which innovative artificial intelligence projects rely.

    Government Strategy and Commercial are the factors of Investment group of indicators, which reflects financial and procedural commitments to artificial intelligence. - Government Strategy indicator focuses on the depth of commitment from national government to artificial intelligence; investigating spending commitments and national strategies. - Commercial indicator focuses on the level of startup activity, investment and business initiatives based on artificial intelligence.

    All these seven indicators were calculated by Tortoise Media via weighting and summarizing 143 other indicators.

    The dataset can be used for practicing data cleaning, data visualization, finding correlations between the indexes, Machine Learning (classification, regression, clustering).

    The data was used in the analytical article research Artificial Intelligence on the World Stage: Dominant Players and Aspiring Challengers

  7. E

    Historic Gridded Standardised Precipitation Index for the United Kingdom...

    • catalogue.ceh.ac.uk
    • hosted-metadata.bgs.ac.uk
    • +3more
    text/directory
    Updated Oct 11, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    M. Tanguy; M. Fry; C. Svensson; J. Hannaford (2017). Historic Gridded Standardised Precipitation Index for the United Kingdom 1862-2015 (generated using gamma distribution with standard period 1961-2010) v4 [Dataset]. http://doi.org/10.5285/233090b2-1d14-4eb9-9f9c-3923ea2350ff
    Explore at:
    text/directoryAvailable download formats
    Dataset updated
    Oct 11, 2017
    Dataset provided by
    NERC EDS Environmental Information Data Centre
    Authors
    M. Tanguy; M. Fry; C. Svensson; J. Hannaford
    License

    https://eidc.ceh.ac.uk/licences/historic-SPI/plainhttps://eidc.ceh.ac.uk/licences/historic-SPI/plain

    Time period covered
    Jan 1, 1862 - Dec 31, 2015
    Area covered
    Description

    5km gridded Standardised Precipitation Index (SPI) data for Great Britain, which is a drought index based on the probability of precipitation for a given accumulation period as defined by McKee et al [1]. There are seven accumulation periods: 1, 3, 6, 9, 12, 18, 24 months and for each period SPI is calculated for each of the twelve calendar months. Note that values in monthly (and for longer accumulation periods also annual) time series of the data therefore are likely to be autocorrelated. The standard period which was used to fit the gamma distribution is 1961-2010. The dataset covers the period from 1862 to 2015. This version supersedes previous versions (version 2 and 3) of the same dataset due to minor errors in the data files. NOTE: the difference between this dataset with the previously published dataset 'Gridded Standardized Precipitation Index (SPI) using gamma distribution with standard period 1961-2010 for Great Britain [SPIgamma61-10]' (Tanguy et al., 2015; https://doi.org/10.5285/94c9eaa3-a178-4de4-8905-dbfab03b69a0) , apart from the temporal and spatial extent, is the underlying rainfall data from which SPI was calculated. In the previously published dataset, CEH-GEAR (Tanguy et al., 2014; https://doi.org/10.5285/5dc179dc-f692-49ba-9326-a6893a503f6e) was used, whereas in this new version, Met Office 5km rainfall grids were used (see supporting information for more details). The methodology to calculate SPI is the same in the two datasets. [1] McKee, T. B., Doesken, N. J., Kleist, J. (1993). The Relationship of Drought Frequency and Duration to Time Scales. Eighth Conference on Applied Climatology, 17-22 January 1993, Anaheim, California.

  8. B

    Bangladesh BD: Net Barter Terms of Trade Index

    • ceicdata.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com, Bangladesh BD: Net Barter Terms of Trade Index [Dataset]. https://www.ceicdata.com/en/bangladesh/trade-index/bd-net-barter-terms-of-trade-index
    Explore at:
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 1, 2009 - Dec 1, 2020
    Area covered
    Bangladesh
    Variables measured
    Merchandise Trade
    Description

    Bangladesh BD: Net Barter Terms of Trade Index data was reported at 68.332 2000=100 in 2020. This records an increase from the previous number of 65.803 2000=100 for 2019. Bangladesh BD: Net Barter Terms of Trade Index data is updated yearly, averaging 103.596 2000=100 from Dec 1980 (Median) to 2020, with 41 observations. The data reached an all-time high of 162.264 2000=100 in 1985 and a record low of 57.575 2000=100 in 2011. Bangladesh BD: Net Barter Terms of Trade Index data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s Bangladesh – Table BD.World Bank.WDI: Trade Index. Net barter terms of trade index is calculated as the percentage ratio of the export unit value indexes to the import unit value indexes, measured relative to the base year 2000. Unit value indexes are based on data reported by countries that demonstrate consistency under UNCTAD quality controls, supplemented by UNCTAD's estimates using the previous year’s trade values at the Standard International Trade Classification three-digit level as weights. To improve data coverage, especially for the latest periods, UNCTAD constructs a set of average prices indexes at the three-digit product classification of the Standard International Trade Classification revision 3 using UNCTAD’s Commodity Price Statistics, international and national sources, and UNCTAD secretariat estimates and calculates unit value indexes at the country level using the current year's trade values as weights.;United Nations Conference on Trade and Development, Handbook of Statistics and data files, and International Monetary Fund, International Financial Statistics.;;

  9. Zillow Home Value Index (Updated Monthly)

    • kaggle.com
    zip
    Updated Oct 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rob Mulla (2025). Zillow Home Value Index (Updated Monthly) [Dataset]. https://www.kaggle.com/datasets/robikscube/zillow-home-value-index
    Explore at:
    zip(273663 bytes)Available download formats
    Dataset updated
    Oct 21, 2025
    Authors
    Rob Mulla
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Reference: https://www.zillow.com/research/zhvi-methodology/

    Official Background

    In setting out to create a new home price index, a major problem Zillow sought to overcome in existing indices was their inability to deal with the changing composition of properties sold in one time period versus another time period. Both a median sale price index and a repeat sales index are vulnerable to such biases (see the analysis here for an example of how influential the bias can be). For example, if expensive homes sell at a disproportionately higher rate than less expensive homes in one time period, a median sale price index will characterize this market as experiencing price appreciation relative to the prior period of time even if the true value of homes is unchanged between the two periods.

    The ideal home price index would be based off sale prices for the same set of homes in each time period so there was never an issue of the sales mix being different across periods. This approach of using a constant basket of goods is widely used, common examples being a commodity price index and a consumer price index. Unfortunately, unlike commodities and consumer goods, for which we can observe prices in all time periods, we can’t observe prices on the same set of homes in all time periods because not all homes are sold in every time period.

    The innovation that Zillow developed in 2005 was a way of approximating this ideal home price index by leveraging the valuations Zillow creates on all homes (called Zestimates). Instead of actual sale prices on every home, the index is created from estimated sale prices on every home. While there is some estimation error associated with each estimated sale price (which we report here), this error is just as likely to be above the actual sale price of a home as below (in statistical terms, this is referred to as minimal systematic error). Because of this fact, the distribution of actual sale prices for homes sold in a given time period looks very similar to the distribution of estimated sale prices for this same set of homes. But, importantly, Zillow has estimated sale prices not just for the homes that sold, but for all homes even if they didn’t sell in that time period. From this data, a comprehensive and robust benchmark of home value trends can be computed which is immune to the changing mix of properties that sell in different periods of time (see Dorsey et al. (2010) for another recent discussion of this approach).

    For an in-depth comparison of the Zillow Home Value Index to the Case Shiller Home Price Index, please refer to the Zillow Home Value Index Comparison to Case-Shiller

    Each Zillow Home Value Index (ZHVI) is a time series tracking the monthly median home value in a particular geographical region. In general, each ZHVI time series begins in April 1996. We generate the ZHVI at seven geographic levels: neighborhood, ZIP code, city, congressional district, county, metropolitan area, state and the nation.

    Underlying Data

    Estimated sale prices (Zestimates) are computed based on proprietary statistical and machine learning models. These models begin the estimation process by subdividing all of the homes in United States into micro-regions, or subsets of homes either near one another or similar in physical attributes to one another. Within each micro-region, the models observe recent sale transactions and learn the relative contribution of various home attributes in predicting the sale price. These home attributes include physical facts about the home and land, prior sale transactions, tax assessment information and geographic location. Based on the patterns learned, these models can then estimate sale prices on homes that have not yet sold.

    The sale transactions from which the models learn patterns include all full-value, arms-length sales that are not foreclosure resales. The purpose of the Zestimate is to give consumers an indication of the fair value of a home under the assumption that it is sold as a conventional, non-foreclosure sale. Similarly, the purpose of the Zillow Home Value Index is to give consumers insight into the home value trends for homes that are not being sold out of foreclosure status. Zillow research indicates that homes sold as foreclosures have typical discounts relative to non-foreclosure sales of between 20 and 40 percent, depending on the foreclosure saturation of the market. This is not to say that the Zestimate is not influenced by foreclosure resales. Zestimates are, in fact, influenced by foreclosure sales, but the pathway of this influence is through the downward pressure foreclosure sales put on non-foreclosure sale prices. It is the price signal observed in the latter that we are attempting to measure and, in turn, predict with the Zestimate.

    Market Segments Within each region, we calculate the ZHVI for various subsets of homes (or mar...

  10. Air Quality Index Hourly_2.5 (2016-2023)

    • kaggle.com
    zip
    Updated Mar 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sheikh Rahmatulla Sakib (2023). Air Quality Index Hourly_2.5 (2016-2023) [Dataset]. https://www.kaggle.com/datasets/kishorsakib099413/air-quality-index-hourly-25-2016-2023
    Explore at:
    zip(480196 bytes)Available download formats
    Dataset updated
    Mar 21, 2023
    Authors
    Sheikh Rahmatulla Sakib
    Description

    The dataset used in this study is the Dhaka City (2016-2023) PM2.5 (Hourly) Air Quality Index dataset available at airnow.gov. AirNow.gov is a website administered by the United States of America. Environmental Protection Agency (EPA), which provides information on air quality around the world. It provides real-time air quality data from thousands of monitoring stations across the city. Measurements of ozone, particulate matter, carbon monoxide, sulfur dioxide, and nitrogen dioxide are available on AirNow.gov. This dataset contains information on AQI's performance in autonomous surveys from 2016 to 2023. The dataset consists of 58212 rows and 7 columns, including categorical and numerical data. Missing data are imputed using the mean for each variable.

    PM2.5 refers to atmospheric particles within a certain size range and has been listed as an important air pollutant by regulatory bodies around the world. However, the Air Quality Index (AQI) is calculated based on the concentrations of several air pollutants ozone (O3), carbon monoxide (CO), sulfur dioxide (SO2), nitrogen dioxide (NO2), particulate matter (PM2.5 and PM10). . from). UTC.The NowCast concentrations in this dataset are intended to provide estimates of current air pollution levels, taking into account recent changes in air quality. NowCast PM2.5 concentration calculation formula is: NowCast Conc = (L1 x 0.5) + (L2 x 0.25) + (L3 x 0.125) +(L4 x 0.0625) + (L5 x 0.03125) where L1 is the current hourly PM2.5 concentration and L2 is the PM2 concentration. 5. One hour ago, L3 was the PM2.5 concentration two hours ago, L4 is the PM2.5 concentration three hours ago, and L5 is the PM2.5 concentration four hours ago. The Raw Concentration of pollutants refers to the actually measured concentration of pollutants in the air at a given place and time. Raw Concentration = (mass of pollutant) / (volume of air) To calculate the AQI, first calculate the concentration of each pollutant of interest (Raw or NowCast) over a period of time (eg, 24 hours). Once you have a concentration value, you can use the AQI formula and the cutoff point specific to that pollutant to calculate the AQI value. The AQI formula and breakpoints are: AQI = (IHi - ILO ) / (BHi - BLO) * (C - BLO) + ILO IHi and ILO are the AQI values for the high and low end of the interval breakpoint, respectively. BHi and BLO are the upper and lower limits of the range, respectively. C is the PM2.5 rawor NowCast concentration

  11. u

    Data from: Data and code from: Topographic wetness index as a proxy for soil...

    • agdatacommons.nal.usda.gov
    • s.cnmilf.com
    • +1more
    zip
    Updated Nov 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    H. Edwin Winzeler; Quentin D. Read (2025). Data and code from: Topographic wetness index as a proxy for soil moisture in a hillslope catena: flow algorithms and map generalization [Dataset]. http://doi.org/10.15482/USDA.ADC/1528088
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 21, 2025
    Dataset provided by
    Ag Data Commons
    Authors
    H. Edwin Winzeler; Quentin D. Read
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    This dataset contains all data and code necessary to reproduce the analysis presented in the manuscript: Winzeler, H.E., Owens, P.R., Read Q.D.., Libohova, Z., Ashworth, A., Sauer, T. 2022. 2022. Topographic wetness index as a proxy for soil moisture in a hillslope catena: flow algorithms and map generalization. Land 11:2018. DOI: 10.3390/land11112018. There are several steps to this analysis. The relevant scripts for each are listed below. The first step is to use the raw digital elevation data (DEM) to produce different versions of the topographic wetness index (TWI) for the study region (Calculating TWI). Then, these TWI output files are processed, along with soil moisture (volumetric water content or VWC) time series data from a number of sensors located within the study region, to create analysis-ready data objects (Processing TWI and VWC). Next, models are fit relating TWI to soil moisture (Model fitting) and results are plotted (Visualizing main results). A number of additional analyses were also done (Additional analyses). Input data The DEM of the study region is archived in this dataset as SourceDem.zip. This contains the DEM of the study region (DEM1.sgrd) and associated auxiliary files all called DEM1.* with different extensions. In addition, the DEM is provided as a .tif file called USGS_one_meter_x39y400_AR_R6_WashingtonCO_2015.tif. The remaining data and code files are archived in the repository created with a GitHub release on 2022-10-11, twi-moisture-0.1.zip. The data are found in a subfolder called data.

    2017_LoggerData_HEW.csv through 2021_HEW.csv: Soil moisture (VWC) logger data for each year 2017-2021 (5 files total). 2882174.csv: weather data from a nearby station. DryPeriods2017-2021.csv: starting and ending days for dry periods 2017-2021. LoggerLocations.csv: Geographic locations and metadata for each VWC logger. Logger_Locations_TWI_2017-2021.xlsx: 546 topographic wetness indexes calculated at each VWC logger location. note: This is intermediate input created in the first step of the pipeline.

    Code pipeline To reproduce the analysis in the manuscript run these scripts in the following order. The scripts are all found in the root directory of the repository. See the manuscript for more details on the methods. Calculating TWI

    TerrainAnalysis.R: Taking the DEM file as input, calculates 546 different topgraphic wetness indexes using a variety of different algorithms. Each algorithm is run multiple times with different input parameters, as described in more detail in the manuscript. After performing this step, it is necessary to use the SAGA-GIS GUI to extract the TWI values for each of the sensor locations. The output generated in this way is included in this repository as Logger_Locations_TWI_2017-2021.xlsx. Therefore it is not necessary to rerun this step of the analysis but the code is provided for completeness.

    Processing TWI and VWC

    read_process_data.R: Takes raw TWI and moisture data files and processes them into analysis-ready format, saving the results as CSV. qc_avg_moisture.R: Does additional quality control on the moisture data and averages it across different time periods.

    Model fitting Models were fit regressing soil moisture (average VWC for a certain time period) against a TWI index, with and without soil depth as a covariate. In each case, for both the model without depth and the model with depth, prediction performance was calculated with and without spatially-blocked cross-validation. Where cross validation wasn't used, we simply used the predictions from the model fit to all the data.

    fit_combos.R: Models were fit to each combination of soil moisture averaged over 57 months (all months from April 2017-December 2021) and 546 TWI indexes. In addition models were fit to soil moisture averaged over years, and to the grand mean across the full study period. fit_dryperiods.R: Models were fit to soil moisture averaged over previously identified dry periods within the study period (each 1 or 2 weeks in length), again for each of the 546 indexes. fit_summer.R: Models were fit to the soil moisture average for the months of June-September for each of the five years, again for each of the 546 indexes.

    Visualizing main results Preliminary visualization of results was done in a series of RMarkdown notebooks. All the notebooks follow the same general format, plotting model performance (observed-predicted correlation) across different combinations of time period and characteristics of the TWI indexes being compared. The indexes are grouped by SWI versus TWI, DEM filter used, flow algorithm, and any other parameters that varied. The notebooks show the model performance metrics with and without the soil depth covariate, and with and without spatially-blocked cross-validation. Crossing those two factors, there are four values for model performance for each combination of time period and TWI index presented.

    performance_plots_bymonth.Rmd: Using the results from the models fit to each month of data separately, prediction performance was averaged by month across the five years of data to show within-year trends. performance_plots_byyear.Rmd: Using the results from the models fit to each month of data separately, prediction performance was averaged by year to show trends across multiple years. performance_plots_dry_periods.Rmd: Prediction performance was presented for the models fit to the previously identified dry periods. performance_plots_summer.Rmd: Prediction performance was presented for the models fit to the June-September moisture averages.

    Additional analyses Some additional analyses were done that may not be published in the final manuscript but which are included here for completeness.

    2019dryperiod.Rmd: analysis, done separately for each day, of a specific dry period in 2019. alldryperiodsbyday.Rmd: analysis, done separately for each day, of the same dry periods discussed above. best_indices.R: after fitting models, this script was used to quickly identify some of the best-performing indexes for closer scrutiny. wateryearfigs.R: exploratory figures showing median and quantile interval of VWC for sensors in low and high TWI locations for each water year. Resources in this dataset:Resource Title: Digital elevation model of study region. File Name: SourceDEM.zipResource Description: .zip archive containing digital elevation model files for the study region. See dataset description for more details.Resource Title: twi-moisture-0.1: Archived git repository containing all other necessary data and code . File Name: twi-moisture-0.1.zipResource Description: .zip archive containing all data and code, other than the digital elevation model archived as a separate file. This file was generated by a GitHub release made on 2022-10-11 of the git repository hosted at https://github.com/qdread/twi-moisture (private repository). See dataset description and README file contained within this archive for more details.

  12. Yost Index with 90% confidence intervals (with all contributing source files...

    • figshare.com
    zip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Francis P. Boscoe; Bian Liu; Furrina F. Lee; Li Niu; jordana lafantasie (2023). Yost Index with 90% confidence intervals (with all contributing source files - LARGE) [Dataset]. http://doi.org/10.6084/m9.figshare.16649773.v3
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Francis P. Boscoe; Bian Liu; Furrina F. Lee; Li Niu; jordana lafantasie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We extend our previous work with the Yost Index by adding 90% confidence intervals to the index values. These were calculated using the variance replicate estimates published in association with the American Community Survey of the United States Census Bureau.

    In the file yost-tract-2015-2019.csv, the data fields consists of 11-digit geographic ID built from FIPS codes (2 digit state, 3 digit county, 6 digit census tract); Yost index, 90% lower confidence interval; 90% upper confidence interval. Data is provided for 72,793 census tracts for which sufficient data were available. The Yost Index ranges from 1 (lowest socioeconomic position) to 100 (highest socioeconomic position).

    For those only interested in using the index as we have calculated it, the file yost-tract-2015-2019 is the only file you need. The other 368 files here are provided for anyone who wishes to replicate our results using the R program yost-conf-intervals.R. The program presumes the user is running Windows machine and that all files reside in a folder called C:/yostindex. The R program requires a number of packages, all of which are specified in lines 10-22 of the program.

    Details of this project were published in Boscoe FP, Liu B, LaFantasie J, Niu L, Lee FF. Estimating uncertainty in a socioeconomic index derived from the American Community Survey. SSM-Population Health 2022; 18: 101078. Full text

    Additional years of data following this format are planned to be added to this repository in time.

  13. Consumer Price Index (CPI)

    • catalog.data.gov
    • datasets.ai
    Updated May 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bureau of Labor Statistics (2022). Consumer Price Index (CPI) [Dataset]. https://catalog.data.gov/dataset/consumer-price-index-cpi-ee18b
    Explore at:
    Dataset updated
    May 16, 2022
    Dataset provided by
    Bureau of Labor Statisticshttp://www.bls.gov/
    Description

    The Consumer Price Index (CPI) is a measure of the average change over time in the prices paid by urban consumers for a market basket of consumer goods and services. Indexes are available for the U.S. and various geographic areas. Average price data for select utility, automotive fuel, and food items are also available. Prices for the goods and services used to calculate the CPI are collected in 75 urban areas throughout the country and from about 23,000 retail and service establishments. Data on rents are collected from about 43,000 landlords or tenants. More information and details about the data provided can be found at http://www.bls.gov/cpi

  14. Data from: Global Aridity Index and Potential Evapotranspiration (ET0)...

    • figshare.com
    jpeg
    Updated Jul 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Antonio Trabucco; Robert Zomer (2025). Global Aridity Index and Potential Evapotranspiration (ET0) Climate Database v2 [Dataset]. http://doi.org/10.6084/m9.figshare.7504448.v3
    Explore at:
    jpegAvailable download formats
    Dataset updated
    Jul 17, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Antonio Trabucco; Robert Zomer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Global Aridity Index (Global-Aridity_ET0) and Global Reference Evapotranspiration (Global-ET0) Version 2 dataset provides high-resolution (30 arc-seconds) global raster climate data for the 1970-2000 period, related to evapotranspiration processes and rainfall deficit for potential vegetative growth, based upon the implementation of a Penman Monteith Evapotranspiration equation for reference crop. The dataset follows the development and is based upon the WorldClim 2.0: http://worldclim.org/version2 Aridity Index represent the ratio between precipitation and ET0, thus rainfall over vegetation water demand (aggregated on annual basis). Under this formulation, Aridity Index values increase for more humid conditions, and decrease with more arid conditions. The Aridity Index values reported within the Global Aridity Index_ET0 geodataset have been multiplied by a factor of 10,000 to derive and distribute the data as integers (with 4 decimal accuracy). This multiplier has been used to increase the precision of the variable values without using decimals.The Global-Aridity_ET0 and Global-ET0 datasets are provided for non-commercial use in standard GeoTiff format, at 30 arc seconds or ~ 1km at the equator.

  15. Drought and Moisture Surplus for the Conterminous United States, Annual Data...

    • catalog.data.gov
    • colorado-river-portal.usgs.gov
    • +11more
    Updated Nov 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Forest Service (2025). Drought and Moisture Surplus for the Conterminous United States, Annual Data 1-Year Windows (Image Service) [Dataset]. https://catalog.data.gov/dataset/drought-and-moisture-surplus-for-the-conterminous-united-states-annual-data-1-year-windows-6243b
    Explore at:
    Dataset updated
    Nov 14, 2025
    Dataset provided by
    U.S. Department of Agriculture Forest Servicehttp://fs.fed.us/
    Area covered
    Contiguous United States, United States
    Description

    The Moisture Deficit and Surplus map uses moisture difference z-score (MDZ) datasets developed by scientists Frank Koch, John Coulston, and William Smith of the Forest Service Southern Research Station to represent drought and moisture surplus across the contiguous United States. A z-score is a statistical method for assessing how different a value is from the mean. Mean moisture values over 1-year windows were derived from monthly historical precipitation and temperature data from PRISM, between 1900 and 2023, and compared against a 1900-2017 baseline. The greater the z-value, the larger the departure from average conditions, indicating larger moisture deficits (droughts) or surpluses. Thus, the dark orange areas on the map indicate a 1-year window with extreme drought, relative to the average conditions over the past century. For further reading on the methodology used to build these maps, see the publication here: https://www.fs.usda.gov/treesearch/pubs/43361Detailed technical methods for this analysis are available here: https://www.fs.usda.gov/treesearch/pubs/43361. This is derived from monthly PRISM temperature and precipitation data, located here: ftp://prism.nacse.org/monthly/. Monthly temperature data are used to calculate potential evapotranspiration (PET) using the Thornthwaite PET equation. Monthly precipitation and PET data are then used to calculate a moisture index (MI) for each month within a 1-year time window. The mean moisture index (MMI) across the months of the target window is compared to an appropriate long-term normal, in this case the average of the MMI for all windows between 1900 and 2017. Then, a moisture difference z-score (MDZ) is calculated from the MMI for the window of interest. This is done by subtracting the 1900-2017 normal MMI from the MMI for a given year, and then dividing by the standard deviation over the baseline period. Equations for calculating modified moisture index are adopted from Willmott, C.J. and Feddema, J.J. 1992. A more rational climatic moisture index. Professional Geographer 44(1): 84-87. The z-score values were then reclassified using the classification scheme below: z-score less than -2 -- extremely dry compared to normal conditions z-score -2 to -1.5 -- severely dry compared to normal conditions z-score -1.5 to -1 -- moderately dry compared to normal conditions z-score -1 to -0.5 - mildly dry compared to normal conditions z-score -0.5 to 0.5 -- near normal conditions z-score 0.5 to 1 -- mildly wet compared to normal conditions z-score 1 to 1.5 -- moderately wet compared to normal conditions z-score 1.5 to 2 -- severely wet compared to normal conditions z-score more than 2 -- extremely wet compared to normal conditions.

  16. HadEX3: Global land-surface climate extremes indices v3.0.4 (1901-2018)

    • catalogue.ceda.ac.uk
    Updated Mar 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Robert J. H. Dunn; Lisa Alexander; Markus Donat; Xuebin Zhang; Margot Bador; Nicholas Herold; Tanya Lippmann; Robert J. Allan; Enric Aguilar; Abdoul Aziz; Manola Brunet; John Caesar; Guillaume Chagnaud; Vincent Cheng; Thelma Cinco; Imke Durre; Rosaline de Guzman; Tin Mar Htay; Wan Maisarah Wan Ibadullah; Muhammad Khairul Izzat Bin Ibrahim; Mahbobeh Khoshkam; Andries Kruge; Hisayuki Kubota; Tan Wee Leng; Gerald Lim; Lim Li-Sha; Jose Marengo; Sifiso Mbatha; Simon McGree; Matthew Menne; Maria de los Milagros Skansi; Sandile Ngwenya; Francis Nkrumah; Chalump Oonariya; Jose Daniel Pabon-Caicedo; Geremy Panthou; Cham Pham; Fatemeh Rahimzadeh; Andrea Ramos; Ernesto Salgado; Jim Salinger; Youssouph Sane; Ardhasena Sopaheluwakan; Arvind Srivastava; Ying Sun; Bertrand Trimbal; Nichanun Trachow; Blair Trewin; Gerard van der Schrier; Jorge Vazquez-Aguirre; Ricardo Vasquez; Claudia Villarroel; Lucie Vincent; Theo Vischel; Russ Vose; Mohd Noor' Arifin Bin Hj Yussof (2024). HadEX3: Global land-surface climate extremes indices v3.0.4 (1901-2018) [Dataset]. https://catalogue.ceda.ac.uk/uuid/115d5e4ebf7148ec941423ec86fa9f26
    Explore at:
    Dataset updated
    Mar 7, 2024
    Dataset provided by
    Centre for Environmental Data Analysishttp://www.ceda.ac.uk/
    Authors
    Robert J. H. Dunn; Lisa Alexander; Markus Donat; Xuebin Zhang; Margot Bador; Nicholas Herold; Tanya Lippmann; Robert J. Allan; Enric Aguilar; Abdoul Aziz; Manola Brunet; John Caesar; Guillaume Chagnaud; Vincent Cheng; Thelma Cinco; Imke Durre; Rosaline de Guzman; Tin Mar Htay; Wan Maisarah Wan Ibadullah; Muhammad Khairul Izzat Bin Ibrahim; Mahbobeh Khoshkam; Andries Kruge; Hisayuki Kubota; Tan Wee Leng; Gerald Lim; Lim Li-Sha; Jose Marengo; Sifiso Mbatha; Simon McGree; Matthew Menne; Maria de los Milagros Skansi; Sandile Ngwenya; Francis Nkrumah; Chalump Oonariya; Jose Daniel Pabon-Caicedo; Geremy Panthou; Cham Pham; Fatemeh Rahimzadeh; Andrea Ramos; Ernesto Salgado; Jim Salinger; Youssouph Sane; Ardhasena Sopaheluwakan; Arvind Srivastava; Ying Sun; Bertrand Trimbal; Nichanun Trachow; Blair Trewin; Gerard van der Schrier; Jorge Vazquez-Aguirre; Ricardo Vasquez; Claudia Villarroel; Lucie Vincent; Theo Vischel; Russ Vose; Mohd Noor' Arifin Bin Hj Yussof
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Time period covered
    Jan 1, 1901 - Dec 31, 2018
    Area covered
    Earth
    Variables measured
    time, Max TN, Max TX, Min TN, Min TX, Ice Days, latitude, longitude, Frost Days, Summer days, and 24 more
    Description

    HadEX3 is a land-surface dataset of climate extremes indices available on a 1.875 x 1.25 longitude-latitude grid. These 29 indices have been developed by the World Meteorological Organization (WMO) Expert Team on Climate Change Detection and Indices (ETCCDI). Daily precipitation, as well as maximum and minimum temperature observations, are used to calculate these indices at each station. The daily data, as well as indices, have been supplied, quality controlled and combined to make a gridded set of NetCDF files covering 1901-2018 (inclusive).

    Spatial coverage is determined by the number of stations present at each time point as well as the spatial correlation structure between the stations for each index. The spatial coverage is lowest at the beginning of the dataset, rising until around 1960 where it plateaus, and then declines slightly after 2010.

    All indices are available as annual quantities, with a subset also available on a monthly basis. A number of the indices use a reference period to determine thresholds. For these, we provide two versions, one set using 1961-1990 and another using the more recent 1981-2010 (these reference periods have been indicated in the file name as either 'ref-6190' or 'ref-8110').

    Version 3.0.4 was added due to an error in how the Rx1day and Rx5day data were being handled for one of the West African data sources. More details can be found in the HadEX3 blog under 'Details/Docs' tab.

    Additionally, an extension to HadEX3, comprising additional indices recommended by the WMO Expert Team on Sector-specific Climate Indices (ET-SCI), has been produced. These data are available in a separate dataset connected to this record, marked as supplemental to this dataset.

  17. d

    Human Development Index (HDI)

    • data.gov.tw
    csv
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Directorate General of Budget, Accounting and Statistics, Executive Yuan, R.O.C., Human Development Index (HDI) [Dataset]. https://data.gov.tw/en/datasets/25711
    Explore at:
    csvAvailable download formats
    Dataset authored and provided by
    Directorate General of Budget, Accounting and Statistics, Executive Yuan, R.O.C.
    License

    https://data.gov.tw/licensehttps://data.gov.tw/license

    Description

    (1) The Human Development Index (HDI) is compiled by the United Nations Development Programme (UNDP) to measure a country's comprehensive development in the areas of health, education, and economy according to the UNDP's calculation formula.(2) Explanation: (1) The HDI value ranges from 0 to 1, with higher values being better. (2) Due to our country's non-membership in the United Nations and its special international situation, the index is calculated by our department according to the UNDP formula using our country's data. The calculation of the comprehensive index for each year is mainly based on the data of various indicators adopted by the UNDP. (3) In order to have the same baseline for international comparison, the comprehensive index and rankings are not retroactively adjusted after being published.(3) Notes: (1) The old indicators included life expectancy at birth, adult literacy rate, gross enrollment ratio, and average annual income per person calculated by purchasing power parity. (2) The indicators were updated to include life expectancy at birth, mean years of schooling, expected years of schooling, and nominal gross national income (GNI) calculated by purchasing power parity. Starting in 2011, the GNI per capita was adjusted from nominal value to real value to exclude the impact of price changes. Additionally, the HDI calculation method has changed from arithmetic mean to geometric mean. (3) The calculation method for indicators in the education domain changed from geometric mean to simple average due to retrospective adjustments in the 2014 Human Development Report for the years 2005, 2008, and 2010-2012. Since 2016, the education domain has adopted data compiled by the Ministry of Education according to definitions from the United Nations Educational, Scientific and Cultural Organization (UNESCO) and the Organization for Economic Co-operation and Development (OECD).

  18. Data from: A Tidy Framework and Infrastructure to Systematically Assemble...

    • tandf.figshare.com
    bin
    Updated Jul 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    H. Sherry Zhang; Dianne Cook; Ursula Laa; Nicolas Langrené; Patricia Menéndez (2024). A Tidy Framework and Infrastructure to Systematically Assemble Spatio-temporal Indexes from Multivariate Data [Dataset]. http://doi.org/10.6084/m9.figshare.26207483.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Jul 8, 2024
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    H. Sherry Zhang; Dianne Cook; Ursula Laa; Nicolas Langrené; Patricia Menéndez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Indexes are useful for summarizing multivariate information into single metrics for monitoring, communicating, and decision-making. While most work has focused on defining new indexes for specific purposes, more attention needs to be directed towards making it possible to understand index behavior in different data conditions, and to determine how their structure affects their values and the variability therein. Here we discuss a modular data pipeline recommendation to assemble indexes. It is universally applicable to index computation and allows investigation of index behavior as part of the development procedure. One can compute indexes with different parameter choices, adjust steps in the index definition by adding, removing, and swapping them to experiment with various index designs, calculate uncertainty measures, and assess indexes’ robustness. The paper presents three examples to illustrate the usage of the pipeline framework: comparison of two different indexes designed to monitor the spatio-temporal distribution of drought in Queensland, Australia; the effect of dimension reduction choices on the Global Gender Gap Index (GGGI) on countries’ ranking; and how to calculate bootstrap confidence intervals for the Standardized Precipitation Index (SPI). The methods are supported by a new R package, called tidyindex. Supplemental materials for the article are available online.

  19. H

    Data from: Long-term, gridded standardized precipitation index for Hawai‘i

    • hydroshare.org
    • dataone.org
    • +1more
    zip
    Updated Sep 22, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthew Lucas; Clay Trauernicht; Abby Frazier; Tomoaki Miura (2020). Long-term, gridded standardized precipitation index for Hawai‘i [Dataset]. http://doi.org/10.4211/hs.822553ead1d04869b5b3e1e3a3817ec6
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Sep 22, 2020
    Dataset provided by
    HydroShare
    Authors
    Matthew Lucas; Clay Trauernicht; Abby Frazier; Tomoaki Miura
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1920 - Dec 31, 2011
    Area covered
    Description

    This dataset contains gridded monthly Standardized Precipitation Index (SPI) at 10 timescales: 1-, 3-, 6-, 9-, 12-, 18-, 24-, 36-, 48-, and 60-month intervals from 1920 to 2012 at 250 m resolution for seven of the eight main Hawaiian Islands (18.849°N, 154.668°W to 22.269°N, 159.816°W; the island of Ni‘ihau is excluded due to lack of data). The gridded data use a World Geographic Coordinate System 1984 (WGS84) and are stored as individual GeoTIFF files for each month-year, organized by SPI interval, as indicated by the GeoTIFF file name. Thus, for example, the file “spi3_1999_11.tif” would contain the gridded 3-month SPI values calculated for the month of November in the year 1999. Currently, the data are available from 1920 to 2012, but the datasets will be updated as new gridded monthly rainfall data become available.SPI is a normalized drought index that converts monthly rainfall totals into the number of standard deviations (z-score) by which the observed, cumulative rainfall diverges from the long-term mean. The conversion of raw rainfall to a z-score is done by fitting a designated probability distribution function to the observed precipitation data for a site. In doing so, anomalous rainfall quantities take the form of positive and negative SPI z-scores. Additionally, because distribution fitting is based on long-term (>30 years) precipitation data at that location, SPI score is relative, making comparisons across different climates possible.The creation of a statewide Hawai‘i SPI dataset relied on a 93-year (1920-2012) high resolution (250 m) spatially interpolated monthly gridded rainfall dataset [1]. This dataset is recognized as the highest quality precipitation data available [2] for the main Hawaiian Islands. After performing extensive quality control on the monthly rainfall station data (including homogeneity testing of over 1,100 stations [1,3]) and a geostatistical method comparison, ordinary kriging was using to generate a time series of gridded monthly rainfall from January 1920 to December 2012 at 250 m resolution [3]. This dataset was then used to calculate monthly SPI for 10 timescales (1-, 3-, 6-, 9-, 12-, 18-, 24-, 36-, 48-, and 60-month) at each grid cell. A 3-month SPI in May 2001, for example, represents the March-April-May (MAM) total rainfall in 2001 compared to the MAM rainfall in the entire time series. The resolution of the gridded rainfall dataset provides a more precise representation of drought (and pluvial) events compared to the other available drought products.Frazier, A.G.; Giambelluca, T.W.; Diaz, H.F.; Needham, H.L. Comparison of geostatistical approaches to spatially interpolate month-year rainfall for the Hawaiian Islands. Int. J. Climatol. 2016, 36, 1459–1470, doi:10.1002/joc.4437.Giambelluca, T.W.; Chen, Q.; Frazier, A.G.; Price, J.P.; Chen, Y.-L.; Chu, P.-S.; Eischeid, J.K.; Delparte, D.M. Online Rainfall Atlas of Hawai‘i. B. Am. Meteorol. Soc. 2013, 94, 313–316, doi:10.1175/BAMS-D-11-00228.1.Frazier, A.G.; Giambelluca, T.W. Spatial trend analysis of Hawaiian rainfall from 1920 to 2012. Int. J. Climatol. 2017, 37, 2522–2531, doi:10.1002/joc.4862.

  20. Global Aridity Index and Potential Evapotranspiration (ET0) Climate Database...

    • figshare.com
    jpeg
    Updated Jul 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Antonio Trabucco; Robert Zomer (2025). Global Aridity Index and Potential Evapotranspiration (ET0) Climate Database v3 [Dataset]. http://doi.org/10.6084/m9.figshare.7504448.v4
    Explore at:
    jpegAvailable download formats
    Dataset updated
    Jul 17, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Antonio Trabucco; Robert Zomer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Global Aridity Index (Global-AI) and Global Reference Evapo-Transpiration (Global-ET0) datasets provided in Version 3 of the Global Aridity Index and Potential Evapo-Transpiration (ET0) Database (Global-AI_PET_v3) provide high-resolution (30 arc-seconds) global raster data for the 1970-2000 period, related to evapotranspiration processes and rainfall deficit for potential vegetative growth, based upon implementation of the FAO-56 Penman-Monteith Reference Evapotranspiration (ET0) equation.

    Aridity Index represent the ratio between precipitation and ET0, thus rainfall over vegetation water demand (aggregated on annual basis). Under this formulation, Aridity Index values increase for more humid conditions, and decrease with more arid conditions. The Aridity Index values reported within the Global-AI geodataset have been multiplied by a factor of 10,000 to derive and distribute the data as integers (with 4 decimal accuracy). This multiplier has been used to increase the precision of the variable values without using decimals. The Readme File is provided with a detailed description of the dataset files, and the following article for a description of the methodology and a technical validation.The Global-AI_PET_v3 datasets are provided for non-commercial use in standard GeoTiff format, at 30 arc seconds or ~ 1km at the equator.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Saad Aziz (2023). Dataset for Stock Market Index of 7 Economies [Dataset]. https://www.kaggle.com/datasets/saadaziz1985/dataset-for-stock-market-index-of-7-countries
Organization logo

Dataset for Stock Market Index of 7 Economies

Time Series Dataset for Stock Market Indices of the 7 Top Economies of the World

Explore at:
zip(1917326 bytes)Available download formats
Dataset updated
Jul 4, 2023
Authors
Saad Aziz
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context:

The provided dataset is extracted from yahoo finance using pandas and yahoo finance library in python. This deals with stock market index of the world best economies. The code generated data from Jan 01, 2003 to Jun 30, 2023 that’s more than 20 years. There are 18 CSV files, dataset is generated for 16 different stock market indices comprising of 7 different countries. Below is the list of countries along with number of indices extracted through yahoo finance library, while two CSV files deals with annualized return and compound annual growth rate (CAGR) has been computed from the extracted data.

Number of Countries & Index:

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F15657145%2F90ce8a986761636e3edbb49464b304d8%2FNumber%20of%20Index.JPG?generation=1688490342207096&alt=media" alt="">

Content:

Unit of analysis: Stock Market Index Analysis

This dataset is useful for research purposes, particularly for conducting comparative analyses involving capital market performance and could be used along with other economic indicators.

There are 18 distinct CSV files associated with this dataset. First 16 CSV files deals with number of indices and last two CSV file deals with annualized return of each year and CAGR of each index. If data in any column is blank, it portrays that index was launch in later years, for instance: Bse500 (India), this index launch in 2007, so earlier values are blank, similarly China_Top300 index launch in year 2021 so early fields are blank too.

The extraction process involves applying different criteria, like in 16 CSV files all columns are included, Adj Close is used to calculate annualized return. The algorithm extracts data based on index name (code given by the yahoo finance) according start and end date.

Annualized return and CAGR has been calculated and illustrated in below image along with machine readable file (CSV) attached to that.

To extract the data provided in the attachment, various criteria were applied:

  1. Content Filtering: The data was filtered based on several attributes, including the index name, start and end date. This filtering process ensured that only relevant data meeting the specified criteria.

  2. Collaborative Filtering: Another filtering technique used was collaborative filtering using yahoo finance, which relies on index similarity. This approach involves finding indices that are similar to other index or extended dataset scope to other countries or economies. By leveraging this method, the algorithm identifies and extracts data based on similarities between indices.

In the last two CSV files, one belongs to annualized return, that was calculated based on the Adj close column and new DataFrame created to store its outcome. Below is the image of annualized returns of all index (if unreadable, machine-readable or CSV format is attached with the dataset).

Annualized Return:

As far as annualised rate of return is concerned, most of the time India stock market indices leading, followed by USA, Canada and Japan stock market indices.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F15657145%2F37645bd90623ea79f3708a958013c098%2FAnnualized%20Return.JPG?generation=1688525901452892&alt=media" alt="">

Compound Annual Growth Rate (CAGR):

The best performing index based on compound growth is Sensex (India) that comprises of top 30 companies is 15.60%, followed by Nifty500 (India) that is 11.34% and Nasdaq (USA) all is 10.60%.

The worst performing index is China top300, however this is launch in 2021 (post pandemic), so would not possible to examine at that stage (due to less data availability). Furthermore, UK and Russia indices are also top 5 in the worst order.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F15657145%2F58ae33f60a8800749f802b46ec1e07e7%2FCAGR.JPG?generation=1688490409606631&alt=media" alt="">

Geography: Stock Market Index of the World Top Economies

Time period: Jan 01, 2003 – June 30, 2023

Variables: Stock Market Index Title, Open, High, Low, Close, Adj Close, Volume, Year, Month, Day, Yearly_Return and CAGR

File Type: CSV file

Inspiration:

  • Time series prediction model
  • Investment opportunities in world best economies
  • Comparative Analysis of past data with other stock market indices or other indices

Disclaimer:

This is not a financial advice; due diligence is required in each investment decision.

Search
Clear search
Close search
Google apps
Main menu