60 datasets found
  1. Medians (M) and inter-quartile ranges (IQR) of maximum likelihood parameter...

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jan Peters; Stephan Franz Miedl; Christian Büchel (2023). Medians (M) and inter-quartile ranges (IQR) of maximum likelihood parameter estimates for the five discounting models examined (see Table 1 for model equations, numbers and abbreviations). [Dataset]. http://doi.org/10.1371/journal.pone.0047225.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Jan Peters; Stephan Franz Miedl; Christian Büchel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Parameters are shown separately for the three different datasets (1, 2, pathological gamblers [PG]).

  2. Meta data and supporting documentation

    • catalog.data.gov
    • s.cnmilf.com
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Meta data and supporting documentation [Dataset]. https://catalog.data.gov/dataset/meta-data-and-supporting-documentation
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    We include a description of the data sets in the meta-data as well as sample code and results from a simulated data set. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: The R code is available on line here: https://github.com/warrenjl/SpGPCW. Format: Abstract The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publicly available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. File format: R workspace file. Metadata (including data dictionary) • y: Vector of binary responses (1: preterm birth, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate). This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).

  3. Descriptive statistics of the 2 datasets with mean, standard deviation (SD),...

    • plos.figshare.com
    xls
    Updated Jun 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Achim Langenbucher; Nóra Szentmáry; Alan Cayless; Jascha Wendelstein; Peter Hoffmann (2023). Descriptive statistics of the 2 datasets with mean, standard deviation (SD), median, the lower (quantile 2.5%) and upper (quantile 97.5%) boundary of the 95% confidence interval, and the interquartile range IQR (quartile 75%—quartile 25%). [Dataset]. http://doi.org/10.1371/journal.pone.0282213.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 18, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Achim Langenbucher; Nóra Szentmáry; Alan Cayless; Jascha Wendelstein; Peter Hoffmann
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    AL refers to the axial length, CCT to the central corneal thickness, ACD to the external phakic anterior chamber depth measured from the corneal front apex to the front apex of the crystalline lens, LT to the central thickness of the crystalline lens, R1 and R2 to the corneal radii of curvature for the flat and steep meridians, Rmean to the average of R1 and R2, PIOL to the refractive power of the intraocular lens implant, and SEQ to the spherical equivalent power achieved 5 to 12 weeks after cataract surgery.

  4. U.S. Pandemic Mental Health Care

    • kaggle.com
    zip
    Updated Jan 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). U.S. Pandemic Mental Health Care [Dataset]. https://www.kaggle.com/datasets/thedevastator/u-s-pandemic-mental-health-care
    Explore at:
    zip(75773 bytes)Available download formats
    Dataset updated
    Jan 21, 2023
    Authors
    The Devastator
    Area covered
    United States
    Description

    U.S. Pandemic Mental Health Care

    Impact on Households in Previous 4 Weeks

    By US Open Data Portal, data.gov [source]

    About this dataset

    This U.S. Household Pandemic Impacts dataset assesses the mental health care that households in America have been receiving over the past four weeks during the Covid-19 pandemic. Produced by a collaboration between the U.S. Census Bureau, and five other federal agencies, this survey was designed to measure both social and economic impacts of Covid-19 on American households, such as employment status, consumer spending trends, food security levels and housing disruptions among other important factors. The data collected was based on an internet questionnaire which was conducted through emails and text messages sent to randomly selected housing units from across America linked with email addresses or cell phone numbers from the Census Bureau Master Address File Data; all estimates comply with NCHS Data Presentation Standards for Proportions. Be sure to check out more about how U.S Government Works for further details!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset can be useful to examine the impact of the Covid-19 pandemic on access to and utilization of mental health care by U.S. households in the last 4 weeks.

    By studying this dataset, you can gain insight into how people’s mental health has been affected by the pandemic and identify trends based on population subgroups, states, phases of the survey and more.

    Instructions for Use: - To get started, open up ‘csv-1’ found in this dataset. This file contains information on access to and utilization of mental health care by U.S households in the last 4 weeks, broken down into 14 different columns (e.g., Indicator, Group, State).
    - Familiarize yourself with each column label (e.g., Time Period Start Date), data type (e

    Research Ideas

    • Analyzing the impact of pandemic-induced stress on different demographic groups, such as age and race/ethnicity.
    • Comparing the mental health care services received in different states over time.
    • Investigating the correlation between socio-economic status and access to mental health care services during Covid-19 pandemic

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.

    Columns

    File: csv-1.csv | Column name | Description | |:---------------------------|:-------------------------------------------------------------------| | Indicator | The type of indicator being measured. (String) | | Group | The group (by age, gender or race) being measured. (String) | | State | The state where the data was collected. (String) | | Subgroup | A narrower level categorization within Group. (String) | | Phase | Phase number reflective of survey iteration. (Integer) | | Time Period | A label indicating duration captured by survey period. (String) | | Time Period Label | A label indicating duration captured by survey period. (String) | | Time Period Start Date | Beginning date for surveyed period. (DateFormat ‘YYYY-MM-DD’) | | Time Period End Date | End date for surveyed period. (DateFormat ‘YYYY-MM-DD’) | | Value | The value of the indicator being measured. (Float) | | LowCI | The lower confidence interval of the value. (Float) | | HighCI | The higher confidence interval of the value. (Float) | | Quartile Range | The quartile range of the value. (String) | | Suppression Flag | A f...

  5. Median (interquartile range; IQR) demographic and clinical data of...

    • figshare.com
    • plos.figshare.com
    xls
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bethany E. Higgins; Deanna J. Taylor; Wei Bi; Alison M. Binns; David P. Crabb (2023). Median (interquartile range; IQR) demographic and clinical data of participants. [Dataset]. http://doi.org/10.1371/journal.pone.0243578.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Bethany E. Higgins; Deanna J. Taylor; Wei Bi; Alison M. Binns; David P. Crabb
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Median (interquartile range; IQR) demographic and clinical data of participants.

  6. B

    2016 Census of Canada - Housing Suitability and Shelter-cost-to-income Ratio...

    • borealisdata.ca
    Updated Apr 9, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statistics Canada (2021). 2016 Census of Canada - Housing Suitability and Shelter-cost-to-income Ratio by Status of Primary Household Maintainer for BC CSDs [custom tabulation] [Dataset]. http://doi.org/10.5683/SP2/6OEKPA
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 9, 2021
    Dataset provided by
    Borealis
    Authors
    Statistics Canada
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Area covered
    Canada, British Columbia
    Description

    This dataset includes one dataset which was custom ordered from Statistics Canada.The table includes information on housing suitability and shelter-cost-to-income ratio by number of bedrooms, housing tenure, status of primary household maintainer, household type, and income quartile ranges for census subdivisions in British Columbia. The dataset is in Beyond 20/20 (.ivt) format. The Beyond 20/20 browser is required in order to open it. This software can be freely downloaded from the Statistics Canada website: https://www.statcan.gc.ca/eng/public/beyond20-20 (Windows only). For information on how to use Beyond 20/20, please see: http://odesi2.scholarsportal.info/documentation/Beyond2020/beyond20-quickstart.pdf https://wiki.ubc.ca/Library:Beyond_20/20_Guide Custom order from Statistics Canada includes the following dimensions and variables: Geography: Non-reserve CSDs in British Columbia - 299 geographies The global non-response rate (GNR) is an important measure of census data quality. It combines total non-response (households) and partial non-response (questions). A lower GNR indicates a lower risk of non-response bias and, as a result, a lower risk of inaccuracy. The counts and estimates for geographic areas with a GNR equal to or greater than 50% are not published in the standard products. The counts and estimates for these areas have a high risk of non-response bias, and in most cases, should not be released. All the geographies requested for this tabulation have been cleared for the release of income data and have a GNR under 50%. Housing Tenure Including Presence of Mortgage (5) 1. Total – Private non-band non-farm off-reserve households with an income greater than zero by housing tenure 2. Households who own 3. With a mortgage1 4. Without a mortgage 5. Households who rent Note: 1) Presence of mortgage - Refers to whether the owner households reported mortgage or loan payments for their dwelling. 2015 Before-tax Household Income Quartile Ranges (5) 1. Total – Private households by quartile ranges1, 2, 3 2. Count of households under or at quartile 1 3. Count of households between quartile 1 and quartile 2 (median) (including at quartile 2) 4. Count of households between quartile 2 (median) and quartile 3 (including at quartile 3) 5. Count of households over quartile 3 Notes: 1) A private household will be assigned to a quartile range depending on its CSD-level location and depending on its tenure (owned and rented). Quartile ranges for owned households in a specific CSD are delimited by the 2015 before-tax income quartiles of owned households with an income greater than zero and residing in non-farm off-reserve dwellings in that CSD. Quartile ranges for rented households in a specific CSD are delimited by the 2015 before-tax income quartiles of rented households with an income greater than zero and residing in non-farm off-reserve dwellings in that CSD. 2) For the income quartiles dollar values (the delimiters) please refer to Table 1. 3) Quartiles 1 to 3 are suppressed if the number of actual records used in the calculation (not rounded or weighted) is less than 16. For cases in which the renters’ quartiles or the owners’ quartiles (figures from Table 1) of a CSD are suppressed the CSD is assigned to a quartile range depending on the provincial renters’ or owners’ quartile figures. Number of Bedrooms (Unit Size) (6) 1. Total – Private households by number of bedrooms1 2. 0 bedrooms (Bachelor/Studio) 3. 1 bedroom 4. 2 bedrooms 5. 3 bedrooms 6. 4 bedrooms Note: 1) Dwellings with 5 bedrooms or more included in the total count only. Housing Suitability (6) 1. Total - Housing suitability 2. Suitable 3. Not suitable 4. One bedroom shortfall 5. Two bedroom shortfall 6. Three or more bedroom shortfall Note: 1) 'Housing suitability' refers to whether a private household is living in suitable accommodations according to the National Occupancy Standard (NOS); that is, whether the dwelling has enough bedrooms for the size and composition of the household. A household is deemed to be living in suitable accommodations if its dwelling has enough bedrooms, as calculated using the NOS. 'Housing suitability' assesses the required number of bedrooms for a household based on the age, sex, and relationships among household members. An alternative variable, 'persons per room,' considers all rooms in a private dwelling and the number of household members. Housing suitability and the National Occupancy Standard (NOS) on which it is based were developed by Canada Mortgage and Housing Corporation (CMHC) through consultations with provincial housing agencies. Shelter-cost-to-income-ratio (4) 1. Total – Private non-band non-farm off-reserve households with an income greater than zero 2. Spending less than 30% of households total income on shelter costs 3. Spending 30% or more of households total income on shelter costs 4. Spending 50% or more of households total income on shelter costs Note: 'Shelter-cost-to-income...

  7. Simulation Data Set

    • s.cnmilf.com
    • catalog.data.gov
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Simulation Data Set [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/simulation-data-set
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).

  8. d

    Data from: Taxonomic and numerical sufficiency in depth- and...

    • datadryad.org
    • search.dataone.org
    zip
    Updated Nov 1, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Zuschin; Rafal Nawrot; Mathias Harzhauser; Oleg Mandic; Adam Tomašových (2016). Taxonomic and numerical sufficiency in depth- and salinity-controlled marine paleocommunities [Dataset]. http://doi.org/10.5061/dryad.r7s92
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 1, 2016
    Dataset provided by
    Dryad
    Authors
    Martin Zuschin; Rafal Nawrot; Mathias Harzhauser; Oleg Mandic; Adam Tomašových
    Time period covered
    Oct 31, 2016
    Description

    Supplementary figure 1Rank abundance distributions for habitats at three taxonomic levelsSuppl_fig_1.pdfSupplementary figure 2Evenness and species richness of the four habitats at three taxonomic levels.Suppl_fig_2.pdfSupplementary figure 3Distribution of p-values from Mantel test for Spearman correlation between dissimilarity matrices representing different taxonomic and numerical levels. A-C, Correlation between taxonomic levels at different numerical resolutions. D-F, Correlation between proportional abundance data and higher levels of numerical transformation. Filled points represent median p-values across 1000 subsampling iterations, empty points are outliers that lie beyond 1.5 times the interquartile range from the upper quartile.Suppl_fig_3.pdfSupplementary figure 4NMDS ordination of a double-standardized subsample of the total dataset comparing individual habitats along the depth- and salinity gradient for species and families using proportional abundances and presence/absence ...

  9. Namoi standard Hydrological Response Variables (HRVs)

    • researchdata.edu.au
    • data.gov.au
    Updated Dec 10, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bioregional Assessment Program (2018). Namoi standard Hydrological Response Variables (HRVs) [Dataset]. https://researchdata.edu.au/namoi-standard-hydrological-variables-hrvs/2987770
    Explore at:
    Dataset updated
    Dec 10, 2018
    Dataset provided by
    Data.govhttps://data.gov/
    Authors
    Bioregional Assessment Program
    Area covered
    Namoi River
    Description

    Abstract

    The dataset was derived by the Bioregional Assessment Programme from multiple datasets. The source dataset is identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.

    Hydrological Response Variables (HRVs) are the hydrological characteristics of the system that potentially change due to coal resource development. These data refer to the HRVs related to the AWRA-R model for the Namoi subregion for the 54 simulation nodes. The nine hydrological response variables (AF, P99, FD, IQR, ZFD, P01, LFD, LFS, LLFS) were computed under CRDP and Baseline conditions, respectively and the ACRD is the difference between the Baseline and CRDP.

    Abbreviation meaning

    AF - the annual streamflow volume (GL/year)

    P01 - the daily streamflow rate at the first percentile (ML/day)

    P01 - the daily streamflow rate at the first percentile (ML/day)

    IQR - the inter-quartile range in daily streamflow (ML/day). That is, the difference between the daily streamflow rate at the 75th percentile and at the 25th percentile.

    LFD - the number of low streamflow days per year. The threshold for low streamflow days is the 10th percentile from the simulated 90-year period (2013 to 2102)

    LFS - the number of low streamflow spells per year (perennial streams only). A spell is defined as a period of contiguous days of streamflow below the 10th percentile threshold

    LLFS - the length (days) of the longest low streamflow spell each year

    P99 - the daily streamflow rate at the 99th percentile (ML/day)

    FD - flood days, the number of days with streamflow greater than the 90th percentile from the simulated 90-year period (2013 to 2102)

    ZFD - Zero flow days

    Purpose

    This is the dataset used for the Namoi 2.6.1 product to evaluate additional coal mine and coal resource development impacts on hydrological response variables at 54 simulation nodes.

    Dataset History

    The Namoi AWRA-R model outputs were used to determine the impacts on the HRVs to produce these data. Readme files within the folders in the dataset provide an explanation on how the resource was created. The nine HRVs (AF, P99, FD, IQR, ZFD, P01, LFD, LFS, LLFS) were computed under CRDP and Baseline conditions, respectively. The difference between CRDP and Baseline is used for predicting ACRD impacts on hydrological response variables at 54 simulation nodes.

    Dataset Citation

    Bioregional Assessment Programme (2017) Namoi standard Hydrological Response Variables (HRVs). Bioregional Assessment Derived Dataset. Viewed 11 December 2018, http://data.bioregionalassessments.gov.au/dataset/189f4c7a-29e1-41f9-868d-b7f5184d829f.

    Dataset Ancestors

  10. Time Series Data of Carbon Monoxide Concentrations

    • kaggle.com
    Updated Aug 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    REDNAM MANIKANTA SAI NEERAJ (2024). Time Series Data of Carbon Monoxide Concentrations [Dataset]. https://www.kaggle.com/datasets/manikantasai18/time-series-data-of-carbon-monoxide-concentrations
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 10, 2024
    Dataset provided by
    Kaggle
    Authors
    REDNAM MANIKANTA SAI NEERAJ
    Description

    The dataset provides the median, 25th percentile, and 75th percentile of carbon monoxide (CO) concentrations in Delhi, measured in moles per square meter and vertically integrated over a 9-day mean period. This data offers insights into the distribution and variability of CO levels over time.

    The data, collected from July 10, 2018, to August 10, 2024, is sourced from the Tropomi Explorer

    CO is a harmful gas that can significantly impact human health. High levels of CO can lead to respiratory issues, cardiovascular problems, and even be life-threatening in extreme cases. Forecasting CO levels helps in predicting and managing air quality to protect public health.

    CO is often emitted from combustion processes, such as those in vehicles and industrial activities. Forecasting CO levels can help in monitoring the impact of these sources and evaluating the effectiveness of emission control measures.**

    Accurate CO forecasts can assist in urban planning and pollution control strategies, especially in densely populated areas where air quality issues are more pronounced.

    Columns and Data Description: system:time_start: This column represents the date when the CO measurements were taken. p25: This likely represents the 25th percentile value of CO levels for the given date, providing insight into the lower range of the distribution. Median: The median CO level for the given date, which is the middle value of the dataset and represents a typical value. IQR: The Interquartile Range, which measures the spread of the middle 50% of the data. It’s calculated as the difference between the 75th percentile (p75) and the 25th percentile (p25) values.

  11. Italy: Mobility COVID-19

    • kaggle.com
    Updated Mar 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mr. Rahman (2021). Italy: Mobility COVID-19 [Dataset]. https://www.kaggle.com/motiurse/italy-mobility-covid19/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 26, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mr. Rahman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Italy
    Description

    A live version of the data record, which will be kept up-to-date with new estimates, can be downloaded from the Humanitarian Data Exchange: https://data.humdata.org/dataset/covid-19-mobility-italy.

    If you find the data helpful or you use the data for your research, please cite our work:

    Pepe, E., Bajardi, P., Gauvin, L., Privitera, F., Lake, B., Cattuto, C., & Tizzoni, M. (2020). COVID-19 outbreak response, a dataset to assess mobility changes in Italy following national lockdown. Scientific Data 7, 230 (2020).

    The data record is structured into 4 comma-separated value (CSV) files, as follows:

    id_provinces_IT.csv. Table of the administrative codes of the 107 Italian provinces. The fields of the table are:

    COD_PROV is an integer field that is used to identify a province in all other data records;

    SIGLA is a two-letters code that identifies the province according to the ISO_3166-2 standard (https://en.wikipedia.org/wiki/ISO_3166-2:IT);

    DEN_PCM is the full name of the province.

    OD_Matrix_daily_flows_norm_full_2020_01_18_2020_04_17.csv. The file contains the daily fraction of users’ moving between Italian provinces. Each line corresponds to an entry of matrix (i, j). The fields of the table are:

    p1: COD_PROV of origin,

    p2: COD_PROV of destination,

    day: in the format yyyy-mm-dd.

    median_q1_q3_rog_2020_01_18_2020_04_17.csv. The file contains median and interquartile range (IQR) of users’ radius of gyration in a province by week. Each entry of the table fields of the table are:

    COD_PROV of the province;

    SIGLA of the province;

    DEN_PCM of the province;

    week: median value of the radius of gyration on week week, with week in the format dd/mm-DD/MM where dd/mm and DD/MM are the first and the last day of the week, respectively.

    week Q1 first quartile (Q1) of the distribution of the radius of gyration on week week,

    week Q3 third quartile (Q3) of the distribution of the radius of gyration on week week,

    average_network_degree_2020_01_18_2020_04_17.csv. The file contains daily time-series of the average degree 〈k〉 of the proximity network. Each entry of the table is a value of 〈k〉 on a given day. The fields of the table are:

    COD_PROV of the province;

    SIGLA of the province;

    DEN_PCM of the province;

    day in the format yyyy-mm-dd.

    ESRI shapefiles of the Italian provinces updated to the most recent definition are available from the website of the Italian National Office of Statistics (ISTAT): https://www.istat.it/it/archivio/222527.

  12. Tectonic uplift, soil production, soil depth, and rock strength at the...

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    txt, zip
    Updated Jul 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emily Geyman; Emily Geyman; Michael Lamb; David Paige; Michael Lamb; David Paige (2024). Tectonic uplift, soil production, soil depth, and rock strength at the Dragon's Back Pressure Ridge, Carrizo Plain, California [Dataset]. http://doi.org/10.5281/zenodo.12637755
    Explore at:
    zip, txtAvailable download formats
    Dataset updated
    Jul 3, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Emily Geyman; Emily Geyman; Michael Lamb; David Paige; Michael Lamb; David Paige
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    California
    Description
    Tectonic uplift, soil production, soil depth, and rock strength at the Dragon's Back Pressure Ridge, Carrizo Plain, California
    Supporting data for “Landscape transience reveals a bottom-up control on soil production”
    Emily C. Geyman*, David A. Paige, Michael P. Lamb
    *Corresponding author: Emily C. Geyman, egeyman@caltech.edu
    Last updated: July 3, 2024
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    Dataset overview.
    This dataset contains:
    1. Georeferenced TIFF files of the (i) LiDAR-derived surface elevation, (ii) geological map (based on the mapping from Dibblee (1973) and Arrowsmith (1995)), (iii) reconstructed cumulative uplift, and (iv) reconstructed uplift rate at the Dragon’s Back Pressure Ridge, Carrizo Plain, California.
    2. Raw and processed ground penetrating radar (GPR) observations of soil thickness.
    3. Geomorphic properties: (i) hilltop erosion rate, (ii) hilltop soil production rate, (iii) hilltop saprolite weakness (based on cone penetrometer observations), and (iv) hilltop soil thickness.
    4. Raw and processed observations from the cone penetrometer (used to compute the saprolite weakness).
    5. Soil pit observations.
    6. Matlab code used to perform the MCMC inversion to generate the uplift reconstructions (item (1) above).
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    Dataset details.
    See the publication: “Geyman, E.C., Paige, D.A., and Lamb, M.P. Landscape transience reveals a bottom-up control on soil production. In review. 2024.” for details on the field methodology and data analysis. Details about each data product also are provided below.
    1. Geotiffs.
    We provide georeferenced TIFF files of the (i) surface elevation, (ii) geological map, (iii) cumulative uplift, and (iv) uplift rate at Dragon’s Back Pressure Ridge, Carrizo Plain, California. The coordinate system for the geotiffs is WGS84 / UTM Zone 11 N (EPSG:32611). All geotiffs are provided at 0.5 m x 0.5 m spatial resolution. Details about each dataset are provided below.
    (i) Surface elevation. We use LiDAR data from the 2005 B4 Lidar Project, acquired and processed by the National Center for Airborne Laser Mapping (NCALM). The full LiDAR dataset is available for download from OpenTopography (https://portal.opentopography.org/datasetMetadata?otCollectionID=OT.032018.32611.1). We convert the LiDAR point cloud to a 0.5 m gridded bare earth digital elevation model (DEM).
    (ii) Geological map. The original geological map of Dibblee (1973, 1999) is available from the United States Geological Survey (USGS) at https://pubs.usgs.gov/of/1999/of99-014/. This mapping was refined by Arrowsmith (1995) and Hilley & Arrowsmith (2008). We modify the geological map using high-resolution satellite imagery (from Google, ESRI, and Bing mosaics), as well as high-resolution imagery from the National Agriculture Imagery Program (NAIP), in order to follow the contacts of the Pink, Tan, and Gray members of the Paso Robles Formation. The units on the geological map are coded as:
    1 - Pink Member, Paso Robles Formation
    2 - Tan Member, Paso Robles Formation
    3 - Gray Member, Paso Robles Formation
    4 - Undifferentiated Paso Robles Formation
    5 - Quaternary alluvium (older)
    6 - Quaternary alluvium (younger)
    7-8 - Quaternary landslides and terraces
    (iii) Cumulative uplift. We follow the general approach of Hilley & Arrowsmith (2008) to reconstruct the cumulative uplift at Dragon’s Back Pressure Ridge based on the observed positions and elevations of the stratigraphic contacts between the Pink, Tan, and Gray members of the Paso Robles Formation. Put simply, since the Pink, Tan, and Gray members of the Paso Robles Formation are initially flat-lying, the progressive increase in elevation of the contacts between these members from the start to the middle of the Dragon’s Back Pressure Ridge records the cumulative tectonic uplift. We perform a Markov Chain Monte Carlo (MCMC) inversion to reconstruct the uplift history that can best explain our geological observations (i.e., the positions and elevations of the Pink, Tan, and Gray members of the Paso Robles Formation). See section 6 for the Matlab code used to perform the MCMC inversion.
    Dataset A: “cumulative_uplift_mean.tif” -- the mean reconstructed cumulative uplift (units: meters).
    Dataset B: “cumulative_uplift_uncertainty_IQR.tif” -- the uncertainty of the reconstructed cumulative uplift (units: meters), documented as the inter-quartile range (IQR), the difference between the 75th percentile and the 25th percentile of the MCMC cumulative uplift estimates.
    (iv) Cumulative uplift rate. The uplift rate dataset is constructed by taking the spatial derivative of the cumulative uplift dataset (iii) in the along-strike direction of the San Andreas Fault, and then converting from space to time using the long-term slip rate on the San Andreas Fault of approximately 33 mm/yr. This is the same approach as used in Hilley & Arrowsmith (2008).
    Dataset A: “uplift_rate_mean.tif” -- the mean reconstructed uplift rate (units: mm/yr).
    Dataset B: “uplift_rate_uncertainty_IQR.tif” -- the uncertainty of the reconstructed uplift rate (units: mm/yr), documented as the inter-quartile range (IQR), the difference between the 75th percentile and the 25th percentile of the MCMC estimates.
    2. Ground penetrating radar (GPR).

    The GPR data were acquired with a MALA HDR GPR system with a 450 MHz shielded antenna. Data were acquired every 4 cm, tracked by a survey wheel for precise relative positioning. The GPR survey covered approximately 19 km of ridgeline and included 21 short (approximately 10 m) ridgetop profiles with cone penetrometer observations that serve as ground-truth for the depth of the soil-saprolite boundary inferred from the GPR data. The GPR data were processed using the open-source GPRPy software (Plattner, 2020). We constrained sub-surface velocities by fitting 364 diffraction hyperbolas in the GPR transects. The hyperbola fitting supports a spatially-uniform velocity of approximately 0.11 m/ns. The locations and fitted velocities of the individual hyperbolas used to construct this velocity model are included in the file “GPR_velocities.csv.”

    The folder “Radar450MHz_raw” includes the raw radar data. The folder “Radar450MHz_GPS” includes the GPS data associated with each radar dataset (saved as .cor files). The GPS observations are aggregated in the spreadsheet “GPS_all” in that folder. The shapefile folder includes the final processed GPR-derived soil thickness estimates (soil thickness reported in units of meters) as a .shp file. The coordinate system for the shapefile is WGS84 / UTM Zone 11 N.

    3. Geomorphic properties.
    These are the datasets plotted in Figures 3 and 4 of “Geyman, E.C., Paige, D.A., and Lamb, M.P. Landscape transience reveals a bottom-up control on soil production. In review. 2024.”
    4. Cone penetrometer observations.
    This folder contains 3 files:
    1. ConePenetrometerSummaryTable_Overview.csv: A summary of the 212 cone penetrometer stations. For each station, there is metadata about the location (GPS coordinates), the stratigraphic unit (Pink, Tan, or Gray Member of the Paso Robles Formation), the side of the ridge, (southeast = SW, center = C, or northwest = NW), and the saprolite weakness, calculated as the cone penetrometer ease of penetration [cm/strike] at the position of the soil-saprolite boundary.
    2. ConePenetrometerSummaryTable_Data.csv: All of the raw observations from the cone penetrometer. The raw observations are the cumulative strike number vs. the cumulative depth of penetration into the ground.
    3. ConePenetrometerSummaryTableFinal.xlsx: An Excel spreadsheet with the same data from items (1) and (2) above as separate sheets ("Overview") and ("Data").
    5. Soil pit observations.
    This folder contains 2 files:
    1. soil_pit_summary: A summary table containing the soil pit locations and the inferred depth to the soil-saprolite boundary.
    2. soil_pit_layers: Simplified stratigraphic columns providing grain sizes and classifications (soil vs. saprolite) of the layers identified in each soil pit.
    6. Matlab code.
    This folder contains 2 primary Matlab scripts, with supporting data files and helper functions.
    1. DBPR_uplift: code to reconstruct the tectonic uplift at DBPR based on the positions and elevations of the stratigraphic contacts.
    1. soil_depth_vs_strength: code to reconstruct Fig. 4 Geyman, E.C., Paige, D.A., and Lamb, M.P. Landscape transience reveals a bottom-up control on soil production. In review. 2024.”
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    References

    Arrowsmith, J. R. Coupled tectonic deformation and geomorphic degradation along the San Andreas Fault System. Ph.D. thesis, Stanford

  13. Gender, Age, and Emotion Detection from Voice

    • kaggle.com
    zip
    Updated May 29, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rohit Zaman (2021). Gender, Age, and Emotion Detection from Voice [Dataset]. https://www.kaggle.com/rohitzaman/gender-age-and-emotion-detection-from-voice
    Explore at:
    zip(967820 bytes)Available download formats
    Dataset updated
    May 29, 2021
    Authors
    Rohit Zaman
    Description

    Context

    Our target was to predict gender, age and emotion from audio. We found audio labeled datasets on Mozilla and RAVDESS. So by using R programming language 20 statistical features were extracted and then after adding the labels these datasets were formed. Audio files were collected from "Mozilla Common Voice" and “Ryerson AudioVisual Database of Emotional Speech and Song (RAVDESS)”.

    Content

    Datasets contains 20 feature columns and 1 column for denoting the label. The 20 statistical features were extracted through the Frequency Spectrum Analysis using R programming Language. They are: 1) meanfreq - The mean frequency (in kHz) is a pitch measure, that assesses the center of the distribution of power across frequencies. 2) sd - The standard deviation of frequency is a statistical measure that describes a dataset’s dispersion relative to its mean and is calculated as the variance’s square root. 3) median - The median frequency (in kHz) is the middle number in the sorted, ascending, or descending list of numbers. 4) Q25 - The first quartile (in kHz), referred to as Q1, is the median of the lower half of the data set. This means that about 25 percent of the data set numbers are below Q1, and about 75 percent are above Q1. 5) Q75 - The third quartile (in kHz), referred to as Q3, is the central point between the median and the highest distributions. 6) IQR - The interquartile range (in kHz) is a measure of statistical dispersion, equal to the difference between 75th and 25th percentiles or between upper and lower quartiles. 7) skew - The skewness is the degree of distortion from the normal distribution. It measures the lack of symmetry in the data distribution. 8) kurt - The kurtosis is a statistical measure that determines how much the tails of distribution vary from the tails of a normal distribution. It is actually the measure of outliers present in the data distribution. 9) sp.ent - The spectral entropy is a measure of signal irregularity that sums up the normalized signal’s spectral power. 10) sfm - The spectral flatness or tonality coefficient, also known as Wiener entropy, is a measure used for digital signal processing to characterize an audio spectrum. Spectral flatness is usually measured in decibels, which, instead of being noise-like, offers a way to calculate how tone-like a sound is. 11) mode - The mode frequency is the most frequently observed value in a data set. 12) centroid - The spectral centroid is a metric used to describe a spectrum in digital signal processing. It means where the spectrum’s center of mass is centered. 13) meanfun - The meanfun is the average of the fundamental frequency measured across the acoustic signal. 14) minfun - The minfun is the minimum fundamental frequency measured across the acoustic signal 15) maxfun - The maxfun is the maximum fundamental frequency measured across the acoustic signal. 16) meandom - The meandom is the average of dominant frequency measured across the acoustic signal. 17) mindom - The mindom is the minimum of dominant frequency measured across the acoustic signal. 18) maxdom - The maxdom is the maximum of dominant frequency measured across the acoustic signal 19) dfrange - The dfrange is the range of dominant frequency measured across the acoustic signal. 20) modindx - the modindx is the modulation index, which calculates the degree of frequency modulation expressed numerically as the ratio of the frequency deviation to the frequency of the modulating signal for a pure tone modulation.

    Acknowledgements

    Gender and Age Audio Data Souce: Link: https://commonvoice.mozilla.org/en Emotion Audio Data Souce: Link : https://smartlaboratory.org/ravdess/

  14. COVID-19 Vaccine Progress Dashboard Data by ZIP Code

    • data.ca.gov
    • data.chhs.ca.gov
    • +1more
    csv, xlsx, zip
    Updated Nov 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Public Health (2025). COVID-19 Vaccine Progress Dashboard Data by ZIP Code [Dataset]. https://data.ca.gov/dataset/covid-19-vaccine-progress-dashboard-data-by-zip-code
    Explore at:
    csv, zip, xlsxAvailable download formats
    Dataset updated
    Nov 30, 2025
    Dataset authored and provided by
    California Department of Public Healthhttps://www.cdph.ca.gov/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Note: In these datasets, a person is defined as up to date if they have received at least one dose of an updated COVID-19 vaccine. The Centers for Disease Control and Prevention (CDC) recommends that certain groups, including adults ages 65 years and older, receive additional doses.

    Starting on July 13, 2022, the denominator for calculating vaccine coverage has been changed from age 5+ to all ages to reflect new vaccine eligibility criteria. Previously the denominator was changed from age 16+ to age 12+ on May 18, 2021, then changed from age 12+ to age 5+ on November 10, 2021, to reflect previous changes in vaccine eligibility criteria. The previous datasets based on age 12+ and age 5+ denominators have been uploaded as archived tables.

    Starting June 30, 2021, the dataset has been reconfigured so that all updates are appended to one dataset to make it easier for API and other interfaces. In addition, historical data has been extended back to January 5, 2021.

    This dataset shows full, partial, and at least 1 dose coverage rates by zip code tabulation area (ZCTA) for the state of California. Data sources include the California Immunization Registry and the American Community Survey’s 2015-2019 5-Year data.

    This is the data table for the LHJ Vaccine Equity Performance dashboard. However, this data table also includes ZTCAs that do not have a VEM score.

    This dataset also includes Vaccine Equity Metric score quartiles (when applicable), which combine the Public Health Alliance of Southern California’s Healthy Places Index (HPI) measure with CDPH-derived scores to estimate factors that impact health, like income, education, and access to health care. ZTCAs range from less healthy community conditions in Quartile 1 to more healthy community conditions in Quartile 4.

    The Vaccine Equity Metric is for weekly vaccination allocation and reporting purposes only. CDPH-derived quartiles should not be considered as indicative of the HPI score for these zip codes. CDPH-derived quartiles were assigned to zip codes excluded from the HPI score produced by the Public Health Alliance of Southern California due to concerns with statistical reliability and validity in populations smaller than 1,500 or where more than 50% of the population resides in a group setting.

    These data do not include doses administered by the following federal agencies who received vaccine allocated directly from CDC: Indian Health Service, Veterans Health Administration, Department of Defense, and the Federal Bureau of Prisons.

    For some ZTCAs, vaccination coverage may exceed 100%. This may be a result of many people from outside the county coming to that ZTCA to get their vaccine and providers reporting the county of administration as the county of residence, and/or the DOF estimates of the population in that ZTCA are too low. Please note that population numbers provided by DOF are projections and so may not be accurate, especially given unprecedented shifts in population as a result of the pandemic.

  15. Walmart Stocks Data 2025

    • kaggle.com
    zip
    Updated Feb 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mehar Shan Ali (2025). Walmart Stocks Data 2025 [Dataset]. https://www.kaggle.com/meharshanali/walmart-stocks-data-2025
    Explore at:
    zip(467062 bytes)Available download formats
    Dataset updated
    Feb 23, 2025
    Authors
    Mehar Shan Ali
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    📊 Walmart Stock Price Dataset & Exploratory Data Analysis (EDA)

    🏢 About Walmart

    Walmart Inc. is a multinational retail corporation that operates a chain of hypermarkets, discount department stores, and grocery stores. It is one of the world's largest companies by revenue and a key player in the retail sector. Walmart's stock is actively traded on major stock exchanges, making it an interesting subject for financial analysis.

    📌 Dataset Overview

    This dataset contains historical stock price data for Walmart, sourced directly from Yahoo Finance using the yfinance Python API. The data covers daily stock prices and includes multiple key financial indicators.

    📊 Features Included in the Dataset

    • Date 📅 – The trading day recorded.
    • Open Price 🟢 – Price at market open.
    • High Price 🔼 – Highest price of the day.
    • Low Price 🔽 – Lowest price of the day.
    • Close Price 🔴 – Price at market close.
    • Adjusted Close Price 📉 – Closing price adjusted for splits & dividends.
    • Trading Volume 📈 – Total shares traded.
    • Dividends 💰 – Cash payments to shareholders.
    • Stock Splits 🔄 – Records stock split events.

    🔍 Exploratory Data Analysis (EDA) Steps

    This notebook performs an extensive EDA to uncover insights into Walmart's stock price trends, volatility, and overall behavior in the stock market. The following analysis steps are included:

    1️⃣ Data Preprocessing & Cleaning

    • Load data using Pandas
    • Handle missing values (if any)
    • Check data types and format them properly
    • Convert date column into a datetime format

    2️⃣ Descriptive Statistics & Summary

    • Calculate key statistical measures like mean, median, standard deviation, and interquartile range (IQR)
    • Identify stock price trends over time
    • Check data distribution and skewness

    3️⃣ Data Visualizations

    • 📉 Line Plot – Analyze trends in closing prices over time.
    • 📦 Box Plot – Detect potential outliers in stock prices.
    • 📊 Histogram – Understand the distribution of closing prices.
    • 📈 Moving Averages – Use short-term and long-term moving averages to observe stock trends.
    • 🔥 Correlation Heatmap – Find relationships between stock market indicators.

    4️⃣ Time Series Analysis

    • Identify trends and seasonality in the stock price data.
    • Calculate daily, weekly, and monthly returns.
    • Use rolling windows to analyze moving averages and volatility.

    5️⃣ Insights & Conclusions

    • How volatile is Walmart’s stock over the given period?
    • Does the stock exhibit strong uptrends or downtrends?
    • Are there any strong correlations between features?
    • What insights can be drawn for investors and traders?

    🚀 Use Cases & Applications

    This dataset and analysis can be useful for: - 📡 Stock Market Analysis – Evaluating Walmart’s stock price trends and volatility. - 🏦 Investment Research – Assisting traders and investors in making informed decisions. - 🎓 Educational Purposes – Teaching data science and financial analysis using real-world stock data. - 📊 Algorithmic Trading – Developing trading strategies based on historical stock price trends.

    📥 Download the dataset and explore Walmart’s stock performance today! 🚀

  16. S

    ML-CNPM2.5

    • scidb.cn
    Updated Jun 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yulong Fan; Lin Sun; Xirong Liu (2024). ML-CNPM2.5 [Dataset]. http://doi.org/10.57760/sciencedb.08635
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 13, 2024
    Dataset provided by
    Science Data Bank
    Authors
    Yulong Fan; Lin Sun; Xirong Liu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The features possibly affecting ground-based PM2.5 from 2014 to 2023 in China were collected to make up our first version of the ML-CNPM2.5. Thanks to our filling and calibrating methods, over 5 million samples (5,076,608) have been obtained, which is so more PM2.5 samples that have not been covered in previous studies, to our knowledge. To train and assess different models in terms of primary and higher accuracy ML-based models, the dataset including unfilled AOD, with 1790210-line records, is also issued since filled AOD always shows lower accuracy than unfilled. To distinguish the two datasets, the filled AOD dataset is named ML-CNPM2.5-A and the unfiled is named ML-CNPM2.5-B. There are twenty-four features contained in the ML-CNPM2.5 A, whereas twenty-three features in the ML-CNPM2.5-B. Most of the features directly affect or indirectly affect ground-based PM2.5 estimating using remote sensing and ML technology, thereby being widely used as the input of ML-based models. The distribution of each feature in the ML-CNPM2.5-A (ML-CNPM2.5-B) is revealed in Fig. 1 (Fig. 2). The Figures intuitively demonstrate each feature’s range of values, including median, quartile, and outlier. For example, the distribution of Terra MAIAC AOD is changed plainly after being calibrated, i.e., from the range of 0-8 calibrated to the range of 0-3, which is more realistic. The discrete features, including year, month, day, Doy and LUC, show even distribution in their range of values, indicating the equilibrium and comprehensiveness of our sample dataset. Detailed information about these features is listed in Table 2 (Table S1) for ML-CNPM2.5-A (CNPM2.5-B). Overall, our sample dataset includes commend features used widely in estimating PM2.5, with high-volume and comprehensive records, as big data ensures the training and validation of different models.

  17. q

    Measures of Center and Measures of Spread -Lesson (Biology Application)

    • qubeshub.org
    Updated Sep 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Divya Ajinth; Sheela Vemu; Irene Corriette (2025). Measures of Center and Measures of Spread -Lesson (Biology Application) [Dataset]. http://doi.org/10.25334/KQ62-HV25
    Explore at:
    Dataset updated
    Sep 8, 2025
    Dataset provided by
    QUBES
    Authors
    Divya Ajinth; Sheela Vemu; Irene Corriette
    Description

    This instructional activity introduces students to the application of statistical tools for analyzing biological data, with a focus on measures of center (mean, median, mode) and measures of spread (range, quartiles, standard deviation). Using real-world biological contexts. students learn how to summarize datasets, identify trends, and evaluate variability. The activity integrates the use of MS Excel and TI-84 Plus graphing calculators to calculate descriptive statistics and interpret results. By engaging with authentic biological data, students develop quantitative reasoning skills that enhance their ability to detect patterns, recognize variability, and draw meaningful conclusions about biological systems

  18. Data from: S1 Dataset -

    • plos.figshare.com
    xlsx
    Updated Feb 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lukundo Siame; Gift C. Chama; Sepiso K. Masenga (2025). S1 Dataset - [Dataset]. http://doi.org/10.1371/journal.pone.0312570.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 12, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Lukundo Siame; Gift C. Chama; Sepiso K. Masenga
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundTuberculosis (TB) remains a significant public health challenge, particularly among vulnerable populations like children. This is especially true in Sub-Saharan Africa, where the burden of TB in children is substantial. Zambia ranks 21st among the top 30 high TB endemic countries globally. While studies have explored TB in adults in Zambia, the prevalence and associated factors in children are not well documented. This study aimed to determine the prevalence and sociodemographic, and clinical factors associated with active TB disease in hospitalized children under the age of 15 years at Livingstone University Teaching Hospital (LUTH), the largest referral center in Zambia’s Southern Province.MethodsThis retrospective cross-sectional study of 700 pediatric patients under 15 years old, utilized programmatic data from the Pediatrics Department at LUTH. A systematic sampling method was used to select participants from medical records. Data on demographics, medical conditions, anthropometric measurements, and blood tests were collected. Data analysis included descriptive statistics, chi-square tests, and multivariable logistic regression to identify factors associated with TB.ResultsThe median age was 24 months (interquartile range (IQR): 11, 60) and majority were male (56.7%, n = 397/700). Most participants were from urban areas (59.9%, n = 419/700), and 9.2% (n = 62/675) were living with HIV. Malnutrition and comorbidities were present in a significant portion of the participants (19.0% and 25.1%, respectively). The prevalence of active TB cases was 9.4% (n = 66/700) among hospitalized children. Persons living with HIV (Adjusted odds ratio (AOR) of 6.30; 95% confidence interval (CI) of 2.85, 13.89, p< 0.001), and those who were malnourished (AOR: 10.38, 95% CI: 4.78, 22.55, p< 0.001) had a significantly higher likelihood of developing active TB disease.ConclusionThis study revealed a prevalence 9.4% active TB among hospitalized children under 15 years at LUTH. HIV status and malnutrition emerged as significant factors associated with active TB disease. These findings emphasize the need for pediatric TB control strategies that prioritize addressing associated factors to effectively reduce the burden of tuberculosis in Zambian children.

  19. f

    Table 1_Estimated activity levels in dogs at population scale with linear...

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    • +1more
    Updated Jul 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    German, Alexander J.; O’Flynn, Ciaran; Butterwick, Richard F.; O’Rourke, Abigail; Lyle, Scott; Haydock, Richard; Carson, Aletha (2025). Table 1_Estimated activity levels in dogs at population scale with linear and causal modeling.docx [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0002035976
    Explore at:
    Dataset updated
    Jul 10, 2025
    Authors
    German, Alexander J.; O’Flynn, Ciaran; Butterwick, Richard F.; O’Rourke, Abigail; Lyle, Scott; Haydock, Richard; Carson, Aletha
    Description

    IntroductionThe aim of this study was to determine patterns of physical activity in pet dogs using real-world data at a population scale aided by the use of accelerometers and electronic health records (EHRs).MethodsA directed acyclic graph (DAG) was created to capture background knowledge and causal assumptions related to dog activity, and this was used to identify relevant data sources, which included activity data from commercially available accelerometers, and health and patient metadata from the EHRs. Linear mixed models (LMM) were fitted to the number of active minutes following log-transformation with the fixed effects tested based on the variables of interest and the adjustment sets indicated by the DAG.ResultsActivity was recorded on 8,726,606 days for 28,562 dogs with 136,876 associated EHRs, with the median number of activity records per dog being 162 [interquartile range (IQR) 60–390]. The average recorded activity per day of 51 min was much lower than previous estimates of physical activity, and there was wide variation in activity levels from less than 10 to over 600 min per day. Physical activity decreased with age, an effect that was dependent on breed size, whereby there was a greater decline in activity for age as breed size increased. Activity increased with breed size and owner age independently. Activity also varied independently with sex, location, climate, season and day of the week: males were more active than females, and dogs were more active in rural areas, in hot dry or marine climates, in spring, and on weekends.ConclusionAccelerometer-derived activity data gathered from pet dogs living in North America was used to determine associations with both dog and environmental characteristics. Knowledge of these associations could be used to inform daily exercise and caloric requirements for dogs, and how they should be adapted according to individual circumstances.

  20. f

    Study population characteristics and description of analytical sample.

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    Updated Jul 19, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Khaw, Kay-Tee; Sagi-Kiss, Virag; Jackson, Kim G.; Lister, Susan J.; Kuhnle, Gunter G. C.; Tasevska, Natasha; Campbell, Rachel; di Paolo, Nick; Mindell, Jennifer S. (2017). Study population characteristics and description of analytical sample. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001845755
    Explore at:
    Dataset updated
    Jul 19, 2017
    Authors
    Khaw, Kay-Tee; Sagi-Kiss, Virag; Jackson, Kim G.; Lister, Susan J.; Kuhnle, Gunter G. C.; Tasevska, Natasha; Campbell, Rachel; di Paolo, Nick; Mindell, Jennifer S.
    Description

    Median and inter-quartile range or absolute number and proportion. See S1 Table for more details.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Jan Peters; Stephan Franz Miedl; Christian Büchel (2023). Medians (M) and inter-quartile ranges (IQR) of maximum likelihood parameter estimates for the five discounting models examined (see Table 1 for model equations, numbers and abbreviations). [Dataset]. http://doi.org/10.1371/journal.pone.0047225.t002
Organization logo

Medians (M) and inter-quartile ranges (IQR) of maximum likelihood parameter estimates for the five discounting models examined (see Table 1 for model equations, numbers and abbreviations).

Related Article
Explore at:
xlsAvailable download formats
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Jan Peters; Stephan Franz Miedl; Christian Büchel
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Parameters are shown separately for the three different datasets (1, 2, pathological gamblers [PG]).

Search
Clear search
Close search
Google apps
Main menu