100+ datasets found

f
Median values, interquartile range (IQR) and Number of outliers.
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Mar 16, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Whaley, Dana H.; Denis, Max; Alizad, Azra; Pruthi, Sandhya; Mehrmohammadi, Mohammad; Chen, Shigao; Song, Pengfei; Meixner, Duane D.; Fatemi, Mostafa; Fazzio, Robert T. (2015). Median values, interquartile range (IQR) and Number of outliers. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001856309
Explore at:
Dataset updated
Mar 16, 2015
Authors
Whaley, Dana H.; Denis, Max; Alizad, Azra; Pruthi, Sandhya; Mehrmohammadi, Mohammad; Chen, Shigao; Song, Pengfei; Meixner, Duane D.; Fatemi, Mostafa; Fazzio, Robert T.
Description
Median values, interquartile range (IQR) and Number of outliers.
Median, interquartile range (IQR) and significance level of the difference...
plos.figshare.com
datasetcatalog.nlm.nih.gov
+1more
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthias Gilgien; Philip Crivelli; Jörg Spörri; Josef Kröll; Erich Müller (2023). Median, interquartile range (IQR) and significance level of the difference between discipline medians and distributions for all parameters, and percentage of DH for GS and SG. [Dataset]. http://doi.org/10.1371/journal.pone.0118119.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0118119.t001
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Matthias Gilgien; Philip Crivelli; Jörg Spörri; Josef Kröll; Erich Müller
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
DH represents 100% for the relative measure. Differences between medians and distributions were significant between all disciplines if indicated with * and were significantly different between GS and SG when marked with 1, significantly different between GS and DH if marked with 2 and significantly different between SG and DH if marked with 3. If no parameter was significantly different the column is empty. Columns marked with—indicate that the measure was not calculated.Median, interquartile range (IQR) and significance level of the difference between discipline medians and distributions for all parameters, and percentage of DH for GS and SG.
Descriptive statistics, mean ± SD, range, median and interquartile range...
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hélène Follet; Delphine Farlay; Yohann Bala; Stéphanie Viguet-Carrin; Evelyne Gineyts; Brigitte Burt-Pichat; Julien Wegrzyn; Pierre Delmas; Georges Boivin; Roland Chapurlat (2023). Descriptive statistics, mean ± SD, range, median and interquartile range (IQR). [Dataset]. http://doi.org/10.1371/journal.pone.0055232.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0055232.t001
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Hélène Follet; Delphine Farlay; Yohann Bala; Stéphanie Viguet-Carrin; Evelyne Gineyts; Brigitte Burt-Pichat; Julien Wegrzyn; Pierre Delmas; Georges Boivin; Roland Chapurlat
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Descriptive statistics, mean ± SD, range, median and interquartile range (IQR).
Characteristics of women, overall and according to BMI categories; data...
plos.figshare.com
datasetcatalog.nlm.nih.gov
xls
Updated Jun 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julie A. Pasco; Geoffrey C. Nicholson; Sharon L. Brennan; Mark A. Kotowicz (2023). Characteristics of women, overall and according to BMI categories; data presented as mean (±SD), median (IQR) or frequency (%). [Dataset]. http://doi.org/10.1371/journal.pone.0029580.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0029580.t003
Dataset updated
Jun 6, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Julie A. Pasco; Geoffrey C. Nicholson; Sharon L. Brennan; Mark A. Kotowicz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
*n = 1041 (35 missing data).BMI = body mass index (kg/m2); SD = standard deviation; IQR = interquartile range; EI energy intake (MJ/d); BMR = basal metabolic rate (MJ/d).
Meta data and supporting documentation
catalog.data.gov
s.cnmilf.com
Updated Nov 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Meta data and supporting documentation [Dataset]. https://catalog.data.gov/dataset/meta-data-and-supporting-documentation
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
We include a description of the data sets in the meta-data as well as sample code and results from a simulated data set. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: The R code is available on line here: https://github.com/warrenjl/SpGPCW. Format: Abstract The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publicly available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. File format: R workspace file. Metadata (including data dictionary) • y: Vector of binary responses (1: preterm birth, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate). This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).
Simulation Data Set
catalog.data.gov
s.cnmilf.com
Updated Nov 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Simulation Data Set [Dataset]. https://catalog.data.gov/dataset/simulation-data-set
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).
f
The median, interquartile range (IQR) and range of the minimum (Factors I,...
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Aug 21, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Scorgie, Fiona E.; Gnanathasan, Christeine A.; Lincz, Lisa F.; Shahmy, Seyed; Isbister, Geoffrey K.; Maduwage, Kalana; Karunathilake, Harendra; Mohamed, Fahim; O’Leary, Margaret A.; Abeysinghe, Chandana (2015). The median, interquartile range (IQR) and range of the minimum (Factors I, II, V, VII, VIII, IX, X) or maximum (PT/INR, aPTT, D-Dimer) factor concentrations/clotting times measured for the 146 patients during their hospital admission. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001888501
Explore at:
Dataset updated
Aug 21, 2015
Authors
Scorgie, Fiona E.; Gnanathasan, Christeine A.; Lincz, Lisa F.; Shahmy, Seyed; Isbister, Geoffrey K.; Maduwage, Kalana; Karunathilake, Harendra; Mohamed, Fahim; O’Leary, Margaret A.; Abeysinghe, Chandana
Description
The median, interquartile range (IQR) and range of the minimum (Factors I, II, V, VII, VIII, IX, X) or maximum (PT/INR, aPTT, D-Dimer) factor concentrations/clotting times measured for the 146 patients during their hospital admission.
Median (interquartile range) of percentage of adult respondents with need...
plos.figshare.com
datasetcatalog.nlm.nih.gov
+1more
xls
Updated Jun 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anita K. Wagner; Amy J. Graves; Zhengyu Fan; Saul Walker; Fang Zhang; Dennis Ross-Degnan (2023). Median (interquartile range) of percentage of adult respondents with need for and access to care in 53 countries. [Dataset]. http://doi.org/10.1371/journal.pone.0057228.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0057228.t002
Dataset updated
Jun 10, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Anita K. Wagner; Amy J. Graves; Zhengyu Fan; Saul Walker; Fang Zhang; Dennis Ross-Degnan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Median (interquartile range) of percentage of adult respondents with need for and access to care in 53 countries.
Numpy , pandas and matplot lib practice
kaggle.com
zip
Updated Jul 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
pratham saraf (2023). Numpy , pandas and matplot lib practice [Dataset]. https://www.kaggle.com/datasets/prathamsaraf1389/numpy-pandas-and-matplot-lib-practise/suggestions
Explore at:
zip(385020 bytes)Available download formats
Dataset updated
Jul 16, 2023
Authors
pratham saraf
License
https://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/
Description
The dataset has been created specifically for practicing Python, NumPy, Pandas, and Matplotlib. It is designed to provide a hands-on learning experience in data manipulation, analysis, and visualization using these libraries.

Specifics of the Dataset:

The dataset consists of 5000 rows and 20 columns, representing various features with different data types and distributions. The features include numerical variables with continuous and discrete distributions, categorical variables with multiple categories, binary variables, and ordinal variables. Each feature has been generated using different probability distributions and parameters to introduce variations and simulate real-world data scenarios. The dataset is synthetic and does not represent any real-world data. It has been created solely for educational purposes.

One of the defining characteristics of this dataset is the intentional incorporation of various real-world data challenges:

Certain columns are randomly selected to be populated with NaN values, effectively simulating the common challenge of missing data. - The proportion of these missing values in each column varies randomly between 1% to 70%. - Statistical noise has been introduced in the dataset. For numerical values in some features, this noise adheres to a distribution with mean 0 and standard deviation 0.1. - Categorical noise is introduced in some features', with its categories randomly altered in about 1% of the rows. Outliers have also been embedded in the dataset, resonating with the Interquartile Range (IQR) rule

Context of the Dataset:

The dataset aims to provide a comprehensive playground for practicing Python, NumPy, Pandas, and Matplotlib. It allows learners to explore data manipulation techniques, perform statistical analysis, and create visualizations using the provided features. By working with this dataset, learners can gain hands-on experience in data cleaning, preprocessing, feature engineering, and visualization. Sources of the Dataset:

The dataset has been generated programmatically using Python's random number generation functions and probability distributions. No external sources or real-world data have been used in creating this dataset.
f
Proportion of positive results, interquartile range (IQR), minimum-maximum...
datasetcatalog.nlm.nih.gov
plos.figshare.com
+1more
Updated Apr 17, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brienen, Eric A. T.; Kahama, Anthony I.; Melchers, Natalie V. S. Vinkeles; van Dam, Govert J.; Shaproski, David; Vennervald, Birgitte J.; van Lieshout, Lisette (2014). Proportion of positive results, interquartile range (IQR), minimum-maximum range, and median per diagnostic test at three different time points (baseline) of 24 S. haematobium-positive subjects. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001188275
Explore at:
Dataset updated
Apr 17, 2014
Authors
Brienen, Eric A. T.; Kahama, Anthony I.; Melchers, Natalie V. S. Vinkeles; van Dam, Govert J.; Shaproski, David; Vennervald, Birgitte J.; van Lieshout, Lisette
Description
Proportion of positive results, interquartile range (IQR), minimum-maximum range, and median per diagnostic test at three different time points (baseline) of 24 S. haematobium-positive subjects.
f
Median response times in seconds (interquartile range in parenthesis) as a...
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Nov 3, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hunt, Thomas E.; Ball, Linden J.; Stupple, Edward J. N.; Steel, Richard; Pitchford, Melanie (2017). Median response times in seconds (interquartile range in parenthesis) as a function of response type and CRT problem. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001816057
Explore at:
Dataset updated
Nov 3, 2017
Authors
Hunt, Thomas E.; Ball, Linden J.; Stupple, Edward J. N.; Steel, Richard; Pitchford, Melanie
Description
Median response times in seconds (interquartile range in parenthesis) as a function of response type and CRT problem.
Time Series Data of Carbon Monoxide Concentrations
kaggle.com
Updated Aug 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
REDNAM MANIKANTA SAI NEERAJ (2024). Time Series Data of Carbon Monoxide Concentrations [Dataset]. https://www.kaggle.com/datasets/manikantasai18/time-series-data-of-carbon-monoxide-concentrations
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 10, 2024
Dataset provided by
Kaggle
Authors
REDNAM MANIKANTA SAI NEERAJ
Description
The dataset provides the median, 25th percentile, and 75th percentile of carbon monoxide (CO) concentrations in Delhi, measured in moles per square meter and vertically integrated over a 9-day mean period. This data offers insights into the distribution and variability of CO levels over time.

The data, collected from July 10, 2018, to August 10, 2024, is sourced from the Tropomi Explorer

CO is a harmful gas that can significantly impact human health. High levels of CO can lead to respiratory issues, cardiovascular problems, and even be life-threatening in extreme cases. Forecasting CO levels helps in predicting and managing air quality to protect public health.

CO is often emitted from combustion processes, such as those in vehicles and industrial activities. Forecasting CO levels can help in monitoring the impact of these sources and evaluating the effectiveness of emission control measures.**

Accurate CO forecasts can assist in urban planning and pollution control strategies, especially in densely populated areas where air quality issues are more pronounced.

Columns and Data Description: system:time_start: This column represents the date when the CO measurements were taken. p25: This likely represents the 25th percentile value of CO levels for the given date, providing insight into the lower range of the distribution. Median: The median CO level for the given date, which is the middle value of the dataset and represents a typical value. IQR: The Interquartile Range, which measures the spread of the middle 50% of the data. It’s calculated as the difference between the 75th percentile (p75) and the 25th percentile (p25) values.
r
Data from: GEOMACS (Geological and Oceanographic Model of Australias...
researchdata.edu.au
data.gov.au
+1more
Updated Jul 24, 2008
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Australian Ocean Data Network (2008). GEOMACS (Geological and Oceanographic Model of Australias Continental Shelf) Interquartile range [Dataset]. https://researchdata.edu.au/geomacs-geological-oceanographic-interquartile-range/691522
Explore at:
Dataset updated
Jul 24, 2008
Dataset provided by
Australian Ocean Data Network
Area covered

Description
Geoscience Australias GEOMACS model was utilised to produce hindcast hourly time series of continental shelf (~20 to 300 m depth) bed shear stress (unit of measure: Pascal, Pa) on a 0.1 degree grid covering the period March 1997 to February 2008 (inclusive). The hindcast data represents the combined contribution to the bed shear stress by waves, tides, wind and density-driven circulation. Included in the parameters that will be calculated to represent the magnitude of the bulk of the data are the quartiles of the distribution; Q25, Q50 and Q75 (i.e. the values for which 25, 50 and 75 percent of the observations fall below). The interquartile range, , of the GEOMACS output takes the observations from between Q25 and Q75 to provide an accurate representation of the spread of observations. The interquartile range was shown to provide a more robust representation of the observations than the standard deviation, which produced highly skewed observations (Hughes and Harris 2008). This dataset is a contribution to the CERF Marine Biodiversity Hub and is hosted temporarily by CMAR on behalf of Geoscience Australia.
Ames Housing Dataset with Engineered Features
kaggle.com
zip
Updated Aug 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
fazelsamar (2025). Ames Housing Dataset with Engineered Features [Dataset]. https://www.kaggle.com/datasets/fazelsamar/ames-housing-dataset-with-engineered-features
Explore at:
zip(393857 bytes)Available download formats
Dataset updated
Aug 29, 2025
Authors
fazelsamar
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Description: Ames Housing Dataset with Engineered Features

This dataset is a cleaned and enhanced version of the popular Ames Housing Dataset, originally compiled by Dean De Cock. It is designed for regression tasks, specifically predicting house sale prices.

Key Transformations and Features:

Missing Value Handling: Missing values have been addressed through dropping columns with excessive missing data and imputing remaining missing values using appropriate strategies (mode for categorical, median for numerical).

Categorical Encoding: Categorical features have been converted into numerical formats using a combination of Ordinal Encoding for variables with a natural order and One-Hot Encoding for nominal variables.

Feature Engineering: Several new features have been created to potentially improve model performance, including:

HouseAge: The age of the house calculated from the year it was built and the year it was sold.

Log_LotArea: A log transformation of the 'Lot Area' to address skewness.

TotalSF: The total square footage of the house, combining basement, first floor, and second floor areas.

Feature Selection: Highly correlated features have been identified and some have been removed to mitigate multicollinearity.

Outlier Handling: Outliers in numerical features have been capped using the Interquartile Range (IQR) rule.

Skewness Handling: Skewed numerical features have been transformed using a log transformation to achieve a more normal distribution.

Duplicate Removal: Duplicate rows have been identified and removed.

Potential Use Cases:

This dataset is suitable for various regression modeling tasks, including:

Building predictive models for house prices.

Exploring the impact of different features on sale price.

Practicing data preprocessing and feature engineering techniques.

This cleaned and engineered dataset provides a solid foundation for developing accurate and robust house price prediction models.
f
Median and interquartile range of R0 by serotype and by province.
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Feb 18, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Grenfell, Bryan T.; Yu, Hongjie; Xing, Weijia; Liu, Fengfeng; Hsiao, Victor Y.; Wu, Joseph T.; Metcalf, C. Jessica E.; van Doorn, H. Rogier; Takahashi, Saki; Leung, Gabriel M.; Liao, Qiaohong; Zhang, Jing; Farrar, Jeremy J.; Van Boeckel, Thomas P.; Cowling, Benjamin J.; Chang, Zhaorui; Sun, Junling (2016). Median and interquartile range of R0 by serotype and by province. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001600804
Explore at:
Dataset updated
Feb 18, 2016
Authors
Grenfell, Bryan T.; Yu, Hongjie; Xing, Weijia; Liu, Fengfeng; Hsiao, Victor Y.; Wu, Joseph T.; Metcalf, C. Jessica E.; van Doorn, H. Rogier; Takahashi, Saki; Leung, Gabriel M.; Liao, Qiaohong; Zhang, Jing; Farrar, Jeremy J.; Van Boeckel, Thomas P.; Cowling, Benjamin J.; Chang, Zhaorui; Sun, Junling
Description
Median and interquartile range of R0 by serotype and by province.
Median (InterQuartile Range, IQR) of air polltants and adjusteda odds ratio...
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Payam Dadvand; Mark J. Nieuwenhuijsen; Xavier Basagaña; Mar Alvarez-Pedrerol; Albert Dalmau-Bueno; Marta Cirach; Ioar Rivas; Bert Brunekreef; Xavier Querol; Ian G. Morgan; Jordi Sunyer (2023). Median (InterQuartile Range, IQR) of air polltants and adjusteda odds ratio (95% confidence intervals) of the use of spectacles associated with one Inter-Quartile Range (IQR) increase in exposure to each pollutant. [Dataset]. http://doi.org/10.1371/journal.pone.0167046.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0167046.t002
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Payam Dadvand; Mark J. Nieuwenhuijsen; Xavier Basagaña; Mar Alvarez-Pedrerol; Albert Dalmau-Bueno; Marta Cirach; Ioar Rivas; Bert Brunekreef; Xavier Querol; Ian G. Morgan; Jordi Sunyer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Median (InterQuartile Range, IQR) of air polltants and adjusteda odds ratio (95% confidence intervals) of the use of spectacles associated with one Inter-Quartile Range (IQR) increase in exposure to each pollutant.
f
The sample size (n), median and interquartile range (IQR) of the 2-year...
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Sep 8, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xu, Steven Jingliang; Lee, Fred Wang-Fat; Ho, Simon Yat-Fan (2020). The sample size (n), median and interquartile range (IQR) of the 2-year measurements taken by Citizen Science Leaders (CSLs) compared with those taken by the Environmental Protection Department of Hong Kong (EPD) where two locations were about 100 m apart from each other. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000449992
Explore at:
Dataset updated
Sep 8, 2020
Authors
Xu, Steven Jingliang; Lee, Fred Wang-Fat; Ho, Simon Yat-Fan
Description
The sample size (n), median and interquartile range (IQR) of the 2-year measurements taken by Citizen Science Leaders (CSLs) compared with those taken by the Environmental Protection Department of Hong Kong (EPD) where two locations were about 100 m apart from each other.
Precipitation Interquartile Range Fall Estimation (PERSIANN) 1984-2014
noaa.hub.arcgis.com
Updated Dec 17, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NOAA GeoPlatform (2024). Precipitation Interquartile Range Fall Estimation (PERSIANN) 1984-2014 [Dataset]. https://noaa.hub.arcgis.com/maps/8ddea6c7812e45b6b1c9848e6d93ad38
Explore at:
Dataset updated
Dec 17, 2024
Dataset provided by
National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
Authors
NOAA GeoPlatform
Area covered

Description
The Precipitation Estimation from Remotely Sensed Information using an Artificial Neural Network-Climate Data Record (PERSIANN-CDR) is a satellite-based precipitation dataset for hydrological and climate studies, spanning from 1983 to present. It is the longest satellite-based precipitation record available, with daily data at 0.25° resolution for the 60°S–60°N latitude band.PERSIANN rain rate estimates are generated at 0.25° resolution and calibrated to a monthly merged in-situ and satellite product from the Global Precipitation Climatology Project (GPCP). The model uses Gridded Satellite (GridSat-B1) infrared data at 3-hourly time steps, with the raw output (PERSIANN-B1) bias-corrected and accumulated to produce the daily PERSIANN-CDR.The maps show 31 years (1984–2014) of annual and seasonal median and interquartile range (IQR) data. The median represents the 50th percentile of precipitation, and the IQR reflects the range between the 75th and 25th percentiles, showing data variability. Median and IQR are preferred over mean and standard deviation as they are less influenced by extreme values and better represent non-normally distributed data, such as precipitation, which is skewed and zero-limited.Data and Metadata: NCEIThis is a component of the Gulf Data Atlas (V1.0) for the Physical topic area.
United States Climate Reference Network (USCRN) Standardized Soil Moisture...
catalog.data.gov
s.cnmilf.com
+2more
Updated Sep 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NOAA National Centers for Environmental Information (Point of Contact) (2023). United States Climate Reference Network (USCRN) Standardized Soil Moisture and Soil Moisture Climatology [Dataset]. https://catalog.data.gov/dataset/united-states-climate-reference-network-uscrn-standardized-soil-moisture-and-soil-moisture-clim2
Explore at:
Dataset updated
Sep 19, 2023
Dataset provided by
National Centers for Environmental Informationhttps://www.ncei.noaa.gov/
National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
Area covered
United States
Description
The U.S. Climate Reference Network (USCRN) was designed to monitor the climate of the United States using research quality instrumentation located within representative pristine environments. This Standardized Soil Moisture (SSM) and Soil Moisture Climatology (SMC) product set is derived using the soil moisture observations from the USCRN. The hourly soil moisture anomaly (SMANOM) is derived by subtracting the MEDIAN from the soil moisture volumetric water content (SMVWC) and dividing the difference by the interquartile range (IQR = 75th percentile - 25th percentile) for that hour: SMANOM = (SMVWC - MEDIAN) / (IQR). The soil moisture percentile (SMPERC) is derived by taking all the values that were used to create the empirical cumulative distribution function (ECDF) that yielded the hourly MEDIAN and adding the current observation to the set, recalculating the ECDF, and determining the percentile value of the current observation. Finally, the soil temperature for the individual layers is provided for the dataset user convenience. The SMC files contain the MEAN, MEDIAN, IQR, and decimal fraction of available data that are valid for each hour of the year at 5, 10, 20, 50, and 100 cm depth soil layers as well as for a top soil layer (TOP) and column soil layer (COLUMN). The TOP layer consists of an average of the 5 and 10 cm depths, while the COLUMN layer includes all available depths at a location, either two layers or five layers depending on soil depth. The SSM files contain the mean VWC, SMANOM, SMPERC, and TEMPERATURE for each of the depth layers described above. File names are structured as CRNSSM0101-STATIONNAME.csv and CRNSMC0101-STATIONNAME.csv. SSM stands for Standardized Soil Moisture and SCM represent Soil Moisture Climatology. The first two digits of the trailing integer indicate major version and the second two digits minor version of the product.
Human Activity Recognition Dataset
kaggle.com
zip
Updated Feb 21, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aruna S (2023). Human Activity Recognition Dataset [Dataset]. https://www.kaggle.com/datasets/arunasivapragasam/human-activity-recognition-dataset
Explore at:
zip(51310476 bytes)Available download formats
Dataset updated
Feb 21, 2023
Authors
Aruna S
Description
The experiments have been carried out with a group of 30 volunteers within an age bracket of 19-48 years. Each person performed six activities (WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING) wearing a smartphone (Samsung Galaxy S II) on the waist. Using its embedded accelerometer and gyroscope, we captured 3-axial linear acceleration and 3-axial angular velocity at a constant rate of 50Hz. The experiments have been video-recorded to label the data manually. The obtained dataset has been randomly partitioned into two sets, where 70% of the volunteers was selected for generating the training data and 30% for the test data.

The sensor signals (accelerometer and gyroscope) were pre-processed by applying noise filters and then sampled in fixed-width sliding windows of 2.56 sec and 50% overlap (128 readings/window). The sensor acceleration signal, which has gravitational and body motion components, was separated using a Butterworth low-pass filter into body acceleration and gravity. The gravitational force is assumed to have only low-frequency components, therefore a filter with 0.3 Hz cutoff frequency was used. From each window, a vector of features was obtained by calculating variables from the time and frequency domain.

The features selected for this database come from the accelerometer and gyroscope 3-axial raw signals tAcc-XYZ and tGyro-XYZ. These time-domain signals (prefix 't' to denote time) were captured at a constant rate of 50 Hz. Then they were filtered using a median filter and a 3rd order low pass Butterworth filter with a corner frequency of 20 Hz to remove noise. Similarly, the acceleration signal was then separated into the body and gravity acceleration signals (tBodyAcc-XYZ and tGravityAcc-XYZ) using another low pass Butterworth filter with a corner frequency of 0.3 Hz.

Subsequently, the body l linear acceleration and angular velocity were derived in time to obtain Jerk signals (tBodyAccJerk-XYZ and tBodyGyroJerk-XYZ). Also the magnitude of these three-dimensional signals were calculated using the Euclidean norm (tBodyAccMag, tGravityAccMag, tBodyAccJerkMag, tBodyGyroMag, tBodyGyroJerkMag).

Finally a Fast Fourier Transform (FFT) was applied to some of these signals producing fBodyAcc-XYZ, fBodyAccJerk-XYZ, fBodyGyro-XYZ, fBodyAccJerkMag, fBodyGyroMag, fBodyGyroJerkMag. (Note the 'f' to indicate frequency domain signals).

These signals were used to estimate variables of the feature vector for each pattern: '-XYZ' is used to denote 3-axial signals in the X, Y, and Z directions.

tBodyAcc-XYZ tGravityAcc-XYZ tBodyAccJerk-XYZ tBodyGyro-XYZ tBodyGyroJerk-XYZ tBodyAccMag tGravityAccMag tBodyAccJerkMag tBodyGyroMag tBodyGyroJerkMag fBodyAcc-XYZ fBodyAccJerk-XYZ fBodyGyro-XYZ fBodyAccMag fBodyAccJerkMag fBodyGyroMag fBodyGyroJerkMag

The set of variables that were estimated from these signals are:

mean(): Mean value std(): Standard deviation mad(): Median absolute deviation max(): Largest value in array min(): Smallest value in array sma(): Signal magnitude area energy(): Energy measure. Sum of the squares divided by the number of values. iqr(): Interquartile range entropy(): Signal entropy arCoeff(): Autorregresion coefficients with Burg order equal to 4 correlation(): correlation coefficient between two signals maxInds(): index of the frequency component with the largest magnitude meanFreq(): Weighted average of the frequency components to obtain a mean frequency skewness(): skewness of the frequency domain signal kurtosis(): kurtosis of the frequency domain signal bandsEnergy(): Energy of a frequency interval within the 64 bins of the FFT of each window. angle(): Angle between two vectors.

Additional vectors are obtained by averaging the signals in a signal window sample. These are used on the angle() variable:

gravityMean tBodyAccMean tBodyAccJerkMean tBodyGyroMean tBodyGyroJerkMean

This data set consists of the following columns:

1 tBodyAcc-mean()-X 2 tBodyAcc-mean()-Y 3 tBodyAcc-mean()-Z 4 tBodyAcc-std()-X 5 tBodyAcc-std()-Y 6 tBodyAcc-std()-Z 7 tBodyAcc-mad()-X 8 tBodyAcc-mad()-Y 9 tBodyAcc-mad()-Z 10 tBodyAcc-max()-X 11 tBodyAcc-max()-Y 12 tBodyAcc-max()-Z 13 tBodyAcc-min()-X 14 tBodyAcc-min()-Y 15 tBodyAcc-min()-Z 16 tBodyAcc-sma() 17 tBodyAcc-energy()-X 18 tBodyAcc-energy()-Y 19 tBodyAcc-energy()-Z 20 tBodyAcc-iqr()-X 21 tBodyAcc-iqr()-Y 22 tBodyAcc-iqr()-Z 23 tBodyAcc-entropy()-X 24 tBodyAcc-entropy()-Y 25 tBodyAcc-entropy()-Z 26 tBodyAcc-arCoeff()-X,1 27 tBodyAcc-arCoeff()-X,2 28 tBodyAcc-arCoeff()-X,3 29 tBodyAcc-arCoeff()-X,4 30 tBodyAcc-arCoeff()-Y,1 31 tBodyAcc-arCoeff()-Y,2 32 tBodyAcc-arCoeff()-Y,3 33 tBodyAcc-arCoeff()-Y,4 34 tBodyAcc-arCoeff()-Z,1 35 tBodyAcc-arCoeff()-Z,2 36 tBodyAcc-arCoeff()-Z,3 37 tBodyAcc-arCoeff()-Z,4 38 tBodyAcc-correlation()-X,Y 39 tBodyAcc-correlation()-X,Z 40 tBodyAcc-correlation()-Y,Z 41 tGravityAcc-mean()-X 42 tGravit...

Facebook

Twitter

Click to copy link

Link copied

Cite

Whaley, Dana H.; Denis, Max; Alizad, Azra; Pruthi, Sandhya; Mehrmohammadi, Mohammad; Chen, Shigao; Song, Pengfei; Meixner, Duane D.; Fatemi, Mostafa; Fazzio, Robert T. (2015). Median values, interquartile range (IQR) and Number of outliers. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001856309

Median values, interquartile range (IQR) and Number of outliers.

Explore at:

Dataset updated

Mar 16, 2015

Authors

Whaley, Dana H.; Denis, Max; Alizad, Azra; Pruthi, Sandhya; Mehrmohammadi, Mohammad; Chen, Shigao; Song, Pengfei; Meixner, Duane D.; Fatemi, Mostafa; Fazzio, Robert T.

Description

Median values, interquartile range (IQR) and Number of outliers.

Clear search

Close search

Google apps

Main menu

Median values, interquartile range (IQR) and Number of outliers.

Median, interquartile range (IQR) and significance level of the difference...

Descriptive statistics, mean ± SD, range, median and interquartile range...

Characteristics of women, overall and according to BMI categories; data...

Meta data and supporting documentation

Simulation Data Set

The median, interquartile range (IQR) and range of the minimum (Factors I,...

Median (interquartile range) of percentage of adult respondents with need...

Numpy , pandas and matplot lib practice

Proportion of positive results, interquartile range (IQR), minimum-maximum...

Median response times in seconds (interquartile range in parenthesis) as a...

Time Series Data of Carbon Monoxide Concentrations

Data from: GEOMACS (Geological and Oceanographic Model of Australias...

Ames Housing Dataset with Engineered Features

Dataset Description: Ames Housing Dataset with Engineered Features

Median and interquartile range of R0 by serotype and by province.

Median (InterQuartile Range, IQR) of air polltants and adjusteda odds ratio...

The sample size (n), median and interquartile range (IQR) of the 2-year...

Precipitation Interquartile Range Fall Estimation (PERSIANN) 1984-2014

United States Climate Reference Network (USCRN) Standardized Soil Moisture...

Human Activity Recognition Dataset

Median values, interquartile range (IQR) and Number of outliers.