85 datasets found
  1. m

    Panel dataset on Brazilian fuel demand

    • data.mendeley.com
    Updated Oct 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sergio Prolo (2024). Panel dataset on Brazilian fuel demand [Dataset]. http://doi.org/10.17632/hzpwbp7j22.1
    Explore at:
    Dataset updated
    Oct 7, 2024
    Authors
    Sergio Prolo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Brazil
    Description

    Summary : Fuel demand is shown to be influenced by fuel prices, people's income and motorization rates. We explore the effects of electric vehicle's rates in gasoline demand using this panel dataset.

    Files : dataset.csv - Panel dimensions are the Brazilian state ( i ) and year ( t ). The other columns are: gasoline sales per capita (ln_Sg_pc), prices of gasoline (ln_Pg) and ethanol (ln_Pe) and their lags, motorization rates of combustion vehicles (ln_Mi_c) and electric vehicles (ln_Mi_e) and GDP per capita (ln_gdp_pc). All variables are all under the natural log function, since we use this to calculate demand elasticities in a regression model.

    adjacency.csv - The adjacency matrix used in interaction with electric vehicles' motorization rates to calculate spatial effects. At first, it follows a binary adjacency formula: for each pair of states i and j, the cell (i, j) is 0 if the states are not adjacent and 1 if they are. Then, each row is normalized to have sum equal to one.

    regression.do - Series of Stata commands used to estimate the regression models of our study. dataset.csv must be imported to work, see comment section.

    dataset_predictions.xlsx - Based on the estimations from Stata, we use this excel file to make average predictions by year and by state. Also, by including years beyond the last panel sample, we also forecast the model into the future and evaluate the effects of different policies that influence gasoline prices (taxation) and EV motorization rates (electrification). This file is primarily used to create images, but can be used to further understand how the forecasting scenarios are set up.

    Sources: Fuel prices and sales: ANP (https://www.gov.br/anp/en/access-information/what-is-anp/what-is-anp) State population, GDP and vehicle fleet: IBGE (https://www.ibge.gov.br/en/home-eng.html?lang=en-GB) State EV fleet: Anfavea (https://anfavea.com.br/en/site/anuarios/)

  2. q

    Linear Regression (Excel) and Cellular Respiration for Biology, Chemistry...

    • qubeshub.org
    Updated Jan 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Irene Corriette; Beatriz Gonzalez; Daniela Kitanska; Henriette Mozsolits; Sheela Vemu (2022). Linear Regression (Excel) and Cellular Respiration for Biology, Chemistry and Mathematics [Dataset]. http://doi.org/10.25334/5PX5-H796
    Explore at:
    Dataset updated
    Jan 11, 2022
    Dataset provided by
    QUBES
    Authors
    Irene Corriette; Beatriz Gonzalez; Daniela Kitanska; Henriette Mozsolits; Sheela Vemu
    Description

    Students typically find linear regression analysis of data sets in a biology classroom challenging. These activities could be used in a Biology, Chemistry, Mathematics, or Statistics course. The collection provides student activity files with Excel instructions and Instructor Activity files with Excel instructions and solutions to problems.

    Students will be able to perform linear regression analysis, find correlation coefficient, create a scatter plot and find the r-square using MS Excel 365. Students will be able to interpret data sets, describe the relationship between biological variables, and predict the value of an output variable based on the input of an predictor variable.

  3. polynomial regression

    • kaggle.com
    Updated Jul 5, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Miraj Deep Bhandari (2023). polynomial regression [Dataset]. http://doi.org/10.34740/kaggle/ds/3482232
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 5, 2023
    Dataset provided by
    Kaggle
    Authors
    Miraj Deep Bhandari
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The Ice Cream Selling dataset is a simple and well-suited dataset for beginners in machine learning who are looking to practice polynomial regression. It consists of two columns: temperature and the corresponding number of units of ice cream sold.

    The dataset captures the relationship between temperature and ice cream sales. It serves as a practical example for understanding and implementing polynomial regression, a powerful technique for modeling nonlinear relationships in data.

    The dataset is designed to be straightforward and easy to work with, making it ideal for beginners. The simplicity of the data allows beginners to focus on the fundamental concepts and steps involved in polynomial regression without overwhelming complexity.

    By using this dataset, beginners can gain hands-on experience in preprocessing the data, splitting it into training and testing sets, selecting an appropriate degree for the polynomial regression model, training the model, and evaluating its performance. They can also explore techniques to address potential challenges such as overfitting.

    With this dataset, beginners can practice making predictions of ice cream sales based on temperature inputs and visualize the polynomial regression curve that represents the relationship between temperature and ice cream sales.

    Overall, the Ice Cream Selling dataset provides an accessible and practical learning resource for beginners to grasp the concepts and techniques of polynomial regression in the context of analyzing ice cream sales data.

  4. Linear Regression

    • kaggle.com
    zip
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fareed Khan (2022). Linear Regression [Dataset]. https://www.kaggle.com/datasets/fareedkhan557/linear-regression
    Explore at:
    zip(14257572 bytes)Available download formats
    Dataset updated
    Jan 29, 2022
    Authors
    Fareed Khan
    Description

    Context

    This dataset is created using the sources from this dataset.

    Content

    Single variable regression model:

    \[ y = mx + c \]

    Both training dataset and testing dataset contain 1 Million rows. 1) x-values are numbers between 1 and 100. 2) y-values are created using this excel function: NORMINV(RAND(), x, 3).

    License: feel free to use

  5. m

    Global Burden of Disease analysis dataset of noncommunicable disease...

    • data.mendeley.com
    Updated Apr 6, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Cundiff (2023). Global Burden of Disease analysis dataset of noncommunicable disease outcomes, risk factors, and SAS codes [Dataset]. http://doi.org/10.17632/g6b39zxck4.10
    Explore at:
    Dataset updated
    Apr 6, 2023
    Authors
    David Cundiff
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This formatted dataset (AnalysisDatabaseGBD) originates from raw data files from the Institute of Health Metrics and Evaluation (IHME) Global Burden of Disease Study (GBD2017) affiliated with the University of Washington. We are volunteer collaborators with IHME and not employed by IHME or the University of Washington.

    The population weighted GBD2017 data are on male and female cohorts ages 15-69 years including noncommunicable diseases (NCDs), body mass index (BMI), cardiovascular disease (CVD), and other health outcomes and associated dietary, metabolic, and other risk factors. The purpose of creating this population-weighted, formatted database is to explore the univariate and multiple regression correlations of health outcomes with risk factors. Our research hypothesis is that we can successfully model NCDs, BMI, CVD, and other health outcomes with their attributable risks.

    These Global Burden of disease data relate to the preprint: The EAT-Lancet Commission Planetary Health Diet compared with Institute of Health Metrics and Evaluation Global Burden of Disease Ecological Data Analysis. The data include the following: 1. Analysis database of population weighted GBD2017 data that includes over 40 health risk factors, noncommunicable disease deaths/100k/year of male and female cohorts ages 15-69 years from 195 countries (the primary outcome variable that includes over 100 types of noncommunicable diseases) and over 20 individual noncommunicable diseases (e.g., ischemic heart disease, colon cancer, etc). 2. A text file to import the analysis database into SAS 3. The SAS code to format the analysis database to be used for analytics 4. SAS code for deriving Tables 1, 2, 3 and Supplementary Tables 5 and 6 5. SAS code for deriving the multiple regression formula in Table 4. 6. SAS code for deriving the multiple regression formula in Table 5 7. SAS code for deriving the multiple regression formula in Supplementary Table 7
    8. SAS code for deriving the multiple regression formula in Supplementary Table 8 9. The Excel files that accompanied the above SAS code to produce the tables

    For questions, please email davidkcundiff@gmail.com. Thanks.

  6. m

    Ultimate_Analysis

    • data.mendeley.com
    Updated Jan 28, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Akara Kijkarncharoensin (2022). Ultimate_Analysis [Dataset]. http://doi.org/10.17632/t8x96g88p3.2
    Explore at:
    Dataset updated
    Jan 28, 2022
    Authors
    Akara Kijkarncharoensin
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This database studies the performance inconsistency on the biomass HHV ultimate analysis. The research null hypothesis is the consistency in the rank of a biomass HHV model. Fifteen biomass models are trained and tested in four datasets. In each dataset, the rank invariability of these 15 models indicates the performance consistency.

    The database includes the datasets and source codes to analyze the performance consistency of the biomass HHV. These datasets are stored in tabular on an excel workbook. The source codes are the biomass HHV machine learning model through the MATLAB Objected Orient Program (OOP). These machine learning models consist of eight regressions, four supervised learnings, and three neural networks.

    An excel workbook, "BiomassDataSetUltimate.xlsx," collects the research datasets in six worksheets. The first worksheet, "Ultimate," contains 908 HHV data from 20 pieces of literature. The names of the worksheet column indicate the elements of the ultimate analysis on a % dry basis. The HHV column refers to the higher heating value in MJ/kg. The following worksheet, "Full Residuals," backups the model testing's residuals based on the 20-fold cross-validations. The article (Kijkarncharoensin & Innet, 2021) verifies the performance consistency through these residuals. The other worksheets present the literature datasets implemented to train and test the model performance in many pieces of literature.

    A file named "SourceCodeUltimate.rar" collects the MATLAB machine learning models implemented in the article. The list of the folders in this file is the class structure of the machine learning models. These classes extend the features of the original MATLAB's Statistics and Machine Learning Toolbox to support, e.g., the k-fold cross-validation. The MATLAB script, name "runStudyUltimate.m," is the article's main program to analyze the performance consistency of the biomass HHV model through the ultimate analysis. The script instantly loads the datasets from the excel workbook and automatically fits the biomass model through the OOP classes.

    The first section of the MATLAB script generates the most accurate model by optimizing the model's higher parameters. It takes a few hours for the first run to train the machine learning model via the trial and error process. The trained models can be saved in MATLAB .mat file and loaded back to the MATLAB workspace. The remaining script, separated by the script section break, performs the residual analysis to inspect the performance consistency. Furthermore, the figure of the biomass data in the 3D scatter plot, and the box plots of the prediction residuals are exhibited. Finally, the interpretations of these results are examined in the author's article.

    Reference : Kijkarncharoensin, A., & Innet, S. (2022). Performance inconsistency of the Biomass Higher Heating Value (HHV) Models derived from Ultimate Analysis [Manuscript in preparation]. University of the Thai Chamber of Commerce.

  7. File3. Fama-MacBeth first-stage time-series regression data

    • figshare.com
    xlsx
    Updated Apr 6, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Youcong Chao; Xiaoqun Liu; Shijun Guo (2017). File3. Fama-MacBeth first-stage time-series regression data [Dataset]. http://doi.org/10.6084/m9.figshare.4658806.v2
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Apr 6, 2017
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Youcong Chao; Xiaoqun Liu; Shijun Guo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This file includes 3 excels on Fama-MacBeth first-stage time-series regression data. The first excel includes portfolios “s1b1”-“s1b5” and “s2b1”-“s2b5”; the second excel includes portfolios “s3b1”-“s3b5” and “s4b1”-“s4b5”, and the third one includes the portfolios “s5b1”-“s5b5”

  8. Independent predictors of adverse pregnancy outcome in mothers attending...

    • plos.figshare.com
    xls
    Updated Jun 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mesfin Tadese; Kefyalew Dagne; Abate Dargie Wubetu; Shiferaw Abeway; Alemayehu Bekele; Worku Misganaw Kebede; Getaneh Baye Mulu (2023). Independent predictors of adverse pregnancy outcome in mothers attending obstetric care at Debre Berhan Comprehensive Specialized Hospital, Northeast Ethiopia. [Dataset]. http://doi.org/10.1371/journal.pone.0271287.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 17, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Mesfin Tadese; Kefyalew Dagne; Abate Dargie Wubetu; Shiferaw Abeway; Alemayehu Bekele; Worku Misganaw Kebede; Getaneh Baye Mulu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Ethiopia, Debre Birhan
    Description

    Independent predictors of adverse pregnancy outcome in mothers attending obstetric care at Debre Berhan Comprehensive Specialized Hospital, Northeast Ethiopia.

  9. m

    Data for: A systematic review showed no performance benefit of machine...

    • data.mendeley.com
    • search.datacite.org
    Updated Mar 14, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ben Van Calster (2019). Data for: A systematic review showed no performance benefit of machine learning over logistic regression for clinical prediction models [Dataset]. http://doi.org/10.17632/sypyt6c2mc.1
    Explore at:
    Dataset updated
    Mar 14, 2019
    Authors
    Ben Van Calster
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The uploaded files are:

    1) Excel file containing 6 sheets in respective Order: "Data Extraction" (summarized final data extractions from the three reviewers involved), "Comparison Data" (data related to the comparisons investigated), "Paper level data" (summaries at paper level), "Outcome Event Data" (information with respect to number of events for every outcome investigated within a paper), "Tuning Classification" (data related to the manner of hyperparameter tuning of Machine Learning Algorithms).

    2) R script used for the Analysis (In order to read the data, please: Save "Comparison Data", "Paper level data", "Outcome Event Data" Excel sheets as txt files. In the R script srpap: Refers to the "Paper level data" sheet, srevents: Refers to the "Outcome Event Data" sheet and srcompx: Refers to " Comparison data Sheet".

    3) Supplementary Material: Including Search String, Tables of data, Figures

    4) PRISMA checklist items

  10. Housing Prices Dataset

    • kaggle.com
    zip
    Updated Jan 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    M Yasser H (2022). Housing Prices Dataset [Dataset]. https://www.kaggle.com/datasets/yasserh/housing-prices-dataset
    Explore at:
    zip(4740 bytes)Available download formats
    Dataset updated
    Jan 12, 2022
    Authors
    M Yasser H
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    https://raw.githubusercontent.com/Masterx-AI/Project_Housing_Price_Prediction_/main/hs.jpg" alt="">

    Description:

    A simple yet challenging project, to predict the housing price based on certain factors like house area, bedrooms, furnished, nearness to mainroad, etc. The dataset is small yet, it's complexity arises due to the fact that it has strong multicollinearity. Can you overcome these obstacles & build a decent predictive model?

    Acknowledgement:

    Harrison, D. and Rubinfeld, D.L. (1978) Hedonic prices and the demand for clean air. J. Environ. Economics and Management 5, 81–102. Belsley D.A., Kuh, E. and Welsch, R.E. (1980) Regression Diagnostics. Identifying Influential Data and Sources of Collinearity. New York: Wiley.

    Objective:

    • Understand the Dataset & cleanup (if required).
    • Build Regression models to predict the sales w.r.t a single & multiple feature.
    • Also evaluate the models & compare thier respective scores like R2, RMSE, etc.
  11. f

    Final Fama-MacBeth regression data

    • figshare.com
    xlsx
    Updated Feb 16, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Youcong Chao; Xiaoqun Liu; Shijun Guo (2017). Final Fama-MacBeth regression data [Dataset]. http://doi.org/10.6084/m9.figshare.4658812.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 16, 2017
    Dataset provided by
    figshare
    Authors
    Youcong Chao; Xiaoqun Liu; Shijun Guo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The excel is Fama-MacBeth second-stage cross-sectional regression data. There are three sheets corresponding to data for three cross-sectional regression models the results of which are shown in table 5 in the context. The last sheet presents the regression results for the three cross-sectional regressions.

  12. Data from: Responses to environmental variability by herbivorous insects and...

    • catalog.data.gov
    • datasets.ai
    • +2more
    Updated Jun 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Data from: Responses to environmental variability by herbivorous insects and their natural enemies within a bioenergy crop, Miscanthus x giganteus [Dataset]. https://catalog.data.gov/dataset/data-from-responses-to-environmental-variability-by-herbivorous-insects-and-their-natural--e1e1d
    Explore at:
    Dataset updated
    Jun 5, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Description

    Description: This dataset consists of field data (arthropods, nematodes and NDVI) collected over the course of 6 field excursions in 2015 and 2016 near TyTy, GA, in a field used for growing Miscanthus x giganteus. It also includes interpolated values of soil measurements collected in 2015 and meteorological data collected on an adjacent farm. Point-in-time measurements include all meteorological, NDVI, arthropod and nematode measurements and their derivatives. Fixed values were measurements that were held constant across all sampling dates, including location, terrain and soils measurements and their derivatives. Dawn Olson and Jason Schmidt collected and processed arthropod count data. Jason Schmidt collected and processed spider count data and computed spider diversity. Richard Davis collected and processed nematode count data. Alisa Coffin collected and processed NDVI data and positional locations. Tim Strickland collected and processed soils data and Alisa Coffin interpolated soils values using kriging to derive values at arthropod sample locations. David Bosch collected and processed meteorological data. Lynne Seymour provided statistical expertise in deriving any estimated values (phloem feeders, parasitoids, spiders, and natural enemies). Alisa Coffin derived terrain data (elevation, slope, aspect, and distances) from publicly available datasets, transformed values (SI, WI, etc), carried out the geographically weighted regression analysis and calculated C:SE values, harmonized the full dataset, and compiled it using Esri's ArcGIS Pro 2.5. Methods for most data are published in the accompanying paper and associated supplements. Questions about dataset development and management should be directed to Alisa Coffin (alisa.coffin@usda.gov). This work was accomplished as a joint USDA and University of Georgia project funded by a cooperative agreement (#6048-13000-026-21S). This research was a contribution from the Long-Term Agroecosystem Research (LTAR) network. LTAR is supported by the United States Department of Agriculture. At request of the author, the data resources are under embargo. The embargo will expire on Fri, Jan 01, 2021. Resources in this dataset:Resource Title: Spreadsheet of data. File Name: GibbsMisFarm_Arthrop_Env_DepVar_201516_final.xlsxResource Description: This workbook contains all of the data used in this analysis. The first worksheet contains data dictionary information.Resource Software Recommended: Microsoft Excel, Office 365,url: https://www.microsoft.com/en-us/microsoft-365/excel Resource Title: GeoJSON. File Name: MiscanthusXGiganteusGeoJSON.json

  13. Cleaned NHANES 1988-2018

    • figshare.com
    txt
    Updated Feb 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vy Nguyen; Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet (2025). Cleaned NHANES 1988-2018 [Dataset]. http://doi.org/10.6084/m9.figshare.21743372.v9
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 18, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Vy Nguyen; Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The National Health and Nutrition Examination Survey (NHANES) provides data and have considerable potential to study the health and environmental exposure of the non-institutionalized US population. However, as NHANES data are plagued with multiple inconsistencies, processing these data is required before deriving new insights through large-scale analyses. Thus, we developed a set of curated and unified datasets by merging 614 separate files and harmonizing unrestricted data across NHANES III (1988-1994) and Continuous (1999-2018), totaling 135,310 participants and 5,078 variables. The variables conveydemographics (281 variables),dietary consumption (324 variables),physiological functions (1,040 variables),occupation (61 variables),questionnaires (1444 variables, e.g., physical activity, medical conditions, diabetes, reproductive health, blood pressure and cholesterol, early childhood),medications (29 variables),mortality information linked from the National Death Index (15 variables),survey weights (857 variables),environmental exposure biomarker measurements (598 variables), andchemical comments indicating which measurements are below or above the lower limit of detection (505 variables).csv Data Record: The curated NHANES datasets and the data dictionaries includes 23 .csv files and 1 excel file.The curated NHANES datasets involves 20 .csv formatted files, two for each module with one as the uncleaned version and the other as the cleaned version. The modules are labeled as the following: 1) mortality, 2) dietary, 3) demographics, 4) response, 5) medications, 6) questionnaire, 7) chemicals, 8) occupation, 9) weights, and 10) comments."dictionary_nhanes.csv" is a dictionary that lists the variable name, description, module, category, units, CAS Number, comment use, chemical family, chemical family shortened, number of measurements, and cycles available for all 5,078 variables in NHANES."dictionary_harmonized_categories.csv" contains the harmonized categories for the categorical variables.“dictionary_drug_codes.csv” contains the dictionary for descriptors on the drugs codes.“nhanes_inconsistencies_documentation.xlsx” is an excel file that contains the cleaning documentation, which records all the inconsistencies for all affected variables to help curate each of the NHANES modules.R Data Record: For researchers who want to conduct their analysis in the R programming language, only cleaned NHANES modules and the data dictionaries can be downloaded as a .zip file which include an .RData file and an .R file.“w - nhanes_1988_2018.RData” contains all the aforementioned datasets as R data objects. We make available all R scripts on customized functions that were written to curate the data.“m - nhanes_1988_2018.R” shows how we used the customized functions (i.e. our pipeline) to curate the original NHANES data.Example starter codes: The set of starter code to help users conduct exposome analysis consists of four R markdown files (.Rmd). We recommend going through the tutorials in order.“example_0 - merge_datasets_together.Rmd” demonstrates how to merge the curated NHANES datasets together.“example_1 - account_for_nhanes_design.Rmd” demonstrates how to conduct a linear regression model, a survey-weighted regression model, a Cox proportional hazard model, and a survey-weighted Cox proportional hazard model.“example_2 - calculate_summary_statistics.Rmd” demonstrates how to calculate summary statistics for one variable and multiple variables with and without accounting for the NHANES sampling design.“example_3 - run_multiple_regressions.Rmd” demonstrates how run multiple regression models with and without adjusting for the sampling design.

  14. Excel Project(MA,Returns,Regression)

    • kaggle.com
    zip
    Updated Oct 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dishani Mishra (2022). Excel Project(MA,Returns,Regression) [Dataset]. https://www.kaggle.com/datasets/dishanimishra/excel-projectmareturnsregression
    Explore at:
    zip(46410 bytes)Available download formats
    Dataset updated
    Oct 14, 2022
    Authors
    Dishani Mishra
    Description

    Dataset

    This dataset was created by Dishani Mishra

    Contents

  15. File2. Data for Time-series regression in 2007 to 2013

    • figshare.com
    xlsx
    Updated Feb 16, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Youcong Chao; Xiaoqun Liu; Shijun Guo (2017). File2. Data for Time-series regression in 2007 to 2013 [Dataset]. http://doi.org/10.6084/m9.figshare.4658797.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 16, 2017
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Youcong Chao; Xiaoqun Liu; Shijun Guo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    File 2 includes seven excels, named Data for Time-series regression in 2007 to Data for Time-series regression in 2013 respectively. In specific, we average all the 25 portfolios’ realized jump measures in 2007 as just one series from File 1 in the excel of “Data for Horse running regression in 2007”. The excels constitute jump variables and the other variables, among which value-weighted monthly returns of 25 portfolios are directly from the RESSET database.

  16. Determining the utility of regression-based CO:NOx ratios for emissions...

    • catalog.data.gov
    • datasets.ai
    • +1more
    Updated Jan 5, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2021). Determining the utility of regression-based CO:NOx ratios for emissions validation in a near-road environment [Dataset]. https://catalog.data.gov/dataset/determining-the-utility-of-regression-based-co-nox-ratios-for-emissions-validation-in-a-ne
    Explore at:
    Dataset updated
    Jan 5, 2021
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    Datasets used in the analysis presented in the manuscript “Determining the utility of regression-based CO:NOx ratios for emissions validation in a near-road environment”. The datasets include hourly CO and NOx emissions estimated by the MOVES model (CONOx_MOVES.xlsx) and 5 minute NO, NO2, NOx and CO observations from the Las Vegas near-road field study described in the paper (LV_5min_S1S2S3S4.xlsx). Each file is an excel spread sheet (.xlsx) and metadata on the variables within the files is provided under the Readme tab of each spreadsheet. This dataset is associated with the following publication: Simon, H., B. Henderson, R. Owen, K. Foley, M. Snyder, and E. Kimbrough. Variability in observation-based onroad emission constraints from a near-road environment. ATMOSPHERE. MDPI AG, Basel, SWITZERLAND, 11(11): 1243, (2020).

  17. Ghana Cryptocurrency Awareness and Adoption Survey Data

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    bin
    Updated Jun 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sheena Lovia Boateng; Desmond Kwadjo Kumi (2024). Ghana Cryptocurrency Awareness and Adoption Survey Data [Dataset]. http://doi.org/10.6084/m9.figshare.25981909.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 6, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Sheena Lovia Boateng; Desmond Kwadjo Kumi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Ghana
    Description

    The dataset presents responses from a survey assessing the level of awareness and adoption of cryptocurrency among college students and staff in a developing economy, specifically Ghana. The data was collected using a structured questionnaire administered through a snowball sampling technique, involving 989 students and staff from six Ghanaian universities. The survey data collection lasted three months between April and May 2022. The dataset comprises raw data available in SPSS (.sav), and Excel (.xlsx) formats with a codebook. It captures self-reported levels of awareness of blockchain technology and cryptocurrency, as well as various determinants of cryptocurrency adoption based on the Utility Theory and the Unified Theory of Acceptance and Use of Technology (UTAUT). The questionnaire was divided into three sections: awareness of blockchain and cryptocurrency, empirical data on drivers of cryptocurrency adoption, and demographic information of respondents. The dataset provides valuable insights into the factors influencing cryptocurrency adoption in a developing economy context. It is suitable for further multivariate analysis, including Structural Equation Modeling (SEM), multigroup analysis (MGA), and logistic regression analysis. The data allows for reuse in future research to explore trends and drivers of cryptocurrency adoption in similar contexts.

  18. H

    Relaxed Naïve Bayes Data

    • dataverse.harvard.edu
    Updated Aug 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Relaxed Naïve Bayes Team (2023). Relaxed Naïve Bayes Data [Dataset]. http://doi.org/10.7910/DVN/7KNKLL
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 7, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Relaxed Naïve Bayes Team
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    NaiveBayes_R.xlsx: This Excel file includes information as to how probabilities of observed features are calculated given recidivism (P(x_ij│R)) in the training data. Each cell is embedded with an Excel function to render appropriate figures. P(Xi|R): This tab contains probabilities of feature attributes among recidivated offenders. NIJ_Recoded: This tab contains re-coded NIJ recidivism challenge data following our coding schema described in Table 1. Recidivated_Train: This tab contains re-coded features of recidivated offenders. Tabs from [Gender] through [Condition_Other]: Each tab contains probabilities of feature attributes given recidivism. We use these conditional probabilities to replace the raw values of each feature in P(Xi|R) tab. NaiveBayes_NR.xlsx: This Excel file includes information as to how probabilities of observed features are calculated given non-recidivism (P(x_ij│N)) in the training data. Each cell is embedded with an Excel function to render appropriate figures. P(Xi|N): This tab contains probabilities of feature attributes among non-recidivated offenders. NIJ_Recoded: This tab contains re-coded NIJ recidivism challenge data following our coding schema described in Table 1. NonRecidivated_Train: This tab contains re-coded features of non-recidivated offenders. Tabs from [Gender] through [Condition_Other]: Each tab contains probabilities of feature attributes given non-recidivism. We use these conditional probabilities to replace the raw values of each feature in P(Xi|N) tab. Training_LnTransformed.xlsx: Figures in each cell are log-transformed ratios of probabilities in NaiveBayes_R.xlsx (P(Xi|R)) to the probabilities in NaiveBayes_NR.xlsx (P(Xi|N)). TestData.xlsx: This Excel file includes the following tabs based on the test data: P(Xi|R), P(Xi|N), NIJ_Recoded, and Test_LnTransformed (log-transformed P(Xi|R)/ P(Xi|N)). Training_LnTransformed.dta: We transform Training_LnTransformed.xlsx to Stata data set. We use Stat/Transfer 13 software package to transfer the file format. StataLog.smcl: This file includes the results of the logistic regression analysis. Both estimated intercept and coefficient estimates in this Stata log correspond to the raw weights and standardized weights in Figure 1. Brier Score_Re-Check.xlsx: This Excel file recalculates Brier scores of Relaxed Naïve Bayes Classifier in Table 3, showing evidence that results displayed in Table 3 are correct. *****Full List***** NaiveBayes_R.xlsx NaiveBayes_NR.xlsx Training_LnTransformed.xlsx TestData.xlsx Training_LnTransformed.dta StataLog.smcl Brier Score_Re-Check.xlsx Data for Weka (Training Set): Bayes_2022_NoID Data for Weka (Test Set): BayesTest_2022_NoID Weka output for machine learning models (Conventional naïve Bayes, AdaBoost, Multilayer Perceptron, Logistic Regression, and Random Forest)

  19. d

    Data from: Multivariate regression model for predicting oxygen reduction...

    • catalog.data.gov
    • data.usgs.gov
    Updated Oct 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Multivariate regression model for predicting oxygen reduction rates in groundwater for the State of Wisconsin [Dataset]. https://catalog.data.gov/dataset/multivariate-regression-model-for-predicting-oxygen-reduction-rates-in-groundwater-for-the
    Explore at:
    Dataset updated
    Oct 8, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Wisconsin
    Description

    A multivariate regression model was developed to predict zero-order oxygen reduction rates (mg/L/yr) in aquifers across the State of Wisconsin. The model used a combination of dissolved oxygen concentrations and mean groundwater ages estimated with sampled age tracers from wells in the U.S. Geological Survey National Water Information System and previously published project reports from state agencies and universities. The multivariate regression model was solved using the Microsoft Excel solver, with 461 wells used for training and 46 wells held-out for validation. A total of 31 predictor variables were used for model development (56 were tested), including basic well characteristics, soil properties, aquifer properties, hydrologic position on the landscape, recharge and evapotranspiration rates, and land use characteristics. Model results indicate that the mean oxygen reduction rate for the training wells is 0.15 mg/L/yr (ranges from 0.07 to 0.59 mg/L/yr), with a root mean weighted square error of 3.13 mg/L/yr and Coefficient of Correlation (r^2) of 0.49 for the holdout validation data. This data release includes the Microsoft Excel file that represents the final solved regression model, as well as an Excel file that describes all of the predictor variables that were tested with the model.

  20. m

    Global Burden of Disease analysis dataset of BMI and CVD outcomes, risk...

    • data.mendeley.com
    Updated Aug 17, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Cundiff (2021). Global Burden of Disease analysis dataset of BMI and CVD outcomes, risk factors, and SAS codes [Dataset]. http://doi.org/10.17632/g6b39zxck4.6
    Explore at:
    Dataset updated
    Aug 17, 2021
    Authors
    David Cundiff
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This formatted dataset originates from raw data files from the Institute of Health Metrics and Evaluation Global Burden of Disease (GBD2017). It is population weighted worldwide data on male and female cohorts ages 15-69 years including body mass index (BMI) and cardiovascular disease (CVD) and associated dietary, metabolic and other risk factors. The purpose of creating this formatted database is to explore the univariate and multiple regression correlations of BMI and CVD and other health outcomes with risk factors. Our research hypothesis is that we can successfully apply artificial intelligence to model BMI and CVD risk factors and health outcomes. We derived a BMI multiple regression risk factor formula that satisfied all nine Bradford Hill causality criteria for epidemiology research. We found that animal products and added fats are negatively correlated with CVD early deaths worldwide but positively correlated with CVD early deaths in high quantities. We interpret this as showing that optimal cardiovascular outcomes come with moderate (not low and not high) intakes of animal foods and added fats.

    For questions, please email davidkcundiff@gmail.com. Thanks.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Sergio Prolo (2024). Panel dataset on Brazilian fuel demand [Dataset]. http://doi.org/10.17632/hzpwbp7j22.1

Panel dataset on Brazilian fuel demand

Explore at:
Dataset updated
Oct 7, 2024
Authors
Sergio Prolo
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered
Brazil
Description

Summary : Fuel demand is shown to be influenced by fuel prices, people's income and motorization rates. We explore the effects of electric vehicle's rates in gasoline demand using this panel dataset.

Files : dataset.csv - Panel dimensions are the Brazilian state ( i ) and year ( t ). The other columns are: gasoline sales per capita (ln_Sg_pc), prices of gasoline (ln_Pg) and ethanol (ln_Pe) and their lags, motorization rates of combustion vehicles (ln_Mi_c) and electric vehicles (ln_Mi_e) and GDP per capita (ln_gdp_pc). All variables are all under the natural log function, since we use this to calculate demand elasticities in a regression model.

adjacency.csv - The adjacency matrix used in interaction with electric vehicles' motorization rates to calculate spatial effects. At first, it follows a binary adjacency formula: for each pair of states i and j, the cell (i, j) is 0 if the states are not adjacent and 1 if they are. Then, each row is normalized to have sum equal to one.

regression.do - Series of Stata commands used to estimate the regression models of our study. dataset.csv must be imported to work, see comment section.

dataset_predictions.xlsx - Based on the estimations from Stata, we use this excel file to make average predictions by year and by state. Also, by including years beyond the last panel sample, we also forecast the model into the future and evaluate the effects of different policies that influence gasoline prices (taxation) and EV motorization rates (electrification). This file is primarily used to create images, but can be used to further understand how the forecasting scenarios are set up.

Sources: Fuel prices and sales: ANP (https://www.gov.br/anp/en/access-information/what-is-anp/what-is-anp) State population, GDP and vehicle fleet: IBGE (https://www.ibge.gov.br/en/home-eng.html?lang=en-GB) State EV fleet: Anfavea (https://anfavea.com.br/en/site/anuarios/)

Search
Clear search
Close search
Google apps
Main menu