50 datasets found
  1. m

    Panel dataset on Brazilian fuel demand

    • data.mendeley.com
    Updated Oct 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sergio Prolo (2024). Panel dataset on Brazilian fuel demand [Dataset]. http://doi.org/10.17632/hzpwbp7j22.1
    Explore at:
    Dataset updated
    Oct 7, 2024
    Authors
    Sergio Prolo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Brazil
    Description

    Summary : Fuel demand is shown to be influenced by fuel prices, people's income and motorization rates. We explore the effects of electric vehicle's rates in gasoline demand using this panel dataset.

    Files : dataset.csv - Panel dimensions are the Brazilian state ( i ) and year ( t ). The other columns are: gasoline sales per capita (ln_Sg_pc), prices of gasoline (ln_Pg) and ethanol (ln_Pe) and their lags, motorization rates of combustion vehicles (ln_Mi_c) and electric vehicles (ln_Mi_e) and GDP per capita (ln_gdp_pc). All variables are all under the natural log function, since we use this to calculate demand elasticities in a regression model.

    adjacency.csv - The adjacency matrix used in interaction with electric vehicles' motorization rates to calculate spatial effects. At first, it follows a binary adjacency formula: for each pair of states i and j, the cell (i, j) is 0 if the states are not adjacent and 1 if they are. Then, each row is normalized to have sum equal to one.

    regression.do - Series of Stata commands used to estimate the regression models of our study. dataset.csv must be imported to work, see comment section.

    dataset_predictions.xlsx - Based on the estimations from Stata, we use this excel file to make average predictions by year and by state. Also, by including years beyond the last panel sample, we also forecast the model into the future and evaluate the effects of different policies that influence gasoline prices (taxation) and EV motorization rates (electrification). This file is primarily used to create images, but can be used to further understand how the forecasting scenarios are set up.

    Sources: Fuel prices and sales: ANP (https://www.gov.br/anp/en/access-information/what-is-anp/what-is-anp) State population, GDP and vehicle fleet: IBGE (https://www.ibge.gov.br/en/home-eng.html?lang=en-GB) State EV fleet: Anfavea (https://anfavea.com.br/en/site/anuarios/)

  2. q

    Linear Regression (Excel) and Cellular Respiration for Biology, Chemistry...

    • qubeshub.org
    Updated Jan 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Irene Corriette; Beatriz Gonzalez; Daniela Kitanska; Henriette Mozsolits; Sheela Vemu (2022). Linear Regression (Excel) and Cellular Respiration for Biology, Chemistry and Mathematics [Dataset]. http://doi.org/10.25334/5PX5-H796
    Explore at:
    Dataset updated
    Jan 11, 2022
    Dataset provided by
    QUBES
    Authors
    Irene Corriette; Beatriz Gonzalez; Daniela Kitanska; Henriette Mozsolits; Sheela Vemu
    Description

    Students typically find linear regression analysis of data sets in a biology classroom challenging. These activities could be used in a Biology, Chemistry, Mathematics, or Statistics course. The collection provides student activity files with Excel instructions and Instructor Activity files with Excel instructions and solutions to problems.

    Students will be able to perform linear regression analysis, find correlation coefficient, create a scatter plot and find the r-square using MS Excel 365. Students will be able to interpret data sets, describe the relationship between biological variables, and predict the value of an output variable based on the input of an predictor variable.

  3. polynomial regression

    • kaggle.com
    Updated Jul 5, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Miraj Deep Bhandari (2023). polynomial regression [Dataset]. http://doi.org/10.34740/kaggle/ds/3482232
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 5, 2023
    Dataset provided by
    Kaggle
    Authors
    Miraj Deep Bhandari
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The Ice Cream Selling dataset is a simple and well-suited dataset for beginners in machine learning who are looking to practice polynomial regression. It consists of two columns: temperature and the corresponding number of units of ice cream sold.

    The dataset captures the relationship between temperature and ice cream sales. It serves as a practical example for understanding and implementing polynomial regression, a powerful technique for modeling nonlinear relationships in data.

    The dataset is designed to be straightforward and easy to work with, making it ideal for beginners. The simplicity of the data allows beginners to focus on the fundamental concepts and steps involved in polynomial regression without overwhelming complexity.

    By using this dataset, beginners can gain hands-on experience in preprocessing the data, splitting it into training and testing sets, selecting an appropriate degree for the polynomial regression model, training the model, and evaluating its performance. They can also explore techniques to address potential challenges such as overfitting.

    With this dataset, beginners can practice making predictions of ice cream sales based on temperature inputs and visualize the polynomial regression curve that represents the relationship between temperature and ice cream sales.

    Overall, the Ice Cream Selling dataset provides an accessible and practical learning resource for beginners to grasp the concepts and techniques of polynomial regression in the context of analyzing ice cream sales data.

  4. m

    Global Burden of Disease analysis dataset of noncommunicable disease...

    • data.mendeley.com
    Updated Apr 6, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Cundiff (2023). Global Burden of Disease analysis dataset of noncommunicable disease outcomes, risk factors, and SAS codes [Dataset]. http://doi.org/10.17632/g6b39zxck4.10
    Explore at:
    Dataset updated
    Apr 6, 2023
    Authors
    David Cundiff
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This formatted dataset (AnalysisDatabaseGBD) originates from raw data files from the Institute of Health Metrics and Evaluation (IHME) Global Burden of Disease Study (GBD2017) affiliated with the University of Washington. We are volunteer collaborators with IHME and not employed by IHME or the University of Washington.

    The population weighted GBD2017 data are on male and female cohorts ages 15-69 years including noncommunicable diseases (NCDs), body mass index (BMI), cardiovascular disease (CVD), and other health outcomes and associated dietary, metabolic, and other risk factors. The purpose of creating this population-weighted, formatted database is to explore the univariate and multiple regression correlations of health outcomes with risk factors. Our research hypothesis is that we can successfully model NCDs, BMI, CVD, and other health outcomes with their attributable risks.

    These Global Burden of disease data relate to the preprint: The EAT-Lancet Commission Planetary Health Diet compared with Institute of Health Metrics and Evaluation Global Burden of Disease Ecological Data Analysis. The data include the following: 1. Analysis database of population weighted GBD2017 data that includes over 40 health risk factors, noncommunicable disease deaths/100k/year of male and female cohorts ages 15-69 years from 195 countries (the primary outcome variable that includes over 100 types of noncommunicable diseases) and over 20 individual noncommunicable diseases (e.g., ischemic heart disease, colon cancer, etc). 2. A text file to import the analysis database into SAS 3. The SAS code to format the analysis database to be used for analytics 4. SAS code for deriving Tables 1, 2, 3 and Supplementary Tables 5 and 6 5. SAS code for deriving the multiple regression formula in Table 4. 6. SAS code for deriving the multiple regression formula in Table 5 7. SAS code for deriving the multiple regression formula in Supplementary Table 7
    8. SAS code for deriving the multiple regression formula in Supplementary Table 8 9. The Excel files that accompanied the above SAS code to produce the tables

    For questions, please email davidkcundiff@gmail.com. Thanks.

  5. Linear Regression

    • kaggle.com
    zip
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fareed Khan (2022). Linear Regression [Dataset]. https://www.kaggle.com/datasets/fareedkhan557/linear-regression
    Explore at:
    zip(14257572 bytes)Available download formats
    Dataset updated
    Jan 29, 2022
    Authors
    Fareed Khan
    Description

    Context

    This dataset is created using the sources from this dataset.

    Content

    Single variable regression model:

    \[ y = mx + c \]

    Both training dataset and testing dataset contain 1 Million rows. 1) x-values are numbers between 1 and 100. 2) y-values are created using this excel function: NORMINV(RAND(), x, 3).

    License: feel free to use

  6. Cleaned NHANES 1988-2018

    • figshare.com
    txt
    Updated Feb 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vy Nguyen; Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet (2025). Cleaned NHANES 1988-2018 [Dataset]. http://doi.org/10.6084/m9.figshare.21743372.v9
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 18, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Vy Nguyen; Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The National Health and Nutrition Examination Survey (NHANES) provides data and have considerable potential to study the health and environmental exposure of the non-institutionalized US population. However, as NHANES data are plagued with multiple inconsistencies, processing these data is required before deriving new insights through large-scale analyses. Thus, we developed a set of curated and unified datasets by merging 614 separate files and harmonizing unrestricted data across NHANES III (1988-1994) and Continuous (1999-2018), totaling 135,310 participants and 5,078 variables. The variables conveydemographics (281 variables),dietary consumption (324 variables),physiological functions (1,040 variables),occupation (61 variables),questionnaires (1444 variables, e.g., physical activity, medical conditions, diabetes, reproductive health, blood pressure and cholesterol, early childhood),medications (29 variables),mortality information linked from the National Death Index (15 variables),survey weights (857 variables),environmental exposure biomarker measurements (598 variables), andchemical comments indicating which measurements are below or above the lower limit of detection (505 variables).csv Data Record: The curated NHANES datasets and the data dictionaries includes 23 .csv files and 1 excel file.The curated NHANES datasets involves 20 .csv formatted files, two for each module with one as the uncleaned version and the other as the cleaned version. The modules are labeled as the following: 1) mortality, 2) dietary, 3) demographics, 4) response, 5) medications, 6) questionnaire, 7) chemicals, 8) occupation, 9) weights, and 10) comments."dictionary_nhanes.csv" is a dictionary that lists the variable name, description, module, category, units, CAS Number, comment use, chemical family, chemical family shortened, number of measurements, and cycles available for all 5,078 variables in NHANES."dictionary_harmonized_categories.csv" contains the harmonized categories for the categorical variables.“dictionary_drug_codes.csv” contains the dictionary for descriptors on the drugs codes.“nhanes_inconsistencies_documentation.xlsx” is an excel file that contains the cleaning documentation, which records all the inconsistencies for all affected variables to help curate each of the NHANES modules.R Data Record: For researchers who want to conduct their analysis in the R programming language, only cleaned NHANES modules and the data dictionaries can be downloaded as a .zip file which include an .RData file and an .R file.“w - nhanes_1988_2018.RData” contains all the aforementioned datasets as R data objects. We make available all R scripts on customized functions that were written to curate the data.“m - nhanes_1988_2018.R” shows how we used the customized functions (i.e. our pipeline) to curate the original NHANES data.Example starter codes: The set of starter code to help users conduct exposome analysis consists of four R markdown files (.Rmd). We recommend going through the tutorials in order.“example_0 - merge_datasets_together.Rmd” demonstrates how to merge the curated NHANES datasets together.“example_1 - account_for_nhanes_design.Rmd” demonstrates how to conduct a linear regression model, a survey-weighted regression model, a Cox proportional hazard model, and a survey-weighted Cox proportional hazard model.“example_2 - calculate_summary_statistics.Rmd” demonstrates how to calculate summary statistics for one variable and multiple variables with and without accounting for the NHANES sampling design.“example_3 - run_multiple_regressions.Rmd” demonstrates how run multiple regression models with and without adjusting for the sampling design.

  7. d

    Data from: Multivariate regression model for predicting oxygen reduction...

    • catalog.data.gov
    • data.usgs.gov
    Updated Oct 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Multivariate regression model for predicting oxygen reduction rates in groundwater for the State of Wisconsin [Dataset]. https://catalog.data.gov/dataset/multivariate-regression-model-for-predicting-oxygen-reduction-rates-in-groundwater-for-the
    Explore at:
    Dataset updated
    Oct 8, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Wisconsin
    Description

    A multivariate regression model was developed to predict zero-order oxygen reduction rates (mg/L/yr) in aquifers across the State of Wisconsin. The model used a combination of dissolved oxygen concentrations and mean groundwater ages estimated with sampled age tracers from wells in the U.S. Geological Survey National Water Information System and previously published project reports from state agencies and universities. The multivariate regression model was solved using the Microsoft Excel solver, with 461 wells used for training and 46 wells held-out for validation. A total of 31 predictor variables were used for model development (56 were tested), including basic well characteristics, soil properties, aquifer properties, hydrologic position on the landscape, recharge and evapotranspiration rates, and land use characteristics. Model results indicate that the mean oxygen reduction rate for the training wells is 0.15 mg/L/yr (ranges from 0.07 to 0.59 mg/L/yr), with a root mean weighted square error of 3.13 mg/L/yr and Coefficient of Correlation (r^2) of 0.49 for the holdout validation data. This data release includes the Microsoft Excel file that represents the final solved regression model, as well as an Excel file that describes all of the predictor variables that were tested with the model.

  8. m

    Data for: A systematic review showed no performance benefit of machine...

    • data.mendeley.com
    • search.datacite.org
    Updated Mar 14, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ben Van Calster (2019). Data for: A systematic review showed no performance benefit of machine learning over logistic regression for clinical prediction models [Dataset]. http://doi.org/10.17632/sypyt6c2mc.1
    Explore at:
    Dataset updated
    Mar 14, 2019
    Authors
    Ben Van Calster
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The uploaded files are:

    1) Excel file containing 6 sheets in respective Order: "Data Extraction" (summarized final data extractions from the three reviewers involved), "Comparison Data" (data related to the comparisons investigated), "Paper level data" (summaries at paper level), "Outcome Event Data" (information with respect to number of events for every outcome investigated within a paper), "Tuning Classification" (data related to the manner of hyperparameter tuning of Machine Learning Algorithms).

    2) R script used for the Analysis (In order to read the data, please: Save "Comparison Data", "Paper level data", "Outcome Event Data" Excel sheets as txt files. In the R script srpap: Refers to the "Paper level data" sheet, srevents: Refers to the "Outcome Event Data" sheet and srcompx: Refers to " Comparison data Sheet".

    3) Supplementary Material: Including Search String, Tables of data, Figures

    4) PRISMA checklist items

  9. m

    Global Burden of Disease analysis dataset of BMI and CVD outcomes, risk...

    • data.mendeley.com
    Updated Aug 17, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Cundiff (2021). Global Burden of Disease analysis dataset of BMI and CVD outcomes, risk factors, and SAS codes [Dataset]. http://doi.org/10.17632/g6b39zxck4.6
    Explore at:
    Dataset updated
    Aug 17, 2021
    Authors
    David Cundiff
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This formatted dataset originates from raw data files from the Institute of Health Metrics and Evaluation Global Burden of Disease (GBD2017). It is population weighted worldwide data on male and female cohorts ages 15-69 years including body mass index (BMI) and cardiovascular disease (CVD) and associated dietary, metabolic and other risk factors. The purpose of creating this formatted database is to explore the univariate and multiple regression correlations of BMI and CVD and other health outcomes with risk factors. Our research hypothesis is that we can successfully apply artificial intelligence to model BMI and CVD risk factors and health outcomes. We derived a BMI multiple regression risk factor formula that satisfied all nine Bradford Hill causality criteria for epidemiology research. We found that animal products and added fats are negatively correlated with CVD early deaths worldwide but positively correlated with CVD early deaths in high quantities. We interpret this as showing that optimal cardiovascular outcomes come with moderate (not low and not high) intakes of animal foods and added fats.

    For questions, please email davidkcundiff@gmail.com. Thanks.

  10. f

    Data from: S1 Dataset -

    • figshare.com
    bin
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tadesse Guangul Alemayehu; Getahun Gebre Bogale; Getaw Walle Bazie (2023). S1 Dataset - [Dataset]. http://doi.org/10.1371/journal.pone.0286400.s001
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Tadesse Guangul Alemayehu; Getahun Gebre Bogale; Getaw Walle Bazie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundIonizing radiation is being used more frequently in medicine, which has been linked to recognized biological effects such as cancer and mortality. Radiology services are becoming more widely available in Ethiopian health facilities but there is no compiled record of worker’s radiation dose. So, assessing the magnitude and identifying the associated factors of occupational radiation exposure dose among radiology personnel help to design strategies for radiation protection.ObjectiveThe study was designed to assess the occupational radiation exposure dose and associated factors among radiology personnel in eastern Amhara, northeast Ethiopia, 2021.MethodsCross-sectional study was conducted from March 25 to April 30, 2021, in 57 health institutions among 198 radiology personnel. The study comprised all eligible radiology personnel. The data were collected using an electronic-based (Google form) self-administered questionnaire, and document review. The data were entered into an excel spread sheet and then, exported to Stata 14 software. Linear regression model was used to analyse the data after checking its assumptions. Variables with a p-value < 0.25 were entered into a multiple linear regression analysis, and those with a p-value < 0.05 were judged significant. VIF was used to check for multi-collinearity. Coefficient of determination was used to check the model fitness.ResultsThe mean (± SD) annual shallow and deep dose equivalents of radiology personnel were 1.20 (± 0.75) and 1.02 (± 0.70) mSv, respectively. Body mass index (β = 0.104, 95% CI: 0.07, 0.14), practice of timing (β = -0.43, 95% CI: -0.73, -0.13), working experience (β = -0.04, 95% CI: -0.048, -0.032), and practice of distancing (β = -0.26, 95% CI: -0.49, -0.17) were found to be statistically significant factors of annual deep dose equivalent. In addition, body mass index (β = 0.113, 95% CI: 0.08, 0.15), practice of timing (β = -0.62 95% CI: -0.93, -0.31) and, working experience (β = -0.044, 95% CI: -0.053, -0.036 had statistically significant associations with annual shallow dose equivalent.ConclusionThe annual dose equivalents were two times higher than the global average of annual per caput effective dose due to medical exposure. Body mass index, practice of timing, working experience, and practice of distancing were factors of occupational radiation exposure dose. Strategies focusing on increasing the skill, experience, and lifestyle of radiology personnel would be supreme important means to reduce occupational radiation exposure dose.

  11. m

    Mathematica software for modeling long-term tree growth curves indirectly,...

    • data.mendeley.com
    Updated Nov 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Ricker (2025). Mathematica software for modeling long-term tree growth curves indirectly, when tree ages are unknown [Dataset]. http://doi.org/10.17632/xrw59yvvfp.6
    Explore at:
    Dataset updated
    Nov 24, 2025
    Authors
    Martin Ricker
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Mathematica notebook "Long-term growth curves of trees with unknown ages (Ricker Nov 2025).nb" carries out all calculations that are developed in the article "Modeling long-term tree growth curves indirectly with piecewise linear regression and explaining factors, when tree ages are unknown", by Ricker et al. (2025) in "Environmental and Ecological Statistics". The two Excel files are two different input files. Please see the instruction file for details of how to use the notebook. For just inspecting the code, without the Mathematica software, the notebook and output files are provided as PDF files.

  12. f

    Data_Sheet_1_Comparative analysis of machine learning algorithms for...

    • frontiersin.figshare.com
    docx
    Updated Feb 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdulsalam Elnaeem Balila; Ani Bin Shabri (2024). Data_Sheet_1_Comparative analysis of machine learning algorithms for predicting Dubai property prices.docx [Dataset]. http://doi.org/10.3389/fams.2024.1327376.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Feb 13, 2024
    Dataset provided by
    Frontiers
    Authors
    Abdulsalam Elnaeem Balila; Ani Bin Shabri
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Dubai
    Description

    IntroductionPredicting property prices is a crucial task in the real estate market, and machine learning algorithms offer valuable tools for accurate predictions. In this study, we introduce a comprehensive comparison of eight well-known machine learning algorithms, namely, ensemble empirical mode decomposition (EEMD)–stochastic (S) + deterministic (D)–support vector machine (EEMD-SD-SVM), support vector machine (SVM), gradient boosting, random forest, K-nearest neighbors (KNN), linear regression, artificial neural networks (ANN), and decision trees. The focus is on predicting property prices in Dubai, with the primary objective of assessing the predictive performance of these algorithms within this specific market context.MethodsThe evaluation is based on four key performance metrics: R-squared (R2), mean squared error (MSE), root mean squared error (RMSE), and mean absolute percentage error (MAPE). These metrics provide insights into prediction errors, accuracy in percentage terms, and the proportion of variance in property prices explained by independent variables. The study compares the strengths and limitations of each algorithm for predicting property prices in Dubai, highlighting scenarios where certain algorithms excel based on the nature of decision boundaries, handling complex data, capturing localized patterns, and offering interpretability.ResultsFindings from the comparative analysis shed light on the performance of each algorithm in predicting property prices in Dubai. EEMD-SD-SVM and SVM excel in scenarios requiring precise decision boundaries, while gradient boosting and random forests demonstrate robust performance with complex and noisy property price data. KNN captures localized patterns effectively, linear regression is suitable for straightforward regression tasks, ANN excels with extensive datasets, and decision trees offer interpretability in understanding factors influencing property prices.DiscussionThe study emphasizes the significance of model tuning, feature selection, and data pre-processing to enhance predictive power. Additionally, practical aspects such as computational efficiency, model interpretability, and scalability in real-world applications are discussed. The comparative analysis provides valuable guidance for stakeholders, including real estate professionals, data scientists, and stakeholders interested in selecting the most suitable machine learning algorithm for predicting property prices in Dubai, with a focus on the essential evaluation metrics of MSE, RMSE, MAPE, and R2. This study offers insights into the applicability and performance of different machine learning algorithms for predicting property prices in Dubai. Stakeholders such as real estate agents, buyers, sellers, or investors can leverage these insights to make informed decisions in the Dubai real estate market.

  13. f

    S1 File - Analysis of spatial heterogeneity in Xi'an's urban heat island...

    • figshare.com
    zip
    Updated Oct 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuan Meng; Qian Luo; Boyu Bai; Yonghao Li; Jialin Lu; Juan Ren (2025). S1 File - Analysis of spatial heterogeneity in Xi'an's urban heat island effect using multi-source data fusion [Dataset]. http://doi.org/10.1371/journal.pone.0332885.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 17, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Yuan Meng; Qian Luo; Boyu Bai; Yonghao Li; Jialin Lu; Juan Ren
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Xi'An
    Description

    S1 Data. fin_data.csv. Original dataset of independent and dependent variables used in the analysis. S2 Data. fin_data.xlsx. Same dataset as S1, provided in Excel format for reproducibility. S3 Data. fin_data_2.csv and fin_data_2.xls. Extended dataset 2, including supplementary indicators for model validation. S4 Data. fin_data_3.csv. Extended dataset 3, providing additional data points for robustness checks. S5 Data. segmentation_result.csv. Semantic segmentation outputs of street view images, including derived indices such as Green View Index (GVI) and Sky View Index (SVI). S6 Text. OLS.pdf. Supplementary results of the Ordinary Least Squares (OLS) regression analysis, including diagnostic plots and summary tables. S7 Data. GWR folder. Geographically Weighted Regression (GWR) model outputs and related spatial results. S8 Data. MGWR folder. Multiscale Geographically Weighted Regression (MGWR) model outputs and coefficient estimation results. (ZIP)

  14. m

    Data and Genetic Programming Expression for Marshall parameters of base and...

    • data.mendeley.com
    Updated Oct 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adnan Khan (2025). Data and Genetic Programming Expression for Marshall parameters of base and wearing course [Dataset]. http://doi.org/10.17632/f93jchtzy3.4
    Explore at:
    Dataset updated
    Oct 23, 2025
    Authors
    Adnan Khan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset was collected from five different road projects in Pakistan. It contains genetic programming expressions for Marshall parameters, including Stability, Flow, and Air Voids for both the base and wearing courses. Additionally, the Excel sheets provide comparisons of Support Vector Regression, Genetic Programming, Multiple Linear Regression, and Non-Linear Regression models. The Python file is also available for the base and wearing course parameters using GP expressions and SVR, along with 10-fold cross-validation and SHAP analysis

  15. Legionella pneumophila inactivation results using germicidal UV-LEDs

    • catalog.data.gov
    • s.cnmilf.com
    Updated Sep 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2024). Legionella pneumophila inactivation results using germicidal UV-LEDs [Dataset]. https://catalog.data.gov/dataset/legionella-pneumophila-inactivation-results-using-germicidal-uv-leds
    Explore at:
    Dataset updated
    Sep 22, 2024
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    These data include all the raw data in three Excel files. The first Excel file entitled "UV-LED Data" contains several datasheets was prepared separately for each test run with three different strains of Legionella pneumophila. The second and third files "Linear Regression Data" and "ANCOVA Data" were prepared for each strain of L. pneumophila to calculate statistics such as regression analysis and analysis of covariance, respectively. This dataset is associated with the following publication: Oh, Y., J. Sangsanont, H. Woo, L. Boczek, K. Linden, and H. Ryu. Inactivation efficacy and mechanisms of wavelength-specific UV sources for various strains of Legionella pneumophila serogroup 1. SCIENCE OF THE TOTAL ENVIRONMENT. Elsevier BV, AMSTERDAM, NETHERLANDS, 907: 167781, (2024).

  16. Air temperature and near-surface meteorology datasets on three Swiss...

    • data.europa.eu
    unknown
    Updated Nov 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2023). Air temperature and near-surface meteorology datasets on three Swiss glaciers - Extreme 2022 Summer [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-8277285?locale=ga
    Explore at:
    unknown(2595)Available download formats
    Dataset updated
    Nov 11, 2023
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    Description

    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% GLACIER METEOROLOGICAL DATA SWISS ALPS -2022%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%Data gathered and structured by Thomas Shaw (WSL, Switzerland (until Oct 2022)). On and off-glacier meteorological data were gathered and analysed as part of a Marie-Curie project 'TEMPEST' (tempestglacier.com).The dataset consists of hourly low-cost AWS (Davis Vantage Pro2) and simple temperature ('T-')logger (Onset TidBitv2) sensor records on three glaciers in the Swiss Alps (Canton Valais). The glaciers are:Haut Glacier d'Arolla (45.967°N, 7.526°E)Glacier d'Otemma (45.956°N, 7.454°E)Glacier du Corbassière (45.975°N, 7.303°E) Data are provided in individual Excel files per glacier that contain all hourly data for the sub-period of comparison (11 August-18 September, 2022).Data are quality controlled and checked for obvious errors. Any uncertain values are set to NaN.Air temperature data at 'T-Logger' stations were corrected for heating errors using the comparison of measurements in artificially (AWS) and naturally ventilated (T-Logger) radiation shields on Arolla and Corbassiere glaciers.A multiple linear regression model was applied to estimate these differences at all T-Loggers on all glaciers as a function of incoming shortwave radiation (MeteoSwiss station-derived) and wind speed (measured at AWS). Each Excel file contains a 'META' tab for simple metadata related to station locations (latitude 'LAT' (°), longitude 'LON' (°), elevation 'ELE' (m a.s.l.) and flowpath length 'FPL' (m)) and a 'DATA' tab for the hourly data. Suffixes to the station names in each column provide the variable measured at that site:'TA' - 2m air temperature (°C)'TA_Hi' - Maximum air temperature for timestep (°C)'TA_Lo' - Minimum air temperature for timestep (°C)'RH' - 2m relative humiditiy (%)'FF' - Wind speed (m s^-1)'FF_Hi' - Maximum wind speed for timestep (m s^-1)'FF_Lo' - Minimum wind speed for timestep (m s^-1)'DIR' - Wind direction (°)'DEW' - Dewpoint temperature (°C)'PRESS' - Air pressure (mbar)'CHILL' - Calculated wind chill temperature (°C)'Heat_idx' - Calculated heat index (°C)'THSW' - A calculated index that uses humidity and temperature like for the Heat Index, but also includes the heating effects of sunshine and the cooling effects of wind (like Wind Chill) to calculate an apparent temperature of what it "feels" like out in the shade Wind speeds and direction measured at off-glacier sites 'OG' are for the lower off-glacier station ('OG_Low'). ------------------------- This work was funded by the EU Horizon 2020 Marie Skłodowska-Curie Actions Grant 101026058.

  17. appendix 1 Participants profiles.docx

    • figshare.com
    • resodate.org
    docx
    Updated Sep 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    doaa tahoon (2022). appendix 1 Participants profiles.docx [Dataset]. http://doi.org/10.6084/m9.figshare.20935768.v2
    Explore at:
    docxAvailable download formats
    Dataset updated
    Sep 5, 2022
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    doaa tahoon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    all particapants' profiles, urveying forms, and all data on hazards rates and social vulnerability factors. Both excel sheets and SPSS files are provided.

  18. Random Data Analysis and Linear Regression

    • kaggle.com
    zip
    Updated Nov 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tommaso Ruzza (2025). Random Data Analysis and Linear Regression [Dataset]. https://www.kaggle.com/datasets/tommasoruzza/random-data-analysis-and-linear-regression
    Explore at:
    zip(82141 bytes)Available download formats
    Dataset updated
    Nov 5, 2025
    Authors
    Tommaso Ruzza
    Description

    Created a multi-tab Excel statistical project where I generated synthetic normally-distributed data, built random sample extraction logic, calculated descriptive and inferential statistics, analysed variable correlations and performed linear regression with visualisation.

  19. s

    Data from: Comparing Machine Learning Classifiers And Linear/Logistic...

    • eprints.soton.ac.uk
    • data.niaid.nih.gov
    Updated Jun 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Miguel-Hurtado, Oscar; Guest, Richard; Stevenage, Sarah; Neil, Gregory; Black, Sue (2024). Comparing Machine Learning Classifiers And Linear/Logistic Regression To Explore The Relationship Between Hand Dimensions And Demographic Characteristics [Dataset]. http://doi.org/10.5281/zenodo.17487
    Explore at:
    Dataset updated
    Jun 6, 2024
    Dataset provided by
    Zenodo
    Authors
    Miguel-Hurtado, Oscar; Guest, Richard; Stevenage, Sarah; Neil, Gregory; Black, Sue
    Description

    Data for "Comparing Machine Learning Classifiers and Linear/Logistic Regression to Explore the Relationship between Hand Dimensions and Demographic Characteristics" (PLOSONE) Oscar Miguel-Hurtado 1, Richard Guest 1, Sarah V. Stevenage2,Greg J. Neil 2, Sue Black 3 1 School of Engineering and Digital Arts, University of Kent, Canterbury, UK 2 Department of Psychology, University of Southampton, Southampton, UK 3 Centre for Anatomy and Human Identification, University of Dundee, Dundee, UK For more information please contact: O.Miguel-Hurtado-98@kent.ac.uk (Oscar Miguel) The zip contains right and left hand geometry images from 112 participants. The images were captured using a Nikon D200 SLR camera (format: jpg, size: 3504x2336 pixels), with both the palm of the hand and camera facing downwards. Participants placed each hand on an acetate sheet with a series of positioning pegs. The excel contains a series of length measurements (based on the underlying skeleton of the hand) manually extracted (see Figure 1 for details) along with demographic information from the participants: sex (male or female), height (in cm), weight (in kg) and foot size (in UK sizes).

  20. f

    Ferroxidase assay data on Rru_EncFtn and Rru_Enc:EncFtn proteins

    • figshare.com
    xlsx
    Updated Sep 22, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cecilia Piergentili (2020). Ferroxidase assay data on Rru_EncFtn and Rru_Enc:EncFtn proteins [Dataset]. http://doi.org/10.6084/m9.figshare.9885575.v2
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Sep 22, 2020
    Dataset provided by
    figshare
    Authors
    Cecilia Piergentili
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    1st and 2nd Excel tabs:Ferroxidase activity of Rru_A0973 (with or without Zn(II) and A0974/A0973 encapsulated ferritins.Ferroxidase assays (average, standard deviation, n >=3) performed on on Rru_A0974/A0973 (wild-type and E31A, E34A, W38A variants) and Rru_A0973 (wild-type and E31A, E34A, W38A, E31A/E34A) proteins in the presence of 50 uM Fe(II).A0973 concentration was kept at 10 uM in both data sets.Protein sample was incubated with 50 μM FeSO4.7H2O (10 times molar equivalent Fe(II) per FOC) and progress curves of the oxidation of Fe(II) to Fe(III) was monitored at 315 nm at room temperature. The background oxidation of iron at 50 μM in enzyme-free control is shown for reference (dotted grey line). Solid lines represent the average (n = 3) of technical replicates, shaded areas represent standard deviation from the mean. Protein and iron samples were prepared anaerobically in Buffer H (10 mM HEPES pH 8.0, 150 mM NaCl), and 0.1 % (v/v) HCl, respectively.3rd Excel tab:Linear Regression on first 200s of ferroxidase assays with Rru_A0973 and Rru_A0974/A0973. Data from first 200 s of the assays described above were processed using the Linear Regression tool in GraphPad (Prism8) to determine v0, initial enzymatic rate, from slope of each of the linear curves. Variants initial enzymatic rate (v0variant) calculated from linear regression curves in the first 200 s of ferroxidase assay experiments were divided by the slope calculated for the corresponding wild-type species (v0wild-type). 4th Excel tabLinear Regression on first 10 s of ferroxidase assays with A0973_E31A and A0974/A0973_E31A.Data from first 10 s of the assays carried out with 10 μM Rru_A0973_E31A or Rru_A0974/A0973_E31A were processed using the Linear Regression tool in GraphPad (Prism8) to determine v0, initial enzymatic rate, from slope of each of the linear curves.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Sergio Prolo (2024). Panel dataset on Brazilian fuel demand [Dataset]. http://doi.org/10.17632/hzpwbp7j22.1

Panel dataset on Brazilian fuel demand

Explore at:
Dataset updated
Oct 7, 2024
Authors
Sergio Prolo
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered
Brazil
Description

Summary : Fuel demand is shown to be influenced by fuel prices, people's income and motorization rates. We explore the effects of electric vehicle's rates in gasoline demand using this panel dataset.

Files : dataset.csv - Panel dimensions are the Brazilian state ( i ) and year ( t ). The other columns are: gasoline sales per capita (ln_Sg_pc), prices of gasoline (ln_Pg) and ethanol (ln_Pe) and their lags, motorization rates of combustion vehicles (ln_Mi_c) and electric vehicles (ln_Mi_e) and GDP per capita (ln_gdp_pc). All variables are all under the natural log function, since we use this to calculate demand elasticities in a regression model.

adjacency.csv - The adjacency matrix used in interaction with electric vehicles' motorization rates to calculate spatial effects. At first, it follows a binary adjacency formula: for each pair of states i and j, the cell (i, j) is 0 if the states are not adjacent and 1 if they are. Then, each row is normalized to have sum equal to one.

regression.do - Series of Stata commands used to estimate the regression models of our study. dataset.csv must be imported to work, see comment section.

dataset_predictions.xlsx - Based on the estimations from Stata, we use this excel file to make average predictions by year and by state. Also, by including years beyond the last panel sample, we also forecast the model into the future and evaluate the effects of different policies that influence gasoline prices (taxation) and EV motorization rates (electrification). This file is primarily used to create images, but can be used to further understand how the forecasting scenarios are set up.

Sources: Fuel prices and sales: ANP (https://www.gov.br/anp/en/access-information/what-is-anp/what-is-anp) State population, GDP and vehicle fleet: IBGE (https://www.ibge.gov.br/en/home-eng.html?lang=en-GB) State EV fleet: Anfavea (https://anfavea.com.br/en/site/anuarios/)

Search
Clear search
Close search
Google apps
Main menu