100+ datasets found
  1. V

    Variable Data Printing (VDP) Software Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Jan 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Variable Data Printing (VDP) Software Report [Dataset]. https://www.datainsightsmarket.com/reports/variable-data-printing-vdp-software-1946900
    Explore at:
    ppt, doc, pdfAvailable download formats
    Dataset updated
    Jan 30, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The size of the Variable Data Printing (VDP) Software market was valued at USD XXX million in 2024 and is projected to reach USD XXX million by 2033, with an expected CAGR of XX% during the forecast period.

  2. G

    Variable Data Shrink Sleeve Printing Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Oct 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Variable Data Shrink Sleeve Printing Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/variable-data-shrink-sleeve-printing-market
    Explore at:
    pptx, pdf, csvAvailable download formats
    Dataset updated
    Oct 3, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Variable Data Shrink Sleeve Printing Market Outlook



    According to our latest research, the global Variable Data Shrink Sleeve Printing market size reached USD 1.87 billion in 2024, demonstrating robust expansion driven by the increasing demand for personalized packaging solutions across various industries. The market is expected to grow at a CAGR of 7.1% from 2025 to 2033, projecting a market value of approximately USD 3.49 billion by 2033. This growth is primarily fueled by advancements in digital printing technologies, the rising trend of product customization, and stringent regulations regarding packaging authenticity and traceability.




    The surge in demand for unique and personalized packaging is one of the key growth factors propelling the Variable Data Shrink Sleeve Printing market. As brands and manufacturers strive to differentiate their products on crowded shelves, the ability to incorporate variable data such as barcodes, QR codes, serialized numbers, and customized graphics has become crucial. This trend is particularly prominent in the food and beverage sector, where consumer engagement and anti-counterfeiting measures are vital. The flexibility offered by variable data printing enables brands to launch limited edition products, regional campaigns, and promotional activities, thus enhancing consumer interaction and brand loyalty.




    Technological advancements in printing methods have significantly contributed to the market's upward trajectory. The integration of digital printing technology has revolutionized the shrink sleeve printing process, enabling high-speed, cost-effective, and high-quality production of short runs and complex designs. Flexographic and gravure printing also continue to evolve, offering improved color accuracy and substrate versatility. These innovations have made it easier for manufacturers to respond quickly to market trends and regulatory requirements, while reducing waste and operational costs. As a result, the adoption of variable data shrink sleeve printing is expanding across industries that require agility and precision in their packaging operations.




    Another major growth driver is the increasing emphasis on regulatory compliance and product security. Governments and industry bodies worldwide are implementing stricter regulations to combat counterfeiting and ensure product authenticity, especially in sensitive sectors such as pharmaceuticals and personal care. Variable data printing allows for the integration of tamper-evident features and traceability elements directly onto shrink sleeves, providing a robust solution to meet these compliance standards. Moreover, the rise of e-commerce and global supply chains has further heightened the need for secure and trackable packaging, reinforcing the role of variable data shrink sleeve printing in modern packaging strategies.




    Regionally, the Asia Pacific market stands out as a major contributor to global growth, supported by rapid industrialization, expanding retail sectors, and a burgeoning middle-class population. North America and Europe also exhibit strong demand, driven by advanced manufacturing infrastructure and a high focus on product innovation. Meanwhile, emerging markets in Latin America and the Middle East & Africa are witnessing increasing adoption, albeit at a relatively slower pace, as local brands recognize the value of sophisticated packaging in enhancing brand image and consumer trust.





    Printing Technology Analysis



    The printing technology segment of the Variable Data Shrink Sleeve Printing market encompasses digital printing, flexographic printing, gravure printing, offset printing, and other emerging technologies. Digital printing has emerged as the fastest-growing sub-segment, owing to its unparalleled ability to deliver high-quality, customizable prints with minimal setup time. The technology’s capacity for on-demand printing and short production runs makes it ideal for brands seeking to implement targeted marketing campaigns or comply with regi

  3. f

    The simulation results of the setting .

    • figshare.com
    xls
    Updated Jun 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yahui Lu; Aiyi Liu; Tao Jiang (2025). The simulation results of the setting . [Dataset]. http://doi.org/10.1371/journal.pone.0322937.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Yahui Lu; Aiyi Liu; Tao Jiang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In many research fields, measurement data containing too many zeros are often called semicontinuous data. For semicontinuous data, the most common method is the two-part model, which establishes the corresponding regression model for both the zero-valued part and the nonzero-valued part. Considering that each part of the two-part regression model often encounters a large number of candidate variables, the variable selection becomes an important problem in semicontinuous data analysis. However, there is little research literature on this topic. To bridge this gap, we propose a new type of variable selection methods for the two-part regression model. In this paper, the Bernoulli-Normal two-part (BNT) regression model is presented, and a variable selection method based on Lasso penalty function is proposed. To solve the problem that Lasso estimator does not have Oracle attribute, we then propose a variable selection method based on adaptive Lasso penalty function. The simulation results show that both methods can select variables for BNT regression model and are easy to implement, and the performance of adaptive Lasso method is superior to the Lasso method. We demonstrate the effectiveness of the proposed tools using dietary intake data to further analyze the important factors affecting dietary intake of patients.

  4. n

    Data from: WiBB: An integrated method for quantifying the relative...

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Aug 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qin Li; Xiaojun Kou (2021). WiBB: An integrated method for quantifying the relative importance of predictive variables [Dataset]. http://doi.org/10.5061/dryad.xsj3tx9g1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 20, 2021
    Dataset provided by
    Field Museum of Natural History
    Beijing Normal University
    Authors
    Qin Li; Xiaojun Kou
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    This dataset contains simulated datasets, empirical data, and R scripts described in the paper: “Li, Q. and Kou, X. (2021) WiBB: An integrated method for quantifying the relative importance of predictive variables. Ecography (DOI: 10.1111/ecog.05651)”.

    A fundamental goal of scientific research is to identify the underlying variables that govern crucial processes of a system. Here we proposed a new index, WiBB, which integrates the merits of several existing methods: a model-weighting method from information theory (Wi), a standardized regression coefficient method measured by ß* (B), and bootstrap resampling technique (B). We applied the WiBB in simulated datasets with known correlation structures, for both linear models (LM) and generalized linear models (GLM), to evaluate its performance. We also applied two other methods, relative sum of wight (SWi), and standardized beta (ß*), to evaluate their performance in comparison with the WiBB method on ranking predictor importances under various scenarios. We also applied it to an empirical dataset in a plant genus Mimulus to select bioclimatic predictors of species’ presence across the landscape. Results in the simulated datasets showed that the WiBB method outperformed the ß* and SWi methods in scenarios with small and large sample sizes, respectively, and that the bootstrap resampling technique significantly improved the discriminant ability. When testing WiBB in the empirical dataset with GLM, it sensibly identified four important predictors with high credibility out of six candidates in modeling geographical distributions of 71 Mimulus species. This integrated index has great advantages in evaluating predictor importance and hence reducing the dimensionality of data, without losing interpretive power. The simplicity of calculation of the new metric over more sophisticated statistical procedures, makes it a handy method in the statistical toolbox.

    Methods To simulate independent datasets (size = 1000), we adopted Galipaud et al.’s approach (2014) with custom modifications of the data.simulation function, which used the multiple normal distribution function rmvnorm in R package mvtnorm(v1.0-5, Genz et al. 2016). Each dataset was simulated with a preset correlation structure between a response variable (y) and four predictors(x1, x2, x3, x4). The first three (genuine) predictors were set to be strongly, moderately, and weakly correlated with the response variable, respectively (denoted by large, medium, small Pearson correlation coefficients, r), while the correlation between the response and the last (spurious) predictor was set to be zero. We simulated datasets with three levels of differences of correlation coefficients of consecutive predictors, where ∆r = 0.1, 0.2, 0.3, respectively. These three levels of ∆r resulted in three correlation structures between the response and four predictors: (0.3, 0.2, 0.1, 0.0), (0.6, 0.4, 0.2, 0.0), and (0.8, 0.6, 0.3, 0.0), respectively. We repeated the simulation procedure 200 times for each of three preset correlation structures (600 datasets in total), for LM fitting later. For GLM fitting, we modified the simulation procedures with additional steps, in which we converted the continuous response into binary data O (e.g., occurrence data having 0 for absence and 1 for presence). We tested the WiBB method, along with two other methods, relative sum of wight (SWi), and standardized beta (ß*), to evaluate the ability to correctly rank predictor importances under various scenarios. The empirical dataset of 71 Mimulus species was collected by their occurrence coordinates and correponding values extracted from climatic layers from WorldClim dataset (www.worldclim.org), and we applied the WiBB method to infer important predictors for their geographical distributions.

  5. r

    QoG Social Policy Dataset - The QoG Social Policy Cross-Section Data

    • demo.researchdata.se
    Updated Feb 13, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jan Teorell; Richard Svensson; Marcus Samanni; Staffan Kumlin; Stefan Dahlberg; Bo Rothstein; Sören Holmberg (2020). QoG Social Policy Dataset - The QoG Social Policy Cross-Section Data [Dataset]. https://demo.researchdata.se/en/catalogue/dataset/ext0004-1
    Explore at:
    Dataset updated
    Feb 13, 2020
    Dataset provided by
    University of Gothenburg
    Authors
    Jan Teorell; Richard Svensson; Marcus Samanni; Staffan Kumlin; Stefan Dahlberg; Bo Rothstein; Sören Holmberg
    Time period covered
    2002
    Area covered
    Estonia, Mexico, Slovakia, United Kingdom, Iceland, Spain, Bulgaria, Romania, Malta, New Caledonia
    Description

    The QoG Institute is an independent research institute within the Department of Political Science at the University of Gothenburg. Overall 30 researchers conduct and promote research on the causes, consequences and nature of Good Governance and the Quality of Government - that is, trustworthy, reliable, impartial, uncorrupted and competent government institutions.

    The main objective of our research is to address the theoretical and empirical problem of how political institutions of high quality can be created and maintained. A second objective is to study the effects of Quality of Government on a number of policy areas, such as health, the environment, social policy, and poverty.

    The dataset was created as part of a research project titled “Quality of Government and the Conditions for Sustainable Social Policy”. The aim of the dataset is to promote cross-national comparative research on social policy output and its correlates, with a special focus on the connection between social policy and Quality of Government (QoG).

    The data comes in three versions: one cross-sectional dataset, and two cross-sectional time-series datasets for a selection of countries. The two combined datasets are called “long” (year 1946-2009) and “wide” (year 1970-2005).

    The data contains six types of variables, each provided under its own heading in the codebook: Social policy variables, Tax system variables, Social Conditions, Public opinion data, Political indicators, Quality of government variables.

    QoG Social Policy Dataset can be downloaded from the Data Archive of the QoG Institute at http://qog.pol.gu.se/data/datadownloads/data-archive Its variables are now included in QoG Standard.

    Purpose:

    The primary aim of QoG is to conduct and promote research on corruption. One aim of the QoG Institute is to make publicly available cross-national comparative data on QoG and its correlates. The aim of the QoG Social Policy Dataset is to promote cross-national comparative research on social policy output and its correlates, with a special focus on the connection between social policy and Quality of Government (QoG).

    A cross-section dataset based on data from and around 2002 of QoG Social Policy-dataset. If there was no data for 2002 on a variable, data from the year closest year available have been used, however not further back in time than 1995.

    Samanni, Marcus. Jan Teorell, Staffan Kumlin, Stefan Dahlberg, Bo Rothstein, Sören Holmberg & Richard Svensson. 2012. The QoG Social Policy Dataset, version 4Apr12. University of Gothenburg:The Quality of Government Institute. http://www.qog.pol.gu.se

  6. f

    Data from: The use of a variable representing compliance improves accuracy...

    • tandf.figshare.com
    • figshare.com
    png
    Updated Jun 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    L. Guizzaro; F. Petavy; R. Ristl; C. Gallo (2023). The use of a variable representing compliance improves accuracy of estimation of the effect of treatment allocation regardless of discontinuation in trials with incomplete follow-up [Dataset]. http://doi.org/10.6084/m9.figshare.11914380.v1
    Explore at:
    pngAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    L. Guizzaro; F. Petavy; R. Ristl; C. Gallo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In Clinical Trials, not all randomised patients follow the course of treatment they are allocated to. The potential impact of such deviations is increasingly recognised, and it has been one of the reasons for a redefinition of the targets of estimation (“Estimands”) in the ICH E9 draft Addendum. Among others, the effect of treatment assignment, regardless of the adherence, appears a Estimand of practical interest, in line with the intention-to-treat principle. This study aims at evaluating the performance of different estimation techniques in trials with incomplete post-discontinuation follow-up when a 'treatment-policy' strategy is implemented.. In order to achieve that, we have (i) modelled and visualised as directed acyclic diagram a reasonable data-generating model; (ii) investigated which set of variables allows identification and estimation of such effect; (iii) simulated 10000 trials in Major Depressive Disorder, with varying real treatment effects, proportions of patients discontinuing the treatment, and incomplete follow-up. Our results suggest that, at least in a 'missing at random (MAR)' setting, all studied estimation methods increase their performance when a variable representing compliance is used. This effect is more pronounced the higher the proportion of post-discontinuation follow-up is.

  7. Parameter definition for each country.

    • plos.figshare.com
    xls
    Updated Jan 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arles Rodríguez; Mercedes Gaitán-Angulo; Melva Inés Gómez-Caicedo; Paula Robayo-Acuña; Iván Ricardo Ruíz-Castro (2025). Parameter definition for each country. [Dataset]. http://doi.org/10.1371/journal.pone.0313756.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 7, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Arles Rodríguez; Mercedes Gaitán-Angulo; Melva Inés Gómez-Caicedo; Paula Robayo-Acuña; Iván Ricardo Ruíz-Castro
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This article discusses the dynamics of innovation in America and Europe, focusing on variables such as access to technology, education, and life expectancy. To do this, the article proposes an agent-based model called the Innovameter. The dependent variable is the Global Innovation Index. The paper focuses on data analysis through correlation analysis and multiple hierarchical regressions to determine the contribution of specific variables related to the pillars of the Global Innovation Index and indicators of the Human Development Index. After analyzing the data, an agent-based model was built to parameterize these main variables by defining two levels of abstraction: at the global level, there is the country, where birth rates, life expectancy, ICT use, and research and development are defined. At the local level, we define the individuals who have an age, years of schooling, and income. A series of experiments were conducted by selecting data from 30 countries. From the results of the experiments, a nonparametric correlation analysis was performed, and correlation indices were obtained indicating a relationship between the predicted outcomes and the outcomes in the global index. The proposed model aims to provide suggestions on how the different variables can become the norm in most of the countries studied.

  8. d

    TigerRAY Moored Deployment Data

    • catalog.data.gov
    • mhkdr.openei.org
    • +1more
    Updated Jun 4, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Washington Applied Physics Lab (2025). TigerRAY Moored Deployment Data [Dataset]. https://catalog.data.gov/dataset/tigerray-moored-deployment-data
    Explore at:
    Dataset updated
    Jun 4, 2025
    Dataset provided by
    University of Washington Applied Physics Lab
    Description

    This respiratory contains TigerRAY moored deployment data for each day in which data was collected between January 10, 2024 and March 3, 2024. For sensors on and inside TigerRAY, there is one .mat file for each day in which TigerRAY operated and collected data. These files are labeled as: "DDMMMYYYY_TigerRAYdata.mat" corresponding to the date collected. These .mat files contain a single variable, data, formatted into: - time stamps and load cell readings from the heave plate - time stamps and data from the two heave plate mounted pressure sensors - structure containing data from the heave plate mounted IMU - data collected by the central data acquisition system in the nacelle - timestamps and data from encoder 1 - timestamps and data from encoder 2 - structure containing data from the nacelle mounted IMU - data from the satellite compass mounted to the mast of the nacelle For SWIFT data, there is one data file that contains all reprocessed SWIFT data for the entire deployment. This repository contains three structures, named SWIFT22_rp, SWIFT23_rp, and SWIFT24_rp. Reprocessing of the data was done to remove frequency components in the wave spectra with frequencies < 0.2 Hz. The remaining energy is distributed between 0.2 Hz and 1 Hz. New significant wave height, peak period, energy period, and peak direction were then calculated from these trimmed energy spectra. See attached data guide for a complete summary of data included in this submission, description of the data products (TigerRAY and SWIFT data), and deployment setup information and figures.

  9. 2

    NCDS7

    • datacatalogue.ukdataservice.ac.uk
    Updated Apr 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of London, Institute of Education, Centre for Longitudinal Studies (2024). NCDS7 [Dataset]. http://doi.org/10.5255/UKDA-SN-5579-1
    Explore at:
    Dataset updated
    Apr 15, 2024
    Dataset provided by
    UK Data Servicehttps://ukdataservice.ac.uk/
    Authors
    University of London, Institute of Education, Centre for Longitudinal Studies
    Area covered
    United Kingdom
    Description

    The National Child Development Study (NCDS) is a continuing longitudinal study that seeks to follow the lives of all those living in Great Britain who were born in one particular week in 1958. The aim of the study is to improve understanding of the factors affecting human development over the whole lifespan.

    The NCDS has its origins in the Perinatal Mortality Survey (PMS) (the original PMS study is held at the UK Data Archive under SN 2137). This study was sponsored by the National Birthday Trust Fund and designed to examine the social and obstetric factors associated with stillbirth and death in early infancy among the 17,000 children born in England, Scotland and Wales in that one week. Selected data from the PMS form NCDS sweep 0, held alongside NCDS sweeps 1-3, under SN 5565.

    Survey and Biomeasures Data (GN 33004):

    To date there have been ten attempts to trace all members of the birth cohort in order to monitor their physical, educational and social development. The first three sweeps were carried out by the National Children's Bureau, in 1965, when respondents were aged 7, in 1969, aged 11, and in 1974, aged 16 (these sweeps form NCDS1-3, held together with NCDS0 under SN 5565). The fourth sweep, also carried out by the National Children's Bureau, was conducted in 1981, when respondents were aged 23 (held under SN 5566). In 1985 the NCDS moved to the Social Statistics Research Unit (SSRU) - now known as the Centre for Longitudinal Studies (CLS). The fifth sweep was carried out in 1991, when respondents were aged 33 (held under SN 5567). For the sixth sweep, conducted in 1999-2000, when respondents were aged 42 (NCDS6, held under SN 5578), fieldwork was combined with the 1999-2000 wave of the 1970 Birth Cohort Study (BCS70), which was also conducted by CLS (and held under GN 33229). The seventh sweep was conducted in 2004-2005 when the respondents were aged 46 (held under SN 5579), the eighth sweep was conducted in 2008-2009 when respondents were aged 50 (held under SN 6137), the ninth sweep was conducted in 2013 when respondents were aged 55 (held under SN 7669), and the tenth sweep was conducted in 2020-24 when the respondents were aged 60-64 (held under SN 9412).

    A Secure Access version of the NCDS is available under SN 9413, containing detailed sensitive variables not available under Safeguarded access (currently only sweep 10 data). Variables include uncommon health conditions (including age at diagnosis), full employment codes and income/finance details, and specific life circumstances (e.g. pregnancy details, year/age of emigration from GB).

    Four separate datasets covering responses to NCDS over all sweeps are available. National Child Development Deaths Dataset: Special Licence Access (SN 7717) covers deaths; National Child Development Study Response and Outcomes Dataset (SN 5560) covers all other responses and outcomes; National Child Development Study: Partnership Histories (SN 6940) includes data on live-in relationships; and National Child Development Study: Activity Histories (SN 6942) covers work and non-work activities. Users are advised to order these studies alongside the other waves of NCDS.

    From 2002-2004, a Biomedical Survey was completed and is available under Safeguarded Licence (SN 8731) and Special Licence (SL) (SN 5594). Proteomics analyses of blood samples are available under SL SN 9254.

    Linked Geographical Data (GN 33497):
    A number of geographical variables are available, under more restrictive access conditions, which can be linked to the NCDS EUL and SL access studies.

    Linked Administrative Data (GN 33396):
    A number of linked administrative datasets are available, under more restrictive access conditions, which can be linked to the NCDS EUL and SL access studies. These include a Deaths dataset (SN 7717) available under SL and the Linked Health Administrative Datasets (SN 8697) available under Secure Access.

    Multi-omics Data and Risk Scores Data (GN 33592)
    Proteomics analyses were run on the blood samples collected from NCDS participants in 2002-2004 and are available under SL SN 9254. Metabolomics analyses were conducted on respondents of sweep 10 and are available under SL SN 9411. Polygenic indices are available under SL SN 9439. Derived summary scores have been created that combine the estimated effects of many different genes on a specific trait or characteristic, such as a person's risk of Alzheimer's disease, asthma, substance abuse, or mental health disorders, for example. These scores can be combined with existing survey data to offer a more nuanced understanding of how cohort members' outcomes may be shaped.

    Additional Sub-Studies (GN 33562):
    In addition to the main NCDS sweeps, further studies have also been conducted on a range of subjects such as parent migration, unemployment, behavioural studies and respondent essays. The full list of NCDS studies available from the UK Data Service can be found on the NCDS series access data webpage.

    How to access genetic and/or bio-medical sample data from a range of longitudinal surveys:
    For information on how to access biomedical data from NCDS that are not held at the UKDS, see the CLS Genetic data and biological samples webpage.

    Further information about the full NCDS series can be found on the Centre for Longitudinal Studies website.

    NCDS7:
    The seventh sweep of NCDS was conducted in 2004-2005, when respondents were aged 46-47 years. It was conducted by telephone, and aimed to update the information gathered at NCDS6 in 1999-2000.

    For the third edition (August 2008), the serial number has been replaced with a new one, variable Ncdsid. This change has been made for all datasets in the NCDS series. Further information may be found in the ‘CLS Confidentiality and Data Security Review’, included in the documentation.

  10. H

    Recollection of repeated dental visits

    • dataverse.harvard.edu
    • researchdata.se
    • +1more
    Updated Sep 21, 2015
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rebecca Willén; Pär Anders Granhag (2015). Recollection of repeated dental visits [Dataset]. http://doi.org/10.7910/DVN/AGZW7E
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 21, 2015
    Dataset provided by
    Harvard Dataverse
    Authors
    Rebecca Willén; Pär Anders Granhag
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Dataset funded by
    The Swedish Crime Victim and Support Authority
    Description

    Dental care patients (n=95) participated in a quasi-experiment during 2012 in Sweden. The respondents were interviewed twice about dental visits they had made between 2002 and 2012. For verification purposes, the narratives were compared to the dental records. The qualitative data has been quantified and is stored as .csv supplemented with a codebook in plain text. In addition, all study material is freely available online at https://osf.io/thwcb. For anonymity reasons, a few adjustments were made to the shared data set: three continuous variables were categorised, one variable, sex, was removed, and all respondents were randomly assigned new ID-numbers (to avoid potential self-identification). The data can be reused to further analyse memory for repeated events. It can be used as experimental data (including both interviews) or as single interview data (including data from only the first interview).

  11. H

    Replication Data for: The Wikipedia Adventure: Field Evaluation of an...

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Jun 7, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sneha Narayan; Jake Orlowitz; Aaron D. Shaw; Benjamin Mako Hill (2017). Replication Data for: The Wikipedia Adventure: Field Evaluation of an Interactive Tutorial for New Users [Dataset]. http://doi.org/10.7910/DVN/6HPRIG
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 7, 2017
    Dataset provided by
    Harvard Dataverse
    Authors
    Sneha Narayan; Jake Orlowitz; Aaron D. Shaw; Benjamin Mako Hill
    License

    https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/6HPRIGhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/6HPRIG

    Dataset funded by
    National Science Foundation (NSF)
    Description

    This dataset contains the data and code necessary to replicate work in the following paper: Narayan, Sneha, Jake Orlowitz, Jonathan Morgan, Benjamin Mako Hill, and Aaron Shaw. 2017. “The Wikipedia Adventure: Field Evaluation of an Interactive Tutorial for New Users.” in Proceedings of the 20th ACM Conference on Computer-Supported Cooperative Work & Social Computing (CSCW '17). New York, New York: ACM Press. http://dx.doi.org/10.1145/2998181.2998307 The published paper contains two studies. Study 1 is a descriptive analysis of a survey of Wikipedia editors who played a gamified tutorial. Study 2 is a field experiment that evaluated the same the tutorial. These data are the data used in the field experiment described in Study 2. Description of Files This dataset contains the following files beyond this README: twa.RData — An RData file that includes all variables used in Study 2. twa_analysis.R — A GNU R script that includes all the code used to generate the tables and plots related to Study 2 in the paper. The RData file contains one variable (d) which is an R dataframe (i.e., table) that includes the following columns: userid (integer): The unique numerical ID representing each user on in our sample. These are 8-digit integers and describe public accounts on Wikipedia. sample.date (date string): The day the user was recruited to the study. Dates are formatted in “YYYY-MM-DD” format. In the case of invitees, it is the date their invitation was sent. For users in the control group, these is the date that they would have been invited to the study. edits.all (integer): The total number of edits made by the user on Wikipedia in the 180 days after they joined the study. Edits to user's user pages, user talk pages and subpages are ignored. edits.ns0 (integer): The total number of edits made by user to article pages on Wikipedia in the 180 days after they joined the study. edits.talk (integer): The total number of edits made by user to talk pages on Wikipedia in the 180 days after they joined the study. Edits to a user's user page, user talk page and subpages are ignored. treat (logical): TRUE if the user was invited, FALSE if the user was in control group. play (logical): TRUE if the user played the game. FALSE if the user did not. All users in control are listed as FALSE because any user who had not been invited to the game but played was removed. twa.level (integer): Takes a value 0 of if the user has not played the game. Ranges from 1 to 7 for those who did, indicating the highest level they reached in the game. quality.score (float). This is the average word persistence (over a 6 revision window) over all edits made by this userid. Our measure of word persistence (persistent word revision per word) is a measure of edit quality developed by Halfaker et al. that tracks how long words in an edit persist after subsequent revisions are made to the wiki-page. For more information on how word persistence is calculated, see the following paper: Halfaker, Aaron, Aniket Kittur, Robert Kraut, and John Riedl. 2009. “A Jury of Your Peers: Quality, Experience and Ownership in Wikipedia.” In Proceedings of the 5th International Symposium on Wikis and Open Collaboration (OpenSym '09), 1–10. New York, New York: ACM Press. doi:10.1145/1641309.1641332. Or this page: https://meta.wikimedia.org/wiki/Research:Content_persistence How we created twa.RData The files twa.RData combines datasets drawn from three places: A dataset created by Wikimedia Foundation staff that tracked the details of the experiment and how far people got in the game. The variables userid, sample.date, treat, play, and twa.level were all generated in a dataset created by WMF staff when The Wikipedia Adventure was deployed. All users in the sample created their accounts within 2 days before the date they were entered into the study. None of them had received a Teahouse invitation, a Level 4 user warning, or been blocked from editing at the time that they entered the study. Additionally, all users made at least one edit after the day they were invited. Users were sorted randomly into treatment and control groups, based on which they either received or did not receive an invite to play The Wikipedia Adventure. Edit and text persistence data drawn from public XML dumps created on May 21st, 2015. We used publicly available XML dumps to generate the outcome variables, namely edits.all, edits.ns0, edits.talk and quality.score. We first extracted all edits made by users in our sample during the six month period since they joined the study, excluding edits made to user pages or user talk pages using. We parsed the XML dumps using the Python based wikiq and MediaWikiUtilities software online at: http://projects.mako.cc/source/?p=mediawiki_dump_tools https://github.com/mediawiki-utilities/python-mediawiki-utilities We obtained the XML dumps from: https://dumps.wikimedia.org/enwiki/ A list of edits made by users in our study that were subsequently deleted, created on...

  12. Z

    Data for "Living in a variable visual environment: How stable versus...

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anderson, Hannah; Balshine, Sigal (2025). Data for "Living in a variable visual environment: How stable versus fluctuating suspended sediments affect fish behavior" [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_15103589
    Explore at:
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    McMaster University
    Authors
    Anderson, Hannah; Balshine, Sigal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data used for the manuscript "Living in a variable visual environment: How stable versus fluctuating suspended sediments affect fish behavior". The sister code repository is located at DOI: 10.5281/zenodo.15103558

  13. f

    Data_Sheet_1_oFVSD: a Python package of optimized forward variable selection...

    • datasetcatalog.nlm.nih.gov
    Updated Sep 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fermin, Alan S. R.; Machizawa, Maro G.; Dang, Tung (2023). Data_Sheet_1_oFVSD: a Python package of optimized forward variable selection decoder for high-dimensional neuroimaging data.pdf [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000942155
    Explore at:
    Dataset updated
    Sep 26, 2023
    Authors
    Fermin, Alan S. R.; Machizawa, Maro G.; Dang, Tung
    Description

    The complexity and high dimensionality of neuroimaging data pose problems for decoding information with machine learning (ML) models because the number of features is often much larger than the number of observations. Feature selection is one of the crucial steps for determining meaningful target features in decoding; however, optimizing the feature selection from such high-dimensional neuroimaging data has been challenging using conventional ML models. Here, we introduce an efficient and high-performance decoding package incorporating a forward variable selection (FVS) algorithm and hyper-parameter optimization that automatically identifies the best feature pairs for both classification and regression models, where a total of 18 ML models are implemented by default. First, the FVS algorithm evaluates the goodness-of-fit across different models using the k-fold cross-validation step that identifies the best subset of features based on a predefined criterion for each model. Next, the hyperparameters of each ML model are optimized at each forward iteration. Final outputs highlight an optimized number of selected features (brain regions of interest) for each model with its accuracy. Furthermore, the toolbox can be executed in a parallel environment for efficient computation on a typical personal computer. With the optimized forward variable selection decoder (oFVSD) pipeline, we verified the effectiveness of decoding sex classification and age range regression on 1,113 structural magnetic resonance imaging (MRI) datasets. Compared to ML models without the FVS algorithm and with the Boruta algorithm as a variable selection counterpart, we demonstrate that the oFVSD significantly outperformed across all of the ML models over the counterpart models without FVS (approximately 0.20 increase in correlation coefficient, r, with regression models and 8% increase in classification models on average) and with Boruta variable selection algorithm (approximately 0.07 improvement in regression and 4% in classification models). Furthermore, we confirmed the use of parallel computation considerably reduced the computational burden for the high-dimensional MRI data. Altogether, the oFVSD toolbox efficiently and effectively improves the performance of both classification and regression ML models, providing a use case example on MRI datasets. With its flexibility, oFVSD has the potential for many other modalities in neuroimaging. This open-source and freely available Python package makes it a valuable toolbox for research communities seeking improved decoding accuracy.

  14. e

    Employment and Unemployment Survey, EUS 2012 - Jordan

    • mail.erfdataportal.com
    • erfdataportal.com
    Updated Apr 11, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Statistics (2017). Employment and Unemployment Survey, EUS 2012 - Jordan [Dataset]. https://mail.erfdataportal.com/index.php/catalog/101
    Explore at:
    Dataset updated
    Apr 11, 2017
    Dataset provided by
    Department of Statistics
    Economic Research Forum
    Time period covered
    2012
    Area covered
    Jordan
    Description

    Abstract

    THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE DEPARTMENT OF STATISTICS OF THE HASHEMITE KINGDOM OF JORDAN

    The Department of Statistics (DOS) carried out four rounds of the 2012 Employment and Unemployment Survey (EUS) during 2012. The survey rounds covered a total sample of about fifty three thousand households Nation-wide (53.4 thousands). The sampled households were selected using a stratified cluster sampling design.

    It is worthy to mention that the DOS employed new technology in the data collection and processing. Data was collected using the electronic questionnaire instead of a hard copy, namely a hand held device (PDA).

    The raw survey data provided by the Statistical Agency were cleaned and harmonized by the Economic Research Forum, in the context of a major project that started in 2009. During which extensive efforts have been exerted to acquire, clean, harmonize, preserve and disseminate micro data of existing labor force surveys in several Arab countries.

    Geographic coverage

    Covering a representative sample on the national level (Kingdom), governorates, and the three Regions (Central, North and South).

    Analysis unit

    1- Household/family. 2- Individual/person.

    Universe

    The survey covered a national sample of households and all individuals permanently residing in surveyed households.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE DEPARTMENT OF STATISTICS OF THE HASHEMITE KINGDOM OF JORDAN

    Survey Frame

    The sample of this survey is based on the frame provided by the data of the Population and Housing Census, 2004. The Kingdom was divided into strata, where each city with a population of 100,000 persons or more was considered as a large city. The total number of these cities is 6. Each governorate (except for the 6 large cities) was divided into rural and urban areas. The rest of the urban areas in each governorate were considered as an independent stratum. The same was applied to rural areas where they were considered as an independent stratum. The total number of strata was 30.

    And because of the existence of significant variation in the social and economic characteristics in large cities, in particular, and in urban areas in general, each stratum of the large cities and urban strata was divided into four sub-stratums according to the socio- economic characteristics provided by the population and housing census 2004 aiming at providing homogeneous strata.

    Sample Design

    The sample of this survey was designed using a stratified cluster sampling method. The sample is considered representative on the Kingdom, rural, urban, regions and governorates levels, however, it does not represent the non-Jordanians.

    Sampling notes

    The frame excludes the population living in remote areas (most of whom are nomads). In addition to that, the frame does not include collective dwellings, such as hotels, hospitals, work camps, prisons and alike. However, it is worth noting that the collective households identified in the harmonized data, through a variable indicating the household type, are those reported without heads in the raw data, and in which the relationship of all household members to head was reported "other".

    This sample is also not representative for the non-Jordanian population.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    The questionnaire was designed electronically on the PDA and revised by the DOS technical staff. It is divided into a number of main topics, each containing a clear and consistent group of questions, and designed in a way that facilitates the electronic data entry and verification. The questionnaire includes the characteristics of household members in addition to the identification information, which reflects the administrative as well as the statistical divisions of the Kingdom.

    Cleaning operations

    Raw Data

    A tabulation results plan has been set based on the previous Employment and Unemployment Surveys while the required programs were prepared and tested. When all prior data processing steps were completed, the actual survey results were tabulated using an ORACLE package. The tabulations were then thoroughly checked for consistency of data. The final report was then prepared, containing detailed tabulations as well as the methodology of the survey.

    Harmonized Data

    • The SPSS package is used to clean and harmonize the datasets.
    • The harmonization process starts with a cleaning process for all raw data files received from the Statistical Agency.
    • All cleaned data files are then merged to produce one data file on the individual level containing all variables subject to harmonization.
    • A country-specific program is generated for each dataset to generate/ compute/ recode/ rename/ format/ label harmonized variables.
    • A post-harmonization cleaning process is then conducted on the data.
    • Harmonized data is saved on the household as well as the individual level, in SPSS and then converted to STATA, to be disseminated.

    Response rate

    The results of the fieldwork indicated that the number of successfully completed interviews was 48880 (with around 91% response rate).

  15. S1 Fig -

    • plos.figshare.com
    zip
    Updated Mar 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anna Halpin-McCormick; Tai McClellan Maaz; Michael B. Kantar; Kasey E. Barton; Rishi R. Masalia; Nick Batora; Kerin Law; Eleanor J. Kuntz (2025). S1 Fig - [Dataset]. http://doi.org/10.1371/journal.pone.0306007.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 13, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Anna Halpin-McCormick; Tai McClellan Maaz; Michael B. Kantar; Kasey E. Barton; Rishi R. Masalia; Nick Batora; Kerin Law; Eleanor J. Kuntz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    137 observations used for SDM model construction with a longitude greater than zero. Fig S2. Individual species distributions for each set of environmental properties examined (A) WorldClim Bioclimatic variables (B) ISRIC soil data (C) Solar radiation (kJm2/day) (D) Wind speed (m/s) (E) Water vapor pressure (kPa) (F) Elevation suitability maps. These maps were generated with Maxent using Worldclim and ISRIC data. Fig S3. Variable contribution graphs for each set of environmental properties examined (A) WorldClim Bioclimatic variables (B) ISRIC soil data (C) Solar radiation (kJm2/day) (D) Wind speed (m/s) (E) Water vapor pressure (kPa) and (F) Elevation. Fig S4. Area under the curve graphs each set of environmental properties examined (A) WorldClim Bioclimatic variables (B) ISRIC soil data (C) Solar radiation (kJm2/day) (D) Wind speed (m/s) (E) Water vapor pressure (kPa) and (F) Elevation. Fig S5. Overlay of all six environmental datasets (A) Worldwide plot (B) standard deviation for the overlay of all six environmental variables. These maps were generated with Maxent using Worldclim and ISRIC data. Fig S6. Overlay of all six environmental datasets (A) Variable contribution graph (B) Area under the curve graphs each set of environmental properties examined. Fig S7. Species distribution with temperature and precipitation data in Asia and Russia for (A) present day (B) SSP45 2050 (C) SSP45 2070 (D) SSP85 2050 (E) SSP85 2070. These maps were generated with Maxent using Worldclim data. Fig S8. Species distribution with temperature and precipitation data in Europe for (A) present day (B) SSP45 2050 (C) SSP45 2070 (D) SSP85 2050 (E) SSP85 2070. These maps were generated with Maxent using Worldclim data. Fig S9. Species distribution with temperature and precipitation data in the United States for (A) present day (B) SSP45 2050 (C) SSP45 2070 (D) SSP85 2050 (E) SSP85 2070. These maps were generated with Maxent using Worldclim data. Fig S10. Species distribution for a subset of the United States with data for all six environmental properties examined (A) California (B) Colorado (C) Maine (D) Oregon (E) Washington (F) Massachusetts (G) Michigan. These maps were generated with Maxent using Worldclim data. Fig S11. (A) Pleistocene: M2 (ca. 3.3 Ma) (B) Predicted Distribution for the Paleoclimate timepoint of the Mid Pliocene warm period (ca. 3.2 Ma) (C) Predicted Distribution for the Paleoclimate timepoint of the Pleistocene: MIS19 (ca. 787,000 years ago) (D) Predicted Distribution for the Paleoclimate timepoint of the Pleistocene: Last Interglacial (130,000 years ago) (E) Predicted Distribution for the Paleoclimate timepoint of the Pleistocene: Last Glacial Maximum (ca. 21,000 years ago) (F) Potential Distribution for the Paleoclimate timepoint of the Pleistocene: Heinrich Stadial (14,700 – 17,000 years ago) (G) Potential Distribution for the Paleoclimate timepoint of the Pleistocene: Bolling-Allerod (12,900 – 14,700 years ago) (H) Potential Distribution for the Paleoclimate timepoint of the Pleistocene: Younger Dryas Stadial (11,700 – 12,900 years ago) (I) Potential Distribution for the Paleoclimate timepoint of the Pleistocene: Early Holocene, Greenlandian (8,366 - 11,700 years ago) (J) Potential Distribution for the Paleoclimate timepoint of the Pleistocene: Mid Holocene, Northgrippian (4,200 – 8,326 years ago) (K) Potential Distribution for the Paleoclimate timepoint of the Pleistocene: Late Holocene, Meghalayan (300 – 4200 years ago). These maps were generated with Maxent using Worldclim data. Fig S12. AUC and variable contribution graphs for each timepoint from the Paleoclim dataset (A) Pleistocene: M2 (ca. 3.3 Ma) (B) Mid Pliocene warm period (ca. 3.2 Ma) (C) Pleistocene: MIS19 (ca. 787,000 years ago) (D) Pleistocene: Last Interglacial (130,000 years ago) (E) Last Glacial Maximum (ca. 21,000 years ago) (F) Heinrich Stadial (14,700 – 17,000 years ago (G) Bolling-Allerod (12,900 – 14,700 years ago) (H) Younger Dryas Stadial (11,700 – 12,900 years ago) (I) Early Holocene, Greenlandian (11,700-8,326 years ago) (J) Mid Holocene, Northgrippian (4,200 – 8,326 years ago) (K) Late Holocene, Meghalayan (300 – 4200 years ago). (ZIP)

  16. Customer Churn (Telecom) - LR, DT, RF and AUC

    • kaggle.com
    zip
    Updated Jul 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    vikram amin (2023). Customer Churn (Telecom) - LR, DT, RF and AUC [Dataset]. https://www.kaggle.com/datasets/vikramamin/customer-churn-telecom-lr-dt-rf-and-auc/discussion?sort=undefined
    Explore at:
    zip(46575 bytes)Available download formats
    Dataset updated
    Jul 23, 2023
    Authors
    vikram amin
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Main Objective: To find out the best model to predict the churners. Models used : Logistic Regression, Decision Tree & Random Forest , ROC and AUC.

    Steps Involved: - Read the data. Column "Churn" is the dependent variable - Data Cleaning involved checking for missing data, changing categorical vector to factor vector. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F8836e58ca07a00d93a07b8a3b736f9e1%2FPicture1.png?generation=1690174421898952&alt=media" alt=""> - Run Logistic Regression https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F054c6b826756c1c8e1902f6ff860ec2d%2FPicture2.png?generation=1690174475710853&alt=media" alt=""> - Used the stepwise function which will include all the independent variables and start removing the insignificant variables one after the other https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F0ae4608af11eac1b929d5b9cac1b5b93%2FPicture3.png?generation=1690174732587813&alt=media" alt=""> - Variables that has been removed are "AccountWeek" , "DayCalls", "DataUsage", "MonthlyCharge“ - Checked for multicollinearity in the model by using the VIF function but find none as the absolute values are less than 5 https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F030da350104c4abf010513b9cd73b8c4%2FPicture4.png?generation=1690174808380692&alt=media" alt=""> - Split the data into train and test (80/20) by using createDataPartition function - Use the train data to create the model and the test data for prediction - Created a new column called "class" which has the predicted class in the "test" data frame. - Use confusion Matrix to find out the sensitivity , specificity and accuracy of the model. - Accuracy = 85.89% (prediction of both churners and non churners), Sensitivity = 11.45% (prediction of churners out of the total number of churners), Specificity = 98.42% (prediction of non churners out of the total number of non churners) https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F138c869a88d29a4c693734e6cba6465c%2FPicture5.png?generation=1690175038748853&alt=media" alt=""> - Sensitivity or True Positive rate is important as we want to predict the total number of churners. - In the dependent variable Churn, there are 2 levels of observations (1 = churners and 0 = non churners) - A lower sensitivity rate (11.45% in this case) can sometimes be due to imbalanced data - It is clearly visible that there are significantly more non churners (2280 nos) in the dataset as opposed to churners (387 nos) https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F36c00c4af6bc998c72f2662b6dfc8fdd%2FPicture6.png?generation=1690175163699036&alt=media" alt=""> - We will use the function “ovun.sample” from package “ROSE” to balance the data by oversampling, undersampling and both sampling - Observations: 2280*2 = 4560 for oversampling, 387*2 = 774 for undersampling & (4560+774)/2 = 2667 for both sampling - Ovesampling data is stored in over_data , undersampling data in under_data, both sampling data in both_data. - We will use Confusion Matrix to find out the sensitivity , accuracy and specificity for over_data, under_data, both_data. We will set the seed(1234) for all three - Confusion matrix provides us the Accuracy (79.13%), Sensitivity (75%) and Specificity (79.82%) for oversampling in Logistic Regression
    - Confusion matrix provides us the Accuracy (79.13%), Sensitivity (72.92%) and Specificity (80.18%) for under sampling in Logistic Regression
    - Confusion matrix provides us the Accuracy (79.13%), Sensitivity (75%) and Specificity (79.82%) for both sampling in Logistic Regression
    - WE USE DECISION TREE FOR PREDICTION FOR OVERSAMPLING, UNDERSAMPLING AND BOTH SAMPLING - We will use the library(“rpart”) for this purpose. We will set the seed(1234) for all three. - Confusion matrix provides us the Accuracy (87.84%), Sensitivity (82.29%) and Specificity (88.77%) for oversampling in Decision Tree - Confusion matrix provides us the Accuracy (90.24%), Sensitivity (83.33%) and Specificity (91.40%) for under - sampling in Decision Tree
    - Confusion matrix provides us the Accuracy (87.84%), Sensitivity (84.38%) and Specificity (88.42%) for both sampling in Decision Tree
    - WE USE RANDOM FOREST FOR PREDICTION FOR OVERSAMPLING, UNDERSAMPLING AND BOTH SAMPLING - We will use the library(“randomForest”) for this purpose. We will set the seed(1234) for all three. - Confusion matrix provides us the Accuracy (93.09%), Sensitivity (66.67%) and Specificity (97.54%) for oversampling in Random Forest - Confusion matrix provides us the Accuracy (88.74%), Sensitivity (81.25%) and Specificity (90%) for under sampling in Random Forest
    - Confusion matrix provides us the Accuracy (92.79%),...

  17. o

    Jacob Kaplan's Concatenated Files: Uniform Crime Reporting (UCR) Program...

    • openicpsr.org
    Updated Mar 29, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jacob Kaplan (2018). Jacob Kaplan's Concatenated Files: Uniform Crime Reporting (UCR) Program Data: Arrests by Age, Sex, and Race, 1974-2020 [Dataset]. http://doi.org/10.3886/E102263V14
    Explore at:
    Dataset updated
    Mar 29, 2018
    Dataset provided by
    Princeton University
    Authors
    Jacob Kaplan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    1974 - 2020
    Area covered
    United States
    Description

    For a comprehensive guide to this data and other UCR data, please see my book at ucrbook.comVersion 14 release notes:Adds 2020 data. Please note that the FBI has retired UCR data ending in 2020 data so this will be the last Arrests by Age, Sex, and Race data they release. Version 13 release notes:Changes R files from .rda to .rds.Fixes bug where the number_of_months_reported variable incorrectly was the largest of the number of months reported for a specific crime variable. For example, if theft was reported Jan-June and robbery was reported July-December in an agency, in total there were 12 months reported. But since each crime (and let's assume no other crime was reported more than 6 months of the year) only was reported 6 months, the number_of_months_reported variable was incorrectly set at 6 months. Now it is the total number of months reported of any crime. So it would be set to 12 months in this example. Thank you to Nick Eubank for alerting me to this issue.Adds rows even when a agency reported zero arrests that month; all arrest values are set to zero for these rows.Version 12 release notes:Adds 2019 data.Version 11 release notes:Changes release notes description, does not change data.Version 10 release notes:The data now has the following age categories (which were previously aggregated into larger groups to reduce file size): under 10, 10-12, 13-14, 40-44, 45-49, 50-54, 55-59, 60-64, over 64. These categories are available for female, male, and total (female+male) arrests. The previous aggregated categories (under 15, 40-49, and over 49 have been removed from the data). Version 9 release notes:For each offense, adds a variable indicating the number of months that offense was reported - these variables are labeled as "num_months_[crime]" where [crime] is the offense name. These variables are generated by the number of times one or more arrests were reported per month for that crime. For example, if there was at least one arrest for assault in January, February, March, and August (and no other months), there would be four months reported for assault. Please note that this does not differentiate between an agency not reporting that month and actually having zero arrests. The variable "number_of_months_reported" is still in the data and is the number of months that any offense was reported. So if any agency reports murder arrests every month but no other crimes, the murder number of months variable and the "number_of_months_reported" variable will both be 12 while every other offense number of month variable will be 0. Adds data for 2017 and 2018.Version 8 release notes:Adds annual data in R format.Changes project name to avoid confusing this data for the ones done by NACJD.Fixes bug where bookmaking was excluded as an arrest category. Changed the number of categories to include more offenses per category to have fewer total files. Added a "total_race" file for each category - this file has total arrests by race for each crime and a breakdown of juvenile/adult by race. Version 7 release notes: Adds 1974-1979 dataAdds monthly data (only totals by sex and race, not by age-categories). All data now from FBI, not NACJD. Changes some column names so all columns are <=32 characters to be usable in Stata.Changes how number of months reported is calculated. Now it is the number of unique months with arrest data reported - months of data from the monthly header file (i.e. juvenile disposition data) are not considered in this calculation. Version 6 release notes: Fix bug where juvenile female columns had the same value as juvenile male columns.Version 5 release notes: Removes support for SPSS and Excel data.Changes the crimes that are stored in each file. There are more files now with fewer crimes per file. The files and their included crimes have been updated below.Adds in agencies that report 0 months of the year.Adds a column that indicates the number of months reported. This is generated summing up the number of unique months an agency reports data for. Note that this indicates the number of months an agency reported arrests for ANY crime. They may not necessarily report every crime every month. Agencies that did not report a crime with have a value of NA for every arrest column for that crime.Removes data on runaways.Version 4 release notes: Changes column names from "poss_coke" and "sale_coke" to "poss_heroi

  18. d

    Data and simulation model files for: Variable-stiffness morphing wheel...

    • datadryad.org
    • search.dataone.org
    • +1more
    zip
    Updated Jul 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jae-Young Lee; Seongji Han; Munyu Kim; Yong-Sin Seo; Jongwoo Park; Dongil Park; Chanhun Park; Hyunuk Seo; Joonho Lee; Hwi-Su Kim; Jeongae Bak; Hugo Rodrigue; Jin-Gyun Kim; Joono Cheong; Sung-Hyuk Song (2024). Data and simulation model files for: Variable-stiffness morphing wheel inspired by the surface tension of a liquid drop [Dataset]. http://doi.org/10.5061/dryad.kwh70rzd7
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 23, 2024
    Dataset provided by
    Dryad
    Authors
    Jae-Young Lee; Seongji Han; Munyu Kim; Yong-Sin Seo; Jongwoo Park; Dongil Park; Chanhun Park; Hyunuk Seo; Joonho Lee; Hwi-Su Kim; Jeongae Bak; Hugo Rodrigue; Jin-Gyun Kim; Joono Cheong; Sung-Hyuk Song
    Time period covered
    Jul 12, 2024
    Description

    Data and simulation model files for: Variable-stiffness morphing wheel inspired by the surface tension of a liquid drop

    https://doi.org/10.5061/dryad.kwh70rzd7

    Overview

    This dataset contains the data, CAD, and simulation model generated from the experiments using the variable-stiffness morphing wheel. The presented dataset is necessary for generating figures and results for the paper titled 'Variable-stiffness morphing wheel inspired by the surface tension of a liquid drop'.

    Data from the figures

    The measured data necessary to reproduce the figures in the paper titled 'Variable-stiffness morphing wheel inspired by the surface tension of a liquid drop'.

    CAD

    The shape and size information of the components and platform used in the paper

    • Smart chain structure: The size and detailed shape of the smart chain structure, which consists of the outer structure of the stiffness-variable morphing wheel.
    • Stiffness-variable module in ...
  19. r

    MCCN Case Study 6 - Environmental Correlates for Productivity

    • researchdata.edu.au
    Updated Nov 13, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rakesh David; Lili Andres Hernandez; Hoang Son Le; Donald Hobern; Alisha Aneja (2025). MCCN Case Study 6 - Environmental Correlates for Productivity [Dataset]. http://doi.org/10.25909/29176682.V1
    Explore at:
    Dataset updated
    Nov 13, 2025
    Dataset provided by
    The University of Adelaide
    Authors
    Rakesh David; Lili Andres Hernandez; Hoang Son Le; Donald Hobern; Alisha Aneja
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The MCCN project is to deliver tools to assist the agricultural sector to understand crop-environment relationships, specifically by facilitating generation of data cubes for spatiotemporal data. This repository contains Jupyter notebooks to demonstrate the functionality of the MCCN data cube components.

    The dataset contains input files for the case study (source_data), RO-Crate metadata (ro-crate-metadata.json), results from the case study (results), and Jupyter Notebook (MCCN-CASE 6.ipynb)

    Research Activity Identifier (RAiD)

    RAiD: https://doi.org/10.26292/8679d473

    Case Studies

    This repository contains code and sample data for the following case studies. Note that the analyses here are to demonstrate the software and result should not be considered scientifically or statistically meaningful. No effort has been made to address bias in samples, and sample data may not be available at sufficient density to warrant analysis. All case studies end with generation of an RO-Crate data package including the source data, the notebook and generated outputs, including netcdf exports of the datacubes themselves.

    Case Study 6 - Environmental Correlates for Productivity

    Description

    Analyse relationship between different environmental drivers and plant yield. This study demonstrates: 1) Loading heterogeneous data sources into a cube, and 2) Analysis and visualisation of drivers. This study combines a suite of spatial variables at different scales across multiple sites to analyse the factors correlated with a variable of interest.

    Data Sources

    The dataset includes the Gilbert site in Queensland which has multiple standard sized plots for three years. We are using data from 2022. The source files are part pf the larger collection - Chapman, Scott and Smith, Daniel (2023). INVITA Core site UAV dataset. The University of Queensland. Data Collection. https://doi.org/10.48610/951f13c

    1. Boundary file - This is a shapefile defining the boundaries of all field plots at the Gilbert site. Each polygon represents a single plot and is associated with a unique Plot ID (e.g., 03_03_1). These plot IDs are essential for joining and aligning data across the orthomosaics and plot-level measurements.
    1. Orthomosaics - The site was imaged by UAV flights multiple times throughout the 2022 growing season, spanning from June to October. Each flight produced an orthorectified mosaic image using RGB and Multispectral (MS) sensors.
    1. Plot level measurements - Multispectral Traits: Calculated from MS sensor imagery and include indices NDVI, NDRE, SAVI and Biomass Cuts: Field-measured biomass sampled during different growth stages (used as a proxy for yield).


  20. d

    Data for and estimates of wet deposition and streamwater solute fluxes at...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Nov 26, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Data for and estimates of wet deposition and streamwater solute fluxes at the Panola Mountain Research Watershed, Stockbridge, Ga., water years 1986-2016 [Dataset]. https://catalog.data.gov/dataset/data-for-and-estimates-of-wet-deposition-and-streamwater-solute-fluxes-at-the-panola-1986-
    Explore at:
    Dataset updated
    Nov 26, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Panola Mountain, Georgia, Stockbridge
    Description

    This dataset contains the data and results of an analysis estimating wet deposition and streamwater solute fluxes at Panola Mountain Research Watershed (PMRW), Panola Mountain State Park, Stockbridge, Georgia for water years 1986–2016. The PMRW is a small (41 ha), relatively undisturbed, forested headwater catchment in the Piedmont Province of Southeastern United States. This data provides the basis for using a watershed mass-balance approach, in which inputs and outputs of water and solutes are quantified and compared to better understand hydrologic and biogeochemical processes on a watershed scale. The dataset contains 13 datasets consisting of a variety of data series and results, which are summarized herein including their purpose(s): (1) Precipitation amount (1-minute time-step) used to estimate wet deposition, for predicting monthly soil moisture, and as variables in most of the streamwater concentration regression models. (2) Precipitation water quality (mostly weekly composite samples) used to estimate wet deposition. (3) Streamwater stage and flow (5-minute time-step, 1-minute during stormflow) used to estimate streamwater solute fluxes, for predicting monthly soil moisture, and as variables in the streamwater concentration regression models. (4) Stream water quality (discrete weekly and storm samples) used to estimate streamwater solute fluxes using a regression-based approach. (5) 13-year soil moisture time-series from a profile with 3 depths (15, 40, and 70 cm; 5-minute time-step), which represents shallow watershed storage, which was used to classify climatic conditions of some solutes (separate concentration regression models were developed for each climate category). (6) Data, calibration variables, and predictions for a USGS monthly water-balance program used to model monthly watershed soil moisture when measured soil moisture data was unavailable. (7) Unit-value base flow (as determined from a hydrograph separation using the Eckhardt filter), which was used to calculate the base-flow ratio that was a variable within a few of the streamwater concentration regression models. The hydrograph separation also defined base-flow and stormflow periods used to determine the base-flow and rising limb indicator variables used by most of the concentration models. Estimates of wet deposition for 9 solutes, summarized on (8) daily, (9) monthly, and (10) annual time-steps. Streamwater solute flux estimates for 10 solutes, summarized on (11) daily and (12) annual time-steps. (13) A dataset of edit code descriptions used by the unit-value precipitation amount, streamwater stage and flow, and soil moisture time-series.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Data Insights Market (2025). Variable Data Printing (VDP) Software Report [Dataset]. https://www.datainsightsmarket.com/reports/variable-data-printing-vdp-software-1946900

Variable Data Printing (VDP) Software Report

Explore at:
ppt, doc, pdfAvailable download formats
Dataset updated
Jan 30, 2025
Dataset authored and provided by
Data Insights Market
License

https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description

The size of the Variable Data Printing (VDP) Software market was valued at USD XXX million in 2024 and is projected to reach USD XXX million by 2033, with an expected CAGR of XX% during the forecast period.

Search
Clear search
Close search
Google apps
Main menu