85 datasets found
  1. File Validation and Training Statistics

    • kaggle.com
    zip
    Updated Dec 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). File Validation and Training Statistics [Dataset]. https://www.kaggle.com/datasets/thedevastator/file-validation-and-training-statistics
    Explore at:
    zip(16413235 bytes)Available download formats
    Dataset updated
    Dec 1, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    File Validation and Training Statistics

    Validation, Training, and Testing Statistics for tasksource/leandojo Files

    By tasksource (From Huggingface) [source]

    About this dataset

    The tasksource/leandojo: File Validation, Training, and Testing Statistics dataset is a comprehensive collection of information regarding the validation, training, and testing processes of files in the tasksource/leandojo repository. This dataset is essential for gaining insights into the file management practices within this specific repository.

    The dataset consists of three distinct files: validation.csv, train.csv, and test.csv. Each file serves a unique purpose in providing statistics and information about the different stages involved in managing files within the repository.

    In validation.csv, you will find detailed information about the validation process undergone by each file. This includes data such as file paths within the repository (file_path), full names of each file (full_name), associated commit IDs (commit), traced tactics implemented (traced_tactics), URLs pointing to each file (url), and respective start and end dates for validation.

    train.csv focuses on providing valuable statistics related to the training phase of files. Here, you can access data such as file paths within the repository (file_path), full names of individual files (full_name), associated commit IDs (commit), traced tactics utilized during training activities (traced_tactics), URLs linking to each specific file undergoing training procedures (url).

    Lastly, test.csv encompasses pertinent statistics concerning testing activities performed on different files within the tasksource/leandojo repository. This data includes information such as file paths within the repo structure (file_path), full names assigned to each individual file tested (full_name) , associated commit IDs linked with these files' versions being tested(commit) , traced tactics incorporated during testing procedures regarded(traced_tactics) ,relevant URLs directing to specific tested files(url).

    By exploring this comprehensive dataset consisting of three separate CSV files - validation.csv, train.csv, test.csv - researchers can gain crucial insights into how effective strategies pertaining to validating ,training or testing tasks have been implemented in order to maintain high-quality standards within the tasksource/leandojo repository

    How to use the dataset

    • Familiarize Yourself with the Dataset Structure:

      • The dataset consists of three separate files: validation.csv, train.csv, and test.csv.
      • Each file contains multiple columns providing different information about file validation, training, and testing.
    • Explore the Columns:

      • 'file_path': This column represents the path of the file within the repository.
      • 'full_name': This column displays the full name of each file.
      • 'commit': The commit ID associated with each file is provided in this column.
      • 'traced_tactics': The tactics traced in each file are listed in this column.
      • 'url': This column provides the URL of each file.
    • Understand Each File's Purpose:

    Validation.csv - This file contains information related to the validation process of files in the tasksource/leandojo repository.

    Train.csv - Utilize this file if you need statistics and information regarding the training phase of files in tasksource/leandojo repository.

    Test.csv - For insights into statistics and information about testing individual files within tasksource/leandojo repository, refer to this file.

    • Generate Insights & Analyze Data:
    • Once you have a clear understanding of each column's purpose, you can start generating insights from your analysis using various statistical techniques or machine learning algorithms.
    • Explore patterns or trends by examining specific columns such as 'traced_tactics' or analyzing multiple columns together.

    • Combine Multiple Files (if necessary):

    • If required, you can merge/correlate data across different csv files based on common fields such as 'file_path', 'full_name', or 'commit'.

    • Visualize the Data (Optional):

    • To enhance your analysis, consider creating visualizations such as plots, charts, or graphs. Visualization can offer a clear representation of patterns or relationships within the dataset.

    • Obtain Further Information:

    • If you need additional details about any specific file, make use of the provided 'url' column to access further information.

    Remember that this guide provides a general overview of how to utilize this dataset effectively. Feel ...

  2. f

    Table_1_Operational Challenges in the Use of Structured Secondary Data for...

    • frontiersin.figshare.com
    • datasetcatalog.nlm.nih.gov
    docx
    Updated May 30, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kelsy N. Areco; Tulio Konstantyner; Paulo Bandiera-Paiva; Rita C. X. Balda; Daniela T. Costa-Nobre; Adriana Sanudo; Carlos Roberto V. Kiffer; Mandira D. Kawakami; Milton H. Miyoshi; Ana Sílvia Scavacini Marinonio; Rosa M. V. Freitas; Liliam C. C. Morais; Monica L. P. Teixeira; Bernadette Waldvogel; Maria Fernanda B. Almeida; Ruth Guinsburg (2023). Table_1_Operational Challenges in the Use of Structured Secondary Data for Health Research.DOCX [Dataset]. http://doi.org/10.3389/fpubh.2021.642163.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Frontiers
    Authors
    Kelsy N. Areco; Tulio Konstantyner; Paulo Bandiera-Paiva; Rita C. X. Balda; Daniela T. Costa-Nobre; Adriana Sanudo; Carlos Roberto V. Kiffer; Mandira D. Kawakami; Milton H. Miyoshi; Ana Sílvia Scavacini Marinonio; Rosa M. V. Freitas; Liliam C. C. Morais; Monica L. P. Teixeira; Bernadette Waldvogel; Maria Fernanda B. Almeida; Ruth Guinsburg
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Background: In Brazil, secondary data for epidemiology are largely available. However, they are insufficiently prepared for use in research, even when it comes to structured data since they were often designed for other purposes. To date, few publications focus on the process of preparing secondary data. The present findings can help in orienting future research projects that are based on secondary data.Objective: Describe the steps in the process of ensuring the adequacy of a secondary data set for a specific use and to identify the challenges of this process.Methods: The present study is qualitative and reports methodological issues about secondary data use. The study material was comprised of 6,059,454 live births and 73,735 infant death records from 2004 to 2013 of children whose mothers resided in the State of São Paulo - Brazil. The challenges and description of the procedures to ensure data adequacy were undertaken in 6 steps: (1) problem understanding, (2) resource planning, (3) data understanding, (4) data preparation, (5) data validation and (6) data distribution. For each step, procedures, and challenges encountered, and the actions to cope with them and partial results were described. To identify the most labor-intensive tasks in this process, the steps were assessed by adding the number of procedures, challenges, and coping actions. The highest values were assumed to indicate the most critical steps.Results: In total, 22 procedures and 23 actions were needed to deal with the 27 challenges encountered along the process of ensuring the adequacy of the study material for the intended use. The final product was an organized database for a historical cohort study suitable for the intended use. Data understanding and data preparation were identified as the most critical steps, accounting for about 70% of the challenges observed for data using.Conclusion: Significant challenges were encountered in the process of ensuring the adequacy of secondary health data for research use, mainly in the data understanding and data preparation steps. The use of the described steps to approach structured secondary data and the knowledge of the potential challenges along the process may contribute to planning health research.

  3. n

    Verst-Maldaun Language Assessment (VMLA) Validation Process Database

    • narcis.nl
    • data.mendeley.com
    Updated Dec 3, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Verst, S (via Mendeley Data) (2020). Verst-Maldaun Language Assessment (VMLA) Validation Process Database [Dataset]. http://doi.org/10.17632/zjhfk7mm7v.3
    Explore at:
    Dataset updated
    Dec 3, 2020
    Dataset provided by
    Data Archiving and Networked Services (DANS)
    Authors
    Verst, S (via Mendeley Data)
    Description

    This paper drives the process of creating VMLA, a language test meant to be used during awake craniotomies. It focuses on step by step process and aims to help other developers to build their own assessment. This project was designed as a prospective study and registered in the Ethic Committee of Educational and Research Institute of Sirio Libanês Hospital. Ethics committee approval number: HSL 2018-37 / CAEE 90603318.9.0000.5461. Images were bought by Shutterstock.com and generated the following receipts: SSTK-0CA8F-1358 and SSTK-0235F-6FC2 VMLA is a neuropsychological assessment of language function, comprising object naming (ON) and semantic. Originally composed by 420 slides, validation among Brazilian native speakers left 368 figures plus fifteen other elements, like numbers, sentences and count. Validation was focused on educational level (EL), gender and age. Volunteers were tested in fourteen different states of Brazil. Cultural differences resulted in improvements to final Answer Template. EL and age were identified as factors that influenced VLMA assessment results. Highly educated volunteers performed better for both ON and semantic. People over 50 and 35 years old had better performance for ON and semantic, respectively. Further validation in unevaluated regions of Brazil, including more balanced number of males and females and more even distribution of age and EL, could confirm our statistical analysis. After validation, ON-VMLA was framed in batteries of 100 slides each, mixing images of six different complexity categories. Semantic-VMLA kept all the original seventy verbal and non-verbal combinations. The validation process resulted in increased confidence during intraoperative test application. We are now able to score and evaluate patient´s language deficits. Currently, VLMA fits its purpose of dynamical application and accuracy during language areas mapping. It is the first test targeted to Brazilians, representing much of our culture and collective imagery. Our experience may be of value to clinicians and researchers working with awake craniotomy who seek to develop their own language test.

    The test is available for free use at www.vemotests.com (beginning in February, 2021)

  4. d

    Data from: Summary report of the 4th IAEA Technical Meeting on Fusion Data...

    • dataone.org
    • dataverse.harvard.edu
    Updated Sep 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    S.M. Gonzalez de Vicente, D. Mazon, M. Xu, S. Pinches, M. Churchill, A. Dinklage, R. Fischer, A. Murari, P. Rodriguez-Fernandez, J. Stillerman, J. Vega, G. Verdoolaege (2024). Summary report of the 4th IAEA Technical Meeting on Fusion Data Processing, Validation and Analysis (FDPVA) [Dataset]. http://doi.org/10.7910/DVN/ZZ9UKO
    Explore at:
    Dataset updated
    Sep 24, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    S.M. Gonzalez de Vicente, D. Mazon, M. Xu, S. Pinches, M. Churchill, A. Dinklage, R. Fischer, A. Murari, P. Rodriguez-Fernandez, J. Stillerman, J. Vega, G. Verdoolaege
    Description

    The objective of the fourth Technical Meeting on Fusion Data Processing, Validation and Analysis was to provide a platform during which a set of topics relevant to fusion data processing, validation and analysis are discussed with the view of extrapolating needs to next step fusion devices such as ITER. The validation and analysis of experimental data obtained from diagnostics used to characterize fusion plasmas are crucial for a knowledge-based understanding of the physical processes governing the dynamics of these plasmas. This paper presents the recent progress and achievements in the domain of plasma diagnostics and synthetic diagnostics data analysis (including image processing, regression analysis, inverse problems, deep learning, machine learning, big data and physics-based models for control) reported at the meeting. The progress in these areas highlight trends observed in current major fusion confinement devices. A special focus is dedicated on data analysis requirements for ITER and DEMO with a particular attention paid to Artificial Intelligence for automatization and improving reliability of control processes.

  5. d

    Contact Validation & Append | Sagacity Contact Data | UK Wide Coverage |...

    • datarade.ai
    .csv, .xls, .txt
    Updated Nov 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sagacity (2025). Contact Validation & Append | Sagacity Contact Data | UK Wide Coverage | Ensure you can contact all your records | Accurate Multi Channel Data [Dataset]. https://datarade.ai/data-products/contact-validation-append-sagacity-contact-data-uk-wide-sagacity
    Explore at:
    .csv, .xls, .txtAvailable download formats
    Dataset updated
    Nov 9, 2025
    Dataset authored and provided by
    Sagacity
    Area covered
    United Kingdom
    Description

    Our Contact Validation and Append solution identifies and fixes errors in your existing customer database whilst appending missing information, including email addresses and telephone numbers. This comprehensive approach allows you to provide excellent customer service, obtain accurate billing information, and achieve high collection rates across all your communications.

    What is it? A combination of cleansing, validation, correction and appending solutions applied to your customer base, whether residential or commercial. The full process involves the following steps:

    • Data Cleansing: Identification and removal of duplicate, outdated, or incorrect contact records
    • Validation: Real-time verification of email addresses and telephone numbers to ensure deliverability
    • Correction: Automatic fixing of common formatting errors and standardisation of contact data
    • Appending: Enhancement of incomplete records with missing email addresses and telephone numbers from our comprehensive database

    This multi-step approach ensures your contact database is not only clean and accurate, but also complete with the most up-to-date information available.

    Use cases - Deliver more messaging to the right customers - Ensure your communications reach their intended recipients by maintaining accurate contact details - Less wastage for your messaging and marketing - Reduce bounce rates and failed delivery attempts, maximising your marketing budget efficiency - Increase delivery success and engagement propensity - Clean, validated contact data leads to higher open rates, click-through rates, and overall campaign performance - Improve customer service delivery - Reach customers through their preferred contact methods with confidence in data accuracy - Enhance billing and collection processes - Accurate contact information supports successful payment reminders and collection activities - Maintain GDPR compliance - Keep your contact database current and accurate in line with data protection requirements

  6. Data from: Validation Process of a Work Engagement Scale for the Brazilian...

    • scielo.figshare.com
    jpeg
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paulo Cesar Porto Martins; Pedro Guilherme Basso Machado (2023). Validation Process of a Work Engagement Scale for the Brazilian Population [Dataset]. http://doi.org/10.6084/m9.figshare.21431263.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    SciELOhttp://www.scielo.org/
    Authors
    Paulo Cesar Porto Martins; Pedro Guilherme Basso Machado
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract The objective of this study is divided into two parts: the first part aimed at presenting the process of translation and adaptation of Porto-Martins and Benevides-Pereira's (2008) version of the Utrecht Work Engagement Scale (UWES); the second part focused on the validation of the instrument considering samples from five distinct and independent studies, with N=1,934. The adaptation process was composed of 13 steps, which included, among other steps, translation and back translation. Descriptive, reliability and confirmatory factor analyses were performed. The results denoted statistical quality and agreement with the data in the official UWES manual. Therefore, the validity of the corresponding version of the UWES is concluded, and future studies are encouraged in order to solidify this relevant construct in the national context.

  7. m

    Liquid Chromatography-Mass Spectrometry Method Validation for...

    • data.mendeley.com
    Updated Oct 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yawen Xu (2025). Liquid Chromatography-Mass Spectrometry Method Validation for β-Hydroxythiofentanyl [Dataset]. http://doi.org/10.17632/92jxfc48gg.1
    Explore at:
    Dataset updated
    Oct 1, 2025
    Authors
    Yawen Xu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset corresponds to the method validation process of a Liquid Chromatography-Mass Spectrometry (LC-MS) analytical approach developed for the detection and quantification of β-Hydroxythiofentanyl—a synthetic opioid with high pharmacological activity and potential forensic/toxicological relevance. The dataset encompasses key validation parameters, including linearity (calibration curve data across relevant concentration ranges), accuracy (recovery rates from matrix-matched samples such as plasma, urine, or tissue homogenates), precision (intra-day and inter-day repeatability with relative standard deviations), selectivity (peak purity data and interference testing against structurally similar opioids), limit of detection (LOD) and limit of quantification (LOQ), matrix effect (ion suppression/enhancement assessment), and stability (data on short-term, long-term, freeze-thaw, and post-preparation stability of β-Hydroxythiofentanyl in matrices). Additionally, the dataset includes critical experimental details: LC conditions (column type, mobile phase composition, flow rate, injection volume), MS parameters (ionization mode, collision energy), sample preparation protocols (extraction method, cleanup steps), and replicate measurement data for quality control (QC) samples. This dataset serves as a foundational resource for researchers in forensic toxicology, clinical chemistry, or analytical chemistry seeking to adopt, verify, or optimize LC-MS methods for β-Hydroxythiofentanyl analysis, and supports reproducibility in studies involving this synthetic opioid.

  8. d

    Data from: Rangeland Condition Monitoring Assessment and Projection (RCMAP)...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Nov 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Rangeland Condition Monitoring Assessment and Projection (RCMAP) Independent Validation Data [Dataset]. https://catalog.data.gov/dataset/rangeland-condition-monitoring-assessment-and-projection-rcmap-independent-validation-data
    Explore at:
    Dataset updated
    Nov 27, 2025
    Dataset provided by
    U.S. Geological Survey
    Description

    Rangeland ecosystems provide critical wildlife habitat (e.g., greater sage grouse, pronghorn, black-footed ferret), forage for livestock, carbon sequestration, provision of water resources, and recreational opportunities. At the same time, rangelands are vulnerable to climate change, fire, and anthropogenic disturbances. The arid-semiarid climate in most rangelands fluctuates widely, impacting livestock forage availability, wildlife habitat, and water resources. Many of these changes can be subtle or evolve over long time periods, responding to climate, anthropogenic, and disturbance driving forces. To understand vegetation change, scientists from the USGS and Bureau of Land Management (BLM) developed the Rangeland Condition Monitoring Assessment and Projection (RCMAP) project. RCMAP provides robust, long-term, and floristically detailed maps of vegetation cover at yearly time-steps, a critical reference to advancing science in the BLM and assessing Landscape Health standards. RCMAP quantifies the percent cover of ten rangeland components (annual herbaceous, bare ground, herbaceous, litter, non-sagebrush shrub, perennial herbaceous, sagebrush, shrub, and tree cover and shrub height) at yearly time-steps across the western U.S. using field training data, Landsat imagery, and machine learning. We utilize an ecologically comprehensive series of field-trained, high-resolution predictions of component cover and BLM Analysis Inventory and Monitoring (AIM) data to train machine learning models predicting component cover over the Landsat time-series. This dataset enables retrospective analysis of vegetation condition, impacts of weather variation and longer-term climatic change, and understanding of vegetation treatment and altered management practice effectiveness. RCMAP data can be used to answer critical questions regarding the influence of climate change and the suitability of management practices. Component products can be downloaded https://www.mrlc.gov/data. Independent validation was our primary validation approach, consisting of field measurements of component cover at stratified-random locations. Independent validation point placement used a stratified random design, with two levels of stratified restrictions to simplify logistics of field sampling (Rigge et al. 2020, Xian et al. 2015). The first level of stratification randomly selected 15, 8 km in diameter, sites across each mapping region. First level sites excluded areas less than 30 km away from training sites and other validation sites. The second level stratification randomly placed 6–10 points within each 8 km diameter validation site (total n = 2,014 points at n = 229 sites). Only sites on public land, between 100 and 1000 m from the nearest road, and in rangeland vegetation cover within each site were considered. The random points within a site were evenly allocated to three NDVI thresholds from a leaf-on Landsat image (low, medium, and high). Sites with relatively high spatial variance within a 90 m by 90 m patch (3 × 3 Landsat pixels) were excluded to minimize plot-pixel locational error. Using NDVI as a stratum ensured plot locations were distributed across the range of validation site productivity. At each validation point, we measured component cover using the line point intercept method along 2, 30 m transects. Data were collected from the first hit perspective.

  9. S

    Data from: Discrete element method model calibration and validation for the...

    • scidb.cn
    • resodate.org
    Updated Sep 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marco Lupo; Sina Zinatlou Ajabshir; Daniele Sofia; Diego Barletta; Massimo Poletto (2024). Discrete element method model calibration and validation for the spreading step of the powder bed fusion process to predict the quality of the layer surface [Dataset]. http://doi.org/10.57760/sciencedb.Partic.00010
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 30, 2024
    Dataset provided by
    Science Data Bank
    Authors
    Marco Lupo; Sina Zinatlou Ajabshir; Daniele Sofia; Diego Barletta; Massimo Poletto
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A Discrete Element Method model, including interparticle cohesive forces, was calibrated and validated to develop a tool to predict the powder layer’s quality in the powder bed fusion process. An elastic contact model was used to describe cohesive interparticle interactions. The surface energy of the model particles was estimated by assuming that the pull-off force should provide the strength of the material evaluated at low consolidation with shear test experiments. The particle rolling friction was calibrated considering the bulk density of the layer produced by the spreading tool. The model was validated with the experiments by comparing the wavelet power spectra obtained with the simulations with those of the experimental layers illuminated by grazing light. The calibration proposed in this study demonstrated superior performance compared to our previous methods, which relied on measuring the angle of repose and unconfined yield strength.

  10. AWC to 60cm DSM data of the Roper catchment NT generated by the Roper River...

    • data.csiro.au
    • researchdata.edu.au
    Updated Apr 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ian Watson; Mark Thomas; Seonaid Philip; Uta Stockmann; Ross Searle; Linda Gregory; jason hill; Elisabeth Bui; John Gallant; Peter R Wilson; Peter Wilson (2024). AWC to 60cm DSM data of the Roper catchment NT generated by the Roper River Water Resource Assessment [Dataset]. http://doi.org/10.25919/y0v9-7b58
    Explore at:
    Dataset updated
    Apr 16, 2024
    Dataset provided by
    CSIROhttp://www.csiro.au/
    Authors
    Ian Watson; Mark Thomas; Seonaid Philip; Uta Stockmann; Ross Searle; Linda Gregory; jason hill; Elisabeth Bui; John Gallant; Peter R Wilson; Peter Wilson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jul 1, 2020 - Jun 30, 2023
    Area covered
    Dataset funded by
    CSIROhttp://www.csiro.au/
    Northern Territory Department of Environment, Parks and Water Security
    Description

    AWC to 60cm is one of 18 attributes of soils chosen to underpin the land suitability assessment of the Roper River Water Resource Assessment (ROWRA) through the digital soil mapping process (DSM). AWC (available water capacity) indicates the ability of a soil to retain and supply water for plant growth. This AWC raster data represents a modelled dataset of AWC to 60cm (mm of water to 60cm of soil depth) and is derived from analysed site data, spline calculations and environmental covariates. AWC is a parameter used in land suitability assessments for rainfed cropping and for water use efficiency in irrigated land uses. This raster data provides improved soil information used to underpin and identify opportunities and promote detailed investigation for a range of sustainable regional development options and was created within the ‘Land Suitability’ activity of the CSIRO ROWRA. A companion dataset and statistics reflecting reliability of this data are also provided and can be found described in the lineage section of this metadata record. Processing information is supplied in ranger R scripts and attributes were modelled using a Random Forest approach. The DSM process is described in the CSIRO ROWRA published report ‘Soils and land suitability for the Roper catchment, Northern Territory’. A technical report from the CSIRO Roper River Water Resource Assessment to the Government of Australia. The Roper River Water Resource Assessment provides a comprehensive overview and integrated evaluation of the feasibility of aquaculture and agriculture development in the Roper catchment NT as well as the ecological, social and cultural (indigenous water values, rights and aspirations) impacts of development. Lineage: This AWC to 60cm dataset has been generated from a range of inputs and processing steps. Following is an overview. For more information refer to the CSIRO ROWRA published reports and in particular ' Soils and land suitability for the Roper catchment, Northern Territory’. A technical report from the CSIRO Roper River Water Resource Assessment to the Government of Australia. 1. Collated existing data (relating to: soils, climate, topography, natural resources, remotely sensed, of various formats: reports, spatial vector, spatial raster etc). 2. Selection of additional soil and land attribute site data locations by a conditioned Latin hypercube statistical sampling method applied across the covariate data space. 3. Fieldwork was carried out to collect new attribute data, soil samples for analysis and build an understanding of geomorphology and landscape processes. 4. Database analysis was performed to extract the data to specific selection criteria required for the attribute to be modelled. 5. The R statistical programming environment was used for the attribute computing. Models were built from selected input data and covariate data using predictive learning from a Random Forest approach implemented in the ranger R package. 6. Create AWC to 60cm Digital Soil Mapping (DSM) attribute raster dataset. DSM data is a geo-referenced dataset, generated from field observations and laboratory data, coupled with environmental covariate data through quantitative relationships. It applies pedometrics - the use of mathematical and statistical models that combine information from soil observations with information contained in correlated environmental variables, remote sensing images and some geophysical measurements. 7. Companion predicted reliability data was produced from the 500 individual Random Forest attribute models created. 8. QA Quality assessment of this DSM attribute data was conducted by three methods. Method 1: Statistical (quantitative) method of the model and input data. Testing the quality of the DSM models was carried out using data withheld from model computations and expressed as OOB and R squared results, giving an estimate of the reliability of the model predictions. These results are supplied. Method 2: Statistical (quantitative) assessment of the spatial attribute output data presented as a raster of the attributes “reliability”. This used the 500 individual trees of the attributes RF models to generate 500 datasets of the attribute to estimate model reliability for each attribute. For continuous attributes the method for estimating reliability is the Coefficient of Variation. This data is supplied. Method 3: Collecting independent external validation site data combined with on-ground expert (qualitative) examination of outputs during validation field trips. Across each of the study areas a two week validation field trip was conducted using a new validation site set which was produced by a random sampling design based on conditioned Latin Hypercube sampling using the reliability data of the attribute. The modelled DSM attribute value was assessed against the actual on-ground value. These results are published in the report cited in this metadata record.

  11. d

    Data from: Training dataset for NABat Machine Learning V1.0

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Nov 26, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Training dataset for NABat Machine Learning V1.0 [Dataset]. https://catalog.data.gov/dataset/training-dataset-for-nabat-machine-learning-v1-0
    Explore at:
    Dataset updated
    Nov 26, 2025
    Dataset provided by
    U.S. Geological Survey
    Description

    Bats play crucial ecological roles and provide valuable ecosystem services, yet many populations face serious threats from various ecological disturbances. The North American Bat Monitoring Program (NABat) aims to assess status and trends of bat populations while developing innovative and community-driven conservation solutions using its unique data and technology infrastructure. To support scalability and transparency in the NABat acoustic data pipeline, we developed a fully-automated machine-learning algorithm. This dataset includes audio files of bat echolocation calls that were considered to develop V1.0 of the NABat machine-learning algorithm, however the test set (i.e., holdout dataset) has been excluded from this release. These recordings were collected by various bat monitoring partners across North America using ultrasonic acoustic recorders for stationary acoustic and mobile acoustic surveys. For more information on how these surveys may be conducted, see Chapters 4 and 5 of “A Plan for the North American Bat Monitoring Program” (https://doi.org/10.2737/SRS-GTR-208). These data were then post-processed by bat monitoring partners to remove noise files (or those that do not contain recognizable bat calls) and apply a species label to each file. There is undoubtedly variation in the steps that monitoring partners take to apply a species label, but the steps documented in “A Guide to Processing Bat Acoustic Data for the North American Bat Monitoring Program” (https://doi.org/10.3133/ofr20181068) include first processing with an automated classifier and then manually reviewing to confirm or downgrade the suggested species label. Once a manual ID label was applied, audio files of bat acoustic recordings were submitted to the NABat database in Waveform Audio File format. From these available files in the NABat database, we considered files from 35 classes (34 species and a noise class). Files for 4 species were excluded due to low sample size (Corynorhinus rafinesquii, N=3; Eumops floridanus, N =3; Lasiurus xanthinus, N = 4; Nyctinomops femorosaccus, N =11). From this pool, files were randomly selected until files for each species/grid cell combination were exhausted or the number of recordings reach 1250. The dataset was then randomly split into training, validation, and test sets (i.e., holdout dataset). This data release includes all files considered for training and validation, including files that had been excluded from model development and testing due to low sample size for a given species or because the threshold for species/grid cell combinations had been met. The test set (i.e., holdout dataset) is not included. Audio files are grouped by species, as indicated by the four-letter species code in the name of each folder. Definitions for each four-letter code, including Family, Genus, Species, and Common name, are also included as a dataset in this release.

  12. Z

    Translation, cultural adaptation, and validation of Developmental...

    • data.niaid.nih.gov
    Updated Jul 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ferreira, Ana; Santana, Maria Raquel; Martins, Ana Paula (2024). Translation, cultural adaptation, and validation of Developmental Coordination Disorder Questionnaire to European Portuguese [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10982466
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    Polytechnic Institute of Beja
    Authors
    Ferreira, Ana; Santana, Maria Raquel; Martins, Ana Paula
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This project corresponds to the data generated throughout the translation, cultural adaptation, and validation process of 'The Developmental Coordination Disorder Questionnaire' to European Portuguese from English.

    This questionnaire was developed to collect parents' perceptions of motor coordination performance in children aged between 5 and 15 years old. It is intended to be completed by parents or caregivers and can be utilized in educational, clinical, therapeutic, or research contexts.

    The data will be presented in three different parts, each corresponding to a specific phase: translation process, pilot study, and validation.

    The methodological steps followed are those recommended by Brenda Wilson, the author of the original questionnaire, who has provided written authorization for this research project.

  13. Nashville Housing Data Cleaning Project

    • kaggle.com
    zip
    Updated Aug 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Elhelbawy (2024). Nashville Housing Data Cleaning Project [Dataset]. https://www.kaggle.com/datasets/elhelbawylogin/nashville-housing-data-cleaning-project/discussion
    Explore at:
    zip(1282 bytes)Available download formats
    Dataset updated
    Aug 20, 2024
    Authors
    Ahmed Elhelbawy
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Area covered
    Nashville
    Description

    Project Overview : This project demonstrates a thorough data cleaning process for the Nashville Housing dataset using SQL. The script performs various data cleaning and transformation operations to improve the quality and usability of the data for further analysis.

    Technologies Used : SQL Server T-SQL

    Dataset: The project uses the Nashville Housing dataset, which contains information about property sales in Nashville, Tennessee. The original dataset includes various fields such as property addresses, sale dates, sale prices, and other relevant real estate information. Data Cleaning Operations The script performs the following data cleaning operations:

    Date Standardization: Converts the SaleDate column to a standard Date format for consistency and easier manipulation. Populating Missing Property Addresses: Fills in NULL values in the PropertyAddress field using data from other records with the same ParcelID. Breaking Down Address Components: Separates the PropertyAddress and OwnerAddress fields into individual columns for Address, City, and State, improving data granularity and queryability. Standardizing Values: Converts 'Y' and 'N' values to 'Yes' and 'No' in the SoldAsVacant field for clarity and consistency. Removing Duplicates: Identifies and removes duplicate records based on specific criteria to ensure data integrity. Dropping Unused Columns: Removes unnecessary columns to streamline the dataset.

    Key SQL Techniques Demonstrated :

    Data type conversion Self joins for data population String manipulation (SUBSTRING, CHARINDEX, PARSENAME) CASE statements Window functions (ROW_NUMBER) Common Table Expressions (CTEs) Data deletion Table alterations (adding and dropping columns)

    Important Notes :

    The script includes cautionary comments about data deletion and column dropping, emphasizing the importance of careful consideration in a production environment. This project showcases various SQL data cleaning techniques and can serve as a template for similar data cleaning tasks.

    Potential Improvements :

    Implement error handling and transaction management for more robust execution. Add data validation steps to ensure the cleaned data meets specific criteria. Consider creating indexes on frequently queried columns for performance optimization.

  14. c

    ckanext-transmute - Extensions - CKAN Ecosystem Catalog Beta

    • catalog.civicdataecosystem.org
    Updated Jun 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). ckanext-transmute - Extensions - CKAN Ecosystem Catalog Beta [Dataset]. https://catalog.civicdataecosystem.org/dataset/ckanext-transmute
    Explore at:
    Dataset updated
    Jun 4, 2025
    Description

    The Transmute extension for CKAN provides a data pipeline for validating and converting data using schemas. It allows users to define schemas that specify validation rules and data transformations, thus ensuring data quality and consistency. The extension enables transformations using an action API with the ability to transform data using defined schemas. Key Features: * Schema-Driven Validation: Uses schemas to define data types, required fields, and validation rules providing the opportunity to validate data against these rules. * Data Transformation: Supports data transformation based on schemas. This includes modifying fields, adding new fields, and removing unnecessary data to fit the desired output format. * Inline Schema Definition: Allows defining schemas directly within the CKAN API calls. This provides a convenient way to apply transformations on-the-fly. * Custom Validators: Supports creation of custom validators, enabling tailored data validation logic. The readme specifically identifies "tsm_concat" as an example of a custom validator. * Field Weighting: Enables control over the order in which fields are processed during validation and transformation, by specifying weight values. * Post-Processing: Provides the option to define steps to execute after processing fields, such as removing fields that are no longer needed after transformation. Technical Integration: The Transmute extension integrates with CKAN by adding a new action API called tsm_transmute. This API allows users to submit data and a schema, and the extension applies the schema to validate and transform the data. The extension is enabled by adding transmute to the list of enabled plugins in the CKAN configuration file. Benefits & Impact: Implementing the Transmute extension enhances CKAN's data quality control and transformation capabilities. It provides a flexible and configurable way to ensure data consistency and conformity to defined standards, thus improving the overall reliability and usability of datasets managed within CKAN. Furthermore, it automates the data transformation process using defined schemas, which can reduce the manual workload of data administrators.

  15. Data from: Validation of psychosocial scales for physical activity in...

    • scielo.figshare.com
    xls
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rafael Miranda Tassitano; José Cazuza de Farias Júnior; Cassiano Ricardo Rech; Maria Cecília Marinho Tenório; Poliana Coelho Cabral; Giselia Alves Pontes da Silva (2023). Validation of psychosocial scales for physical activity in university students [Dataset]. http://doi.org/10.6084/m9.figshare.5670355.v1
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    SciELOhttp://www.scielo.org/
    Authors
    Rafael Miranda Tassitano; José Cazuza de Farias Júnior; Cassiano Ricardo Rech; Maria Cecília Marinho Tenório; Poliana Coelho Cabral; Giselia Alves Pontes da Silva
    License

    Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
    License information was derived automatically

    Description

    OBJECTIVE Translate the Patient-centered Assessment and Counseling for Exercise questionnaire, adapt it cross-culturally and identify the psychometric properties of the psychosocial scales for physical activity in young university students.METHODS The Patient-centered Assessment and Counseling for Exercise questionnaire is made up of 39 items divided into constructs based on the social cognitive theory and the transtheoretical model. The analyzed constructs were, as follows: behavior change strategy (15 items), decision-making process (10), self-efficacy (6), support from family (4), and support from friends (4). The validation procedures were conceptual, semantic, operational, and functional equivalences, in addition to the equivalence of the items and of measurements. The conceptual, of items and semantic equivalences were performed by a specialized committee. During measurement equivalence, the instrument was applied to 717 university students. Exploratory factor analysis was used to verify the loading of each item, explained variance and internal consistency of the constructs. Reproducibility was measured by means of intraclass correlation coefficient.RESULTS The two translations were equivalent and back-translation was similar to the original version, with few adaptations. The layout, presentation order of the constructs and items from the original version were kept in the same form as the original instrument. The sample size was adequate and was evaluated by the Kaiser-Meyer-Olkin test, with values between 0.72 and 0.91. The correlation matrix of the items presented r < 0.8 (p < 0.05). The factor loadings of the items from all the constructs were satisfactory (> 0.40), varying between 0.43 and 0.80, which explained between 45.4% and 59.0% of the variance. Internal consistency was satisfactory (α ≥ 0.70), with support from friends being 0.70 and 0.92 for self-efficacy. Most items (74.3%) presented values above 0.70 for the reproducibility test.CONCLUSIONS The validation process steps were considered satisfactory and adequate for applying to the population.

  16. Bioprocess Validation Market Analysis North America, Europe, Asia, Rest of...

    • technavio.com
    pdf
    Updated Jul 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2024). Bioprocess Validation Market Analysis North America, Europe, Asia, Rest of World (ROW) - US, Germany, China, Canada, Japan - Size and Forecast 2024-2028 [Dataset]. https://www.technavio.com/report/bioprocess-validation-market-analysis
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jul 3, 2024
    Dataset provided by
    TechNavio
    Authors
    Technavio
    License

    https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice

    Time period covered
    2024 - 2028
    Area covered
    Germany, Canada, China, United States, Japan
    Description

    Snapshot img

    Bioprocess Validation Market Size 2024-2028

    The bioprocess validation market size is forecast to increase by USD 364 billion at a CAGR of 12.88% between 2023 and 2028.

    The market is witnessing significant growth due to the increasing demand for biopharmaceuticals and the adoption of single-use technologies. Biopharmaceuticals are gaining popularity In the healthcare industry due to their ability to treat complex diseases, leading to a surge in demand for their production. Single-use technologies, which offer advantages such as reduced costs, improved product quality, and increased efficiency, are increasingly being adopted for bioprocess validation. However, the high costs associated with bioprocess validation remain a challenge for market growth. Bioprocess tecnology is a critical step in ensuring the safety and efficacy of biopharmaceuticals, making it essential for market players to invest in advanced technologies and techniques to streamline the validation process and reduce costs.The market is expected to continue its growth trajectory In the coming years, driven by these trends and the increasing focus on developing innovative biopharmaceutical products.
    

    What will be the Size of the Bioprocess Validation Market During the Forecast Period?

    Request Free Sample

    The market encompasses the technologies and services employed to ensure the production of high-quality biopharmaceuticals, including impurities testing for vaccines, drug products, monoclonal antibodies, recombinant proteins, and biosimilars. With the ongoing development of precision medicines and vaccines for chronic diseases, such as the SARS-CoV-2 virus, the market's significance continues to grow. The market consists of various segments, including in-house and outsourcing services, with leading biopharmaceutical companies increasingly relying on outsourcing to manage bioproduction activities. The biopharmaceutical manufacturing sector's expansion is driven by socioeconomic factors, increasing demand for biologic generic drugs, and the need for compatibility, microbiological, physiochemical, and integrity testing services.
    Key components of bioprocess validation include filter elements, mixing systems, and other critical equipment used throughout the bioproduction process. The market's trends include the increasing use of advanced technologies for bioprocess validation, such as automation, artificial intelligence, and machine learning, to improve efficiency and accuracy.
    

    How is this Bioprocess Validation Industry segmented and which is the largest segment?

    The bioprocess validation industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.

    End-user
    
      Pharmaceutical companies
      Contract development and manufacturing organizations
      Others
    
    
    Type
    
      In-house
      Outsourced
    
    
    Geography
    
      North America
    
        Canada
        US
    
    
      Europe
    
        Germany
    
    
      Asia
    
        China
        Japan
    
    
      Rest of World (ROW)
    

    By End-user Insights

    The pharmaceutical companies segment is estimated to witness significant growth during the forecast period.
    

    The market encompasses the validation of biopharmaceutical manufacturing processes for various pharmaceutical companies, including large enterprises and SMEs. Large companies, such as Pfizer, J and J, and Novartis, contribute significantly to the market due to their extensive resources, expertise, and adherence to industry best practices. They invest heavily in research and development (RD) expenditure for the production of complex biologics, including vaccines for SARS-CoV-2, monoclonal antibodies, recombinant proteins, and biosimilars. The market includes several segments, including impurities testing, vaccines, drug products, and biosimilars. Validation procedures involve analytical testing methods, cleaning procedures, and compliance with regulatory standards for drug safety. The market also includes services for precision medicines, cell therapy, and gene therapy.

    Contract service providers offer digital tools, continuous process monitoring, real-time release testing, advanced analytics, and modelling techniques. The biopharmaceutical manufacturing sector is driven by the increasing demand for biologic drugs and bioproduction volumes. Automation technologies, including robotics and single-use systems, are also transforming the industry. Socioeconomic factors, such as chronic diseases and aging populations, further fuel market growth. The market includes services for extractable testing, microbiological testing, physiochemical testing, and compatibility testing, as well as bioprocess instruments, such as bioreactors, chromatography systems, and filtration elements.

    Explore Bioprocess Validation Industry Segments Request Free Sample

    The Pharmaceutic

  17. i

    Household Expenditure and Income Survey 2010, Economic Research Forum (ERF)...

    • catalog.ihsn.org
    Updated Mar 29, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Hashemite Kingdom of Jordan Department of Statistics (DOS) (2019). Household Expenditure and Income Survey 2010, Economic Research Forum (ERF) Harmonization Data - Jordan [Dataset]. https://catalog.ihsn.org/index.php/catalog/7662
    Explore at:
    Dataset updated
    Mar 29, 2019
    Dataset authored and provided by
    The Hashemite Kingdom of Jordan Department of Statistics (DOS)
    Time period covered
    2010 - 2011
    Area covered
    Jordan
    Description

    Abstract

    The main objective of the HEIS survey is to obtain detailed data on household expenditure and income, linked to various demographic and socio-economic variables, to enable computation of poverty indices and determine the characteristics of the poor and prepare poverty maps. Therefore, to achieve these goals, the sample had to be representative on the sub-district level. The raw survey data provided by the Statistical Office was cleaned and harmonized by the Economic Research Forum, in the context of a major research project to develop and expand knowledge on equity and inequality in the Arab region. The main focus of the project is to measure the magnitude and direction of change in inequality and to understand the complex contributing social, political and economic forces influencing its levels. However, the measurement and analysis of the magnitude and direction of change in this inequality cannot be consistently carried out without harmonized and comparable micro-level data on income and expenditures. Therefore, one important component of this research project is securing and harmonizing household surveys from as many countries in the region as possible, adhering to international statistics on household living standards distribution. Once the dataset has been compiled, the Economic Research Forum makes it available, subject to confidentiality agreements, to all researchers and institutions concerned with data collection and issues of inequality.

    Data collected through the survey helped in achieving the following objectives: 1. Provide data weights that reflect the relative importance of consumer expenditure items used in the preparation of the consumer price index 2. Study the consumer expenditure pattern prevailing in the society and the impact of demographic and socio-economic variables on those patterns 3. Calculate the average annual income of the household and the individual, and assess the relationship between income and different economic and social factors, such as profession and educational level of the head of the household and other indicators 4. Study the distribution of individuals and households by income and expenditure categories and analyze the factors associated with it 5. Provide the necessary data for the national accounts related to overall consumption and income of the household sector 6. Provide the necessary income data to serve in calculating poverty indices and identifying the poor characteristics as well as drawing poverty maps 7. Provide the data necessary for the formulation, follow-up and evaluation of economic and social development programs, including those addressed to eradicate poverty

    Geographic coverage

    National

    Analysis unit

    • Households
    • Individuals

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The Household Expenditure and Income survey sample for 2010, was designed to serve the basic objectives of the survey through providing a relatively large sample in each sub-district to enable drawing a poverty map in Jordan. The General Census of Population and Housing in 2004 provided a detailed framework for housing and households for different administrative levels in the country. Jordan is administratively divided into 12 governorates, each governorate is composed of a number of districts, each district (Liwa) includes one or more sub-district (Qada). In each sub-district, there are a number of communities (cities and villages). Each community was divided into a number of blocks. Where in each block, the number of houses ranged between 60 and 100 houses. Nomads, persons living in collective dwellings such as hotels, hospitals and prison were excluded from the survey framework.

    A two stage stratified cluster sampling technique was used. In the first stage, a cluster sample proportional to the size was uniformly selected, where the number of households in each cluster was considered the weight of the cluster. At the second stage, a sample of 8 households was selected from each cluster, in addition to another 4 households selected as a backup for the basic sample, using a systematic sampling technique. Those 4 households were sampled to be used during the first visit to the block in case the visit to the original household selected is not possible for any reason. For the purposes of this survey, each sub-district was considered a separate stratum to ensure the possibility of producing results on the sub-district level. In this respect, the survey framework adopted that provided by the General Census of Population and Housing Census in dividing the sample strata. To estimate the sample size, the coefficient of variation and the design effect of the expenditure variable provided in the Household Expenditure and Income Survey for the year 2008 was calculated for each sub-district. These results were used to estimate the sample size on the sub-district level so that the coefficient of variation for the expenditure variable in each sub-district is less than 10%, at a minimum, of the number of clusters in the same sub-district (6 clusters). This is to ensure adequate presentation of clusters in different administrative areas to enable drawing an indicative poverty map.

    It should be noted that in addition to the standard non response rate assumed, higher rates were expected in areas where poor households are concentrated in major cities. Therefore, those were taken into consideration during the sampling design phase, and a higher number of households were selected from those areas, aiming at well covering all regions where poverty spreads.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    • General form
    • Expenditure on food commodities form
    • Expenditure on non-food commodities form

    Cleaning operations

    Raw Data: - Organizing forms/questionnaires: A compatible archive system was used to classify the forms according to different rounds throughout the year. A registry was prepared to indicate different stages of the process of data checking, coding and entry till forms were back to the archive system. - Data office checking: This phase was achieved concurrently with the data collection phase in the field where questionnaires completed in the field were immediately sent to data office checking phase. - Data coding: A team was trained to work on the data coding phase, which in this survey is only limited to education specialization, profession and economic activity. In this respect, international classifications were used, while for the rest of the questions, coding was predefined during the design phase. - Data entry/validation: A team consisting of system analysts, programmers and data entry personnel were working on the data at this stage. System analysts and programmers started by identifying the survey framework and questionnaire fields to help build computerized data entry forms. A set of validation rules were added to the entry form to ensure accuracy of data entered. A team was then trained to complete the data entry process. Forms prepared for data entry were provided by the archive department to ensure forms are correctly extracted and put back in the archive system. A data validation process was run on the data to ensure the data entered is free of errors. - Results tabulation and dissemination: After the completion of all data processing operations, ORACLE was used to tabulate the survey final results. Those results were further checked using similar outputs from SPSS to ensure that tabulations produced were correct. A check was also run on each table to guarantee consistency of figures presented, together with required editing for tables' titles and report formatting.

    Harmonized Data: - The Statistical Package for Social Science (SPSS) was used to clean and harmonize the datasets. - The harmonization process started with cleaning all raw data files received from the Statistical Office. - Cleaned data files were then merged to produce one data file on the individual level containing all variables subject to harmonization. - A country-specific program was generated for each dataset to generate/compute/recode/rename/format/label harmonized variables. - A post-harmonization cleaning process was run on the data. - Harmonized data was saved on the household as well as the individual level, in SPSS and converted to STATA format.

  18. d

    ListBuilders US (46 States) Library Staff Contact Data Manually Sourced ....

    • datarade.ai
    Updated Jul 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    List Builders (2025). ListBuilders US (46 States) Library Staff Contact Data Manually Sourced . Mobile & Email w/16k records [Dataset]. https://datarade.ai/data-products/listbuilders-us-46-states-library-staff-contact-data-manual-list-builders
    Explore at:
    .json, .xml, .csv, .xlsAvailable download formats
    Dataset updated
    Jul 16, 2025
    Dataset authored and provided by
    List Builders
    Area covered
    United States
    Description

    Experience the transformative power of targeted connectivity with ListBuilders' meticulously curated US Library Contact Data. Our dedicated team has manually sourced data from across 46 states, creating an exceptional dataset that provides unrivaled depth and breadth in Library Staff Data within the US.

    What sets our data apart is the painstakingly meticulous manual sourcing methods. This process guarantees the accuracy, timeliness, and reliability of every data point. We have compiled an impressive 16,000 records, each one featuring Validated Contact Details. This means that every entry is not only accurate but also actionable. This attention to detail and dedication to quality is what makes our data a valuable asset to businesses and researchers.

    Our dataset offers data from a diverse range of library types, including Library Systems Data and Regional Library Systems Data. We have also distinguished between standalone libraries and branches, offering you more granular insights into the library sector landscape. This comprehensive coverage opens up a world of opportunities for creating precision-targeted strategies.

    The primary use-cases for our dataset extend across multiple verticals, making it a versatile tool for a variety of industries. Marketing teams can leverage the Library Staff Data for precision-targeted campaigns. In the hands of HR professionals, the data becomes a resource for targeted recruitment drives. Market researchers will find our data invaluable for gaining in-depth understanding and conducting thorough analyses of the US library sector.

    As a part of our broader data offering, this US Library Contact Data product exemplifies our commitment to quality and detail. We believe in the transformative power of reliable, accurate data in driving strategic decisions. Our dataset, enriched with Validated Contact Details and comprehensive coverage, provides the competitive edge your business needs, irrespective of your industry.

    At ListBuilders, we don't just provide data; we provide reliable, actionable insights. Our commitment to quality is reflected in our validation process. All emails provided in our data have undergone a rigorous Email Verification process, ensuring they are valid, non-role-based, and not associated with free services such as Gmail. This process guarantees that your communications reach the right individuals, enhancing the effectiveness of your outreach efforts.

    As we continue to evolve and expand our data offerings, our dedication to quality, accuracy, and actionable insights remains at the core of our mission. We understand that in the age of information, the right data can make all the difference. That's why we strive to provide the best Library Systems Data to meet all your business needs.

    Experience the power of precision-targeted outreach with our US Library Contact Data. Uncover new opportunities, enhance your strategies, and achieve your business objectives with data you can trust. Discover the ListBuilders difference today and step into a world of data-driven possibilities.

  19. Soil rockiness DSM data of the Roper catchment NT generated by the Roper...

    • data.csiro.au
    • researchdata.edu.au
    Updated Apr 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ian Watson; Mark Thomas; Seonaid Philip; Uta Stockmann; Ross Searle; Linda Gregory; jason hill; Elisabeth Bui; John Gallant; Peter R Wilson; Peter Wilson (2024). Soil rockiness DSM data of the Roper catchment NT generated by the Roper River Water Resource Assessment [Dataset]. http://doi.org/10.25919/42fs-6a23
    Explore at:
    Dataset updated
    Apr 16, 2024
    Dataset provided by
    CSIROhttp://www.csiro.au/
    Authors
    Ian Watson; Mark Thomas; Seonaid Philip; Uta Stockmann; Ross Searle; Linda Gregory; jason hill; Elisabeth Bui; John Gallant; Peter R Wilson; Peter Wilson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jul 1, 2020 - Jun 30, 2023
    Area covered
    Dataset funded by
    CSIROhttp://www.csiro.au/
    Northern Territory Department of Environment, Parks and Water Security
    Description

    Soil rockiness is one of 18 attributes of soils chosen to underpin the land suitability assessment of the Roper River Water Resource Assessment (ROWRA) through the digital soil mapping process (DSM). Soil rockiness represents areas that are excluded from agricultural production due to the abundance and size of rock outcrop, surface coarse fragments, profile coarse fragments and hard segregations. This raster data represents a modelled dataset of a set of rules applied to the above features for the top 0.10m of soil and is derived from field measured site data and environmental covariates. Data values are: 0 Not rocky, 1 Rocky. Descriptions of the rules defining rockiness are supplied with this data. Rockiness is a parameter used in land suitability assessments as restrictions relate to the intensity of rock picking required in land preparation, root crop harvesting, reduces crop growth and use of agricultural machinery particularly in the plough zone. This raster data provides improved soil information used to underpin and identify opportunities and promote detailed investigation for a range of sustainable regional development options and was created within the ‘Land Suitability’ activity of the CSIRO ROWRA. A companion dataset and statistics reflecting reliability of this data are also provided and can be found described in the lineage section of this metadata record. Processing information is supplied in ranger R scripts and attributes were modelled using a Random Forest approach. The DSM process is described in the CSIRO ROWRA published report ‘Soils and land suitability for the Roper catchment, Northern Territory’. A technical report from the CSIRO Roper River Water Resource Assessment to the Government of Australia. The Roper River Water Resource Assessment provides a comprehensive overview and integrated evaluation of the feasibility of aquaculture and agriculture development in the Roper catchment NT as well as the ecological, social and cultural (indigenous water values, rights and aspirations) impacts of development. Lineage: This soil rockiness dataset has been generated from a range of inputs and processing steps. Following is an overview. For more information refer to the CSIRO ROWRA published reports and in particular ' Soils and land suitability for the Roper catchment, Northern Territory’. A technical report from the CSIRO Roper River Water Resource Assessment to the Government of Australia. 1. Collated existing data (relating to: soils, climate, topography, natural resources, remotely sensed, of various formats: reports, spatial vector, spatial raster etc). 2. Selection of additional soil and land attribute site data locations by a conditioned Latin hypercube statistical sampling method applied across the covariate data space. 3. Fieldwork was carried out to collect new attribute data, soil samples for analysis and build an understanding of geomorphology and landscape processes. 4. Database analysis was performed to extract the data to specific selection criteria required for the attribute to be modelled. 5. The R statistical programming environment was used for the attribute computing. Models were built from selected input data and covariate data using predictive learning from a Random Forest approach implemented in the ranger R package. 6. Create soil rockiness Digital Soil Mapping (DSM) attribute raster dataset. DSM data is a geo-referenced dataset, generated from field observations and laboratory data, coupled with environmental covariate data through quantitative relationships. It applies pedometrics - the use of mathematical and statistical models that combine information from soil observations with information contained in correlated environmental variables, remote sensing images and some geophysical measurements. 7. Companion predicted reliability data was produced from the 500 individual Random Forest attribute models created. 8. QA Quality assessment of this DSM attribute data was conducted by three methods. Method 1: Statistical (quantitative) method of the model and input data. Testing the quality of the DSM models was carried out using data withheld from model computations and expressed as OOB and confusion matrix results, giving an estimate of the reliability of the model predictions. These results are supplied. Method 2: Statistical (quantitative) assessment of the spatial attribute output data presented as a raster of the attributes “reliability”. This used the 500 individual trees of the attributes RF models to generate 500 datasets of the attribute to estimate model reliability for each attribute. For categorical attributes the method for estimating reliability is the Confusion Index. This data is supplied. Method 3: Collecting independent external validation site data combined with on-ground expert (qualitative) examination of outputs during validation field trips. Across each of the study areas a two week validation field trip was conducted using a new validation site set which was produced by a random sampling design based on conditioned Latin Hypercube sampling using the reliability data of the attribute. The modelled DSM attribute value was assessed against the actual on-ground value. These results are published in the report cited in this metadata record.

  20. G

    Data Versioning for ADAS Datasets Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Oct 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Data Versioning for ADAS Datasets Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/data-versioning-for-adas-datasets-market
    Explore at:
    pptx, csv, pdfAvailable download formats
    Dataset updated
    Oct 3, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Data Versioning for ADAS Datasets Market Outlook



    According to our latest research, the global Data Versioning for ADAS Datasets market size reached USD 1.14 billion in 2024, reflecting the rapidly growing demand for robust data management solutions within automotive development ecosystems. The market is expected to expand at a CAGR of 18.5% from 2025 to 2033, with the projected market size reaching USD 6.17 billion by 2033. This impressive growth is primarily fueled by the increasing sophistication of Advanced Driver Assistance Systems (ADAS) and the surging adoption of autonomous vehicle technologies, which require highly accurate, traceable, and up-to-date datasets to ensure safety, compliance, and innovation.




    One of the primary growth factors propelling the Data Versioning for ADAS Datasets market is the escalating complexity of ADAS and autonomous driving algorithms. As vehicles become more intelligent and capable of making critical decisions in real time, the need for high-quality, version-controlled datasets becomes paramount. The data generated from a multitude of sensors—such as cameras, LiDAR, radar, and ultrasonic devices—must be meticulously managed, annotated, and tracked across various developmental stages. Data versioning platforms enable automotive engineers to efficiently handle dataset iterations, ensuring that modifications, updates, and enhancements are systematically documented. This not only accelerates the pace of innovation but also supports traceability and regulatory compliance, which are vital in the automotive industry where safety standards are uncompromising.




    Another significant driver is the increasing regulatory scrutiny and the necessity for data transparency in the automotive sector. Regulatory bodies worldwide are mandating stringent safety standards for ADAS and autonomous vehicles, necessitating rigorous testing and validation processes. Data versioning solutions facilitate the ability to reproduce test scenarios, validate algorithm performance, and provide auditable records for compliance purposes. The traceability offered by these systems is invaluable for automotive OEMs and suppliers, as it allows for the identification of data lineage and the management of data provenance, which are critical when investigating anomalies or addressing recalls. As regulatory frameworks continue to evolve, the reliance on sophisticated data versioning tools is expected to intensify, further boosting market growth.




    Technological advancements in cloud computing and artificial intelligence are also playing a pivotal role in shaping the Data Versioning for ADAS Datasets market. The integration of AI-driven data management tools with scalable cloud infrastructure enables organizations to handle vast volumes of multimodal data efficiently. Cloud-based solutions offer flexibility, scalability, and remote accessibility, making it easier for global teams to collaborate on dataset curation, annotation, and version control. Furthermore, the adoption of machine learning techniques for automated data labeling and quality assurance is streamlining the data preparation process, reducing manual labor, and minimizing errors. These technological trends are creating new avenues for market expansion, attracting investments from both established players and innovative startups.




    Regionally, North America and Europe are leading the adoption of data versioning solutions for ADAS datasets, driven by the presence of major automotive OEMs, advanced research institutes, and supportive regulatory environments. Asia Pacific is emerging as a lucrative market, fueled by the rapid growth of the automotive sector, increasing investments in smart mobility, and the proliferation of connected vehicles. The Middle East & Africa and Latin America are also witnessing gradual adoption, supported by government initiatives and the entry of global automotive players. The global landscape is characterized by a dynamic interplay of technological innovation, regulatory compliance, and competitive strategies, positioning the Data Versioning for ADAS Datasets market for robust growth over the forecast period.




    </div&g

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
The Devastator (2023). File Validation and Training Statistics [Dataset]. https://www.kaggle.com/datasets/thedevastator/file-validation-and-training-statistics
Organization logo

File Validation and Training Statistics

Validation, Training, and Testing Statistics for tasksource/leandojo Files

Explore at:
zip(16413235 bytes)Available download formats
Dataset updated
Dec 1, 2023
Authors
The Devastator
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

File Validation and Training Statistics

Validation, Training, and Testing Statistics for tasksource/leandojo Files

By tasksource (From Huggingface) [source]

About this dataset

The tasksource/leandojo: File Validation, Training, and Testing Statistics dataset is a comprehensive collection of information regarding the validation, training, and testing processes of files in the tasksource/leandojo repository. This dataset is essential for gaining insights into the file management practices within this specific repository.

The dataset consists of three distinct files: validation.csv, train.csv, and test.csv. Each file serves a unique purpose in providing statistics and information about the different stages involved in managing files within the repository.

In validation.csv, you will find detailed information about the validation process undergone by each file. This includes data such as file paths within the repository (file_path), full names of each file (full_name), associated commit IDs (commit), traced tactics implemented (traced_tactics), URLs pointing to each file (url), and respective start and end dates for validation.

train.csv focuses on providing valuable statistics related to the training phase of files. Here, you can access data such as file paths within the repository (file_path), full names of individual files (full_name), associated commit IDs (commit), traced tactics utilized during training activities (traced_tactics), URLs linking to each specific file undergoing training procedures (url).

Lastly, test.csv encompasses pertinent statistics concerning testing activities performed on different files within the tasksource/leandojo repository. This data includes information such as file paths within the repo structure (file_path), full names assigned to each individual file tested (full_name) , associated commit IDs linked with these files' versions being tested(commit) , traced tactics incorporated during testing procedures regarded(traced_tactics) ,relevant URLs directing to specific tested files(url).

By exploring this comprehensive dataset consisting of three separate CSV files - validation.csv, train.csv, test.csv - researchers can gain crucial insights into how effective strategies pertaining to validating ,training or testing tasks have been implemented in order to maintain high-quality standards within the tasksource/leandojo repository

How to use the dataset

  • Familiarize Yourself with the Dataset Structure:

    • The dataset consists of three separate files: validation.csv, train.csv, and test.csv.
    • Each file contains multiple columns providing different information about file validation, training, and testing.
  • Explore the Columns:

    • 'file_path': This column represents the path of the file within the repository.
    • 'full_name': This column displays the full name of each file.
    • 'commit': The commit ID associated with each file is provided in this column.
    • 'traced_tactics': The tactics traced in each file are listed in this column.
    • 'url': This column provides the URL of each file.
  • Understand Each File's Purpose:

Validation.csv - This file contains information related to the validation process of files in the tasksource/leandojo repository.

Train.csv - Utilize this file if you need statistics and information regarding the training phase of files in tasksource/leandojo repository.

Test.csv - For insights into statistics and information about testing individual files within tasksource/leandojo repository, refer to this file.

  • Generate Insights & Analyze Data:
  • Once you have a clear understanding of each column's purpose, you can start generating insights from your analysis using various statistical techniques or machine learning algorithms.
  • Explore patterns or trends by examining specific columns such as 'traced_tactics' or analyzing multiple columns together.

  • Combine Multiple Files (if necessary):

  • If required, you can merge/correlate data across different csv files based on common fields such as 'file_path', 'full_name', or 'commit'.

  • Visualize the Data (Optional):

  • To enhance your analysis, consider creating visualizations such as plots, charts, or graphs. Visualization can offer a clear representation of patterns or relationships within the dataset.

  • Obtain Further Information:

  • If you need additional details about any specific file, make use of the provided 'url' column to access further information.

Remember that this guide provides a general overview of how to utilize this dataset effectively. Feel ...

Search
Clear search
Close search
Google apps
Main menu