86 datasets found
  1. Data from: Engineering Test Report Dataset

    • kaggle.com
    zip
    Updated Jul 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ziya (2025). Engineering Test Report Dataset [Dataset]. https://www.kaggle.com/datasets/ziya07/engineering-test-report-dataset
    Explore at:
    zip(53060 bytes)Available download formats
    Dataset updated
    Jul 24, 2025
    Authors
    Ziya
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset is designed to support research and development in automated test report generation and quality assessment within engineering domains. It contains 2,454 test report records, each simulating the output of system-level testing across components like sensor modules, brake systems, and control boards.

    Each entry includes technical attributes such as execution time, defect severity, test environment, and report length, as well as qualitative scores like clarity, conciseness, and tester confidence. The goal is to provide a comprehensive set of features that represent both objective system metrics and subjective report quality.

    A key label, Is_High_Impact_Report, indicates whether a report holds high value in terms of diagnostic importance, based on a combination of severity, clarity, and label quality.

    Test Report Generation Applied specifically to engineering systems — such as software engineering, embedded systems, hardware validation, or automated quality assurance in engineering workflows.

    🔍 Key Features Feature Name Description Test_Report_ID Unique ID for each report Component Engineering subsystem tested (e.g., Sensor Module, Engine Unit) Test_Case_ID Identifier of the executed test case Execution_Time(s) Time taken to complete the test Defect_Detected Indicates if a defect was found Defect_Severity Severity of detected defect: Low, Medium, High, Critical, or None Defect_Variability Recurrence score of the defect across tests (0.0–1.0) Log_Length Number of lines in the report log Report_Clarity_Score Clarity score of the report text (0.0–1.0) Report_Conciseness_Score Conciseness rating of the report (0.0–1.0) Tester_Confidence_Level Confidence level of the person executing the test (1–5) Test_Environment Environment where the test occurred: Simulation, Lab, or Field Auto_Label_Quality Expert quality rating for the report (1–10) Timestamp Date and time when the test was conducted Is_High_Impact_Report Target label indicating whether the report is considered impactful

    ✅ Use Cases Enhancing test documentation processes

    Analyzing defect characteristics and report relevance

    Supporting quality assurance workflows

    Building datasets for exploratory or statistical analysis in engineering testing.

  2. C

    Complete Schema Markup Implementation Strategy for Local Businesses

    • caseysseo.com
    jsonld
    Updated Jul 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Casey Miller (2025). Complete Schema Markup Implementation Strategy for Local Businesses [Dataset]. https://caseysseo.com/complete-schema-markup-implementation-strategy-for-local-businesses
    Explore at:
    jsonldAvailable download formats
    Dataset updated
    Jul 30, 2025
    Dataset provided by
    Casey's SEO
    Authors
    Casey Miller
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2025
    Variables measured
    Review Count, Event Attendance, Service Bookings, Content Engagement, Local Pack Ranking, Review Star Rating, Local Search Visibility, Voice Search Visibility
    Measurement technique
    Customer surveys, Industry benchmarking, Google Search Console data analysis, Controlled testing and experimentation
    Description

    Detailed dataset covering comprehensive schema markup implementation methodology, LocalBusiness schema setup, and advanced structured data strategies for local businesses.

  3. G

    Schema Markup for Hotel Websites Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Oct 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Schema Markup for Hotel Websites Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/schema-markup-for-hotel-websites-market
    Explore at:
    csv, pptx, pdfAvailable download formats
    Dataset updated
    Oct 4, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Schema Markup for Hotel Websites Market Outlook



    According to our latest research, the global Schema Markup for Hotel Websites market size reached USD 1.3 billion in 2024, with a robust compound annual growth rate (CAGR) of 13.2% expected through the forecast period. By 2033, the market is projected to attain a value of USD 3.8 billion, driven by the increasing emphasis on digital visibility, structured data adoption, and the need for enhanced user experiences in the hospitality industry. The primary growth factor remains the rapid digital transformation of the hotel sector, where schema markup plays a pivotal role in improving search engine rankings and boosting direct bookings.



    One of the most significant growth drivers for the Schema Markup for Hotel Websites market is the escalating competition among hotels to secure top positions in search engine results. As travelers increasingly rely on search engines to discover and book accommodations, hotels are compelled to implement advanced SEO strategies, where schema markup is a cornerstone. Schema markup enables search engines to better understand website content, resulting in rich snippets, enhanced visibility, and higher click-through rates. This trend is further accelerated by the growing use of mobile devices for travel planning, which demands more precise and accessible information presentation. In addition, Google’s ongoing updates to its search algorithms have made structured data not just a recommendation but a necessity for hotels aiming to maintain or improve their digital footprint.



    Another key factor fueling market growth is the proliferation of online reviews and user-generated content, which have become central to the decision-making process for travelers. Hotels are increasingly utilizing review schema and event schema to highlight guest experiences and promote special events directly in search results. This not only builds trust and credibility but also encourages more direct engagement with potential guests. The ability to display ratings, availability, pricing, and special offers in search listings provides hotels with a competitive edge, leading to higher conversion rates. Furthermore, the integration of schema markup with booking engines and property management systems is streamlining operations and enhancing the guest journey from discovery to post-stay feedback.



    The surge in cloud-based deployment models is also propelling the market forward. Cloud-based schema management solutions offer scalability, ease of updates, and integration with various digital marketing platforms. This is particularly advantageous for hotel chains and large properties that require consistent schema implementation across multiple locations. The rise of AI-driven content management systems, capable of automating schema generation and updates, is making it easier for hotels of all sizes to adopt structured data practices. As a result, the barrier to entry for small and medium hotels is diminishing, democratizing access to advanced SEO techniques and leveling the playing field in the digital hospitality marketplace.



    Regionally, North America holds the largest share of the Schema Markup for Hotel Websites market, attributed to the high adoption rate of digital marketing technologies and the presence of leading hospitality brands and technology providers. Europe follows closely, driven by a strong tourism sector and regulatory emphasis on data transparency. The Asia Pacific region is experiencing the fastest growth, supported by rapid urbanization, increasing internet penetration, and a burgeoning middle class with rising travel aspirations. Latin America and the Middle East & Africa are also witnessing steady adoption, though at a comparatively nascent stage, as hotels in these regions increasingly recognize the value of enhanced online visibility and structured data for global competitiveness.





    Type Analysis



    The Type segment of the Schema Markup for Hotel Websites market is categorized into Local Business Schem

  4. Nexdata | Korean Test Questions Structured Analysis Processing Data | 2.4...

    • data.nexdata.ai
    Updated Nov 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2025). Nexdata | Korean Test Questions Structured Analysis Processing Data | 2.4 million| Large Language Model(LLM) Data [Dataset]. https://data.nexdata.ai/products/nexdata-korean-test-questions-structured-analysis-processin-nexdata
    Explore at:
    Dataset updated
    Nov 7, 2025
    Dataset authored and provided by
    Nexdata
    Area covered
    South Korea
    Description

    Korean Test Questions Structured Analysis Processing Data, around 2.4 million questions, contains question types, questions, answers, explanations, etc.

  5. Nexdata | Korean Test Questions Structured Analysis Processing Data | 2.4...

    • datarade.ai
    Updated Nov 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2025). Nexdata | Korean Test Questions Structured Analysis Processing Data | 2.4 million [Dataset]. https://datarade.ai/data-products/nexdata-korean-test-questions-structured-analysis-processin-nexdata
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Nov 7, 2025
    Dataset authored and provided by
    Nexdata
    Area covered
    Korea (Republic of)
    Description

    Korean Test Questions Structured Analysis Processing Data, around 2.4 million questions, contains question types, questions, answers, explanations, etc..For subjects, include [Primary School] Korean, Mathematics, English, Social Studies, Science; [Middle School] Korean, English, Mathematics, Science, Social Studies; [High School] Korean, English, Mathematics, Physics, Chemistry, Biology, History, Geography; question Types indlude single-choice question, fill-in question, true or false question, short answer question, etc. This dataset can be used for large-scale subject knowledge enhancement tasks.

    Data content Korean K12, university test question

    Amount around 2.4 million questions

    Data fields Contains question types, questions, answers, explanations, etc.

    Subject and Grade Level K12, university;contains math,physics,chemistry,biology

    Question Types single-choice question, fill-in question, true or false question, short answer question, etc.

    Format Jsonl

    Language Korean

  6. h

    structured-instructions-test

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maximilian Schall, structured-instructions-test [Dataset]. https://huggingface.co/datasets/Maxscha/structured-instructions-test
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Maximilian Schall
    Description

    Dataset Card for "structured-instructions-test"

    More Information needed

  7. d

    Data from: Develop and verify soil/structure interaction for pile/foundation...

    • catalog.data.gov
    • data.openei.org
    • +2more
    Updated Jun 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wind Energy Technologies Office (WETO) (2023). Develop and verify soil/structure interaction for pile/foundation interaction [Dataset]. https://catalog.data.gov/dataset/verification-of-a-new-soil-structure-interaction-model
    Explore at:
    Dataset updated
    Jun 11, 2023
    Dataset provided by
    Wind Energy Technologies Office (WETO)
    Description

    Overview Phase II of the Offshore Code Comparison Collaboration, Continued, with Correlation and unCertainty (OC6) project was used to verify the implementation of a new soil-structure interaction (SSI) model for use within offshore wind turbine modeling software. The REDWIN Macro-element model implemented and verified in this study enables a computationally efficient way to model the linear and nonlinear SSI problem, including hysteretic damping, of a monopile structure. The modeling approach was integrated into several modeling tools and a series of increasingly complex simulations was conducted using the IEA 10MW reference turbine mounted on a monopile support structure to verify the coupling between the tools and the REDWIN Macro-element SSI model. This campaign includes only numerical verification between various software and modeling approaches so no experimental measurements are available. The load cases (LC) considered include: LC1 – static response of the tower and substructure LC2 – frequency and mode-shape analysis of the tower and substructure LC3 – response of the tower and substructure due to wind-only loading LC4 – response of the tower and substructure due to wave-only loading LC5 – response of the tower and substructure due to wind and wave loading. Detailed properties of the modeled system are found in the following reference, “Bergua, Roger, Amy Robertson, Jason Jonkman, and Andy Platt. 2021. "Specification Document for OC6 Phase II: Verification of an Advanced Soil-Structure Interaction Model for Offshore Wind Turbines.” Golden, CO: National Renewable Energy Laboratory. NREL/TP-5000-79938. https://www.nrel.gov/docs/fy21osti/79938.pdf. Details on the results from the OC6 Phase II project can be found in the following reference, “Bergua R, Robertson A, Jonkman J, et al. OC6 Phase II: Integration and verification of a new soil–structure interaction model for offshore wind design.” Wind Energy. 2022;25(5):793-810. doi:10.1002/we.2698 Data Details Nineteen academic and industrial partners performed simulations as part of this project, and their simulation results are available on this website. The naming of the datafiles follows the convention: oc6.phase2.participant.loadcase.txt. Also included are the wind files used by participants to prescribe forces and moments at the tower top yaw bearing for average hub-height wind speeds of 9.06 m/s and 20.09 m/s. These files are named as “IEA-10.0-198-RWT_Uref09p06.txt” and “IEA-10.0-198-RWT_Uref20p09.txt” respectively. OC6 Phase II data files have an identifier after the participant corresponding to the modeling approach used. These identifiers are defined as followed: M1: Apparent Fixity (AF) M2: Coupled Springs (CS) M3: Distributed Springs (DS) M4: REDWIN Data Quality This was a verification study with only simulation results. Data quality and uncertainty statements apply only to experimental data.

  8. n

    TargetDB: Structural Genomics Target Search

    • neuinfo.org
    • dknet.org
    • +2more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). TargetDB: Structural Genomics Target Search [Dataset]. http://identifiers.org/RRID:SCR_007960
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    TargetDB, a target registration database, provides information on the experimental progress and status of targets selected for structure determination. Search sequences from the PSI Structural Genomics Centers and other Structural Genomics projects.For more information about how these proteins were cloned, expressed, purified, or other experimental protocols please go to the Protein expression, purification, and crystallization DataBase.

  9. d

    Data from: Introducing evidence based medicine to the journal club, using a...

    • catalog.data.gov
    • odgavaprod.ogopendata.com
    Updated Sep 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institutes of Health (2025). Introducing evidence based medicine to the journal club, using a structured pre and post test: a cohort study [Dataset]. https://catalog.data.gov/dataset/introducing-evidence-based-medicine-to-the-journal-club-using-a-structured-pre-and-post-te
    Explore at:
    Dataset updated
    Sep 6, 2025
    Dataset provided by
    National Institutes of Health
    Description

    Background Journal Club at a University-based residency program was restructured to introduce, reinforce and evaluate residents understanding of the concepts of Evidence Based Medicine. Methods Over the course of a year structured pre and post-tests were developed for use during each Journal Club. Questions were derived from the articles being reviewed. Performance with the key concepts of Evidence Based Medicine was assessed. Study subjects were 35 PGY2 and PGY3 residents in a University based Family Practice Program. Results Performance on the pre-test demonstrated a significant improvement from a median of 54.5 % to 78.9 % over the course of the year (F 89.17, p < .001). The post-test results also exhibited a significant increase from 63.6 % to 81.6% (F 85.84, p < .001). Conclusions Following organizational revision, the introduction of a pre-test/post-test instrument supported achievement of the learning objectives with a better understanding and utilization of the concepts of Evidence Based Medicine.

  10. w

    Google Search Console Field Reference Available options

    • windsor.ai
    json
    Updated Sep 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Windsor.ai (2022). Google Search Console Field Reference Available options [Dataset]. https://windsor.ai/data-field/searchconsole/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Sep 1, 2022
    Dataset provided by
    Windsor.ai
    Variables measured
    Include Fresh Data
    Description

    Auto-generated structured data of Google Search Console Field Reference from table Available options

  11. Coding Questions Dataset

    • kaggle.com
    zip
    Updated Oct 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kartikeya Pandey (2025). Coding Questions Dataset [Dataset]. https://www.kaggle.com/datasets/guitaristboy/coding-questions-dataset
    Explore at:
    zip(135582 bytes)Available download formats
    Dataset updated
    Oct 24, 2025
    Authors
    Kartikeya Pandey
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset contains a curated collection of programming questions, each paired with example inputs/outputs, constraints, and test cases.

    It is designed for use in machine learning research, code generation models, natural language processing (NLP) tasks, or simply as a question bank for learners and educators.

    Dataset Highlights:

    📘 616 questions with titles, descriptions, and difficulty levels (Easy, Medium, Hard)

    💡 Each question includes examples, constraints, and test cases stored as structured JSON

    🧠 Useful for LLM fine-tuning, question answering, and automated code evaluation tasks

    🧩 Ideal for creating or benchmarking AI coding assistants and educational apps

    Source: Collected from a structured internal question database built for educational and evaluation purposes.

    Format: CSV file with the following columns: id, title, description, difficulty_level, created_at, updated_at, examples, constraints, test_cases

  12. Data from: ClaimsKG - A Knowledge Graph of Fact-Checked Claims

    • zenodo.org
    • explore.openaire.eu
    • +1more
    zip
    Updated Oct 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andon Tchechmedjiev; Andon Tchechmedjiev; Pavlos Fafalios; Pavlos Fafalios; Konstantin Todorov; Konstantin Todorov; Stefan Dietze; Boland; Zapilko; Stefan Dietze; Boland; Zapilko (2022). ClaimsKG - A Knowledge Graph of Fact-Checked Claims [Dataset]. http://doi.org/10.5281/zenodo.3518960
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 18, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Andon Tchechmedjiev; Andon Tchechmedjiev; Pavlos Fafalios; Pavlos Fafalios; Konstantin Todorov; Konstantin Todorov; Stefan Dietze; Boland; Zapilko; Stefan Dietze; Boland; Zapilko
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    The latest release of ClaimsKG is available in Datorium.

    ClaimsKG is a knowledge graph of metadata information for thousands of fact-checked claims which facilitates structured queries about their truth values, authors, dates, and other kinds of metadata. ClaimsKG is generated through a (semi-)automated pipeline, which harvests claim-related data from popular fact-checking web sites, annotates them with related entities from DBpedia, and lifts all data to RDF using an RDF/S model that makes use of established vocabularies (such as schema.org).

    ClaimsKG does NOT contain the text of the reviews from the fact-checking web sites; it only contains structured metadata information and links to the reviews.

    More information, such as statistics, query examples and a user friendly interface to explore the knowledge graph, is available at: https://data.gesis.org/claimskg/site

    If you use ClaimsKG, please cite the below paper:

    Tchechmedjiev, Andon, Pavlos Fafalios, Katarina Boland, Malo Gasquet, Matthäus Zloch, Benjamin Zapilko, Stefan Dietze, and Konstantin Todorov. "ClaimsKG: a Knowledge Graph of Fact-Checked Claims." In International Semantic Web Conference, pp. 309-324. Springer, Cham, 2019. https://doi.org/10.1007/978-3-030-30796-7_20
    [pdf, bib]

  13. Data from: A consensus compound/bioactivity dataset for data-driven drug...

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated May 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laura Isigkeit; Laura Isigkeit; Apirat Chaikuad; Apirat Chaikuad; Daniel Merk; Daniel Merk (2022). A consensus compound/bioactivity dataset for data-driven drug design and chemogenomics [Dataset]. http://doi.org/10.5281/zenodo.6320761
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 13, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Laura Isigkeit; Laura Isigkeit; Apirat Chaikuad; Apirat Chaikuad; Daniel Merk; Daniel Merk
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Information

    The diverse publicly available compound/bioactivity databases constitute a key resource for data-driven applications in chemogenomics and drug design. Analysis of their coverage of compound entries and biological targets revealed considerable differences, however, suggesting benefit of a consensus dataset. Therefore, we have combined and curated information from five esteemed databases (ChEMBL, PubChem, BindingDB, IUPHAR/BPS and Probes&Drugs) to assemble a consensus compound/bioactivity dataset comprising 1144803 compounds with 10915362 bioactivities on 5613 targets (including defined macromolecular targets as well as cell-lines and phenotypic readouts). It also provides simplified information on assay types underlying the bioactivity data and on bioactivity confidence by comparing data from different sources. We have unified the source databases, brought them into a common format and combined them, enabling an ease for generic uses in multiple applications such as chemogenomics and data-driven drug design.

    The consensus dataset provides increased target coverage and contains a higher number of molecules compared to the source databases which is also evident from a larger number of scaffolds. These features render the consensus dataset a valuable tool for machine learning and other data-driven applications in (de novo) drug design and bioactivity prediction. The increased chemical and bioactivity coverage of the consensus dataset may improve robustness of such models compared to the single source databases. In addition, semi-automated structure and bioactivity annotation checks with flags for divergent data from different sources may help data selection and further accurate curation.

    Structure and content of the dataset

    Dataset structure

    ChEMBL

    ID

    PubChem

    ID

    IUPHAR

    ID

    Target

    Activity

    type

    Assay typeUnitMean C (0)...Mean PC (0)...Mean B (0)...Mean I (0)...Mean PD (0)...Activity check annotationLigand namesCanonical SMILES C...Structure checkSource

    The dataset was created using the Konstanz Information Miner (KNIME) (https://www.knime.com/) and was exported as a CSV-file and a compressed CSV-file.

    Except for the canonical SMILES columns, all columns are filled with the datatype ‘string’. The datatype for the canonical SMILES columns is the smiles-format. We recommend the File Reader node for using the dataset in KNIME. With the help of this node the data types of the columns can be adjusted exactly. In addition, only this node can read the compressed format.

    Column content:

    • ChEMBL ID, PubChem ID, IUPHAR ID: chemical identifier of the databases
    • Target: biological target of the molecule expressed as the HGNC gene symbol
    • Activity type: for example, pIC50
    • Assay type: Simplification/Classification of the assay into cell-free, cellular, functional and unspecified
    • Unit: unit of bioactivity measurement
    • Mean columns of the databases: mean of bioactivity values or activity comments denoted with the frequency of their occurrence in the database, e.g. Mean C = 7.5 *(15) -> the value for this compound-target pair occurs 15 times in ChEMBL database
    • Activity check annotation: a bioactivity check was performed by comparing values from the different sources and adding an activity check annotation to provide automated activity validation for additional confidence
      • no comment: bioactivity values are within one log unit;
      • check activity data: bioactivity values are not within one log unit;
      • only one data point: only one value was available, no comparison and no range calculated;
      • no activity value: no precise numeric activity value was available;
      • no log-value could be calculated: no negative decadic logarithm could be calculated, e.g., because the reported unit was not a compound concentration
    • Ligand names: all unique names contained in the five source databases are listed
    • Canonical SMILES columns: Molecular structure of the compound from each database
    • Structure check: To denote matching or differing compound structures in different source databases
      • match: molecule structures are the same between different sources;
      • no match: the structures differ;
      • 1 source: no structure comparison is possible, because the molecule comes from only one source database.
    • Source: From which databases the data come from

  14. h

    test-text-clustering-structured-batched-v0.1

    • huggingface.co
    Updated Sep 5, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agustín Piqueres Lajarín (2024). test-text-clustering-structured-batched-v0.1 [Dataset]. https://huggingface.co/datasets/plaguss/test-text-clustering-structured-batched-v0.1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 5, 2024
    Authors
    Agustín Piqueres Lajarín
    Description

    Dataset Card for test-text-clustering-structured-batched-v0.1

    This dataset has been created with distilabel.

      Dataset Summary
    

    This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/plaguss/test-text-clustering-structured-batched-v0.1/raw/main/pipeline.yaml"

    or explore the configuration: distilabel pipeline… See the full description on the dataset page: https://huggingface.co/datasets/plaguss/test-text-clustering-structured-batched-v0.1.

  15. Abstracts for scoping review on automated fact-checking

    • figshare.com
    xlsx
    Updated Feb 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lasha Kavtaradze (2024). Abstracts for scoping review on automated fact-checking [Dataset]. http://doi.org/10.6084/m9.figshare.25305199.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 28, 2024
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Lasha Kavtaradze
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is curated for a scoping literature review focusing on automated fact-checking. It comprises metadata extracted from 338 papers sourced from 10 databases, all centered around automated information verification. Following inclusion and exclusion criteria, 199 abstracts were chosen for subsequent disciplinary and thematic analysis.

  16. Wikimedia Structured Dataset Navigator (JSONL)

    • kaggle.com
    zip
    Updated Apr 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mehranism (2025). Wikimedia Structured Dataset Navigator (JSONL) [Dataset]. https://www.kaggle.com/datasets/mehranism/wikimedia-structured-dataset-navigator-jsonl
    Explore at:
    zip(266196504 bytes)Available download formats
    Dataset updated
    Apr 23, 2025
    Authors
    Mehranism
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    📚 Overview: This dataset provides a compact and efficient way to explore the massive "Wikipedia Structured Contents" dataset by Wikimedia Foundation, which consists of 38 large JSONL files (each ~2.5GB). Loading these directly in Kaggle or Colab is impractical due to resource constraints. This file index solves that problem.

    🔍 What’s Inside: This dataset includes a single JSONL file named wiki_structured_dataset_navigator.jsonl that contains metadata for every file in the English portion of the Wikimedia dataset.

    Each line in the JSONL file is a JSON object with the following fields: - file_name: the actual filename in the source dataset (e.g., enwiki_namespace_0_0.jsonl) - file_index: the numeric row index of the file - name: the Wikipedia article title or identifier - url: a link to the full article on Wikipedia - description: a short description or abstract of the article (when available)

    🛠 Use Case: Use this dataset to search by keyword, article name, or description to find which specific files from the full Wikimedia dataset contain the topics you're interested in. You can then download only the relevant file(s) instead of the entire dataset.

    ⚡️ Benefits: - Lightweight (~MBs vs. GBs) - Easy to load and search - Great for indexing, previewing, and subsetting the Wikimedia dataset - Saves time, bandwidth, and compute resources

    📎 Example Usage (Python): ```python import kagglehub import json import pandas as pd import numpy as np import os from tqdm import tqdm from datetime import datetime import re

    def read_jsonl(file_path, max_records=None): data = [] with open(file_path, 'r', encoding='utf-8') as f: for i, line in enumerate(tqdm(f)): if max_records and i >= max_records: break data.append(json.loads(line)) return data

    file_path = kagglehub.dataset_download("mehranism/wikimedia-structured-dataset-navigator-jsonl",path="wiki_structured_dataset_navigator.jsonl") data = read_jsonl(file_path) print(f"Successfully loaded {len(data)} records")

    df = pd.DataFrame(data) print(f"Dataset shape: {df.shape}") print(" Columns in the dataset:") for col in df.columns: print(f"- {col}")

    
    This dataset is perfect for developers working on:
    - Retrieval-Augmented Generation (RAG)
    - Large Language Model (LLM) fine-tuning
    - Search and filtering pipelines
    - Academic research on structured Wikipedia content
    
    💡 Tip:
    Pair this index with the original [Wikipedia Structured Contents dataset](https://www.kaggle.com/datasets/wikimedia-foundation/wikipedia-structured-contents) for full article access.
    
    📃 Format:
    - File: `wiki_structured_dataset_navigator.jsonl`
    - Format: JSON Lines (1 object per line)
    - Encoding: UTF-8
    
    ---
    
    ### **Tags**
    

    wikipedia, wikimedia, jsonl, structured-data, search-index, metadata, file-catalog, dataset-index, large-language-models, machine-learning ```

    Licensing

    CC0: Public Domain Dedication
    

    (Recommended for open indexing tools with no sensitive data.)

  17. w

    Dataset of book subjects that contain Test process improvement : a practical...

    • workwithdata.com
    Updated Nov 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2024). Dataset of book subjects that contain Test process improvement : a practical step-by-step guide to structured testing [Dataset]. https://www.workwithdata.com/datasets/book-subjects?f=1&fcol0=j0-book&fop0=%3D&fval0=Test+process+improvement+:+a+practical+step-by-step+guide+to+structured+testing&j=1&j0=books
    Explore at:
    Dataset updated
    Nov 7, 2024
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about book subjects. It has 1 row and is filtered where the books is Test process improvement : a practical step-by-step guide to structured testing. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.

  18. t

    LUMO - Leibniz Universtity Test Structure for Monitoring

    • service.tib.eu
    Updated Jul 23, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). LUMO - Leibniz Universtity Test Structure for Monitoring [Dataset]. https://service.tib.eu/ldmservice/dataset/lumo-leibniz-universtity-test-structure-for-monitoring
    Explore at:
    Dataset updated
    Jul 23, 2021
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The published dataset comprises long-term structural measurements of a lattice tower to support and facilitate the research in the field of Structural Health Monitoring (SHM). The structure is located near Hanover in Northern Germany and is equipped with reversible damage mechanisms on multiple positions, which have been repeatedly activated and reactivated during the measurement period from August to December 2020. Meteorological measurements have been conducted in parallel by the Institute of Meteorology and Climatology (IMUK) of Leibniz University Hannover and are provided in the same file format as the structural data. The data can be accessed through: https://data.uni-hannover.de:8080/dataset/upload/users/isd/lumo/ For unlocking the meteorological data, please send an informal request to public.data(at)isd.uni-hannover.de

  19. w

    Global Big Data Analytics Software for Test and Measurement Market Research...

    • wiseguyreports.com
    Updated Oct 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Global Big Data Analytics Software for Test and Measurement Market Research Report: By Application (Quality Assurance, Predictive Maintenance, Product Testing, Compliance Testing), By Deployment Type (On-Premises, Cloud-Based, Hybrid), By End User (Manufacturing, Telecommunications, Healthcare, Automotive), By Data Type (Structured Data, Unstructured Data, Semi-Structured Data) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2035 [Dataset]. https://www.wiseguyreports.com/reports/big-data-analytics-software-for-test-and-measurement-market
    Explore at:
    Dataset updated
    Oct 14, 2025
    License

    https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

    Time period covered
    Oct 25, 2025
    Area covered
    Global
    Description
    BASE YEAR2024
    HISTORICAL DATA2019 - 2023
    REGIONS COVEREDNorth America, Europe, APAC, South America, MEA
    REPORT COVERAGERevenue Forecast, Competitive Landscape, Growth Factors, and Trends
    MARKET SIZE 20242.48(USD Billion)
    MARKET SIZE 20252.64(USD Billion)
    MARKET SIZE 20355.0(USD Billion)
    SEGMENTS COVEREDApplication, Deployment Type, End User, Data Type, Regional
    COUNTRIES COVEREDUS, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA
    KEY MARKET DYNAMICSGrowing demand for real-time analytics, Increasing reliance on data-driven decisions, Advancements in machine learning algorithms, Rise of IoT applications, Need for regulatory compliance and standards
    MARKET FORECAST UNITSUSD Billion
    KEY COMPANIES PROFILEDQlik, SAS Institute, Domo, Micro Focus, SAP, Teradata, TIBCO Software, Tableau Software, Microsoft, Alteryx, IBM, Oracle
    MARKET FORECAST PERIOD2025 - 2035
    KEY MARKET OPPORTUNITIESEmerging IoT integration, Increasing demand for real-time analysis, Adoption in quality assurance processes, Growth in automated testing solutions, Advancements in machine learning techniques.
    COMPOUND ANNUAL GROWTH RATE (CAGR) 6.6% (2025 - 2035)
  20. a

    Accessory Structure Self-verification Form

    • maine.hub.arcgis.com
    Updated Dec 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    State of Maine (2024). Accessory Structure Self-verification Form [Dataset]. https://maine.hub.arcgis.com/maps/6b46baf3ee3f428498390d110c454b7a
    Explore at:
    Dataset updated
    Dec 3, 2024
    Dataset authored and provided by
    State of Maine
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    Description

    Accessory structures often do not require a permit but still must meet certain land use standards. Before beginning construction, property owners must complete this self-verification form indicating that the structure will comply with the relevant rules. This dataset depicts where self-verification that a structure meets the relevant land use standards has been completed.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ziya (2025). Engineering Test Report Dataset [Dataset]. https://www.kaggle.com/datasets/ziya07/engineering-test-report-dataset
Organization logo

Data from: Engineering Test Report Dataset

Structured test data with severity, variability, and impact classification tags.

Related Article
Explore at:
zip(53060 bytes)Available download formats
Dataset updated
Jul 24, 2025
Authors
Ziya
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

This dataset is designed to support research and development in automated test report generation and quality assessment within engineering domains. It contains 2,454 test report records, each simulating the output of system-level testing across components like sensor modules, brake systems, and control boards.

Each entry includes technical attributes such as execution time, defect severity, test environment, and report length, as well as qualitative scores like clarity, conciseness, and tester confidence. The goal is to provide a comprehensive set of features that represent both objective system metrics and subjective report quality.

A key label, Is_High_Impact_Report, indicates whether a report holds high value in terms of diagnostic importance, based on a combination of severity, clarity, and label quality.

Test Report Generation Applied specifically to engineering systems — such as software engineering, embedded systems, hardware validation, or automated quality assurance in engineering workflows.

🔍 Key Features Feature Name Description Test_Report_ID Unique ID for each report Component Engineering subsystem tested (e.g., Sensor Module, Engine Unit) Test_Case_ID Identifier of the executed test case Execution_Time(s) Time taken to complete the test Defect_Detected Indicates if a defect was found Defect_Severity Severity of detected defect: Low, Medium, High, Critical, or None Defect_Variability Recurrence score of the defect across tests (0.0–1.0) Log_Length Number of lines in the report log Report_Clarity_Score Clarity score of the report text (0.0–1.0) Report_Conciseness_Score Conciseness rating of the report (0.0–1.0) Tester_Confidence_Level Confidence level of the person executing the test (1–5) Test_Environment Environment where the test occurred: Simulation, Lab, or Field Auto_Label_Quality Expert quality rating for the report (1–10) Timestamp Date and time when the test was conducted Is_High_Impact_Report Target label indicating whether the report is considered impactful

✅ Use Cases Enhancing test documentation processes

Analyzing defect characteristics and report relevance

Supporting quality assurance workflows

Building datasets for exploratory or statistical analysis in engineering testing.

Search
Clear search
Close search
Google apps
Main menu