86 datasets found

Data from: Engineering Test Report Dataset
kaggle.com
zip
Updated Jul 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ziya (2025). Engineering Test Report Dataset [Dataset]. https://www.kaggle.com/datasets/ziya07/engineering-test-report-dataset
Explore at:
zip(53060 bytes)Available download formats
Dataset updated
Jul 24, 2025
Authors
Ziya
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset is designed to support research and development in automated test report generation and quality assessment within engineering domains. It contains 2,454 test report records, each simulating the output of system-level testing across components like sensor modules, brake systems, and control boards.

Each entry includes technical attributes such as execution time, defect severity, test environment, and report length, as well as qualitative scores like clarity, conciseness, and tester confidence. The goal is to provide a comprehensive set of features that represent both objective system metrics and subjective report quality.

A key label, Is_High_Impact_Report, indicates whether a report holds high value in terms of diagnostic importance, based on a combination of severity, clarity, and label quality.

Test Report Generation Applied specifically to engineering systems — such as software engineering, embedded systems, hardware validation, or automated quality assurance in engineering workflows.

🔍 Key Features Feature Name Description Test_Report_ID Unique ID for each report Component Engineering subsystem tested (e.g., Sensor Module, Engine Unit) Test_Case_ID Identifier of the executed test case Execution_Time(s) Time taken to complete the test Defect_Detected Indicates if a defect was found Defect_Severity Severity of detected defect: Low, Medium, High, Critical, or None Defect_Variability Recurrence score of the defect across tests (0.0–1.0) Log_Length Number of lines in the report log Report_Clarity_Score Clarity score of the report text (0.0–1.0) Report_Conciseness_Score Conciseness rating of the report (0.0–1.0) Tester_Confidence_Level Confidence level of the person executing the test (1–5) Test_Environment Environment where the test occurred: Simulation, Lab, or Field Auto_Label_Quality Expert quality rating for the report (1–10) Timestamp Date and time when the test was conducted Is_High_Impact_Report Target label indicating whether the report is considered impactful

✅ Use Cases Enhancing test documentation processes

Analyzing defect characteristics and report relevance

Supporting quality assurance workflows

Building datasets for exploratory or statistical analysis in engineering testing.
C
Complete Schema Markup Implementation Strategy for Local Businesses
caseysseo.com
jsonld
Updated Jul 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Casey Miller (2025). Complete Schema Markup Implementation Strategy for Local Businesses [Dataset]. https://caseysseo.com/complete-schema-markup-implementation-strategy-for-local-businesses
Explore at:
jsonldAvailable download formats
Dataset updated
Jul 30, 2025
Dataset provided by
Casey's SEO
Authors
Casey Miller
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2025
Variables measured
Review Count, Event Attendance, Service Bookings, Content Engagement, Local Pack Ranking, Review Star Rating, Local Search Visibility, Voice Search Visibility
Measurement technique
Customer surveys, Industry benchmarking, Google Search Console data analysis, Controlled testing and experimentation
Description
Detailed dataset covering comprehensive schema markup implementation methodology, LocalBusiness schema setup, and advanced structured data strategies for local businesses.
G
Schema Markup for Hotel Websites Market Research Report 2033
growthmarketreports.com
csv, pdf, pptx
Updated Oct 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Growth Market Reports (2025). Schema Markup for Hotel Websites Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/schema-markup-for-hotel-websites-market
Explore at:
csv, pptx, pdfAvailable download formats
Dataset updated
Oct 4, 2025
Dataset authored and provided by
Growth Market Reports
Time period covered
2024 - 2032
Area covered
Global
Description
Schema Markup for Hotel Websites Market Outlook

According to our latest research, the global Schema Markup for Hotel Websites market size reached USD 1.3 billion in 2024, with a robust compound annual growth rate (CAGR) of 13.2% expected through the forecast period. By 2033, the market is projected to attain a value of USD 3.8 billion, driven by the increasing emphasis on digital visibility, structured data adoption, and the need for enhanced user experiences in the hospitality industry. The primary growth factor remains the rapid digital transformation of the hotel sector, where schema markup plays a pivotal role in improving search engine rankings and boosting direct bookings.

One of the most significant growth drivers for the Schema Markup for Hotel Websites market is the escalating competition among hotels to secure top positions in search engine results. As travelers increasingly rely on search engines to discover and book accommodations, hotels are compelled to implement advanced SEO strategies, where schema markup is a cornerstone. Schema markup enables search engines to better understand website content, resulting in rich snippets, enhanced visibility, and higher click-through rates. This trend is further accelerated by the growing use of mobile devices for travel planning, which demands more precise and accessible information presentation. In addition, Google’s ongoing updates to its search algorithms have made structured data not just a recommendation but a necessity for hotels aiming to maintain or improve their digital footprint.

Another key factor fueling market growth is the proliferation of online reviews and user-generated content, which have become central to the decision-making process for travelers. Hotels are increasingly utilizing review schema and event schema to highlight guest experiences and promote special events directly in search results. This not only builds trust and credibility but also encourages more direct engagement with potential guests. The ability to display ratings, availability, pricing, and special offers in search listings provides hotels with a competitive edge, leading to higher conversion rates. Furthermore, the integration of schema markup with booking engines and property management systems is streamlining operations and enhancing the guest journey from discovery to post-stay feedback.

The surge in cloud-based deployment models is also propelling the market forward. Cloud-based schema management solutions offer scalability, ease of updates, and integration with various digital marketing platforms. This is particularly advantageous for hotel chains and large properties that require consistent schema implementation across multiple locations. The rise of AI-driven content management systems, capable of automating schema generation and updates, is making it easier for hotels of all sizes to adopt structured data practices. As a result, the barrier to entry for small and medium hotels is diminishing, democratizing access to advanced SEO techniques and leveling the playing field in the digital hospitality marketplace.

Regionally, North America holds the largest share of the Schema Markup for Hotel Websites market, attributed to the high adoption rate of digital marketing technologies and the presence of leading hospitality brands and technology providers. Europe follows closely, driven by a strong tourism sector and regulatory emphasis on data transparency. The Asia Pacific region is experiencing the fastest growth, supported by rapid urbanization, increasing internet penetration, and a burgeoning middle class with rising travel aspirations. Latin America and the Middle East & Africa are also witnessing steady adoption, though at a comparatively nascent stage, as hotels in these regions increasingly recognize the value of enhanced online visibility and structured data for global competitiveness.

Type Analysis

The Type segment of the Schema Markup for Hotel Websites market is categorized into Local Business Schem
Nexdata | Korean Test Questions Structured Analysis Processing Data | 2.4...
data.nexdata.ai
Updated Nov 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2025). Nexdata | Korean Test Questions Structured Analysis Processing Data | 2.4 million| Large Language Model(LLM) Data [Dataset]. https://data.nexdata.ai/products/nexdata-korean-test-questions-structured-analysis-processin-nexdata
Explore at:
Dataset updated
Nov 7, 2025
Dataset authored and provided by
Nexdata
Area covered
South Korea
Description
Korean Test Questions Structured Analysis Processing Data, around 2.4 million questions, contains question types, questions, answers, explanations, etc.
Nexdata | Korean Test Questions Structured Analysis Processing Data | 2.4...
datarade.ai
Updated Nov 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2025). Nexdata | Korean Test Questions Structured Analysis Processing Data | 2.4 million [Dataset]. https://datarade.ai/data-products/nexdata-korean-test-questions-structured-analysis-processin-nexdata
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Nov 7, 2025
Dataset authored and provided by
Nexdata
Area covered
Korea (Republic of)
Description
Korean Test Questions Structured Analysis Processing Data, around 2.4 million questions, contains question types, questions, answers, explanations, etc..For subjects, include [Primary School] Korean, Mathematics, English, Social Studies, Science; [Middle School] Korean, English, Mathematics, Science, Social Studies; [High School] Korean, English, Mathematics, Physics, Chemistry, Biology, History, Geography; question Types indlude single-choice question, fill-in question, true or false question, short answer question, etc. This dataset can be used for large-scale subject knowledge enhancement tasks.

Data content Korean K12, university test question

Amount around 2.4 million questions

Data fields Contains question types, questions, answers, explanations, etc.

Subject and Grade Level K12, university；contains math,physics,chemistry,biology

Question Types single-choice question, fill-in question, true or false question, short answer question, etc.

Format Jsonl

Language Korean
h
structured-instructions-test
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maximilian Schall, structured-instructions-test [Dataset]. https://huggingface.co/datasets/Maxscha/structured-instructions-test
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Maximilian Schall
Description
Dataset Card for "structured-instructions-test"

More Information needed
d
Data from: Develop and verify soil/structure interaction for pile/foundation...
catalog.data.gov
data.openei.org
+2more
Updated Jun 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wind Energy Technologies Office (WETO) (2023). Develop and verify soil/structure interaction for pile/foundation interaction [Dataset]. https://catalog.data.gov/dataset/verification-of-a-new-soil-structure-interaction-model
Explore at:
Dataset updated
Jun 11, 2023
Dataset provided by
Wind Energy Technologies Office (WETO)
Description
Overview Phase II of the Offshore Code Comparison Collaboration, Continued, with Correlation and unCertainty (OC6) project was used to verify the implementation of a new soil-structure interaction (SSI) model for use within offshore wind turbine modeling software. The REDWIN Macro-element model implemented and verified in this study enables a computationally efficient way to model the linear and nonlinear SSI problem, including hysteretic damping, of a monopile structure. The modeling approach was integrated into several modeling tools and a series of increasingly complex simulations was conducted using the IEA 10MW reference turbine mounted on a monopile support structure to verify the coupling between the tools and the REDWIN Macro-element SSI model. This campaign includes only numerical verification between various software and modeling approaches so no experimental measurements are available. The load cases (LC) considered include: LC1 – static response of the tower and substructure LC2 – frequency and mode-shape analysis of the tower and substructure LC3 – response of the tower and substructure due to wind-only loading LC4 – response of the tower and substructure due to wave-only loading LC5 – response of the tower and substructure due to wind and wave loading. Detailed properties of the modeled system are found in the following reference, “Bergua, Roger, Amy Robertson, Jason Jonkman, and Andy Platt. 2021. "Specification Document for OC6 Phase II: Verification of an Advanced Soil-Structure Interaction Model for Offshore Wind Turbines.” Golden, CO: National Renewable Energy Laboratory. NREL/TP-5000-79938. https://www.nrel.gov/docs/fy21osti/79938.pdf. Details on the results from the OC6 Phase II project can be found in the following reference, “Bergua R, Robertson A, Jonkman J, et al. OC6 Phase II: Integration and verification of a new soil–structure interaction model for offshore wind design.” Wind Energy. 2022;25(5):793-810. doi:10.1002/we.2698 Data Details Nineteen academic and industrial partners performed simulations as part of this project, and their simulation results are available on this website. The naming of the datafiles follows the convention: oc6.phase2.participant.loadcase.txt. Also included are the wind files used by participants to prescribe forces and moments at the tower top yaw bearing for average hub-height wind speeds of 9.06 m/s and 20.09 m/s. These files are named as “IEA-10.0-198-RWT_Uref09p06.txt” and “IEA-10.0-198-RWT_Uref20p09.txt” respectively. OC6 Phase II data files have an identifier after the participant corresponding to the modeling approach used. These identifiers are defined as followed: M1: Apparent Fixity (AF) M2: Coupled Springs (CS) M3: Distributed Springs (DS) M4: REDWIN Data Quality This was a verification study with only simulation results. Data quality and uncertainty statements apply only to experimental data.
n
TargetDB: Structural Genomics Target Search
neuinfo.org
dknet.org
+2more
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). TargetDB: Structural Genomics Target Search [Dataset]. http://identifiers.org/RRID:SCR_007960
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_007960
Dataset updated
Jan 29, 2022
Description
TargetDB, a target registration database, provides information on the experimental progress and status of targets selected for structure determination. Search sequences from the PSI Structural Genomics Centers and other Structural Genomics projects.For more information about how these proteins were cloned, expressed, purified, or other experimental protocols please go to the Protein expression, purification, and crystallization DataBase.
d
Data from: Introducing evidence based medicine to the journal club, using a...
catalog.data.gov
odgavaprod.ogopendata.com
Updated Sep 6, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institutes of Health (2025). Introducing evidence based medicine to the journal club, using a structured pre and post test: a cohort study [Dataset]. https://catalog.data.gov/dataset/introducing-evidence-based-medicine-to-the-journal-club-using-a-structured-pre-and-post-te
Explore at:
Dataset updated
Sep 6, 2025
Dataset provided by
National Institutes of Health
Description
Background Journal Club at a University-based residency program was restructured to introduce, reinforce and evaluate residents understanding of the concepts of Evidence Based Medicine. Methods Over the course of a year structured pre and post-tests were developed for use during each Journal Club. Questions were derived from the articles being reviewed. Performance with the key concepts of Evidence Based Medicine was assessed. Study subjects were 35 PGY2 and PGY3 residents in a University based Family Practice Program. Results Performance on the pre-test demonstrated a significant improvement from a median of 54.5 % to 78.9 % over the course of the year (F 89.17, p < .001). The post-test results also exhibited a significant increase from 63.6 % to 81.6% (F 85.84, p < .001). Conclusions Following organizational revision, the introduction of a pre-test/post-test instrument supported achievement of the learning objectives with a better understanding and utilization of the concepts of Evidence Based Medicine.
w
Google Search Console Field Reference Available options
windsor.ai
json
Updated Sep 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Windsor.ai (2022). Google Search Console Field Reference Available options [Dataset]. https://windsor.ai/data-field/searchconsole/
Explore at:
jsonAvailable download formats
Dataset updated
Sep 1, 2022
Dataset provided by
Windsor.ai
Variables measured
Include Fresh Data
Description
Auto-generated structured data of Google Search Console Field Reference from table Available options
Coding Questions Dataset
kaggle.com
zip
Updated Oct 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kartikeya Pandey (2025). Coding Questions Dataset [Dataset]. https://www.kaggle.com/datasets/guitaristboy/coding-questions-dataset
Explore at:
zip(135582 bytes)Available download formats
Dataset updated
Oct 24, 2025
Authors
Kartikeya Pandey
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset contains a curated collection of programming questions, each paired with example inputs/outputs, constraints, and test cases.

It is designed for use in machine learning research, code generation models, natural language processing (NLP) tasks, or simply as a question bank for learners and educators.

Dataset Highlights:

📘 616 questions with titles, descriptions, and difficulty levels (Easy, Medium, Hard)

💡 Each question includes examples, constraints, and test cases stored as structured JSON

🧠 Useful for LLM fine-tuning, question answering, and automated code evaluation tasks

🧩 Ideal for creating or benchmarking AI coding assistants and educational apps

Source: Collected from a structured internal question database built for educational and evaluation purposes.

Format: CSV file with the following columns: id, title, description, difficulty_level, created_at, updated_at, examples, constraints, test_cases
Data from: ClaimsKG - A Knowledge Graph of Fact-Checked Claims
zenodo.org
explore.openaire.eu
+1more
zip
Updated Oct 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andon Tchechmedjiev; Andon Tchechmedjiev; Pavlos Fafalios; Pavlos Fafalios; Konstantin Todorov; Konstantin Todorov; Stefan Dietze; Boland; Zapilko; Stefan Dietze; Boland; Zapilko (2022). ClaimsKG - A Knowledge Graph of Fact-Checked Claims [Dataset]. http://doi.org/10.5281/zenodo.3518960
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3518960
Dataset updated
Oct 18, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Andon Tchechmedjiev; Andon Tchechmedjiev; Pavlos Fafalios; Pavlos Fafalios; Konstantin Todorov; Konstantin Todorov; Stefan Dietze; Boland; Zapilko; Stefan Dietze; Boland; Zapilko
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description

The latest release of ClaimsKG is available in Datorium.

ClaimsKG is a knowledge graph of metadata information for thousands of fact-checked claims which facilitates structured queries about their truth values, authors, dates, and other kinds of metadata. ClaimsKG is generated through a (semi-)automated pipeline, which harvests claim-related data from popular fact-checking web sites, annotates them with related entities from DBpedia, and lifts all data to RDF using an RDF/S model that makes use of established vocabularies (such as schema.org).

ClaimsKG does NOT contain the text of the reviews from the fact-checking web sites; it only contains structured metadata information and links to the reviews.

More information, such as statistics, query examples and a user friendly interface to explore the knowledge graph, is available at: https://data.gesis.org/claimskg/site

If you use ClaimsKG, please cite the below paper:

Tchechmedjiev, Andon, Pavlos Fafalios, Katarina Boland, Malo Gasquet, Matthäus Zloch, Benjamin Zapilko, Stefan Dietze, and Konstantin Todorov. "ClaimsKG: a Knowledge Graph of Fact-Checked Claims." In International Semantic Web Conference, pp. 309-324. Springer, Cham, 2019. https://doi.org/10.1007/978-3-030-30796-7_20
[pdf, bib]

Data from: A consensus compound/bioactivity dataset for data-driven drug...

zenodo.org
data.niaid.nih.gov
+1more

zip

Updated May 13, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

Laura Isigkeit; Laura Isigkeit; Apirat Chaikuad; Apirat Chaikuad; Daniel Merk; Daniel Merk (2022). A consensus compound/bioactivity dataset for data-driven drug design and chemogenomics [Dataset]. http://doi.org/10.5281/zenodo.6320761

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.6320761

Dataset updated

May 13, 2022

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Laura Isigkeit; Laura Isigkeit; Apirat Chaikuad; Apirat Chaikuad; Daniel Merk; Daniel Merk

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Information

The diverse publicly available compound/bioactivity databases constitute a key resource for data-driven applications in chemogenomics and drug design. Analysis of their coverage of compound entries and biological targets revealed considerable differences, however, suggesting benefit of a consensus dataset. Therefore, we have combined and curated information from five esteemed databases (ChEMBL, PubChem, BindingDB, IUPHAR/BPS and Probes&Drugs) to assemble a consensus compound/bioactivity dataset comprising 1144803 compounds with 10915362 bioactivities on 5613 targets (including defined macromolecular targets as well as cell-lines and phenotypic readouts). It also provides simplified information on assay types underlying the bioactivity data and on bioactivity confidence by comparing data from different sources. We have unified the source databases, brought them into a common format and combined them, enabling an ease for generic uses in multiple applications such as chemogenomics and data-driven drug design.

The consensus dataset provides increased target coverage and contains a higher number of molecules compared to the source databases which is also evident from a larger number of scaffolds. These features render the consensus dataset a valuable tool for machine learning and other data-driven applications in (de novo) drug design and bioactivity prediction. The increased chemical and bioactivity coverage of the consensus dataset may improve robustness of such models compared to the single source databases. In addition, semi-automated structure and bioactivity annotation checks with flags for divergent data from different sources may help data selection and further accurate curation.

Structure and content of the dataset

**Dataset structure**
ChEMBL ID	PubChem ID	IUPHAR ID	Target	Activity type	Assay type	Unit	Mean C (0)	...	Mean PC (0)	...	Mean B (0)	...	Mean I (0)	...	Mean PD (0)	...	Activity check annotation	Ligand names	Canonical SMILES C	...	Structure check	Source

The dataset was created using the Konstanz Information Miner (KNIME) (https://www.knime.com/) and was exported as a CSV-file and a compressed CSV-file.

Except for the canonical SMILES columns, all columns are filled with the datatype ‘string’. The datatype for the canonical SMILES columns is the smiles-format. We recommend the File Reader node for using the dataset in KNIME. With the help of this node the data types of the columns can be adjusted exactly. In addition, only this node can read the compressed format.

Column content:

ChEMBL ID, PubChem ID, IUPHAR ID: chemical identifier of the databases
Target: biological target of the molecule expressed as the HGNC gene symbol
Activity type: for example, pIC₅₀
Assay type: Simplification/Classification of the assay into cell-free, cellular, functional and unspecified
Unit: unit of bioactivity measurement
Mean columns of the databases: mean of bioactivity values or activity comments denoted with the frequency of their occurrence in the database, e.g. Mean C = 7.5 *(15) -> the value for this compound-target pair occurs 15 times in ChEMBL database
Activity check annotation: a bioactivity check was performed by comparing values from the different sources and adding an activity check annotation to provide automated activity validation for additional confidence
- no comment: bioactivity values are within one log unit;
- check activity data: bioactivity values are not within one log unit;
- only one data point: only one value was available, no comparison and no range calculated;
- no activity value: no precise numeric activity value was available;
- no log-value could be calculated: no negative decadic logarithm could be calculated, e.g., because the reported unit was not a compound concentration
Ligand names: all unique names contained in the five source databases are listed
Canonical SMILES columns: Molecular structure of the compound from each database
Structure check: To denote matching or differing compound structures in different source databases
- match: molecule structures are the same between different sources;
- no match: the structures differ;
- 1 source: no structure comparison is possible, because the molecule comes from only one source database.
Source: From which databases the data come from

h
test-text-clustering-structured-batched-v0.1
huggingface.co
Updated Sep 5, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agustín Piqueres Lajarín (2024). test-text-clustering-structured-batched-v0.1 [Dataset]. https://huggingface.co/datasets/plaguss/test-text-clustering-structured-batched-v0.1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 5, 2024
Authors
Agustín Piqueres Lajarín
Description
Dataset Card for test-text-clustering-structured-batched-v0.1

This dataset has been created with distilabel.

Dataset Summary

This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/plaguss/test-text-clustering-structured-batched-v0.1/raw/main/pipeline.yaml"

or explore the configuration: distilabel pipeline… See the full description on the dataset page: https://huggingface.co/datasets/plaguss/test-text-clustering-structured-batched-v0.1.
Abstracts for scoping review on automated fact-checking
figshare.com
xlsx
Updated Feb 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lasha Kavtaradze (2024). Abstracts for scoping review on automated fact-checking [Dataset]. http://doi.org/10.6084/m9.figshare.25305199.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.25305199.v1
Dataset updated
Feb 28, 2024
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Lasha Kavtaradze
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is curated for a scoping literature review focusing on automated fact-checking. It comprises metadata extracted from 338 papers sourced from 10 databases, all centered around automated information verification. Following inclusion and exclusion criteria, 199 abstracts were chosen for subsequent disciplinary and thematic analysis.
Wikimedia Structured Dataset Navigator (JSONL)
kaggle.com
zip
Updated Apr 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mehranism (2025). Wikimedia Structured Dataset Navigator (JSONL) [Dataset]. https://www.kaggle.com/datasets/mehranism/wikimedia-structured-dataset-navigator-jsonl
Explore at:
zip(266196504 bytes)Available download formats
Dataset updated
Apr 23, 2025
Authors
Mehranism
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
📚 Overview: This dataset provides a compact and efficient way to explore the massive "Wikipedia Structured Contents" dataset by Wikimedia Foundation, which consists of 38 large JSONL files (each ~2.5GB). Loading these directly in Kaggle or Colab is impractical due to resource constraints. This file index solves that problem.

🔍 What’s Inside: This dataset includes a single JSONL file named wiki_structured_dataset_navigator.jsonl that contains metadata for every file in the English portion of the Wikimedia dataset.

Each line in the JSONL file is a JSON object with the following fields: - file_name: the actual filename in the source dataset (e.g., enwiki_namespace_0_0.jsonl) - file_index: the numeric row index of the file - name: the Wikipedia article title or identifier - url: a link to the full article on Wikipedia - description: a short description or abstract of the article (when available)

🛠 Use Case: Use this dataset to search by keyword, article name, or description to find which specific files from the full Wikimedia dataset contain the topics you're interested in. You can then download only the relevant file(s) instead of the entire dataset.

⚡️ Benefits: - Lightweight (~MBs vs. GBs) - Easy to load and search - Great for indexing, previewing, and subsetting the Wikimedia dataset - Saves time, bandwidth, and compute resources

📎 Example Usage (Python): ```python import kagglehub import json import pandas as pd import numpy as np import os from tqdm import tqdm from datetime import datetime import re

def read_jsonl(file_path, max_records=None): data = [] with open(file_path, 'r', encoding='utf-8') as f: for i, line in enumerate(tqdm(f)): if max_records and i >= max_records: break data.append(json.loads(line)) return data

file_path = kagglehub.dataset_download("mehranism/wikimedia-structured-dataset-navigator-jsonl",path="wiki_structured_dataset_navigator.jsonl") data = read_jsonl(file_path) print(f"Successfully loaded {len(data)} records")

df = pd.DataFrame(data) print(f"Dataset shape: {df.shape}") print(" Columns in the dataset:") for col in df.columns: print(f"- {col}")

This dataset is perfect for developers working on: - Retrieval-Augmented Generation (RAG) - Large Language Model (LLM) fine-tuning - Search and filtering pipelines - Academic research on structured Wikipedia content 💡 Tip: Pair this index with the original [Wikipedia Structured Contents dataset](https://www.kaggle.com/datasets/wikimedia-foundation/wikipedia-structured-contents) for full article access. 📃 Format: - File: `wiki_structured_dataset_navigator.jsonl` - Format: JSON Lines (1 object per line) - Encoding: UTF-8 --- ### **Tags**

wikipedia, wikimedia, jsonl, structured-data, search-index, metadata, file-catalog, dataset-index, large-language-models, machine-learning ```

Licensing

CC0: Public Domain Dedication

(Recommended for open indexing tools with no sensitive data.)
w
Dataset of book subjects that contain Test process improvement : a practical...
workwithdata.com
Updated Nov 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2024). Dataset of book subjects that contain Test process improvement : a practical step-by-step guide to structured testing [Dataset]. https://www.workwithdata.com/datasets/book-subjects?f=1&fcol0=j0-book&fop0=%3D&fval0=Test+process+improvement+:+a+practical+step-by-step+guide+to+structured+testing&j=1&j0=books
Explore at:
Dataset updated
Nov 7, 2024
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about book subjects. It has 1 row and is filtered where the books is Test process improvement : a practical step-by-step guide to structured testing. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
t
LUMO - Leibniz Universtity Test Structure for Monitoring
service.tib.eu
Updated Jul 23, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). LUMO - Leibniz Universtity Test Structure for Monitoring [Dataset]. https://service.tib.eu/ldmservice/dataset/lumo-leibniz-universtity-test-structure-for-monitoring
Explore at:
Dataset updated
Jul 23, 2021
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The published dataset comprises long-term structural measurements of a lattice tower to support and facilitate the research in the field of Structural Health Monitoring (SHM). The structure is located near Hanover in Northern Germany and is equipped with reversible damage mechanisms on multiple positions, which have been repeatedly activated and reactivated during the measurement period from August to December 2020. Meteorological measurements have been conducted in parallel by the Institute of Meteorology and Climatology (IMUK) of Leibniz University Hannover and are provided in the same file format as the structural data. The data can be accessed through: https://data.uni-hannover.de:8080/dataset/upload/users/isd/lumo/ For unlocking the meteorological data, please send an informal request to public.data(at)isd.uni-hannover.de

Global Big Data Analytics Software for Test and Measurement Market Research...

wiseguyreports.com

Updated Oct 14, 2025

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

(2025). Global Big Data Analytics Software for Test and Measurement Market Research Report: By Application (Quality Assurance, Predictive Maintenance, Product Testing, Compliance Testing), By Deployment Type (On-Premises, Cloud-Based, Hybrid), By End User (Manufacturing, Telecommunications, Healthcare, Automotive), By Data Type (Structured Data, Unstructured Data, Semi-Structured Data) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2035 [Dataset]. https://www.wiseguyreports.com/reports/big-data-analytics-software-for-test-and-measurement-market

Explore at:

Dataset updated

Oct 14, 2025

License

https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

Time period covered

Oct 25, 2025

Area covered

Global

Description

BASE YEAR	2024
HISTORICAL DATA	2019 - 2023
REGIONS COVERED	North America, Europe, APAC, South America, MEA
REPORT COVERAGE	Revenue Forecast, Competitive Landscape, Growth Factors, and Trends
MARKET SIZE 2024	2.48(USD Billion)
MARKET SIZE 2025	2.64(USD Billion)
MARKET SIZE 2035	5.0(USD Billion)
SEGMENTS COVERED	Application, Deployment Type, End User, Data Type, Regional
COUNTRIES COVERED	US, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA
KEY MARKET DYNAMICS	Growing demand for real-time analytics, Increasing reliance on data-driven decisions, Advancements in machine learning algorithms, Rise of IoT applications, Need for regulatory compliance and standards
MARKET FORECAST UNITS	USD Billion
KEY COMPANIES PROFILED	Qlik, SAS Institute, Domo, Micro Focus, SAP, Teradata, TIBCO Software, Tableau Software, Microsoft, Alteryx, IBM, Oracle
MARKET FORECAST PERIOD	2025 - 2035
KEY MARKET OPPORTUNITIES	Emerging IoT integration, Increasing demand for real-time analysis, Adoption in quality assurance processes, Growth in automated testing solutions, Advancements in machine learning techniques.
COMPOUND ANNUAL GROWTH RATE (CAGR)	6.6% (2025 - 2035)

a
Accessory Structure Self-verification Form
maine.hub.arcgis.com
Updated Dec 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
State of Maine (2024). Accessory Structure Self-verification Form [Dataset]. https://maine.hub.arcgis.com/maps/6b46baf3ee3f428498390d110c454b7a
Explore at:
Dataset updated
Dec 3, 2024
Dataset authored and provided by
State of Maine
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Area covered
Description
Accessory structures often do not require a permit but still must meet certain land use standards. Before beginning construction, property owners must complete this self-verification form indicating that the structure will comply with the relevant rules. This dataset depicts where self-verification that a structure meets the relevant land use standards has been completed.

Facebook

Twitter

Click to copy link

Link copied

Cite

Ziya (2025). Engineering Test Report Dataset [Dataset]. https://www.kaggle.com/datasets/ziya07/engineering-test-report-dataset

Data from: Engineering Test Report Dataset

Structured test data with severity, variability, and impact classification tags.

Explore at:

zip(53060 bytes)Available download formats

Dataset updated

Jul 24, 2025

Authors

Ziya

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

This dataset is designed to support research and development in automated test report generation and quality assessment within engineering domains. It contains 2,454 test report records, each simulating the output of system-level testing across components like sensor modules, brake systems, and control boards.

Each entry includes technical attributes such as execution time, defect severity, test environment, and report length, as well as qualitative scores like clarity, conciseness, and tester confidence. The goal is to provide a comprehensive set of features that represent both objective system metrics and subjective report quality.

A key label, Is_High_Impact_Report, indicates whether a report holds high value in terms of diagnostic importance, based on a combination of severity, clarity, and label quality.

Test Report Generation Applied specifically to engineering systems — such as software engineering, embedded systems, hardware validation, or automated quality assurance in engineering workflows.

🔍 Key Features Feature Name Description Test_Report_ID Unique ID for each report Component Engineering subsystem tested (e.g., Sensor Module, Engine Unit) Test_Case_ID Identifier of the executed test case Execution_Time(s) Time taken to complete the test Defect_Detected Indicates if a defect was found Defect_Severity Severity of detected defect: Low, Medium, High, Critical, or None Defect_Variability Recurrence score of the defect across tests (0.0–1.0) Log_Length Number of lines in the report log Report_Clarity_Score Clarity score of the report text (0.0–1.0) Report_Conciseness_Score Conciseness rating of the report (0.0–1.0) Tester_Confidence_Level Confidence level of the person executing the test (1–5) Test_Environment Environment where the test occurred: Simulation, Lab, or Field Auto_Label_Quality Expert quality rating for the report (1–10) Timestamp Date and time when the test was conducted Is_High_Impact_Report Target label indicating whether the report is considered impactful

✅ Use Cases Enhancing test documentation processes

Analyzing defect characteristics and report relevance

Supporting quality assurance workflows

Building datasets for exploratory or statistical analysis in engineering testing.

Clear search

Close search

Google apps

Main menu

Data from: Engineering Test Report Dataset

Complete Schema Markup Implementation Strategy for Local Businesses

Schema Markup for Hotel Websites Market Research Report 2033

Schema Markup for Hotel Websites Market Outlook

Type Analysis

Nexdata | Korean Test Questions Structured Analysis Processing Data | 2.4...

Nexdata | Korean Test Questions Structured Analysis Processing Data | 2.4...

structured-instructions-test

Data from: Develop and verify soil/structure interaction for pile/foundation...

TargetDB: Structural Genomics Target Search

Data from: Introducing evidence based medicine to the journal club, using a...

Google Search Console Field Reference Available options

Coding Questions Dataset

Data from: ClaimsKG - A Knowledge Graph of Fact-Checked Claims

Data from: A consensus compound/bioactivity dataset for data-driven drug...

test-text-clustering-structured-batched-v0.1

Abstracts for scoping review on automated fact-checking

Wikimedia Structured Dataset Navigator (JSONL)

Licensing

Dataset of book subjects that contain Test process improvement : a practical...

LUMO - Leibniz Universtity Test Structure for Monitoring

Global Big Data Analytics Software for Test and Measurement Market Research...

Accessory Structure Self-verification Form

Data from: Engineering Test Report Dataset

Structured test data with severity, variability, and impact classification tags.