100+ datasets found
  1. F# Data: Making structured data first-class

    • figshare.com
    bin
    Updated Jan 19, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tomas Petricek (2016). F# Data: Making structured data first-class [Dataset]. http://doi.org/10.6084/m9.figshare.1169941.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Jan 19, 2016
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Tomas Petricek
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Accessing data in structured formats such as XML, CSV and JSON in statically typed languages is difficult, because the languages do not understand the structure of the data. Dynamically typed languages make this syntactically easier, but lead to error-prone code. Despite numerous efforts, most of the data available on the web do not come with a schema. The only information available to developers is a set of examples, such as typical server responses. We describe an inference algorithm that infers a type of structured formats including CSV, XML and JSON. The algorithm is based on finding a common supertype of types representing individual samples (or values in collections). We use the algorithm as a basis for an F# type provider that integrates the inference into the F# type system. As a result, users can access CSV, XML and JSON data in a statically-typed fashion just by specifying a representative sample document.

  2. d

    Administrative Enforcement Agency real estate data download (CSV, JSON, and...

    • data.gov.tw
    csv, json, zip
    Updated Oct 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Administrative Enforcement Agency (2023). Administrative Enforcement Agency real estate data download (CSV, JSON, and XML) for the period from year 112 to the end of the third quarter in the fiscal year. [Dataset]. https://data.gov.tw/en/datasets/165951
    Explore at:
    zip, csv, jsonAvailable download formats
    Dataset updated
    Oct 4, 2023
    Dataset authored and provided by
    Administrative Enforcement Agency
    License

    https://data.gov.tw/licensehttps://data.gov.tw/license

    Description

    Real estate has already recorded target data......

  3. d

    Download the real estate information of the sub-offices auctioned by the...

    • data.gov.tw
    csv, json, zip
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Administrative Enforcement Agency, Download the real estate information of the sub-offices auctioned by the Administrative and Executive Department from 2011 to the first quarter (CSV, JSON and XML) [Dataset]. https://data.gov.tw/en/datasets/151945
    Explore at:
    json, zip, csvAvailable download formats
    Dataset authored and provided by
    Administrative Enforcement Agency
    License

    https://data.gov.tw/licensehttps://data.gov.tw/license

    Description

    Public information: refers to various financial statistical reports, policy news, information services, laws and other related data sets

  4. C

    CityPropertyMailingListOwnerAbsentee

    • data.milwaukee.gov
    csv
    Updated Jun 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Information Technology and Management Division (2025). CityPropertyMailingListOwnerAbsentee [Dataset]. https://data.milwaukee.gov/dataset/citypropertymailinglistownerabsentee
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jun 27, 2025
    Dataset authored and provided by
    Information Technology and Management Division
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    To download XML and JSON files, click the CSV option below and click the down arrow next to the Download button in the upper right on its page.

  5. d

    Download Cities Database

    • download-cities-data.org
    xlsx
    Updated Jun 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Download Cities Database (2025). Download Cities Database [Dataset]. https://www.download-cities-data.org
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 10, 2025
    Dataset authored and provided by
    Download Cities Database
    Time period covered
    2024
    Area covered
    World Cities Database
    Description

    Paid dataset with city names, coordinates, regions, and administrative divisions of World Cities. Available in Excel (.xlsx), CSV, JSON, XML, and SQL formats after purchase.

  6. w

    Global Data Element Market Research Report: By Data Source (Relational...

    • wiseguyreports.com
    Updated Jul 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    wWiseguy Research Consultants Pvt Ltd (2024). Global Data Element Market Research Report: By Data Source (Relational Databases, NoSQL Databases, Big Data Platforms, Cloud-based Data Warehouses), By Type (Structured Data, Unstructured Data, Semi-Structured Data), By Format (XML, JSON, CSV, Parquet), By Purpose (Data Analysis, Machine Learning, Data Visualization, Data Governance), By Deployment Model (On-premises, Cloud-based, Hybrid) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2032. [Dataset]. https://www.wiseguyreports.com/reports/data-element-market
    Explore at:
    Dataset updated
    Jul 23, 2024
    Dataset authored and provided by
    wWiseguy Research Consultants Pvt Ltd
    License

    https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

    Time period covered
    Jan 7, 2024
    Area covered
    Global
    Description
    BASE YEAR2024
    HISTORICAL DATA2019 - 2024
    REPORT COVERAGERevenue Forecast, Competitive Landscape, Growth Factors, and Trends
    MARKET SIZE 20237.6(USD Billion)
    MARKET SIZE 20248.66(USD Billion)
    MARKET SIZE 203224.7(USD Billion)
    SEGMENTS COVEREDData Source ,Type ,Format ,Purpose ,Deployment Model ,Regional
    COUNTRIES COVEREDNorth America, Europe, APAC, South America, MEA
    KEY MARKET DYNAMICSAIdriven data element management Data privacy and regulations Cloudbased data element platforms Data sharing and collaboration Increasing demand for realtime data
    MARKET FORECAST UNITSUSD Billion
    KEY COMPANIES PROFILEDInformatica ,Micro Focus ,IBM ,SAS ,Denodo ,Oracle ,TIBCO ,Talend ,SAP
    MARKET FORECAST PERIOD2024 - 2032
    KEY MARKET OPPORTUNITIES1 Adoption of AI and ML 2 Growing demand for data analytics 3 Increasing cloud adoption 4 Data privacy and security concerns 5 Integration with emerging technologies
    COMPOUND ANNUAL GROWTH RATE (CAGR) 13.99% (2024 - 2032)
  7. d

    Ecuador Cities Database

    • download-cities-data.org
    xlsx
    Updated Jun 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Download Cities Database (2025). Ecuador Cities Database [Dataset]. https://www.download-cities-data.org/Ecuador.php
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 12, 2025
    Dataset authored and provided by
    Download Cities Database
    Time period covered
    2024
    Area covered
    Ecuador
    Description

    Paid dataset with city names, coordinates, regions, and administrative divisions of Ecuador. Available in Excel (.xlsx), CSV, JSON, XML, and SQL formats after purchase.

  8. d

    The Administrative Enforcement Agency has finalized the download of real...

    • data.gov.tw
    csv, json, zip
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Administrative Enforcement Agency, The Administrative Enforcement Agency has finalized the download of real estate data (CSV, JSON, and XML) for the period from the 112th year to the end of the fourth quarter. [Dataset]. https://data.gov.tw/en/datasets/167258
    Explore at:
    json, csv, zipAvailable download formats
    Dataset authored and provided by
    Administrative Enforcement Agency
    License

    https://data.gov.tw/licensehttps://data.gov.tw/license

    Description

    Real estate has been auctioned off................

  9. d

    충북대학교_도서 및 논문원문정보

    • data.go.kr
    csv
    Updated Sep 11, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    충북대학교 (2019). 충북대학교_도서 및 논문원문정보 [Dataset]. https://www.data.go.kr/data/3058186/fileData.do
    Explore at:
    csvAvailable download formats
    Dataset updated
    Sep 11, 2019
    Dataset authored and provided by
    충북대학교
    License

    https://data.go.kr/ugs/selectPortalPolicyView.dohttps://data.go.kr/ugs/selectPortalPolicyView.do

    Description

    도서명, 저자명, 출판사명, 형태사항, 주기사항,원문정보등

  10. d

    The government execution office's attached branches from the 113th year to...

    • data.gov.tw
    csv, json, zip
    Updated Oct 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Administrative Enforcement Agency (2024). The government execution office's attached branches from the 113th year to the end of the third quarter have confirmed the download of real estate data (CSV, JSON, and XML). [Dataset]. https://data.gov.tw/en/datasets/171005
    Explore at:
    csv, json, zipAvailable download formats
    Dataset updated
    Oct 1, 2024
    Dataset authored and provided by
    Administrative Enforcement Agency
    License

    https://data.gov.tw/licensehttps://data.gov.tw/license

    Description

    Real estate has been auctioned off................

  11. C

    CityPropertyMailingListBusiness

    • data.milwaukee.gov
    csv
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Information Technology and Management Division (2025). CityPropertyMailingListBusiness [Dataset]. https://data.milwaukee.gov/dataset/citypropertymailinglistbusiness
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jun 30, 2025
    Dataset authored and provided by
    Information Technology and Management Division
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    To download XML and JSON files, click the CSV option below and click the down arrow next to the Download button in the upper right on its page.

  12. c

    ckanext-tdt

    • catalog.civicdataecosystem.org
    Updated Jun 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). ckanext-tdt [Dataset]. https://catalog.civicdataecosystem.org/dataset/ckanext-tdt
    Explore at:
    Dataset updated
    Jun 4, 2025
    Description

    The tdt (The DataTank) extension for CKAN enhances the accessibility of datasets by automatically integrating them with The DataTank when a dataset is added and is determined to be proxyable. This process makes data available in various formats, like XML, JSON, and CSV, through The DataTank's adapters. By reconfiguring the dataset's URL to use The DataTank, the extension simplifies the consumption of CKAN-managed data in multiple formats. Key Features: Automatic Data Format Conversion: Upon dataset creation and detectability, the system will automatically activate The DataTank adapters to provide XML, JSON, and CSV formats. URL Reconfiguration: The core functionality lies in rewriting the dataset URL to point to The DataTank access point. Integration with The DataTank: Operates as a hook that utilizes The DataTank's proxy capabilities. Technical Integration: The extension requires configuration values to be added to the CKAN .ini file (typically /etc/ckan/default/development.ini). These configuration settings are essential for the extension to correctly identify and interact with datasets to be proxied through The DataTank. Benefits & Impact: The primary benefit of the tdt extension is the streamlined data access for end-users. By transparently integrating with The DataTank, this extension simplifies the process of consuming data managed by CKAN in multiple, commonly used formats, without requiring manual conversion processes.

  13. d

    Download the real estate information downloaded by the sub-offices of the...

    • data.gov.tw
    csv, json, zip
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Administrative Enforcement Agency, Download the real estate information downloaded by the sub-offices of the Administrative and Executive Department from 2011 to the third quarter (CSV, JSON and XML) [Dataset]. https://data.gov.tw/en/datasets/158132
    Explore at:
    json, csv, zipAvailable download formats
    Dataset authored and provided by
    Administrative Enforcement Agency
    License

    https://data.gov.tw/licensehttps://data.gov.tw/license

    Description

    Real estate bidding data (CSV)....................

  14. f

    Europe PMC Full Text Corpus

    • figshare.com
    zip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Santosh Tirunagari; Xiao Yang; Shyamasree Saha; Aravind Venkatesan; Vid Vartak; Johanna McEntyre (2023). Europe PMC Full Text Corpus [Dataset]. http://doi.org/10.6084/m9.figshare.22848380.v2
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    figshare
    Authors
    Santosh Tirunagari; Xiao Yang; Shyamasree Saha; Aravind Venkatesan; Vid Vartak; Johanna McEntyre
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains the Europe PMC full text corpus, a collection of 300 articles from the Europe PMC Open Access subset. Each article contains 3 core entity types, manually annotated by curators: Gene/Protein, Disease and Organism.

    Corpus Directory Structure

    annotations/: contains annotations of the 300 full-text articles in the Europe PMC corpus. Annotations are provided in 3 different formats.

    hypothesis/csv/: contains raw annotations fetched from the annotation platform Hypothes.is in comma-separated values (CSV) format.
    GROUP0/: contains raw manual annotations made by curator GROUP0. GROUP1/: contains raw manual annotations made by curator GROUP1. GROUP2/: contains raw manual annotations made by curator GROUP2.

    IOB/: contains automatically extracted annotations using raw manual annotations in hypothesis/csv/, which is in Inside–Outside–Beginning tagging format.
    dev/: contains IOB format annotations of 45 articles, suppose to be used a dev set in machine learning task. test/: contains IOB format annotations of 45 articles, suppose to be used a test set in machine learning task. train/: contains IOB format annotations of 210 articles, suppose to be used a training set in machine learning task.

    JSON/: contains automatically extracted annotations using raw manual annotations in hypothesis/csv/, which is in JSON format. README.md: a detailed description of all the annotation formats.

    articles/: contains the full-text articles annotated in Europe PMC corpus.

    Sentencised/: contains XML articles whose text has been split into sentences using the Europe PMC sentenciser. XML/: contains XML articles directly fetched using Europe PMC Article Restful API. README.md: a detailed description of the sentencising and fetching of XML articles.

    docs/: contains related documents that were used for generating the corpus.

    Annotation guideline.pdf: annotation guideline that is provided to curators to assist the manual annotation. demo to molecular conenctions.pdf: annotation platform guideline that is provided to curator to help them get familiar with the Hypothes.is platform. Training set development.pdf: initial document that details the paper selection procedures.

    pilot/: contains annotations and articles that were used in a pilot study.

    annotations/csv/: contains raw annotations fetched from the annotation platform Hypothes.is in comma-separated values (CSV) format. articles/: contains the full-text articles annotated in the pilot study.

     Sentencised/: contains XML articles whose text has been split into sentences using the Europe PMC sentenciser.
     XML/: contains XML articles directly fetched using Europe PMC Article Restful API.
    

    README.md: a detailed description of the sentencising and fetching of XML articles.

    src/: source codes for cleaning annotations and generating IOB files

    metrics/ner_metrics.py: Python script contains SemEval evaluation metrics. annotations.py: Python script used to extract annotations from raw Hypothes.is annotations. generate_IOB_dataset.py: Python script used to convert JSON format annotations to IOB tagging format. generate_json_dataset.py: Python script used to extract annotations to JSON format. hypothesis.py: Python script used to fetch raw Hypothes.is annotations.

    License

    CCBY

    Feedback

    For any comment, question, and suggestion, please contact us through helpdesk@europepmc.org or Europe PMC contact page.

  15. Data from: KGCW 2023 Challenge @ ESWC 2023

    • zenodo.org
    • investigacion.usc.gal
    application/gzip
    Updated Apr 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dylan Van Assche; Dylan Van Assche; David Chaves-Fraga; David Chaves-Fraga; Anastasia Dimou; Anastasia Dimou; Umutcan Şimşek; Umutcan Şimşek; Ana Iglesias; Ana Iglesias (2024). KGCW 2023 Challenge @ ESWC 2023 [Dataset]. http://doi.org/10.5281/zenodo.7837289
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Apr 15, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Dylan Van Assche; Dylan Van Assche; David Chaves-Fraga; David Chaves-Fraga; Anastasia Dimou; Anastasia Dimou; Umutcan Şimşek; Umutcan Şimşek; Ana Iglesias; Ana Iglesias
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Knowledge Graph Construction Workshop 2023: challenge

    Knowledge graph construction of heterogeneous data has seen a lot of uptake
    in the last decade from compliance to performance optimizations with respect
    to execution time. Besides execution time as a metric for comparing knowledge
    graph construction, other metrics e.g. CPU or memory usage are not considered.
    This challenge aims at benchmarking systems to find which RDF graph
    construction system optimizes for metrics e.g. execution time, CPU,
    memory usage, or a combination of these metrics.

    Task description

    The task is to reduce and report the execution time and computing resources
    (CPU and memory usage) for the parameters listed in this challenge, compared
    to the state-of-the-art of the existing tools and the baseline results provided
    by this challenge. This challenge is not limited to execution times to create
    the fastest pipeline, but also computing resources to achieve the most efficient
    pipeline.

    We provide a tool which can execute such pipelines end-to-end. This tool also
    collects and aggregates the metrics such as execution time, CPU and memory
    usage, necessary for this challenge as CSV files. Moreover, the information
    about the hardware used during the execution of the pipeline is available as
    well to allow fairly comparing different pipelines. Your pipeline should consist
    of Docker images which can be executed on Linux to run the tool. The tool is
    already tested with existing systems, relational databases e.g. MySQL and
    PostgreSQL, and triplestores e.g. Apache Jena Fuseki and OpenLink Virtuoso
    which can be combined in any configuration. It is strongly encouraged to use
    this tool for participating in this challenge. If you prefer to use a different
    tool or our tool imposes technical requirements you cannot solve, please contact
    us directly.

    Part 1: Knowledge Graph Construction Parameters

    These parameters are evaluated using synthetic generated data to have more
    insights of their influence on the pipeline.

    Data

    • Number of data records: scaling the data size vertically by the number of records with a fixed number of data properties (10K, 100K, 1M, 10M records).
    • Number of data properties: scaling the data size horizontally by the number of data properties with a fixed number of data records (1, 10, 20, 30 columns).
    • Number of duplicate values: scaling the number of duplicate values in the dataset (0%, 25%, 50%, 75%, 100%).
    • Number of empty values: scaling the number of empty values in the dataset (0%, 25%, 50%, 75%, 100%).
    • Number of input files: scaling the number of datasets (1, 5, 10, 15).

    Mappings

    • Number of subjects: scaling the number of subjects with a fixed number of predicates and objects (1, 10, 20, 30 TMs).
    • Number of predicates and objects: scaling the number of predicates and objects with a fixed number of subjects (1, 10, 20, 30 POMs).
    • Number of and type of joins: scaling the number of joins and type of joins (1-1, N-1, 1-N, N-M)

    Part 2: GTFS-Madrid-Bench

    The GTFS-Madrid-Bench provides insights in the pipeline with real data from the
    public transport domain in Madrid.

    Scaling

    • GTFS-1 SQL
    • GTFS-10 SQL
    • GTFS-100 SQL
    • GTFS-1000 SQL

    Heterogeneity

    • GTFS-100 XML + JSON
    • GTFS-100 CSV + XML
    • GTFS-100 CSV + JSON
    • GTFS-100 SQL + XML + JSON + CSV

    Example pipeline

    The ground truth dataset and baseline results are generated in different steps
    for each parameter:

    1. The provided CSV files and SQL schema are loaded into a MySQL relational database.
    2. Mappings are executed by accessing the MySQL relational database to construct a knowledge graph in N-Triples as RDF format.
    3. The constructed knowledge graph is loaded into a Virtuoso triplestore, tuned according to the Virtuoso documentation.
    4. The provided SPARQL queries are executed on the SPARQL endpoint exposed by Virtuoso.

    The pipeline is executed 5 times from which the median execution time of each
    step is calculated and reported. Each step with the median execution time is
    then reported in the baseline results with all its measured metrics.
    Query timeout is set to 1 hour and knowledge graph construction timeout
    to 24 hours. The execution is performed with the following tool: https://github.com/kg-construct/challenge-tool,
    you can adapt the execution plans for this example pipeline to your own needs.

    Each parameter has its own directory in the ground truth dataset with the
    following files:

    • Input dataset as CSV.
    • Mapping file as RML.
    • Queries as SPARQL.
    • Execution plan for the pipeline in metadata.json.

    Datasets

    Knowledge Graph Construction Parameters

    The dataset consists of:

    • Input dataset as CSV for each parameter.
    • Mapping file as RML for each parameter.
    • SPARQL queries to retrieve the results for each parameter.
    • Baseline results for each parameter with the example pipeline.
    • Ground truth dataset for each parameter generated with the example pipeline.

    Format

    All input datasets are provided as CSV, depending on the parameter that is being
    evaluated, the number of rows and columns may differ. The first row is always
    the header of the CSV.

    GTFS-Madrid-Bench

    The dataset consists of:

    • Input dataset as CSV with SQL schema for the scaling and a combination of XML,
    • CSV, and JSON is provided for the heterogeneity.
    • Mapping file as RML for both scaling and heterogeneity.
    • SPARQL queries to retrieve the results.
    • Baseline results with the example pipeline.
    • Ground truth dataset generated with the example pipeline.

    Format

    CSV datasets always have a header as their first row.
    JSON and XML datasets have their own schema.

    Evaluation criteria

    Submissions must evaluate the following metrics:

    • Execution time of all the steps in the pipeline. The execution time of a step is the difference between the begin and end time of a step.
    • CPU time as the time spent in the CPU for all steps of the pipeline. The CPU time of a step is the difference between the begin and end CPU time of a step.
    • Minimal and maximal memory consumption for each step of the pipeline. The minimal and maximal memory consumption of a step is the minimum and maximum calculated of the memory consumption during the execution of a step.

    Expected output

    Duplicate values

    ScaleNumber of Triples
    0 percent2000000 triples
    25 percent1500020 triples
    50 percent1000020 triples
    75 percent500020 triples
    100 percent20 triples

    Empty values

    ScaleNumber of Triples
    0 percent2000000 triples
    25 percent1500000 triples
    50 percent1000000 triples
    75 percent500000 triples
    100 percent0 triples

    Mappings

    ScaleNumber of Triples
    1TM + 15POM1500000 triples
    3TM + 5POM1500000 triples
    5TM + 3POM 1500000 triples
    15TM + 1POM1500000 triples

    Properties

    ScaleNumber of Triples
    1M rows 1 column1000000 triples
    1M rows 10 columns10000000 triples
    1M rows 20 columns20000000 triples
    1M rows 30 columns30000000 triples

    Records

    ScaleNumber of Triples
    10K rows 20 columns200000 triples
    100K rows 20 columns2000000 triples
    1M rows 20 columns20000000 triples
    10M rows 20 columns200000000 triples

    Joins

    1-1 joins

    ScaleNumber of Triples
    0 percent0 triples
    25 percent125000 triples
    50 percent250000 triples
    75 percent375000 triples
    100 percent500000 triples

    1-N joins

    ScaleNumber of Triples
    1-10 0 percent0 triples
    1-10 25 percent125000 triples
    1-10 50 percent250000 triples
    1-10 75 percent375000

  16. m

    CityPropertyMailingList

    • data.milwaukee.gov
    csv
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Information Technology and Management Division (2025). CityPropertyMailingList [Dataset]. https://data.milwaukee.gov/dataset/citypropertymailinglist
    Explore at:
    csv(89063167)Available download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Information Technology and Management Division
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    To download XML and JSON files, click the CSV option below and click the down arrow next to the Download button in the upper right on its page.

  17. d

    Download the real estate information downloaded by the sub-offices of the...

    • data.gov.tw
    csv, json, zip
    Updated Jan 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Administrative Enforcement Agency (2022). Download the real estate information downloaded by the sub-offices of the Administrative and Executive Office from 2011 to the fourth quarter (CSV, JSON and XML) [Dataset]. https://data.gov.tw/en/datasets/147153
    Explore at:
    csv, json, zipAvailable download formats
    Dataset updated
    Jan 3, 2022
    Dataset authored and provided by
    Administrative Enforcement Agency
    License

    https://data.gov.tw/licensehttps://data.gov.tw/license

    Description

    Public information: refers to various financial statistical reports, policy news, information services, laws and other related data sets

  18. d

    United States Minor Outlying Islands Cities Database

    • download-cities-data.org
    xlsx
    Updated Jun 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Download Cities Database (2025). United States Minor Outlying Islands Cities Database [Dataset]. https://www.download-cities-data.org/United_States_Minor_Outlying_Islands.php
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 10, 2025
    Dataset authored and provided by
    Download Cities Database
    Time period covered
    2024
    Area covered
    United States Minor Outlying Islands
    Description

    Paid dataset with city names, coordinates, regions, and administrative divisions of United States Minor Outlying Islands. Available in Excel (.xlsx), CSV, JSON, XML, and SQL formats after purchase.

  19. u

    Data from: KGCW 2024 Challenge @ ESWC 2024

    • investigacion.usc.gal
    • data.niaid.nih.gov
    Updated 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Van Assche, Dylan; Chaves-Fraga, David; Dimou, Anastasia; Serles, Umutcan; Iglesias, Ana; Van Assche, Dylan; Chaves-Fraga, David; Dimou, Anastasia; Serles, Umutcan; Iglesias, Ana (2024). KGCW 2024 Challenge @ ESWC 2024 [Dataset]. https://investigacion.usc.gal/documentos/668fc40fb9e7c03b01bd3810
    Explore at:
    Dataset updated
    2024
    Authors
    Van Assche, Dylan; Chaves-Fraga, David; Dimou, Anastasia; Serles, Umutcan; Iglesias, Ana; Van Assche, Dylan; Chaves-Fraga, David; Dimou, Anastasia; Serles, Umutcan; Iglesias, Ana
    Description

    Knowledge Graph Construction Workshop 2024: challenge

    Knowledge graph construction of heterogeneous data has seen a lot of uptakein the last decade from compliance to performance optimizations with respectto execution time. Besides execution time as a metric for comparing knowledgegraph construction, other metrics e.g. CPU or memory usage are not considered.This challenge aims at benchmarking systems to find which RDF graphconstruction system optimizes for metrics e.g. execution time, CPU,memory usage, or a combination of these metrics.

    Task description

    The task is to reduce and report the execution time and computing resources(CPU and memory usage) for the parameters listed in this challenge, comparedto the state-of-the-art of the existing tools and the baseline results providedby this challenge. This challenge is not limited to execution times to createthe fastest pipeline, but also computing resources to achieve the most efficientpipeline.

    We provide a tool which can execute such pipelines end-to-end. This tool alsocollects and aggregates the metrics such as execution time, CPU and memoryusage, necessary for this challenge as CSV files. Moreover, the informationabout the hardware used during the execution of the pipeline is available aswell to allow fairly comparing different pipelines. Your pipeline should consistof Docker images which can be executed on Linux to run the tool. The tool isalready tested with existing systems, relational databases e.g. MySQL andPostgreSQL, and triplestores e.g. Apache Jena Fuseki and OpenLink Virtuosowhich can be combined in any configuration. It is strongly encouraged to usethis tool for participating in this challenge. If you prefer to use a differenttool or our tool imposes technical requirements you cannot solve, please contactus directly.

    Track 1: Conformance

    The set of new specification for the RDF Mapping Language (RML) established by the W3C Community Group on Knowledge Graph Construction provide a set of test-cases for each module:

    RML-Core

    RML-IO

    RML-CC

    RML-FNML

    RML-Star

    These test-cases are evaluated in this Track of the Challenge to determine their feasibility, correctness, etc. by applying them in implementations. This Track is in Beta status because these new specifications have not seen any implementation yet, thus it may contain bugs and issues. If you find problems with the mappings, output, etc. please report them to the corresponding repository of each module.

    Note: validating the output of the RML Star module automatically through the provided tooling is currently not possible, see https://github.com/kg-construct/challenge-tool/issues/1.

    Through this Track we aim to spark development of implementations for the new specifications and improve the test-cases. Let us know your problems with the test-cases and we will try to find a solution.

    Track 2: Performance

    Part 1: Knowledge Graph Construction Parameters

    These parameters are evaluated using synthetic generated data to have moreinsights of their influence on the pipeline.

    Data

    Number of data records: scaling the data size vertically by the number of records with a fixed number of data properties (10K, 100K, 1M, 10M records).

    Number of data properties: scaling the data size horizontally by the number of data properties with a fixed number of data records (1, 10, 20, 30 columns).

    Number of duplicate values: scaling the number of duplicate values in the dataset (0%, 25%, 50%, 75%, 100%).

    Number of empty values: scaling the number of empty values in the dataset (0%, 25%, 50%, 75%, 100%).

    Number of input files: scaling the number of datasets (1, 5, 10, 15).

    Mappings

    Number of subjects: scaling the number of subjects with a fixed number of predicates and objects (1, 10, 20, 30 TMs).

    Number of predicates and objects: scaling the number of predicates and objects with a fixed number of subjects (1, 10, 20, 30 POMs).

    Number of and type of joins: scaling the number of joins and type of joins (1-1, N-1, 1-N, N-M)

    Part 2: GTFS-Madrid-Bench

    The GTFS-Madrid-Bench provides insights in the pipeline with real data from thepublic transport domain in Madrid.

    Scaling

    GTFS-1 SQL

    GTFS-10 SQL

    GTFS-100 SQL

    GTFS-1000 SQL

    Heterogeneity

    GTFS-100 XML + JSON

    GTFS-100 CSV + XML

    GTFS-100 CSV + JSON

    GTFS-100 SQL + XML + JSON + CSV

    Example pipeline

    The ground truth dataset and baseline results are generated in different stepsfor each parameter:

    The provided CSV files and SQL schema are loaded into a MySQL relational database.

    Mappings are executed by accessing the MySQL relational database to construct a knowledge graph in N-Triples as RDF format

    The pipeline is executed 5 times from which the median execution time of eachstep is calculated and reported. Each step with the median execution time isthen reported in the baseline results with all its measured metrics.Knowledge graph construction timeout is set to 24 hours. The execution is performed with the following tool: https://github.com/kg-construct/challenge-tool,you can adapt the execution plans for this example pipeline to your own needs.

    Each parameter has its own directory in the ground truth dataset with thefollowing files:

    Input dataset as CSV.

    Mapping file as RML.

    Execution plan for the pipeline in metadata.json.

    Datasets

    Knowledge Graph Construction Parameters

    The dataset consists of:

    Input dataset as CSV for each parameter.

    Mapping file as RML for each parameter.

    Baseline results for each parameter with the example pipeline.

    Ground truth dataset for each parameter generated with the example pipeline.

    Format

    All input datasets are provided as CSV, depending on the parameter that is beingevaluated, the number of rows and columns may differ. The first row is alwaysthe header of the CSV.

    GTFS-Madrid-Bench

    The dataset consists of:

    Input dataset as CSV with SQL schema for the scaling and a combination of XML,

    CSV, and JSON is provided for the heterogeneity.

    Mapping file as RML for both scaling and heterogeneity.

    SPARQL queries to retrieve the results.

    Baseline results with the example pipeline.

    Ground truth dataset generated with the example pipeline.

    Format

    CSV datasets always have a header as their first row.JSON and XML datasets have their own schema.

    Evaluation criteria

    Submissions must evaluate the following metrics:

    Execution time of all the steps in the pipeline. The execution time of a step is the difference between the begin and end time of a step.

    CPU time as the time spent in the CPU for all steps of the pipeline. The CPU time of a step is the difference between the begin and end CPU time of a step.

    Minimal and maximal memory consumption for each step of the pipeline. The minimal and maximal memory consumption of a step is the minimum and maximum calculated of the memory consumption during the execution of a step.

    Expected output

    Duplicate values

    Scale Number of Triples

    0 percent 2000000 triples

    25 percent 1500020 triples

    50 percent 1000020 triples

    75 percent 500020 triples

    100 percent 20 triples

    Empty values

    Scale Number of Triples

    0 percent 2000000 triples

    25 percent 1500000 triples

    50 percent 1000000 triples

    75 percent 500000 triples

    100 percent 0 triples

    Mappings

    Scale Number of Triples

    1TM + 15POM 1500000 triples

    3TM + 5POM 1500000 triples

    5TM + 3POM 1500000 triples

    15TM + 1POM 1500000 triples

    Properties

    Scale Number of Triples

    1M rows 1 column 1000000 triples

    1M rows 10 columns 10000000 triples

    1M rows 20 columns 20000000 triples

    1M rows 30 columns 30000000 triples

    Records

    Scale Number of Triples

    10K rows 20 columns 200000 triples

    100K rows 20 columns 2000000 triples

    1M rows 20 columns 20000000 triples

    10M rows 20 columns 200000000 triples

    Joins

    1-1 joins

    Scale Number of Triples

    0 percent 0 triples

    25 percent 125000 triples

    50 percent 250000 triples

    75 percent 375000 triples

    100 percent 500000 triples

    1-N joins

    Scale Number of Triples

    1-10 0 percent 0 triples

    1-10 25 percent 125000 triples

    1-10 50 percent 250000 triples

    1-10 75 percent 375000 triples

    1-10 100 percent 500000 triples

    1-5 50 percent 250000 triples

    1-10 50 percent 250000 triples

    1-15 50 percent 250005 triples

    1-20 50 percent 250000 triples

    1-N joins

    Scale Number of Triples

    10-1 0 percent 0 triples

    10-1 25 percent 125000 triples

    10-1 50 percent 250000 triples

    10-1 75 percent 375000 triples

    10-1 100 percent 500000 triples

    5-1 50 percent 250000 triples

    10-1 50 percent 250000 triples

    15-1 50 percent 250005 triples

    20-1 50 percent 250000 triples

    N-M joins

    Scale Number of Triples

    5-5 50 percent 1374085 triples

    10-5 50 percent 1375185 triples

    5-10 50 percent 1375290 triples

    5-5 25 percent 718785 triples

    5-5 50 percent 1374085 triples

    5-5 75 percent 1968100 triples

    5-5 100 percent 2500000 triples

    5-10 25 percent 719310 triples

    5-10 50 percent 1375290 triples

    5-10 75 percent 1967660 triples

    5-10 100 percent 2500000 triples

    10-5 25 percent 719370 triples

    10-5 50 percent 1375185 triples

    10-5 75 percent 1968235 triples

    10-5 100 percent 2500000 triples

    GTFS Madrid Bench

    Generated Knowledge Graph

    Scale Number of Triples

    1 395953 triples

    10 3959530 triples

    100 39595300 triples

    1000 395953000 triples

    Queries

    Query Scale 1 Scale 10 Scale 100 Scale 1000

    Q1 58540 results 585400 results No results available No results available

    Q2 636 results 11998 results
    125565 results 1261368 results

    Q3 421 results 4207 results 42067 results 420667 results

    Q4 13 results 130 results 1300 results 13000 results

    Q5 35 results 350 results 3500 results 35000 results

    Q6 1 result 1 result 1 result 1 result

    Q7 68 results 67 results 67 results 53 results

    Q8 35460 results 354600 results No results available No results available

    Q9 130 results 1300

  20. d

    Montenegro Cities Database

    • download-cities-data.org
    xlsx
    Updated Jun 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Download Cities Database (2025). Montenegro Cities Database [Dataset]. https://www.download-cities-data.org/Montenegro.php
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 10, 2025
    Dataset authored and provided by
    Download Cities Database
    Time period covered
    2024
    Area covered
    Montenegro
    Description

    Paid dataset with city names, coordinates, regions, and administrative divisions of Montenegro. Available in Excel (.xlsx), CSV, JSON, XML, and SQL formats after purchase.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Tomas Petricek (2016). F# Data: Making structured data first-class [Dataset]. http://doi.org/10.6084/m9.figshare.1169941.v1
Organization logo

F# Data: Making structured data first-class

Explore at:
3 scholarly articles cite this dataset (View in Google Scholar)
binAvailable download formats
Dataset updated
Jan 19, 2016
Dataset provided by
Figsharehttp://figshare.com/
Authors
Tomas Petricek
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Accessing data in structured formats such as XML, CSV and JSON in statically typed languages is difficult, because the languages do not understand the structure of the data. Dynamically typed languages make this syntactically easier, but lead to error-prone code. Despite numerous efforts, most of the data available on the web do not come with a schema. The only information available to developers is a set of examples, such as typical server responses. We describe an inference algorithm that infers a type of structured formats including CSV, XML and JSON. The algorithm is based on finding a common supertype of types representing individual samples (or values in collections). We use the algorithm as a basis for an F# type provider that integrates the inference into the F# type system. As a result, users can access CSV, XML and JSON data in a statically-typed fashion just by specifying a representative sample document.

Search
Clear search
Close search
Google apps
Main menu