100+ datasets found
  1. h

    example-space-to-dataset-json

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lucain Pouget, example-space-to-dataset-json [Dataset]. https://huggingface.co/datasets/Wauplin/example-space-to-dataset-json
    Explore at:
    Authors
    Lucain Pouget
    Description
  2. Ecommerce Data | Product & Customer Review Data | Scrape Any Website | FREE...

    • datarade.ai
    .json, .xml, .csv
    Updated Nov 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PromptCloud (2023). Ecommerce Data | Product & Customer Review Data | Scrape Any Website | FREE Sample Available | Custom Scraping Services | PromptCloud [Dataset]. https://datarade.ai/data-products/ecommerce-data-product-and-customer-review-dataset-from-eco-promptcloud
    Explore at:
    .json, .xml, .csvAvailable download formats
    Dataset updated
    Nov 30, 2023
    Dataset authored and provided by
    PromptCloud
    Area covered
    Bahrain, Niger, Peru, Mayotte, Nigeria, Monaco, Botswana, Latvia, Wallis and Futuna, Estonia
    Description

    PromptCloud offers specialized data extraction services for eCommerce businesses, focusing on acquiring detailed product and customer review datasets from a variety of eCommerce websites. This service is instrumental for businesses aiming to refine their eCommerce strategies through in-depth market analysis, competitive research, and enhanced customer insights.

    Customization is a key aspect of PromptCloud's offerings. PromptCloud provides bespoke scraping services, tailored to the unique requirements of each business. This adaptability is especially beneficial for companies seeking a competitive advantage in the dynamic eCommerce market. A distinctive feature of PromptCloud's approach is the provision of a free sample, allowing potential clients to experience the quality and accuracy of their data firsthand. This commitment to quality is reflected in their use of advanced technologies that ensure the delivery of precise, up-to-date data.

    PromptCloud's versatility extends to data delivery, offering various formats like JSON, CSV, and XML. This flexibility facilitates seamless integration of data into different business systems, highlighting their focus on creating user-friendly and effective solutions.

    PromptCloud positions itself as a vital resource for eCommerce businesses looking to utilize data for strategic planning and customer understanding. Their tailored scraping services, combined with a commitment to delivering current and accurate data, make PromptCloud the best option for businesses seeking to improve their market presence and deepen their understanding of customer behavior.

    We are committed to putting data at the heart of your business. Reach out for a no-frills PromptCloud experience- professional, technologically ahead and reliable.

  3. JSON Repository

    • data.amerigeoss.org
    • cloud.csiss.gmu.edu
    • +2more
    csv, geojson, json +1
    Updated Feb 26, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UN Humanitarian Data Exchange (2025). JSON Repository [Dataset]. https://data.amerigeoss.org/dataset/json-repository
    Explore at:
    geojson(9124), csv(335), csv(242), json(707249), geojson(886086), geojson(953043), json(1132925), json(3478518), csv(9901), csv(177073), csv(536), csv(669568), geojson(365288), csv(462610), geojson(545299), csv(85982), json(559095), json(876253), csv(6789), json(1975854), json(520472), geojson(2396630), geojson(709673), json(640845), json(457832), csv(177), geojson(178718), csv(845984), geojson(219728), geojson(543777), geojson(162605), csv(358964), csv(4907), geojson(222216), geojson(74470), geojson(1324722), json(632081), json(3411081), geojson(164379), geojson(366788), geojson(54889), csv(779), topojson(2728099), geojson(135805), json(2064743), csv(9980), json(461423), json(327649), json(3401512)Available download formats
    Dataset updated
    Feb 26, 2025
    Dataset provided by
    United Nationshttp://un.org/
    Description

    This dataset contains resources transformed from other datasets on HDX. They exist here only in a format modified to support visualization on HDX and may not be as up to date as the source datasets from which they are derived.

    Source datasets: https://data.hdx.rwlabs.org/dataset/idps-data-by-region-in-mali

  4. O

    Sample of Drugs from QHP drug.json files

    • healthdata.demo.socrata.com
    csv, xlsx, xml
    Updated Apr 16, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2016). Sample of Drugs from QHP drug.json files [Dataset]. https://healthdata.demo.socrata.com/CMS-Insurance-Plans/Sample-of-Drugs-from-QHP-drug-json-files/jaa8-k3k2
    Explore at:
    csv, xlsx, xmlAvailable download formats
    Dataset updated
    Apr 16, 2016
    Description
  5. h

    example-space-to-dataset-json

    • huggingface.co
    Updated Jun 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    t (2024). example-space-to-dataset-json [Dataset]. https://huggingface.co/datasets/taichi256/example-space-to-dataset-json
    Explore at:
    Dataset updated
    Jun 8, 2024
    Authors
    t
    Description

    taichi256/example-space-to-dataset-json dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. g

    JSON 1db5b7f3-712a-4977-820d-c3d33eb6db86 JSON 다운로드

    • gimi9.com
    Updated Sep 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). JSON 1db5b7f3-712a-4977-820d-c3d33eb6db86 JSON 다운로드 [Dataset]. https://gimi9.com/dataset/eu_5ec3a046c9e9abed50d770a9
    Explore at:
    Dataset updated
    Sep 1, 2024
    Description

    This dataset contains the articles published on the Covid-19 FAQ for companies published by the Directorate-General for Enterprises at https://info-entreprises-covid19.economie.fr The data are presented in the JSON format as follows: JSON [ { “title”: “Example article for documentation”, “content”: [ this is the first page of the article. here the second, “‘div’these articles incorporate some HTML formatting‘/div’” ], “path”: [ “File to visit in the FAQ”, “to join the article”] }, ... ] “'” The update is done every day at 6:00 UTC. This data is extracted directly from the site, the source code of the script used to extract the data is available here: https://github.com/chrnin/docCovidDGE

  7. c

    Complete News Data Extracted from CNBC in JSON Format: Covering Business,...

    • crawlfeeds.com
    json, zip
    Updated Mar 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Complete News Data Extracted from CNBC in JSON Format: Covering Business, Finance, Technology, and Global Trends for Europe, US, and UK Audiences [Dataset]. https://crawlfeeds.com/datasets/complete-news-data-extracted-from-cnbc-in-json-format-covering-business-finance-technology-and-global-trends-for-europe-us-and-uk-audiences
    Explore at:
    zip, jsonAvailable download formats
    Dataset updated
    Mar 24, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Area covered
    United States
    Description

    We have successfully extracted a comprehensive news dataset from CNBC, covering not only financial updates but also an extensive range of news categories relevant to diverse audiences in Europe, the US, and the UK. This dataset includes over 500,000 records, meticulously structured in JSON format for seamless integration and analysis.

    Diverse News Segments for In-Depth Analysis

    This extensive extraction spans multiple segments, such as:

    • Business and Market Analysis: Stay updated on major companies, mergers, and acquisitions.
    • Technology and Innovation: Explore developments in AI, cybersecurity, and digital transformation.
    • Economic Forecasts: Access insights into GDP, employment rates, inflation, and other economic indicators.
    • Geopolitical Developments: Understand the impact of political events and global trade dynamics on markets.
    • Personal Finance: Learn about saving strategies, investment tips, and real estate trends.

    Each record in the dataset is enriched with metadata tags, enabling precise filtering by region, sector, topic, and publication date.

    Why Choose This Dataset?

    The comprehensive news dataset provides real-time insights into global developments, corporate strategies, leadership changes, and sector-specific trends. Designed for media analysts, research firms, and businesses, it empowers users to perform:

    • Trend Analysis
    • Sentiment Analysis
    • Predictive Modeling

    Additionally, the JSON format ensures easy integration with analytics platforms for advanced processing.

    Access More News Datasets

    Looking for a rich repository of structured news data? Visit our news dataset collection to explore additional offerings tailored to your analysis needs.

    Sample Dataset Available

    To get a preview, check out the CSV sample of the CNBC economy articles dataset.

  8. O

    Sample of Providers from QHP provider.json files

    • healthdata.demo.socrata.com
    csv, xlsx, xml
    Updated Apr 16, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2016). Sample of Providers from QHP provider.json files [Dataset]. https://healthdata.demo.socrata.com/CMS-Insurance-Plans/Sample-of-Providers-from-QHP-provider-json-files/axbq-xnwy
    Explore at:
    xlsx, xml, csvAvailable download formats
    Dataset updated
    Apr 16, 2016
    Description
  9. Z

    Example FAIRtracks JSON document - augmented

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Example FAIRtracks JSON document - augmented [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3984946
    Explore at:
    Dataset updated
    Jul 20, 2023
    Dataset provided by
    Daniel Zerbino
    Sveinung Gundersen
    Finn Drabløs
    Dmytro Titov
    José M. Fernández
    Kieron Taylor
    Salvador Capella-Gutierrez
    Eivind Hovig
    Radmila Kompova
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Background

    Many types of data from genomic analyses can be represented as genomic tracks, i.e. features linked to the genomic coordinates of a reference genome. Examples of such data are epigenetic DNA methylation data, ChIP-seq peaks, germline or somatic DNA variants, or RNA-seq expression levels. Researchers often face difficulties in locating, accessing and combining relevant tracks from external sources, as well as locating the raw data, reducing the value of the generated information.

    FAIRtracks software ecosystem

    We have, as an output of the ELIXIR Implementation Study "FAIRification of Genomic Tracks", developed a basic set of recommendations for genomic track metadata together with an implementation called FAIRtracks in the form of a JSON Schema. We propose FAIRtracks as a draft standard for genomic track metadata in order to advance the application of FAIR data principles (Findable, Accessible, Interoperable, and Reusable). We have demonstrated practical usage of this approach by designing a software ecosystem around the FAIRtracks draft standard, integrating globally identifiable metadata from various track hubs in the Track Hub Registry and other relevant repositories into a novel track search service, called TrackFind. The software ecosystem also includes the FAIRtracks augmentation service, which assists metadata producers by automatically augmenting minimal machine-readable metadata with their human-readable counterparts, as well as the FAIRtracks validation service, which extends basic JSON Schema validation to include FAIR-related features (global identifiers, ontology terms, and object references). Finally, we have implemented track metadata search and import functionality into relevant analytical tools: EPICO and the GSuite HyperBrowser. For an overview of the FAIRtracks software ecosystem, please visit: http://fairtracks.github.io/

    Example FAIRtracks JSON document - augmented

    The "Example FAIRtracks JSON document - augmented" is generated as part of the build process of the FAIRtracks draft standard JSON Schema (source code: https://github.com/fairtracks/fairtracks_standard/). The example FAIRtracks document contains a small selection of tracks and objects from the ENCODE project metadata (https://www.encodeproject.org/), adapted to align with the FAIRtracks draft standard. In addition to being available in the above-mentioned GitHub repository, the "Example FAIRtracks JSON document - augmented" is also published here on Zenodo in order for the document to be globally uniquely identifiable by a Digital Object Identifier (DOI).

  10. F# Data: Making structured data first-class

    • figshare.com
    bin
    Updated Jan 19, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tomas Petricek (2016). F# Data: Making structured data first-class [Dataset]. http://doi.org/10.6084/m9.figshare.1169941.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Jan 19, 2016
    Dataset provided by
    figshare
    Authors
    Tomas Petricek
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Accessing data in structured formats such as XML, CSV and JSON in statically typed languages is difficult, because the languages do not understand the structure of the data. Dynamically typed languages make this syntactically easier, but lead to error-prone code. Despite numerous efforts, most of the data available on the web do not come with a schema. The only information available to developers is a set of examples, such as typical server responses. We describe an inference algorithm that infers a type of structured formats including CSV, XML and JSON. The algorithm is based on finding a common supertype of types representing individual samples (or values in collections). We use the algorithm as a basis for an F# type provider that integrates the inference into the F# type system. As a result, users can access CSV, XML and JSON data in a statically-typed fashion just by specifying a representative sample document.

  11. Z

    Json file from Twitter API used for benchmarking Jsonpath

    • data.niaid.nih.gov
    • zenodo.org
    Updated Oct 19, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paperman, Charles (2022). Json file from Twitter API used for benchmarking Jsonpath [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7225576
    Explore at:
    Dataset updated
    Oct 19, 2022
    Dataset authored and provided by
    Paperman, Charles
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A JSON file used as an example to illustrate queries and to benchmark some tool.

  12. TrainingDML-AI JSON Encoding

    • figshare.com
    txt
    Updated Jun 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peng Yue; Boyi Shangguan (2022). TrainingDML-AI JSON Encoding [Dataset]. http://doi.org/10.6084/m9.figshare.16625071.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 1, 2022
    Dataset provided by
    figshare
    Authors
    Peng Yue; Boyi Shangguan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Prototype of TrainingDML-AI JSON schema and examples in the paper "Towards an interoperable training data markup language for artificial intelligence in earth observation"

  13. d

    The DREAM Dataset: Behavioural data from robot enhanced therapies for...

    • b2find.dkrz.de
    Updated Jul 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). The DREAM Dataset: Behavioural data from robot enhanced therapies for children with autism spectrum disorder - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/26ebbc54-d162-5fde-9cce-03f72eb5ab4b
    Explore at:
    Dataset updated
    Jul 1, 2024
    Description

    This dataset comprise behavioural data recorded from 61 children diagnosed with Autism Spectrum Disorders (ASD). The data was collected during a large-scale evaluation of Robot Enhanced Therapy (RET). The dataset covers over 3000 therapy sessions and more than 300 hours of therapy. Half of the children interacted with the social robot NAO supervised by a therapist. The other half, constituting a control group, interacted directly with a therapist. Both groups followed the Applied Behavior Analysis (ABA) protocol. Each session was recorded with three RGB cameras and two RGBD (Kinect) cameras, providing detailed information of children's behaviour during therapy. This public release of the dataset noes not include video recordings or other personal information. Instead, it comprises body motion, head position and orientation, and eye gaze variables, all specified as 3D data in a joint frame of reference. In addition, metadata including participant age, gender, and autism diagnosis (ADOS) variables are included. All data in this dataset is stored in JavaScript Object Notation (JSON) and can be downloaded here as DREAMdataset.zip. A much smaller archive comprising example data recorded from a single session is provided in DREAMdata-example.zip. The JSON format is specified in detail by the JSON Schema (dream.1.1.json) provided with this dataset. JSON data can be read using standard libraries in most programming languages. Basic instructions on how to load and plot the data using Python and Jupyter are available in DREAMdata-documentation.zip attached with this dataset. Please refer to https://github.com/dream2020/data for more details. The DREAM Dataset can be visualized using the DREAM Data Visualizer, an open source software available at https://github.com/dream2020/DREAM-data-visualizer. The DREAM RET System that was used for collecting this dataset is available at https://github.com/dream2020/DREAM. Denna databas omfattar beteendedata från 61 barn diagnostiserade med Autismspektrumtillstånd (AST). Insamlat data kommer från en storskalig studie på autismterapi med stöd av robotar. Databasen omfattar över 3000 sessioner från mer än 300 timmar terapi. Hälften av barnen interagerade med den sociala roboten NAO, övervakad av en terapeut. Den andra hälften, vilka utgjorde kontrollgrupp, interagerade direkt med en terapeut. Båda grupperna följde samma standardprotokoll för kognitiv beteendeterapi, Applied Behavior Analysis (ABA). Varje session spelades in med tre RGB-kameror och två RGBD kameror (Kinect) vilka analyserats med bildbehandlingstekniker för att identifiera barnets beteende under terapin. Den här publika versionen av databasen innehåller inget inspelat videomaterial eller andra personuppgifter, utan omfattar i stället anonymiserat data som beskriver barnets rörelser, huvudets position och orientering, samt ögonrörelser, alla angivna i ett gemensamt koordinatsystem. Vidare inkluderas metadata i form av barnets ålder, kön, och autismdiagnos (ADOS). All data i den här databasen är lagrad som JavaScript Object Notation (JSON) kan här laddas ned i form av DREAMdataset.zip. Ett mycket mindre arkiv med exempeldata från en enstaka session kan laddas ned separat i form av DREAMdata-example.zip. JSON-formatet finns specificerat i form av ett JSON-schema som också bifogas med denna databas. JSON kan läsas med hjälp av standardbibliotek i de flesta programspråk. Instruktioner för att läsa och visualisera datat med hjälp av Python och Jupyter bifogas i DREAMdata-documentation.zip. Vänligen besök https://github.com/dream2020/data för detaljer. Databasen kan också visualiseras med hjälp av DREAM Data Visualizer, en enkel mjukvara som finns tillgänglig i form av öppen källkod via https://github.com/dream2020/DREAM-data-visualizer. Det fullständiga systemet som användes för inspelning av denna databas finns också tillgänglig via https://github.com/dream2020/DREAM.

  14. Example Microscopy Metadata JSON files produced using Micro-Meta App to...

    • zenodo.org
    • data.niaid.nih.gov
    json, tiff
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Karl Bellve; Alessandro Rigano; Kevin Fogarty; Kevin Fogarty; Caterina Strambio-De-Castillia; Caterina Strambio-De-Castillia; Karl Bellve; Alessandro Rigano (2024). Example Microscopy Metadata JSON files produced using Micro-Meta App to document the acquisition of example images using a custom-built TIRF Epifluorescence Structured Illumination Microscope [Dataset]. http://doi.org/10.5281/zenodo.4891883
    Explore at:
    json, tiffAvailable download formats
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Karl Bellve; Alessandro Rigano; Kevin Fogarty; Kevin Fogarty; Caterina Strambio-De-Castillia; Caterina Strambio-De-Castillia; Karl Bellve; Alessandro Rigano
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Example Microscopy Metadata JSON files produced using the Micro-Meta App documenting an example raw-image file acquired using the custom-built TIRF Epifluorescence Structured Illumination Microscope.

    For this use case, which is presented in Figure 5 of Rigano et al., 2021, Micro-Meta App was utilized to document:

    1) The Hardware Specifications of the custom build TIRF Epifluorescence Structured light Microscope (TESM; Navaroli et al., 2010) developed, built on the basis of the based on Olympus IX71 microscope stand, and owned by the Biomedical Imaging Group (http://big.umassmed.edu/) at the Program in Molecular Medicine of the University of Massachusetts Medical School. Because TESM was custom-built the most appropriate documentation level is Tier 3 (Manufacturing/Technical Development/Full Documentation) as specified by the 4DN-BINA-OME Microscopy Metadata model (Hammer et al., 2021).

    The TESM Hardware Specifications are stored in: Rigano et al._Figure 5_UseCase_Biomedical Imaging Group_TESM.JSON

    2) The Image Acquisition Settings that were applied to the TESM microscope for the acquisition of an example image (FSWT-6hVirus-10minFIX-stk_4-EPI.tif.ome.tif) obtained by Nicholas Vecchietti and Caterina Strambio-De-Castillia. For this image, TZM-bl human cells were infected with HIV-1 retroviral three-part vector (FSWT+PAX2+pMD2.G). Six hours post-infection cells were fixed for 10 min with 1% formaldehyde in PBS, and permeabilized. Cells were stained with mouse anti-p24 primary antibody followed by DyLight488-anti-Mouse secondary antibody, to detect HIV-1 viral Capsid. In addition, cells were counterstained using rabbit anti-Lamin B1 primary antibody followed by DyLight649-anti-Rabbit secondary antibody, to visualize the nuclear envelope and with DAPI to visualize the nuclear chromosomal DNA.

    The Image Acquisition Settings used to acquire the FSWT-6hVirus-10minFIX-stk_4-EPI.tif.ome.tif image are stored in: Rigano et al._Figure 5_UseCase_AS_fswt-6hvirus-10minfix-stk_4-epi.tif.JSON

    Instructional video tutorials on how to use these example data files:
    Use these videos to get started with using Micro-Meta App after downloading the example data files available here.

  15. top50_hashtags.json

    • figshare.com
    txt
    Updated Jan 16, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bjarke Mønsted (2019). top50_hashtags.json [Dataset]. http://doi.org/10.6084/m9.figshare.6007322.v3
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jan 16, 2019
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Bjarke Mønsted
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Contains counts of the 50 most common hashtags on Twitter from 2013 through 2016, sampling each hour. The sample was obtained using a 10% sample of all tweets.

  16. Unicode 10.0 Character Database in JSON

    • kaggle.com
    Updated Dec 15, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rachael Tatman (2017). Unicode 10.0 Character Database in JSON [Dataset]. https://www.kaggle.com/datasets/rtatman/unicode-100-character-database-in-json/discussion?sortBy=hot
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 15, 2017
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Rachael Tatman
    Description

    Context:

    In working on Unicode implementations, it is often useful to access the full content of the Unicode Character Database (UCD). For example, in establishing mappings from characters to glyphs in fonts, it is convenient to see the character scalar value, the character name, the character East Asian width, along with the shape and metrics of the proposed glyph to map to; looking at all this data simultaneously helps in evaluating the mapping.

    This is a machine-readable version of the Unicode Character Database in JSON format.

    Content:

    The majority of information about individual codepoints is represented using properties. Each property, except for the Special_Case_Condition and Name_Alias properties, is represented by an attribute. In an XML data file, the absence of an attribute (may be only on some code-points) means that the document does not express the value of the corresponding property. Conversely, the presence of an attribute is an expression of the corresponding property value; the implied null value is represented by the empty string.

    The Name_Alias property is represented by zero or more name-alias child elements. Unlike the situation for properties represented by attributes, it is not possible to determine whether all of the aliases have been represented in a data file by inspecting that data file.

    The name of an attribute is the abbreviated name of the property as given in the file PropertyAliases.txt in version 6.1.0 of the UCD. For the Unihan properties, the name is that given in the various versions of the Unihan database (some properties are no longer present in version 6.1.0).

    For catalog and enumerated properties, the values are those listed in the file PropertyValueAliases.txt in version 6.1.0 of the UCD; if there is an abbreviated name, it is used, otherwise the long name is used. Note that the set of possible values for a property captured in this schema may change from one version to the next.

    The following properties are associated with code points:

    • Age property
    • Name properties
    • Name Aliases
    • Block
    • General Category
    • Combining properties
    • Bidirectionality properties
    • Decomposition properties
    • Numeric Properties
    • Joining properties
    • Linebreak properties
    • East Asian Width property
    • Case properties
    • Script properties
    • ISO Comment properties
    • Hangul properties
    • Indic properties
    • Identifier and Pattern and programming language properties
    • Properties related to function and graphic characteristics
    • Properties related to boundaries
    • Properties related to ideographs
    • Miscellaneous properties
    • Unihan properties
    • Tangut data
    • Nushu data

    For additional information, please consult the full documentation on the Unicode website.

    Acknowledgements:

    Copyright © 1991-2017 Unicode, Inc. All rights reserved. Distributed under the Terms of Use in http://www.unicode.org/copyright.html.

    Permission is hereby granted, free of charge, to any person obtaining a copy of the Unicode data files and any associated documentation (the "Data Files") or Unicode software and any associated documentation (the "Software") to deal in the Data Files or Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, and/or sell copies of the Data Files or Software, and to permit persons to whom the Data Files or Software are furnished to do so, provided that either (a) this copyright and permission notice appear with all copies of the Data Files or Software, or (b) this copyright and permission notice appear in associated Documentation.

  17. g

    Collections database | gimi9.com

    • gimi9.com
    Updated Nov 1, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2013). Collections database | gimi9.com [Dataset]. https://gimi9.com/dataset/uk_collections-database
    Explore at:
    Dataset updated
    Nov 1, 2013
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    We offer two data formats: A richer dataset is provided in the JSON format, which is organised by the directory structure of the Git repository. JSON supports more hierarchical or nested information such as subjects. We also provide CSVs of flattened data, which is less comprehensive but perhaps easier to grok. The CSVs provide a good introduction to overall contents of the Tate metadata and create opportunities for artistic pivot tables. JSON Artists Each artist has his or her own JSON file. They are found in the artists folder, then filed away by first letter of the artist’s surname. Artworks Artworks are found in the artworks folder. They are filed away by accession number. This is the unique identifier given to artworks when they come into the Tate collection. In many cases, the format has significance. For example, the ar accession number prefix indicates that the artwork is part of ARTIST ROOMS collection. The n prefix indicates works that once were part of the National Gallery collection. CSV There is one CSV file for artists (artist_data.csv) and one (very large) for artworks (artwork_data.csv), which we may one day break up into more manageable chunks. The CSV headings should be helpful. Let us know if not. Entrepreneurial hackers could use the CSVs as an index to the JSON collections if they wanted richer data. Usage guidelines for open data These usage guidelines are based on goodwill. They are not a legal contract but Tate requests that you follow these guidelines if you use Metadata from our Collection dataset. The Metadata published by Tate is available free of restrictions under the Creative Commons Zero Public Domain Dedication. This means that you can use it for any purpose without having to give attribution. However, Tate requests that you actively acknowledge and give attribution to Tate wherever possible. Attribution supports future efforts to release other data. It also reduces the amount of ‘orphaned data’, helping retain links to authoritative sources. Give attribution to Tate Make sure that others are aware of the rights status of Tate and are aware of these guidelines by keeping intact links to the Creative Commons Zero Public Domain Dedication. If for technical or other reasons you cannot include all the links to all sources of the Metadata and rights information directly with the Metadata, you should consider including them separately, for example in a separate document that is distributed with the Metadata or dataset. If for technical or other reasons you cannot include all the links to all sources of the Metadata and rights information, you may consider linking only to the Metadata source on Tate’s website, where all available sources and rights information can be found, including in machine readable formats. Metadata is dynamic When working with Metadata obtained from Tate, please be aware that this Metadata is not static. It sometimes changes daily. Tate continuously updates its Metadata in order to correct mistakes and include new and additional information. Museum collections are under constant study and research, and new information is frequently added to objects in the collection. Mention your modifications of the Metadata and contribute your modified Metadata back Whenever you transform, translate or otherwise modify the Metadata, make it clear that the resulting Metadata has been modified by you. If you enrich or otherwise modify Metadata, consider publishing the derived Metadata without reuse restrictions, preferably via the Creative Commons Zero Public Domain Dedication. Be responsible Ensure that you do not use the Metadata in a way that suggests any official status or that Tate endorses you or your use of the Metadata, unless you have prior permission to do so. Ensure that you do not mislead others or misrepresent the Metadata or its sources Ensure that your use of the Metadata does not breach any national legislation based thereon, notably concerning (but not limited to) data protection, defamation or copyright. Please note that you use the Metadata at your own risk. Tate offers the Metadata as-is and makes no representations or warranties of any kind concerning any Metadata published by Tate. The writers of these guidelines are deeply indebted to the Smithsonian Cooper-Hewitt, National Design Museum; and Europeana.

  18. i

    kat-comp.dist_analysis.json

    • doi.ipk-gatersleben.de
    Updated Oct 22, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thomas Schmutzer; Sebastian Beier; Chris Ulpinnis; Markus Schwalbe; Thomas Münch; Uwe Scholz; Thomas Schmutzer (2019). kat-comp.dist_analysis.json [Dataset]. https://doi.ipk-gatersleben.de/DOI/10fdd0bb-825f-459a-9d08-7c04066208f0/ec324188-ccf2-4de3-8b38-73c53c7eee5e/1
    Explore at:
    Dataset updated
    Oct 22, 2019
    Dataset provided by
    e!DAL - Plant Genomics and Phenomics Research Data Repository (PGP), IPK Gatersleben, Seeland OT Gatersleben, Corrensstraße 3, 06466, Germany
    Authors
    Thomas Schmutzer; Sebastian Beier; Chris Ulpinnis; Markus Schwalbe; Thomas Münch; Uwe Scholz; Thomas Schmutzer
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Three data sets providing assistance and guidance of data processing using Kmasker plants. Also, we perform a comparison of Kmasker plants functionality with the KAT tool. This DOI holds input and output data of this analysis. Example_1 uses an Aegilops speltoides dataset and results show that tested repeat sequences have B chromosome origin. Example_2 uses the winter barley specific gene VRN-H2 and results show that it is absent in spring barley cultivar Morex. Example_3 uses the full barley gene set and compares winter and spring barley presence/absence. Related commands and updates of this tutorial are provided on GitHub in the tutorial section of Kmasker plants. For the most recent version of this tutorial please have a look to the project page (https://github.com/tschmutzer/kmasker).

  19. d

    Startup Data | Company Data | Refreshed 2x/Mo | Delivery Hourly via...

    • datarade.ai
    .json, .csv, .sql
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Forager.ai, Startup Data | Company Data | Refreshed 2x/Mo | Delivery Hourly via CSV/JSON/PostgreSQL DB Delivery [Dataset]. https://datarade.ai/data-products/startup-data-company-data-refreshed-2x-mo-delivery-hour-forager-ai
    Explore at:
    .json, .csv, .sqlAvailable download formats
    Dataset provided by
    Forager.ai
    Area covered
    Angola, Bangladesh, Cameroon, Northern Mariana Islands, Saint Vincent and the Grenadines, Dominica, New Zealand, Oman, Swaziland, Somalia
    Description

    The Forager.ai Global Dataset is a leading source of firmographic data, backed by advanced AI and offering the highest refresh rate in the industry.

    | Volume and Stats |

    • Over 60M total records, the highest volume in the industry today.
    • Every company record refreshed twice a month, offering an unparalleled update frequency.
    • Delivery is made every hour, ensuring you have the latest data at your fingertips.
    • Each record is the result of an advanced AI-driven process, ensuring high-quality, accurate data.

    | Use Cases |

    Sales Platforms, ABM and Intent Data Platforms, Identity Platforms, Data Vendors:

    Example applications include:

    1. Uncover trending technologies or tools gaining popularity.

    2. Pinpoint lucrative business prospects by identifying similar solutions utilized by a specific company.

    3. Study a company's tech stacks to understand the technical capability and skills available within that company.

    B2B Tech Companies:

    • Enrich leads that sign-up through the Company Search API (available separately).
    • Identify and map every company that fits your core personas and ICP.
    • Build audiences to target, using key fields like location, company size, industry, and description.

    Venture Capital and Private Equity:

    • Discover new investment opportunities using company descriptions and industry-level data.
    • Review the growth of private companies and benchmark their strength against competitors.
    • Create high-level views of companies competing in popular verticals for investment.

    | Delivery Options |

    • Flat files via S3 or GCP
    • PostgreSQL Shared Database
    • PostgreSQL Managed Database
    • API
    • Other options available upon request, depending on the scale required

    Our dataset provides a unique blend of volume, freshness, and detail that is perfect for Sales Platforms, B2B Tech, VCs & PE firms, Marketing Automation, ABM & Intent. It stands as a cornerstone in our broader data offering, ensuring you have the information you need to drive decision-making and growth.

    Tags: Company Data, Company Profiles, Employee Data, Firmographic Data, AI-Driven Data, High Refresh Rate, Company Classification, Private Market Intelligence, Workforce Intelligence, Public Companies.

  20. AIT Log Data Set V2.0

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Jun 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Max Landauer; Florian Skopik; Maximilian Frank; Wolfgang Hotwagner; Markus Wurzenberger; Andreas Rauber; Max Landauer; Florian Skopik; Maximilian Frank; Wolfgang Hotwagner; Markus Wurzenberger; Andreas Rauber (2024). AIT Log Data Set V2.0 [Dataset]. http://doi.org/10.5281/zenodo.5789064
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 28, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Max Landauer; Florian Skopik; Maximilian Frank; Wolfgang Hotwagner; Markus Wurzenberger; Andreas Rauber; Max Landauer; Florian Skopik; Maximilian Frank; Wolfgang Hotwagner; Markus Wurzenberger; Andreas Rauber
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    AIT Log Data Sets

    This repository contains synthetic log data suitable for evaluation of intrusion detection systems, federated learning, and alert aggregation. A detailed description of the dataset is available in [1]. The logs were collected from eight testbeds that were built at the Austrian Institute of Technology (AIT) following the approach by [2]. Please cite these papers if the data is used for academic publications.

    In brief, each of the datasets corresponds to a testbed representing a small enterprise network including mail server, file share, WordPress server, VPN, firewall, etc. Normal user behavior is simulated to generate background noise over a time span of 4-6 days. At some point, a sequence of attack steps is launched against the network. Log data is collected from all hosts and includes Apache access and error logs, authentication logs, DNS logs, VPN logs, audit logs, Suricata logs, network traffic packet captures, horde logs, exim logs, syslog, and system monitoring logs. Separate ground truth files are used to label events that are related to the attacks. Compared to the AIT-LDSv1.1, a more complex network and diverse user behavior is simulated, and logs are collected from all hosts in the network. If you are only interested in network traffic analysis, we also provide the AIT-NDS containing the labeled netflows of the testbed networks. We also provide the AIT-ADS, an alert data set derived by forensically applying open-source intrusion detection systems on the log data.

    The datasets in this repository have the following structure:

    • The gather directory contains all logs collected from the testbed. Logs collected from each host are located in gather/.
    • The labels directory contains the ground truth of the dataset that indicates which events are related to attacks. The directory mirrors the structure of the gather directory so that each label files is located at the same path and has the same name as the corresponding log file. Each line in the label files references the log event corresponding to an attack by the line number counted from the beginning of the file ("line"), the labels assigned to the line that state the respective attack step ("labels"), and the labeling rules that assigned the labels ("rules"). An example is provided below.
    • The processing directory contains the source code that was used to generate the labels.
    • The rules directory contains the labeling rules.
    • The environment directory contains the source code that was used to deploy the testbed and run the simulation using the Kyoushi Testbed Environment.
    • The dataset.yml file specifies the start and end time of the simulation.

    The following table summarizes relevant properties of the datasets:

    • fox
      • Simulation time: 2022-01-15 00:00 - 2022-01-20 00:00
      • Attack time: 2022-01-18 11:59 - 2022-01-18 13:15
      • Scan volume: High
      • Unpacked size: 26 GB
    • harrison
      • Simulation time: 2022-02-04 00:00 - 2022-02-09 00:00
      • Attack time: 2022-02-08 07:07 - 2022-02-08 08:38
      • Scan volume: High
      • Unpacked size: 27 GB
    • russellmitchell
      • Simulation time: 2022-01-21 00:00 - 2022-01-25 00:00
      • Attack time: 2022-01-24 03:01 - 2022-01-24 04:39
      • Scan volume: Low
      • Unpacked size: 14 GB
    • santos
      • Simulation time: 2022-01-14 00:00 - 2022-01-18 00:00
      • Attack time: 2022-01-17 11:15 - 2022-01-17 11:59
      • Scan volume: Low
      • Unpacked size: 17 GB
    • shaw
      • Simulation time: 2022-01-25 00:00 - 2022-01-31 00:00
      • Attack time: 2022-01-29 14:37 - 2022-01-29 15:21
      • Scan volume: Low
      • Data exfiltration is not visible in DNS logs
      • Unpacked size: 27 GB
    • wardbeck
      • Simulation time: 2022-01-19 00:00 - 2022-01-24 00:00
      • Attack time: 2022-01-23 12:10 - 2022-01-23 12:56
      • Scan volume: Low
      • Unpacked size: 26 GB
    • wheeler
      • Simulation time: 2022-01-26 00:00 - 2022-01-31 00:00
      • Attack time: 2022-01-30 07:35 - 2022-01-30 17:53
      • Scan volume: High
      • No password cracking in attack chain
      • Unpacked size: 30 GB
    • wilson
      • Simulation time: 2022-02-03 00:00 - 2022-02-09 00:00
      • Attack time: 2022-02-07 10:57 - 2022-02-07 11:49
      • Scan volume: High
      • Unpacked size: 39 GB

    The following attacks are launched in the network:

    • Scans (nmap, WPScan, dirb)
    • Webshell upload (CVE-2020-24186)
    • Password cracking (John the Ripper)
    • Privilege escalation
    • Remote command execution
    • Data exfiltration (DNSteal)

    Note that attack parameters and their execution orders vary in each dataset. Labeled log files are trimmed to the simulation time to ensure that their labels (which reference the related event by the line number in the file) are not misleading. Other log files, however, also contain log events generated before or after the simulation time and may therefore be affected by testbed setup or data collection. It is therefore recommended to only consider logs with timestamps within the simulation time for analysis.

    The structure of labels is explained using the audit logs from the intranet server in the russellmitchell data set as an example in the following. The first four labels in the labels/intranet_server/logs/audit/audit.log file are as follows:

    {"line": 1860, "labels": ["attacker_change_user", "escalate"], "rules": {"attacker_change_user": ["attacker.escalate.audit.su.login"], "escalate": ["attacker.escalate.audit.su.login"]}}

    {"line": 1861, "labels": ["attacker_change_user", "escalate"], "rules": {"attacker_change_user": ["attacker.escalate.audit.su.login"], "escalate": ["attacker.escalate.audit.su.login"]}}

    {"line": 1862, "labels": ["attacker_change_user", "escalate"], "rules": {"attacker_change_user": ["attacker.escalate.audit.su.login"], "escalate": ["attacker.escalate.audit.su.login"]}}

    {"line": 1863, "labels": ["attacker_change_user", "escalate"], "rules": {"attacker_change_user": ["attacker.escalate.audit.su.login"], "escalate": ["attacker.escalate.audit.su.login"]}}

    Each JSON object in this file assigns a label to one specific log line in the corresponding log file located at gather/intranet_server/logs/audit/audit.log. The field "line" in the JSON objects specify the line number of the respective event in the original log file, while the field "labels" comprise the corresponding labels. For example, the lines in the sample above provide the information that lines 1860-1863 in the gather/intranet_server/logs/audit/audit.log file are labeled with "attacker_change_user" and "escalate" corresponding to the attack step where the attacker receives escalated privileges. Inspecting these lines shows that they indeed correspond to the user authenticating as root:

    type=USER_AUTH msg=audit(1642999060.603:2226): pid=27950 uid=33 auid=4294967295 ses=4294967295 msg='op=PAM:authentication acct="jhall" exe="/bin/su" hostname=? addr=? terminal=/dev/pts/1 res=success'

    type=USER_ACCT msg=audit(1642999060.603:2227): pid=27950 uid=33 auid=4294967295 ses=4294967295 msg='op=PAM:accounting acct="jhall" exe="/bin/su" hostname=? addr=? terminal=/dev/pts/1 res=success'

    type=CRED_ACQ msg=audit(1642999060.615:2228): pid=27950 uid=33 auid=4294967295 ses=4294967295 msg='op=PAM:setcred acct="jhall" exe="/bin/su" hostname=? addr=? terminal=/dev/pts/1 res=success'

    type=USER_START msg=audit(1642999060.627:2229): pid=27950 uid=33 auid=4294967295 ses=4294967295 msg='op=PAM:session_open acct="jhall" exe="/bin/su" hostname=? addr=? terminal=/dev/pts/1 res=success'

    The same applies to all other labels for this log file and all other log files. There are no labels for logs generated by "normal" (i.e., non-attack) behavior; instead, all log events that have no corresponding JSON object in one of the files from the labels directory, such as the lines 1-1859 in the example above, can be considered to be labeled as "normal". This means that in order to figure out the labels for the log data it is necessary to store the line numbers when processing the original logs from the gather directory and see if these line numbers also appear in the corresponding file in the labels directory.

    Beside the attack labels, a general overview of the exact times when specific attack steps are launched are available in gather/attacker_0/logs/attacks.log. An enumeration of all hosts and their IP addresses is stated in processing/config/servers.yml. Moreover, configurations of each host are provided in gather/ and gather/.

    Version history:

    • AIT-LDS-v1.x: Four datasets, logs from single host, fine-granular audit logs, mail/CMS.
    • AIT-LDS-v2.0: Eight datasets, logs from all hosts, system logs and network traffic, mail/CMS/cloud/web.

    Acknowledgements: Partially funded by the FFG projects INDICAETING (868306) and DECEPT (873980), and the EU projects GUARD (833456) and PANDORA (SI2.835928).

    If you use the dataset, please cite the following publications:

    [1] M. Landauer, F. Skopik, M. Frank, W. Hotwagner,

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Lucain Pouget, example-space-to-dataset-json [Dataset]. https://huggingface.co/datasets/Wauplin/example-space-to-dataset-json

example-space-to-dataset-json

Wauplin/example-space-to-dataset-json

Explore at:
Authors
Lucain Pouget
Description
Search
Clear search
Close search
Google apps
Main menu