100+ datasets found
  1. Datasets for PySpark project

    • kaggle.com
    zip
    Updated Sep 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Towhidul.Tonmoy (2022). Datasets for PySpark project [Dataset]. https://www.kaggle.com/datasets/towhidultonmoy/datasets-for-pyspark-project
    Explore at:
    zip(264895 bytes)Available download formats
    Dataset updated
    Sep 23, 2022
    Authors
    Towhidul.Tonmoy
    Description

    Dataset

    This dataset was created by Towhidul.Tonmoy

    Contents

  2. Data from: PySpark SQL Dataset

    • kaggle.com
    zip
    Updated Jan 24, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rashid60 (2023). PySpark SQL Dataset [Dataset]. https://www.kaggle.com/datasets/rashid60/pyspark-sql-dataset
    Explore at:
    zip(4531876 bytes)Available download formats
    Dataset updated
    Jan 24, 2023
    Authors
    Rashid60
    Description

    Dataset

    This dataset was created by Rashid60

    Contents

  3. h

    Spark-Data

    • huggingface.co
    Updated Sep 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Intern Large Models (2025). Spark-Data [Dataset]. https://huggingface.co/datasets/internlm/Spark-Data
    Explore at:
    Dataset updated
    Sep 29, 2025
    Dataset authored and provided by
    Intern Large Models
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Spark-Data

    Paper | Github Repository | Models

      Data Introduction
    

    This repository stores the datasets used for training 🤗Spark-VL-7B and Spark-VL-32B, as well as a collection of multiple mathematical benchmarks covered in the SPARK: Synergistic Policy And Reward Co-Evolving Framework paper. infer_data_ViRL_19k_h.json is used for training Spark-VL-7B. infer_data_ViRL_hard_24k_h.json is used for training Spark-VL-32B. benchmark_combine.json and… See the full description on the dataset page: https://huggingface.co/datasets/internlm/Spark-Data.

  4. Z

    Dataset for class comment analysis

    • data.niaid.nih.gov
    Updated Feb 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pooja Rani (2022). Dataset for class comment analysis [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4311838
    Explore at:
    Dataset updated
    Feb 22, 2022
    Dataset provided by
    University of Bern
    Authors
    Pooja Rani
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A list of different projects selected to analyze class comments (available in the source code) of various languages such as Java, Python, and Pharo. The projects vary in terms of size, contributors, and domain.

    Structure

    Projects/
      Java_projects/
        eclipse.zip
        guava.zip
        guice.zip
        hadoop.zip
        spark.zip
        vaadin.zip
    
      Pharo_projects/
        images/
          GToolkit.zip
          Moose.zip
          PetitParser.zip
          Pillar.zip
          PolyMath.zip
          Roassal2.zip
          Seaside.zip
    
        vm/
          70-x64/Pharo
    
        Scripts/
          ClassCommentExtraction.st
          SampleSelectionScript.st    
    
      Python_projects/
        django.zip
        ipython.zip
        Mailpile.zip
        pandas.zip
        pipenv.zip
        pytorch.zip   
        requests.zip 
      
    

    Contents of the Replication Package

    Projects/ contains the raw projects of each language that are used to analyze class comments. - Java_projects/ - eclipse.zip - Eclipse project downloaded from the GitHub. More detail about the project is available on GitHub Eclipse. - guava.zip - Guava project downloaded from the GitHub. More detail about the project is available on GitHub Guava. - guice.zip - Guice project downloaded from the GitHub. More detail about the project is available on GitHub Guice - hadoop.zip - Apache Hadoop project downloaded from the GitHub. More detail about the project is available on GitHub Apache Hadoop - spark.zip - Apache Spark project downloaded from the GitHub. More detail about the project is available on GitHub Apache Spark - vaadin.zip - Vaadin project downloaded from the GitHub. More detail about the project is available on GitHub Vaadin

    • Pharo_projects/

      • images/ -

        • GToolkit.zip - Gtoolkit project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.
        • Moose.zip - Moose project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.
        • PetitParser.zip - Petit Parser project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.
        • Pillar.zip - Pillar project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.
        • PolyMath.zip - PolyMath project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.
        • Roassal2.zip - Roassal2 project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.
        • Seaside.zip - Seaside project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.
      • vm/ -

      • 70-x64/Pharo - Pharo7 (version 7 of Pharo) virtual machine to instantiate the Pharo images given in the images/ folder. The user can run the vm on macOS and select any of the Pharo image.

      • Scripts/ - It contains the sample Smalltalk scripts to extract class comments from various projects.

      • ClassCommentExtraction.st - A Smalltalk script to show how class comments are extracted from various Pharo projects. This script is already provided in the respective project image.

      • SampleSelectionScript.st - A Smalltalk script to show sample class comments of Pharo projects are selected. This script can be run in any of the Pharo images given in the images/ folder.

    • Python_projects/

      • django.zip - Django project downloaded from the GitHub. More detail about the project is available on GitHub Django
      • ipython.zip - IPython project downloaded from the GitHub. More detail about the project is available on GitHub on IPython
      • Mailpile.zip - Mailpile project downloaded from the GitHub. More detail about the project is available on GitHub on Mailpile
      • pandas.zip - pandas project downloaded from the GitHub. More detail about the project is available on GitHub on pandas
      • pipenv.zip - Pipenv project downloaded from the GitHub. More detail about the project is available on GitHub on Pipenv
      • pytorch.zip - PyTorch project downloaded from the GitHub. More detail about the project is available on GitHub on PyTorch
      • requests.zip - Requests project downloaded from the GitHub. More detail about the project is available on GitHub on Requests
  5. A sample medical dataset.

    • plos.figshare.com
    xls
    Updated May 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Farough Ashkouti; Keyhan Khamforoosh (2023). A sample medical dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0285212.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Farough Ashkouti; Keyhan Khamforoosh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Recently big data and its applications had sharp growth in various fields such as IoT, bioinformatics, eCommerce, and social media. The huge volume of data incurred enormous challenges to the architecture, infrastructure, and computing capacity of IT systems. Therefore, the compelling need of the scientific and industrial community is large-scale and robust computing systems. Since one of the characteristics of big data is value, data should be published for analysts to extract useful patterns from them. However, data publishing may lead to the disclosure of individuals’ private information. Among the modern parallel computing platforms, Apache Spark is a fast and in-memory computing framework for large-scale data processing that provides high scalability by introducing the resilient distributed dataset (RDDs). In terms of performance, Due to in-memory computations, it is 100 times faster than Hadoop. Therefore, Apache Spark is one of the essential frameworks to implement distributed methods for privacy-preserving in big data publishing (PPBDP). This paper uses the RDD programming of Apache Spark to propose an efficient parallel implementation of a new computing model for big data anonymization. This computing model has three-phase of in-memory computations to address the runtime, scalability, and performance of large-scale data anonymization. The model supports partition-based data clustering algorithms to preserve the λ-diversity privacy model by using transformation and actions on RDDs. Therefore, the authors have investigated Spark-based implementation for preserving the λ-diversity privacy model by two designed City block and Pearson distance functions. The results of the paper provide a comprehensive guideline allowing the researchers to apply Apache Spark in their own researches.

  6. c

    ORBITAAL: cOmpRehensive BItcoin daTaset for temorAl grAph anaLysis - Dataset...

    • cryptodata.center
    Updated Dec 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). ORBITAAL: cOmpRehensive BItcoin daTaset for temorAl grAph anaLysis - Dataset - CryptoData Hub [Dataset]. https://cryptodata.center/dataset/orbitaal-comprehensive-bitcoin-dataset-for-temoral-graph-analysis
    Explore at:
    Dataset updated
    Dec 4, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Construction This dataset captures the temporal network of Bitcoin (BTC) flow exchanged between entities at the finest time resolution in UNIX timestamp. Its construction is based on the blockchain covering the period from January, 3rd of 2009 to January the 25th of 2021. The blockchain extraction has been made using bitcoin-etl (https://github.com/blockchain-etl/bitcoin-etl) Python package. The entity-entity network is built by aggregating Bitcoin addresses using the common-input heuristic [1] as well as popular Bitcoin users' addresses provided by https://www.walletexplorer.com/ [1] M. Harrigan and C. Fretter, "The Unreasonable Effectiveness of Address Clustering," 2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld), Toulouse, France, 2016, pp. 368-373, doi: 10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.0071.keywords: {Online banking;Merging;Protocols;Upper bound;Bipartite graph;Electronic mail;Size measurement;bitcoin;cryptocurrency;blockchain}, Dataset Description Bitcoin Activity Temporal Coverage: From 03 January 2009 to 25 January 2021 Overview: This dataset provides a comprehensive representation of Bitcoin exchanges between entities over a significant temporal span, spanning from the inception of Bitcoin to recent years. It encompasses various temporal resolutions and representations to facilitate Bitcoin transaction network analysis in the context of temporal graphs. Every dates have been retrieved from bloc UNIX timestamp and GMT timezone. Contents: The dataset is distributed across three compressed archives: All data are stored in the Apache Parquet file format, a columnar storage format optimized for analytical queries. It can be used with pyspark Python package. orbitaal-stream_graph.tar.gz: The root directory is STREAM_GRAPH/ Contains a stream graph representation of Bitcoin exchanges at the finest temporal scale, corresponding to the validation time of each block (averaging approximately 10 minutes). The stream graph is divided into 13 files, one for each year Files format is parquet Name format is orbitaal-stream_graph-date-[YYYY]-file-id-[ID].snappy.parquet, where [YYYY] stands for the corresponding year and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year ordering These files are in the subdirectory STREAM_GRAPH/EDGES/ orbitaal-snapshot-all.tar.gz: The root directory is SNAPSHOT/ Contains the snapshot network representing all transactions aggregated over the whole dataset period (from Jan. 2009 to Jan. 2021). Files format is parquet Name format is orbitaal-snapshot-all.snappy.parquet. These files are in the subdirectory SNAPSHOT/EDGES/ALL/ orbitaal-snapshot-year.tar.gz: The root directory is SNAPSHOT/ Contains the yearly resolution of snapshot networks Files format is parquet Name format is orbitaal-snapshot-date-[YYYY]-file-id-[ID].snappy.parquet, where [YYYY] stands for the corresponding year and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year ordering These files are in the subdirectory SNAPSHOT/EDGES/year/ orbitaal-snapshot-month.tar.gz: The root directory is SNAPSHOT/ Contains the monthly resoluted snapshot networks Files format is parquet Name format is orbitaal-snapshot-date-[YYYY]-[MM]-file-id-[ID].snappy.parquet, where [YYYY] and [MM] stands for the corresponding year and month, and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year and month ordering These files are in the subdirectory SNAPSHOT/EDGES/month/ orbitaal-snapshot-day.tar.gz: The root directory is SNAPSHOT/ Contains the daily resoluted snapshot networks Files format is parquet Name format is orbitaal-snapshot-date-[YYYY]-[MM]-[DD]-file-id-[ID].snappy.parquet, where [YYYY], [MM], and [DD] stand for the corresponding year, month, and day, and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year, month, and day ordering These files are in the subdirectory SNAPSHOT/EDGES/day/ orbitaal-snapshot-hour.tar.gz: The root directory is SNAPSHOT/ Contains the hourly resoluted snapshot networks Files format is parquet Name format is orbitaal-snapshot-date-[YYYY]-[MM]-[DD]-[hh]-file-id-[ID].snappy.parquet, where [YYYY], [MM], [DD], and [hh] stand for the corresponding year, month, day, and hour, and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year, month, day and hour ordering These files are in the subdirectory SNAPSHOT/EDGES/hour/ orbitaal-nodetable.tar.gz: The root directory is NODE_TABLE/ Contains two files in parquet format, the first one gives information related to nodes present in stream graphs and snapshots such as period of activity and associated global Bitcoin balance, and the other one contains the list of all associated Bitcoin addresses. Small samples in CSV format orbitaal-stream_graph-2016_07_08.csv and orbitaal-stream_graph-2016_07_09.csv These two CSV files are related to stream graph representations of an halvening happening in 2016.

  7. Data from: Spark project dataset

    • kaggle.com
    zip
    Updated May 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    YuanKefan (2025). Spark project dataset [Dataset]. https://www.kaggle.com/datasets/yuankefan555/spark-project-dataset
    Explore at:
    zip(2028920 bytes)Available download formats
    Dataset updated
    May 15, 2025
    Authors
    YuanKefan
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by YuanKefan

    Released under Apache 2.0

    Contents

  8. 🎧📻 Spotify data from PySpark course

    • kaggle.com
    zip
    Updated Mar 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Kapturov (2025). 🎧📻 Spotify data from PySpark course [Dataset]. https://www.kaggle.com/datasets/kapturovalexander/spotify-data-from-pyspark-course
    Explore at:
    zip(12969103 bytes)Available download formats
    Dataset updated
    Mar 29, 2025
    Authors
    Alexander Kapturov
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Each row in the dataset corresponds to a track, with variables such as the title, artist, and year located in their respective columns. Aside from the fundamental variables, musical elements of each track, such as the tempo, danceability, and key, were likewise extracted; the algorithm for these values were generated by Spotify based on a range of technical parameters.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10074224%2F4b7cb7993ede80c505009719d1fe6679%2FSpmxA8vA8pXmR2PLKtzXFj.jpg?generation=1691728786522025&alt=media" alt="">

    • id: str, identifier of the track.
    • name: str, name of the track.
    • artists: str, artists of the track.
    • duration_ms: float, duration of the track in milliseconds.
    • release_date: date, release date of the track.
    • year: int, release year of the track.
    • acousticness: float, measure of acousticness of the track.
    • danceability: float, measure of danceability of the track.
    • energy: float, measure of energy of the track.
    • instrumentalness: float, measure of instrumental elements in the track.
    • liveness: float, measure of liveness of the track.
    • loudness: float, loudness of the track.
    • speechiness: float, measure of speechiness in the track.
    • tempo: float, tempo of the track.
    • valence: float, measure of valence (positivity) of the track.
    • mode: int, mode of the track (major or minor).
    • key: int, key of the track.
    • popularity: int, popularity score of the track.
    • explicit: int, indication of explicit content presence (explicit or implicit).

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10074224%2F064e784fe275a4de3e2b71563194a283%2FAppleCompetition-FTRHeader_V2.png?generation=1691728735626917&alt=media" alt="">

  9. w

    Dataset of books series that contain Spark

    • workwithdata.com
    Updated Nov 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2024). Dataset of books series that contain Spark [Dataset]. https://www.workwithdata.com/datasets/book-series?f=1&fcol0=j0-book&fop0=%3D&fval0=Spark&j=1&j0=books
    Explore at:
    Dataset updated
    Nov 25, 2024
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about book series. It has 3 rows and is filtered where the books is Spark. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.

  10. i

    Big Data Machine Learning Benchmark on Spark

    • ieee-dataport.org
    Updated Jun 6, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jairson Rodrigues (2019). Big Data Machine Learning Benchmark on Spark [Dataset]. https://ieee-dataport.org/open-access/big-data-machine-learning-benchmark-spark
    Explore at:
    Dataset updated
    Jun 6, 2019
    Authors
    Jairson Rodrigues
    Description

    net traffic

  11. O

    SPark Sites

    • data.sanantonio.gov
    • hub.arcgis.com
    • +1more
    Updated Oct 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GIS Data (2025). SPark Sites [Dataset]. https://data.sanantonio.gov/dataset/spark-sites
    Explore at:
    arcgis geoservices rest api, csv, geojson, zip, gdb, gpkg, kml, txt, xlsx, htmlAvailable download formats
    Dataset updated
    Oct 20, 2025
    Dataset provided by
    City of San Antonio
    Authors
    GIS Data
    Description

    This is a geographic database of SPark Sites within the City of San Antonio

  12. R

    Fire Spark Dataset

    • universe.roboflow.com
    zip
    Updated Oct 14, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AI2 (2024). Fire Spark Dataset [Dataset]. https://universe.roboflow.com/ai2-5ihol/fire-spark/dataset/2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 14, 2024
    Dataset authored and provided by
    AI2
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Fire Spark Bounding Boxes
    Description

    Fire Spark

    ## Overview
    
    Fire Spark is a dataset for object detection tasks - it contains Fire Spark annotations for 337 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  13. R

    Data from: Spark Detector Dataset

    • universe.roboflow.com
    zip
    Updated Dec 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Avian AG (2023). Spark Detector Dataset [Dataset]. https://universe.roboflow.com/avian-ag-sd77w/spark-detector/model/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 5, 2023
    Dataset authored and provided by
    Avian AG
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Sparks Bounding Boxes
    Description

    Spark Detector

    ## Overview
    
    Spark Detector is a dataset for object detection tasks - it contains Sparks annotations for 8,212 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  14. R

    Spark 2 Dataset

    • universe.roboflow.com
    zip
    Updated Dec 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AI Squad (2023). Spark 2 Dataset [Dataset]. https://universe.roboflow.com/ai-squad/spark-2/model/5
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 17, 2023
    Dataset authored and provided by
    AI Squad
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Mobil Dzw0 Bounding Boxes
    Description

    SPark 2

    ## Overview
    
    SPark 2 is a dataset for object detection tasks - it contains Mobil Dzw0 annotations for 3,843 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  15. R

    Super Spark An Com Dataset

    • universe.roboflow.com
    zip
    Updated May 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    tnh tin cm (2025). Super Spark An Com Dataset [Dataset]. https://universe.roboflow.com/tnh-tin-cm/super-spark-an-com
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 13, 2025
    Dataset authored and provided by
    tnh tin cm
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Variables measured
    Objects Polygons
    Description

    Super SPARK An Com

    ## Overview
    
    Super SPARK An Com is a dataset for instance segmentation tasks - it contains Objects annotations for 885 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [MIT license](https://creativecommons.org/licenses/MIT).
    
  16. h

    thomas-2018-spark-all

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SCBIR Lab, thomas-2018-spark-all [Dataset]. https://huggingface.co/datasets/scbirlab/thomas-2018-spark-all
    Explore at:
    Dataset authored and provided by
    SCBIR Lab
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    SPARK: Human-curated and standardized MICs

    These data were collated by the authors of:

    Joe Thomas, Marc Navre, Aileen Rubio, and Allan Coukell Shared Platform for Antibiotic Research and Knowledge: A Collaborative Tool to SPARK Antibiotic Discovery ACS Infectious Diseases 2018 4 (11), 1536-1539 DOI: 10.1021/acsinfecdis.8b00193

    We cleaned the original SPARK dataset to subset the most relevant columns, remove empty values, give succint column titles, and split by species. The… See the full description on the dataset page: https://huggingface.co/datasets/scbirlab/thomas-2018-spark-all.

  17. R

    Ocs Spark Dataset

    • universe.roboflow.com
    zip
    Updated May 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OCS (2025). Ocs Spark Dataset [Dataset]. https://universe.roboflow.com/ocs-on0hi/ocs-spark/dataset/3
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 13, 2025
    Dataset authored and provided by
    OCS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Spark Bounding Boxes
    Description

    OCS Spark

    ## Overview
    
    OCS Spark is a dataset for object detection tasks - it contains Spark annotations for 372 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  18. f

    Self-adaptive Executors for Big Data Processing

    • figshare.com
    text/x-diff
    Updated Jul 28, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    S. (Sobhan) Omranian Khorasani (2020). Self-adaptive Executors for Big Data Processing [Dataset]. http://doi.org/10.4121/uuid:38529ffe-00d0-42b0-9b3c-29d192262686
    Explore at:
    text/x-diffAvailable download formats
    Dataset updated
    Jul 28, 2020
    Dataset provided by
    4TU.ResearchData
    Authors
    S. (Sobhan) Omranian Khorasani
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains the measurements obtained with Apache Spark using different strategies for adapting the number of executor threads to reduce I/O contention. The two main strategies explored are a static solution (number of executor threads for I/O intensive tasks pre-determined) and a dynamic solution that employs an active control loop to measure epoll_wait time.

  19. h

    spark

    • huggingface.co
    Updated Aug 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    d.s. spero (2024). spark [Dataset]. https://huggingface.co/datasets/baebee/spark
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 15, 2024
    Authors
    d.s. spero
    Description

    baebee/spark dataset hosted on Hugging Face and contributed by the HF Datasets community

  20. R

    Spark.gg Dataset

    • universe.roboflow.com
    zip
    Updated Nov 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    precel (2024). Spark.gg Dataset [Dataset]. https://universe.roboflow.com/precel/spark.gg/model/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 9, 2024
    Dataset authored and provided by
    precel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Players Bounding Boxes
    Description

    Spark.gg

    ## Overview
    
    Spark.gg is a dataset for object detection tasks - it contains Players annotations for 323 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Towhidul.Tonmoy (2022). Datasets for PySpark project [Dataset]. https://www.kaggle.com/datasets/towhidultonmoy/datasets-for-pyspark-project
Organization logo

Datasets for PySpark project

Explore at:
zip(264895 bytes)Available download formats
Dataset updated
Sep 23, 2022
Authors
Towhidul.Tonmoy
Description

Dataset

This dataset was created by Towhidul.Tonmoy

Contents

Search
Clear search
Close search
Google apps
Main menu