100+ datasets found

Datasets for PySpark project
kaggle.com
zip
Updated Sep 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Towhidul.Tonmoy (2022). Datasets for PySpark project [Dataset]. https://www.kaggle.com/datasets/towhidultonmoy/datasets-for-pyspark-project
Explore at:
zip(264895 bytes)Available download formats
Dataset updated
Sep 23, 2022
Authors
Towhidul.Tonmoy
Description
Dataset

This dataset was created by Towhidul.Tonmoy

Contents
Data from: PySpark SQL Dataset
kaggle.com
zip
Updated Jan 24, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rashid60 (2023). PySpark SQL Dataset [Dataset]. https://www.kaggle.com/datasets/rashid60/pyspark-sql-dataset
Explore at:
zip(4531876 bytes)Available download formats
Dataset updated
Jan 24, 2023
Authors
Rashid60
Description
Dataset

This dataset was created by Rashid60

Contents
h
Spark-Data
huggingface.co
Updated Sep 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Intern Large Models (2025). Spark-Data [Dataset]. https://huggingface.co/datasets/internlm/Spark-Data
Explore at:
Dataset updated
Sep 29, 2025
Dataset authored and provided by
Intern Large Models
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Spark-Data

Paper | Github Repository | Models

Data Introduction

This repository stores the datasets used for training 🤗Spark-VL-7B and Spark-VL-32B, as well as a collection of multiple mathematical benchmarks covered in the SPARK: Synergistic Policy And Reward Co-Evolving Framework paper. infer_data_ViRL_19k_h.json is used for training Spark-VL-7B. infer_data_ViRL_hard_24k_h.json is used for training Spark-VL-32B. benchmark_combine.json and… See the full description on the dataset page: https://huggingface.co/datasets/internlm/Spark-Data.
Z
Dataset for class comment analysis
data.niaid.nih.gov
Updated Feb 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pooja Rani (2022). Dataset for class comment analysis [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4311838
Explore at:
Dataset updated
Feb 22, 2022
Dataset provided by
University of Bern
Authors
Pooja Rani
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A list of different projects selected to analyze class comments (available in the source code) of various languages such as Java, Python, and Pharo. The projects vary in terms of size, contributors, and domain.

Structure

Projects/ Java_projects/ eclipse.zip guava.zip guice.zip hadoop.zip spark.zip vaadin.zip Pharo_projects/ images/ GToolkit.zip Moose.zip PetitParser.zip Pillar.zip PolyMath.zip Roassal2.zip Seaside.zip vm/ 70-x64/Pharo Scripts/ ClassCommentExtraction.st SampleSelectionScript.st Python_projects/ django.zip ipython.zip Mailpile.zip pandas.zip pipenv.zip pytorch.zip requests.zip

Contents of the Replication Package

Projects/ contains the raw projects of each language that are used to analyze class comments. - Java_projects/ - eclipse.zip - Eclipse project downloaded from the GitHub. More detail about the project is available on GitHub Eclipse. - guava.zip - Guava project downloaded from the GitHub. More detail about the project is available on GitHub Guava. - guice.zip - Guice project downloaded from the GitHub. More detail about the project is available on GitHub Guice - hadoop.zip - Apache Hadoop project downloaded from the GitHub. More detail about the project is available on GitHub Apache Hadoop - spark.zip - Apache Spark project downloaded from the GitHub. More detail about the project is available on GitHub Apache Spark - vaadin.zip - Vaadin project downloaded from the GitHub. More detail about the project is available on GitHub Vaadin

Pharo_projects/

images/ -

GToolkit.zip - Gtoolkit project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.

Moose.zip - Moose project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.

PetitParser.zip - Petit Parser project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.

Pillar.zip - Pillar project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.

PolyMath.zip - PolyMath project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.

Roassal2.zip - Roassal2 project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.

Seaside.zip - Seaside project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.

vm/ -

70-x64/Pharo - Pharo7 (version 7 of Pharo) virtual machine to instantiate the Pharo images given in the images/ folder. The user can run the vm on macOS and select any of the Pharo image.

Scripts/ - It contains the sample Smalltalk scripts to extract class comments from various projects.

ClassCommentExtraction.st - A Smalltalk script to show how class comments are extracted from various Pharo projects. This script is already provided in the respective project image.

SampleSelectionScript.st - A Smalltalk script to show sample class comments of Pharo projects are selected. This script can be run in any of the Pharo images given in the images/ folder.

Python_projects/

django.zip - Django project downloaded from the GitHub. More detail about the project is available on GitHub Django

ipython.zip - IPython project downloaded from the GitHub. More detail about the project is available on GitHub on IPython

Mailpile.zip - Mailpile project downloaded from the GitHub. More detail about the project is available on GitHub on Mailpile

pandas.zip - pandas project downloaded from the GitHub. More detail about the project is available on GitHub on pandas

pipenv.zip - Pipenv project downloaded from the GitHub. More detail about the project is available on GitHub on Pipenv

pytorch.zip - PyTorch project downloaded from the GitHub. More detail about the project is available on GitHub on PyTorch

requests.zip - Requests project downloaded from the GitHub. More detail about the project is available on GitHub on Requests
A sample medical dataset.
plos.figshare.com
xls
Updated May 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Farough Ashkouti; Keyhan Khamforoosh (2023). A sample medical dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0285212.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0285212.t001
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Farough Ashkouti; Keyhan Khamforoosh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Recently big data and its applications had sharp growth in various fields such as IoT, bioinformatics, eCommerce, and social media. The huge volume of data incurred enormous challenges to the architecture, infrastructure, and computing capacity of IT systems. Therefore, the compelling need of the scientific and industrial community is large-scale and robust computing systems. Since one of the characteristics of big data is value, data should be published for analysts to extract useful patterns from them. However, data publishing may lead to the disclosure of individuals’ private information. Among the modern parallel computing platforms, Apache Spark is a fast and in-memory computing framework for large-scale data processing that provides high scalability by introducing the resilient distributed dataset (RDDs). In terms of performance, Due to in-memory computations, it is 100 times faster than Hadoop. Therefore, Apache Spark is one of the essential frameworks to implement distributed methods for privacy-preserving in big data publishing (PPBDP). This paper uses the RDD programming of Apache Spark to propose an efficient parallel implementation of a new computing model for big data anonymization. This computing model has three-phase of in-memory computations to address the runtime, scalability, and performance of large-scale data anonymization. The model supports partition-based data clustering algorithms to preserve the λ-diversity privacy model by using transformation and actions on RDDs. Therefore, the authors have investigated Spark-based implementation for preserving the λ-diversity privacy model by two designed City block and Pearson distance functions. The results of the paper provide a comprehensive guideline allowing the researchers to apply Apache Spark in their own researches.
c
ORBITAAL: cOmpRehensive BItcoin daTaset for temorAl grAph anaLysis - Dataset...
cryptodata.center
Updated Dec 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). ORBITAAL: cOmpRehensive BItcoin daTaset for temorAl grAph anaLysis - Dataset - CryptoData Hub [Dataset]. https://cryptodata.center/dataset/orbitaal-comprehensive-bitcoin-dataset-for-temoral-graph-analysis
Explore at:
Dataset updated
Dec 4, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Construction This dataset captures the temporal network of Bitcoin (BTC) flow exchanged between entities at the finest time resolution in UNIX timestamp. Its construction is based on the blockchain covering the period from January, 3rd of 2009 to January the 25th of 2021. The blockchain extraction has been made using bitcoin-etl (https://github.com/blockchain-etl/bitcoin-etl) Python package. The entity-entity network is built by aggregating Bitcoin addresses using the common-input heuristic [1] as well as popular Bitcoin users' addresses provided by https://www.walletexplorer.com/ [1] M. Harrigan and C. Fretter, "The Unreasonable Effectiveness of Address Clustering," 2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld), Toulouse, France, 2016, pp. 368-373, doi: 10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.0071.keywords: {Online banking;Merging;Protocols;Upper bound;Bipartite graph;Electronic mail;Size measurement;bitcoin;cryptocurrency;blockchain}, Dataset Description Bitcoin Activity Temporal Coverage: From 03 January 2009 to 25 January 2021 Overview: This dataset provides a comprehensive representation of Bitcoin exchanges between entities over a significant temporal span, spanning from the inception of Bitcoin to recent years. It encompasses various temporal resolutions and representations to facilitate Bitcoin transaction network analysis in the context of temporal graphs. Every dates have been retrieved from bloc UNIX timestamp and GMT timezone. Contents: The dataset is distributed across three compressed archives: All data are stored in the Apache Parquet file format, a columnar storage format optimized for analytical queries. It can be used with pyspark Python package. orbitaal-stream_graph.tar.gz: The root directory is STREAM_GRAPH/ Contains a stream graph representation of Bitcoin exchanges at the finest temporal scale, corresponding to the validation time of each block (averaging approximately 10 minutes). The stream graph is divided into 13 files, one for each year Files format is parquet Name format is orbitaal-stream_graph-date-[YYYY]-file-id-[ID].snappy.parquet, where [YYYY] stands for the corresponding year and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year ordering These files are in the subdirectory STREAM_GRAPH/EDGES/ orbitaal-snapshot-all.tar.gz: The root directory is SNAPSHOT/ Contains the snapshot network representing all transactions aggregated over the whole dataset period (from Jan. 2009 to Jan. 2021). Files format is parquet Name format is orbitaal-snapshot-all.snappy.parquet. These files are in the subdirectory SNAPSHOT/EDGES/ALL/ orbitaal-snapshot-year.tar.gz: The root directory is SNAPSHOT/ Contains the yearly resolution of snapshot networks Files format is parquet Name format is orbitaal-snapshot-date-[YYYY]-file-id-[ID].snappy.parquet, where [YYYY] stands for the corresponding year and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year ordering These files are in the subdirectory SNAPSHOT/EDGES/year/ orbitaal-snapshot-month.tar.gz: The root directory is SNAPSHOT/ Contains the monthly resoluted snapshot networks Files format is parquet Name format is orbitaal-snapshot-date-[YYYY]-[MM]-file-id-[ID].snappy.parquet, where [YYYY] and [MM] stands for the corresponding year and month, and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year and month ordering These files are in the subdirectory SNAPSHOT/EDGES/month/ orbitaal-snapshot-day.tar.gz: The root directory is SNAPSHOT/ Contains the daily resoluted snapshot networks Files format is parquet Name format is orbitaal-snapshot-date-[YYYY]-[MM]-[DD]-file-id-[ID].snappy.parquet, where [YYYY], [MM], and [DD] stand for the corresponding year, month, and day, and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year, month, and day ordering These files are in the subdirectory SNAPSHOT/EDGES/day/ orbitaal-snapshot-hour.tar.gz: The root directory is SNAPSHOT/ Contains the hourly resoluted snapshot networks Files format is parquet Name format is orbitaal-snapshot-date-[YYYY]-[MM]-[DD]-[hh]-file-id-[ID].snappy.parquet, where [YYYY], [MM], [DD], and [hh] stand for the corresponding year, month, day, and hour, and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year, month, day and hour ordering These files are in the subdirectory SNAPSHOT/EDGES/hour/ orbitaal-nodetable.tar.gz: The root directory is NODE_TABLE/ Contains two files in parquet format, the first one gives information related to nodes present in stream graphs and snapshots such as period of activity and associated global Bitcoin balance, and the other one contains the list of all associated Bitcoin addresses. Small samples in CSV format orbitaal-stream_graph-2016_07_08.csv and orbitaal-stream_graph-2016_07_09.csv These two CSV files are related to stream graph representations of an halvening happening in 2016.
Data from: Spark project dataset
kaggle.com
zip
Updated May 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
YuanKefan (2025). Spark project dataset [Dataset]. https://www.kaggle.com/datasets/yuankefan555/spark-project-dataset
Explore at:
zip(2028920 bytes)Available download formats
Dataset updated
May 15, 2025
Authors
YuanKefan
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by YuanKefan

Released under Apache 2.0

Contents
🎧📻 Spotify data from PySpark course
kaggle.com
zip
Updated Mar 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander Kapturov (2025). 🎧📻 Spotify data from PySpark course [Dataset]. https://www.kaggle.com/datasets/kapturovalexander/spotify-data-from-pyspark-course
Explore at:
zip(12969103 bytes)Available download formats
Dataset updated
Mar 29, 2025
Authors
Alexander Kapturov
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Each row in the dataset corresponds to a track, with variables such as the title, artist, and year located in their respective columns. Aside from the fundamental variables, musical elements of each track, such as the tempo, danceability, and key, were likewise extracted; the algorithm for these values were generated by Spotify based on a range of technical parameters.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10074224%2F4b7cb7993ede80c505009719d1fe6679%2FSpmxA8vA8pXmR2PLKtzXFj.jpg?generation=1691728786522025&alt=media" alt="">

id: str, identifier of the track.

name: str, name of the track.

artists: str, artists of the track.

duration_ms: float, duration of the track in milliseconds.

release_date: date, release date of the track.

year: int, release year of the track.

acousticness: float, measure of acousticness of the track.

danceability: float, measure of danceability of the track.

energy: float, measure of energy of the track.

instrumentalness: float, measure of instrumental elements in the track.

liveness: float, measure of liveness of the track.

loudness: float, loudness of the track.

speechiness: float, measure of speechiness in the track.

tempo: float, tempo of the track.

valence: float, measure of valence (positivity) of the track.

mode: int, mode of the track (major or minor).

key: int, key of the track.

popularity: int, popularity score of the track.

explicit: int, indication of explicit content presence (explicit or implicit).

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10074224%2F064e784fe275a4de3e2b71563194a283%2FAppleCompetition-FTRHeader_V2.png?generation=1691728735626917&alt=media" alt="">
w
Dataset of books series that contain Spark
workwithdata.com
Updated Nov 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2024). Dataset of books series that contain Spark [Dataset]. https://www.workwithdata.com/datasets/book-series?f=1&fcol0=j0-book&fop0=%3D&fval0=Spark&j=1&j0=books
Explore at:
Dataset updated
Nov 25, 2024
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about book series. It has 3 rows and is filtered where the books is Spark. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
i
Big Data Machine Learning Benchmark on Spark
ieee-dataport.org
Updated Jun 6, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jairson Rodrigues (2019). Big Data Machine Learning Benchmark on Spark [Dataset]. https://ieee-dataport.org/open-access/big-data-machine-learning-benchmark-spark
Explore at:
Dataset updated
Jun 6, 2019
Authors
Jairson Rodrigues
Description
net traffic
O
SPark Sites
data.sanantonio.gov
hub.arcgis.com
+1more
Updated Oct 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GIS Data (2025). SPark Sites [Dataset]. https://data.sanantonio.gov/dataset/spark-sites
Explore at:
arcgis geoservices rest api, csv, geojson, zip, gdb, gpkg, kml, txt, xlsx, htmlAvailable download formats
Dataset updated
Oct 20, 2025
Dataset provided by
City of San Antonio
Authors
GIS Data
Description
This is a geographic database of SPark Sites within the City of San Antonio
R
Fire Spark Dataset
universe.roboflow.com
zip
Updated Oct 14, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AI2 (2024). Fire Spark Dataset [Dataset]. https://universe.roboflow.com/ai2-5ihol/fire-spark/dataset/2
Explore at:
zipAvailable download formats
Dataset updated
Oct 14, 2024
Dataset authored and provided by
AI2
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Fire Spark Bounding Boxes
Description
Fire Spark

## Overview Fire Spark is a dataset for object detection tasks - it contains Fire Spark annotations for 337 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
R
Data from: Spark Detector Dataset
universe.roboflow.com
zip
Updated Dec 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Avian AG (2023). Spark Detector Dataset [Dataset]. https://universe.roboflow.com/avian-ag-sd77w/spark-detector/model/1
Explore at:
zipAvailable download formats
Dataset updated
Dec 5, 2023
Dataset authored and provided by
Avian AG
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Sparks Bounding Boxes
Description
Spark Detector

## Overview Spark Detector is a dataset for object detection tasks - it contains Sparks annotations for 8,212 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
R
Spark 2 Dataset
universe.roboflow.com
zip
Updated Dec 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AI Squad (2023). Spark 2 Dataset [Dataset]. https://universe.roboflow.com/ai-squad/spark-2/model/5
Explore at:
zipAvailable download formats
Dataset updated
Dec 17, 2023
Dataset authored and provided by
AI Squad
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Mobil Dzw0 Bounding Boxes
Description
SPark 2

## Overview SPark 2 is a dataset for object detection tasks - it contains Mobil Dzw0 annotations for 3,843 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
R
Super Spark An Com Dataset
universe.roboflow.com
zip
Updated May 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
tnh tin cm (2025). Super Spark An Com Dataset [Dataset]. https://universe.roboflow.com/tnh-tin-cm/super-spark-an-com
Explore at:
zipAvailable download formats
Dataset updated
May 13, 2025
Dataset authored and provided by
tnh tin cm
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Variables measured
Objects Polygons
Description
Super SPARK An Com

## Overview Super SPARK An Com is a dataset for instance segmentation tasks - it contains Objects annotations for 885 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [MIT license](https://creativecommons.org/licenses/MIT).
h
thomas-2018-spark-all
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SCBIR Lab, thomas-2018-spark-all [Dataset]. https://huggingface.co/datasets/scbirlab/thomas-2018-spark-all
Explore at:
Dataset authored and provided by
SCBIR Lab
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
SPARK: Human-curated and standardized MICs

These data were collated by the authors of:

Joe Thomas, Marc Navre, Aileen Rubio, and Allan Coukell Shared Platform for Antibiotic Research and Knowledge: A Collaborative Tool to SPARK Antibiotic Discovery ACS Infectious Diseases 2018 4 (11), 1536-1539 DOI: 10.1021/acsinfecdis.8b00193

We cleaned the original SPARK dataset to subset the most relevant columns, remove empty values, give succint column titles, and split by species. The… See the full description on the dataset page: https://huggingface.co/datasets/scbirlab/thomas-2018-spark-all.
R
Ocs Spark Dataset
universe.roboflow.com
zip
Updated May 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OCS (2025). Ocs Spark Dataset [Dataset]. https://universe.roboflow.com/ocs-on0hi/ocs-spark/dataset/3
Explore at:
zipAvailable download formats
Dataset updated
May 13, 2025
Dataset authored and provided by
OCS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Spark Bounding Boxes
Description
OCS Spark

## Overview OCS Spark is a dataset for object detection tasks - it contains Spark annotations for 372 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
f
Self-adaptive Executors for Big Data Processing
figshare.com
text/x-diff
Updated Jul 28, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
S. (Sobhan) Omranian Khorasani (2020). Self-adaptive Executors for Big Data Processing [Dataset]. http://doi.org/10.4121/uuid:38529ffe-00d0-42b0-9b3c-29d192262686
Explore at:
text/x-diffAvailable download formats
Unique identifier
https://doi.org/10.4121/uuid:38529ffe-00d0-42b0-9b3c-29d192262686
Dataset updated
Jul 28, 2020
Dataset provided by
4TU.ResearchData
Authors
S. (Sobhan) Omranian Khorasani
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset contains the measurements obtained with Apache Spark using different strategies for adapting the number of executor threads to reduce I/O contention. The two main strategies explored are a static solution (number of executor threads for I/O intensive tasks pre-determined) and a dynamic solution that employs an active control loop to measure epoll_wait time.
h
spark
huggingface.co
Updated Aug 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
d.s. spero (2024). spark [Dataset]. https://huggingface.co/datasets/baebee/spark
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 15, 2024
Authors
d.s. spero
Description
baebee/spark dataset hosted on Hugging Face and contributed by the HF Datasets community
R
Spark.gg Dataset
universe.roboflow.com
zip
Updated Nov 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
precel (2024). Spark.gg Dataset [Dataset]. https://universe.roboflow.com/precel/spark.gg/model/1
Explore at:
zipAvailable download formats
Dataset updated
Nov 9, 2024
Dataset authored and provided by
precel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Players Bounding Boxes
Description
Spark.gg

## Overview Spark.gg is a dataset for object detection tasks - it contains Players annotations for 323 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).

Facebook

Twitter

Click to copy link

Link copied

Cite

Towhidul.Tonmoy (2022). Datasets for PySpark project [Dataset]. https://www.kaggle.com/datasets/towhidultonmoy/datasets-for-pyspark-project

Datasets for PySpark project

Explore at:

zip(264895 bytes)Available download formats

Dataset updated

Sep 23, 2022

Authors

Towhidul.Tonmoy

Description

Dataset

This dataset was created by Towhidul.Tonmoy

Clear search

Close search

Google apps

Main menu

Datasets for PySpark project

Dataset

Contents

Data from: PySpark SQL Dataset

Dataset

Contents

Spark-Data

Dataset for class comment analysis

Structure

Contents of the Replication Package

A sample medical dataset.

ORBITAAL: cOmpRehensive BItcoin daTaset for temorAl grAph anaLysis - Dataset...

Data from: Spark project dataset

Dataset

Contents

🎧📻 Spotify data from PySpark course

Dataset of books series that contain Spark

Big Data Machine Learning Benchmark on Spark

SPark Sites

Fire Spark Dataset

Fire Spark

Data from: Spark Detector Dataset

Spark Detector

Spark 2 Dataset

SPark 2

Super Spark An Com Dataset

Super SPARK An Com

thomas-2018-spark-all

Ocs Spark Dataset

OCS Spark

Self-adaptive Executors for Big Data Processing

spark

Spark.gg Dataset

Spark.gg

Datasets for PySpark project

Dataset

Contents