100+ datasets found

Database with raw data (CSV file).
figshare.com
txt
Updated Jun 3, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bartosz Symonides (2018). Database with raw data (CSV file). [Dataset]. http://doi.org/10.6084/m9.figshare.6411002.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.6411002.v1
Dataset updated
Jun 3, 2018
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Bartosz Symonides
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Survival after open versus endovascular repair of abdominal aortic aneurysm. Polish population analysis. (in press)
Raw Data - CSV Files
osf.io
Updated Apr 27, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Katelyn Conn (2020). Raw Data - CSV Files [Dataset]. https://osf.io/h5wbt
Explore at:
Dataset updated
Apr 27, 2020
Dataset provided by
Center for Open Sciencehttps://cos.io/
Authors
Katelyn Conn
Description
Raw Data in .csv format for use with the R data wrangling scripts.
d
Data from: CSV file of names, times, and locations of images collected by an...
catalog.data.gov
Updated Jul 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). CSV file of names, times, and locations of images collected by an unmanned aerial system (UAS) flying over Black Beach, Falmouth, Massachusetts on 18 March 2016 [Dataset]. https://catalog.data.gov/dataset/csv-file-of-names-times-and-locations-of-images-collected-by-an-unmanned-aerial-system-uas
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Falmouth, Massachusetts, Black Beach
Description
Imagery acquired with unmanned aerial systems (UAS) and coupled with structure from motion (SfM) photogrammetry can produce high-resolution topographic and visual reflectance datasets that rival or exceed lidar and orthoimagery. These new techniques are particularly useful for data collection of coastal systems, which requires high temporal and spatial resolution datasets. The U.S. Geological Survey worked in collaboration with members of the Marine Biological Laboratory and Woods Hole Analytics at Black Beach, in Falmouth, Massachusetts to explore scientific research demands on UAS technology for topographic and habitat mapping applications. This project explored the application of consumer-grade UAS platforms as a cost-effective alternative to lidar and aerial/satellite imagery to support coastal studies requiring high-resolution elevation or remote sensing data. A small UAS was used to capture low-altitude photographs and GPS devices were used to survey reference points. These data were processed in an SfM workflow to create an elevation point cloud, an orthomosaic image, and a digital elevation model.

Sample Graph Datasets in CSV Format

zenodo.org

csv

Updated Dec 9, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Edwin Carreño; Edwin Carreño (2024). Sample Graph Datasets in CSV Format [Dataset]. http://doi.org/10.5281/zenodo.14335015

Explore at:

csvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.14335015

Dataset updated

Dec 9, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Edwin Carreño; Edwin Carreño

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Sample Graph Datasets in CSV Format

Note: none of the data sets published here contain actual data, they are for testing purposes only.

Description

This data repository contains graph datasets, where each graph is represented by two CSV files: one for node information and another for edge details. To link the files to the same graph, their names include a common identifier based on the number of nodes. For example:

dataset_30_nodes_interactions.csv:contains 30 rows (nodes).
dataset_30_edges_interactions.csv: contains 47 rows (edges).
the common identifier dataset_30 refers to the same graph.

CSV nodes

Each dataset contains the following columns:

Name of the Column	Type	Description
UniProt ID	string	protein identification
label	string	protein label (type of node)
properties	string	a dictionary containing properties related to the protein.

CSV edges

Each dataset contains the following columns:

Name of the Column	Type	Description
Relationship ID	string	relationship identification
Source ID	string	identification of the source protein in the relationship
Target ID	string	identification of the target protein in the relationship
label	string	relationship label (type of relationship)
properties	string	a dictionary containing properties related to the relationship.

Metadata

Graph	Number of Nodes	Number of Edges	Sparse graph
dataset_30*	30	47	Y
dataset_60*	60	181	Y
dataset_120*	120	689	Y
dataset_240*	240	2819	Y
dataset_300*	300	4658	Y
dataset_600*	600	18004	Y
dataset_1200*	1200	71785	Y
dataset_2400*	2400	288600	Y
dataset_3000*	3000	449727	Y
dataset_6000*	6000	1799413	Y
dataset_12000*	12000	7199863	Y
dataset_24000*	24000	28792361	Y
dataset_30000*	30000	44991744	Y

This repository include two (2) additional tiny graph datasets to experiment before dealing with larger datasets.

CSV nodes (tiny graphs)

Each dataset contains the following columns:

Name of the Column	Type	Description
ID	string	node identification
label	string	node label (type of node)
properties	string	a dictionary containing properties related to the node.

CSV edges (tiny graphs)

Each dataset contains the following columns:

Name of the Column	Type	Description
ID	string	relationship identification
source	string	identification of the source node in the relationship
target	string	identification of the target node in the relationship
label	string	relationship label (type of relationship)
properties	string	a dictionary containing properties related to the relationship.

Metadata (tiny graphs)

Graph	Number of Nodes	Number of Edges	Sparse graph
dataset_dummy*	3	6	N
dataset_dummy2*	3	6	N

Full oral and gene database (csv format)
figshare.com
application/gzip
Updated May 22, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Braden Tierney (2019). Full oral and gene database (csv format) [Dataset]. http://doi.org/10.6084/m9.figshare.8001362.v1
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.8001362.v1
Dataset updated
May 22, 2019
Dataset provided by
Figsharehttp://figshare.com/
Authors
Braden Tierney
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is our complete database in csv format (with gene names, ID's, annotations, lengths, cluster sizes, and taxonomic classifications) that can be queried on our website. The difference is that it does not have the sequences – those can be downloaded in other files on figshare. This file, as well as those, can be parsed and linked by the gene identifier.We recommend downloading this database and parsing it yourself if you attempt to run a query that is too large for our servers to handle.
emp-data-csv-File
kaggle.com
Updated Aug 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dilip Srivastava (2024). emp-data-csv-File [Dataset]. https://www.kaggle.com/dilipkrsrivastava/emp-data/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 2, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Dilip Srivastava
Description
Dataset

This dataset was created by Dilip Srivastava

Contents
c
Dog Food Data Extracted from Chewy (USA) - 4,500 Records in CSV Format
crawlfeeds.com
csv, zip
Updated Apr 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Dog Food Data Extracted from Chewy (USA) - 4,500 Records in CSV Format [Dataset]. https://crawlfeeds.com/datasets/dog-food-data-extracted-from-chewy-usa-4-500-records-in-csv-format
Explore at:
zip, csvAvailable download formats
Dataset updated
Apr 22, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
The Dog Food Data Extracted from Chewy (USA) dataset contains 4,500 detailed records of dog food products sourced from one of the leading pet supply platforms in the United States, Chewy. This dataset is ideal for businesses, researchers, and data analysts who want to explore and analyze the dog food market, including product offerings, pricing strategies, brand diversity, and customer preferences within the USA.

The dataset includes essential information such as product names, brands, prices, ingredient details, product descriptions, weight options, and availability. Organized in a CSV format for easy integration into analytics tools, this dataset provides valuable insights for those looking to study the pet food market, develop marketing strategies, or train machine learning models.

Key Features:

Record Count: 4,500 dog food product records.

Data Fields: Product names, brands, prices, descriptions, ingredients .. etc. Find more fields under data points section.

Format: CSV, easy to import into databases and data analysis tools.

Source: Extracted from Chewy’s official USA platform.

Geography: Focused on the USA dog food market.

Use Cases:

Market Research: Analyze trends and preferences in the USA dog food market, including popular brands, price ranges, and product availability.

E-commerce Analysis: Understand how Chewy presents and prices dog food products, helping businesses compare their own product offerings.

Competitor Analysis: Compare different brands and products to develop competitive strategies for dog food businesses.

Machine Learning Models: Use the dataset for machine learning tasks such as product recommendation systems, demand forecasting, and price optimization.
COCI CSV dataset of all the citation data
figshare.com
bin
Updated May 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenCitations (2023). COCI CSV dataset of all the citation data [Dataset]. http://doi.org/10.6084/m9.figshare.6741422.v19
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.6741422.v19
Dataset updated
May 30, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
OpenCitations
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset contains all the citation data (in CSV format) included in COCI, released on 23 January 2023. In particular, each line of the CSV file defines a citation, and includes the following information:

[field "oci"] the Open Citation Identifier (OCI) for the citation; [field "citing"] the DOI of the citing entity; [field "cited"] the DOI of the cited entity; [field "creation"] the creation date of the citation (i.e. the publication date of the citing entity); [field "timespan"] the time span of the citation (i.e. the interval between the publication date of the cited entity and the publication date of the citing entity); [field "journal_sc"] it records whether the citation is a journal self-citations (i.e. the citing and the cited entities are published in the same journal); [field "author_sc"] it records whether the citation is an author self-citation (i.e. the citing and the cited entities have at least one author in common).

This version of the dataset contains:

1,463,920,523 citations; 77,045,952 bibliographic resources.

The size of the zipped archive is 37.5 GB, while the size of the unzipped CSV file is 238.5 GB.

Additional information about COCI can be found at the official webpage.
H
CSV data
dataverse.harvard.edu
search.dataone.org
Updated Apr 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ofra Benny (2024). CSV data [Dataset]. http://doi.org/10.7910/DVN/OFWOGY
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/OFWOGY
Dataset updated
Apr 17, 2024
Dataset provided by
Harvard Dataverse
Authors
Ofra Benny
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
csv files of the data, including the translation of fcs raw data files. Also it contains pre-processing files.
m
Download CSV DB
maclookup.app
json
Updated May 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Download CSV DB [Dataset]. https://maclookup.app/downloads/csv-database
Explore at:
jsonAvailable download formats
Dataset updated
May 28, 2025
Description
Free, daily updated MAC prefix and vendor CSV database. Download now for accurate device identification.
Data pipeline Validation And Load Testing using Multiple CSV Files
zenodo.org
explore.openaire.eu
+1more
zip
Updated Mar 26, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mainak Adhikari; Afsana Khan; Pelle Jakovits; Mainak Adhikari; Afsana Khan; Pelle Jakovits (2021). Data pipeline Validation And Load Testing using Multiple CSV Files [Dataset]. http://doi.org/10.5281/zenodo.4636798
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4636798
Dataset updated
Mar 26, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mainak Adhikari; Afsana Khan; Pelle Jakovits; Mainak Adhikari; Afsana Khan; Pelle Jakovits
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The datasets were used to validate and test the data pipeline deployment following the RADON approach. The dataset has a CSV file that contains around 32000 Twitter tweets. 100 CSV files have been created from the single CSV file and each CSV file containing 320 tweets. Those 100 CSV files are used to validate and test (performance/load testing) the data pipeline components.
Data from: Pre-compiled metrics data sets, links to yearly statistics files...
doi.pangaea.de
html, tsv
Updated Sep 8, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin G Schultz; Sabine Schröder; Olga Lyapina; Owen R Cooper (2017). Pre-compiled metrics data sets, links to yearly statistics files in CSV format [Dataset]. http://doi.org/10.1594/PANGAEA.880505
Explore at:
tsv, htmlAvailable download formats
Unique identifier
https://doi.org/10.1594/PANGAEA.880505
Dataset updated
Sep 8, 2017
Dataset provided by
PANGAEA
Authors
Martin G Schultz; Sabine Schröder; Olga Lyapina; Owen R Cooper
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 1970 - Jan 1, 2015
Variables measured
DATE/TIME, File name, File size, Uniform resource locator/link to file
Description
Errata: On Dec 2nd, 2018, several yearly statistics files were replaced with new versions to correct an inconsistency related to the computation of the "dma8epax" statistics. As written in Schultz et al. (2017) [https://doi.org/10.1525/elementa.244], Supplement 1, Table 6: "When the aggregation period is “seasonal”, “summer”, or “annual”, the 4th highest daily 8-hour maximum of the aggregation period will be computed.". The data values for these aggregation periods are correct, however, the header information in the original files stated that the respective data column would contain "average daily maximum 8-hour ozone mixing ratio (nmol mol-1)". Therefore, the header of the seasonal, summer, and annual files has been corrected. Furthermore, the "dma8epax" column in the monthly files erroneously contained 4th highest daily maximum 8-hour average values, while it should have listed monthly average values instead. The data of this metric in the monthly files have therefore been replaced. The new column header reads "avgdma8epax". The updated files contain a version label "1.1" and a brief description of the error. If you have made use of previous TOAR data files with the "dma8epax" metric, please exchange your data files.
Level Crossing Warning Bell (LCWB) Dataset
zenodo.org
data.niaid.nih.gov
Updated May 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lorenzo De Donato; Lorenzo De Donato; Valeria Vittorini; Valeria Vittorini; Francesco Flammini; Francesco Flammini; Stefano Marrone; Stefano Marrone (2023). Level Crossing Warning Bell (LCWB) Dataset [Dataset]. http://doi.org/10.5281/zenodo.7945412
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.7945412
Dataset updated
May 20, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Lorenzo De Donato; Lorenzo De Donato; Valeria Vittorini; Valeria Vittorini; Francesco Flammini; Francesco Flammini; Stefano Marrone; Stefano Marrone
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Acknowledgement
These data are a product of a research activity conducted in the context of the RAILS (Roadmaps for AI integration in the raiL Sector) project which has received funding from the Shift2Rail Joint Undertaking under the European Union’s Horizon 2020 research and innovation programme under grant agreement n. 881782 Rails. The JU receives support from the European Union’s Horizon 2020 research and innovation program and the Shift2Rail JU members other than the Union.

Disclaimers
The information and views set out in this document are those of the author(s) and do not necessarily reflect the official opinion of Shift2Rail Joint Undertaking. The JU does not guarantee the accuracy of the data included in this document. Neither the JU nor any person acting on the JU’s behalf may be held responsible for the use which may be made of the information contained therein.

This "dataset" has been created for scientific purposes only - and WITHOUT ANY COMMERCIAL purposes - to study the potentials of Deep Learning and Transfer Learning approaches. We are NOT re-distributing any video or audio; our files just contain pointers and indications needed to reproduce our study. The authors DO NOT ASSUME any responsibility for the use that other researchers or users will make of these data.

General Info
The CSV files contained in this folder (and subfolders) compose the Level Crossing (LC) Warning Bell (WB) Dataset.

When using any of these data, please mention:

De Donato, L., Marrone, S., Flammini, F., Sansone, C., Vittorini, V., Nardone, R., Mazzariello, C., and Bernaudine, F., "Intelligent Detection of Warning Bells at Level Crossings through Deep Transfer Learning for Smarter Railway Maintenance", Engineering Applications of Artificial Intelligence, Elsevier, 2023

Content of the folder
This folder contains the following subfolders and files.

"Data Files" contains all the CSV files related to the data composing the LCWB Dataset:

WB_data.csv (WB_labels.csv): representing data of the "Warning Bell (WB)" class;

NA_data.csv (NA_labels.csv): representing data of the "No Alarm (NA)" class;

GE_data.csv (GE_labels.csv): representing data of the "GEneric alarm (GE)" class.

"LCWB Dataset" contains all the JSON files that show how the aforementioned data have been distributed among training, validation, and test sets:

IT_Distribution.json and UK_distribution.json respectively show how Italian (IT) WBs and British (UK) WBs have been distributed;

The same goes for NA_Distribution.json and GE_Distribution.json, which show the distribution of NA and GE data respectively;

DatasetDistribution.json simply incorporates the content of the aforementioned JSON files in a unique file that can be exploited to obtain exactly the same dataset we adopted in our analyses.

"Additional Files" contains some CSV files related to data we adopted to further test the deep neural network leveraged in the aforementioned manuscript:

FR_DE_data.csv (FR_DE_labels.csv): representing data that have been used to test the generalisation performances of the network we exploited on LC WBs related to countries that were not considered in the training phase.

Noises_data.csv (Noises_labels.csv): representing the noises that were considered to study the behaviour of the network in case of noisy data.

CSV Files Structure
Each "XX_labels.csv" file contains, for each entry, the following information:

The identifier ("index") of the sub-class (which is not relevant in our case);

The code-name ("mid") of the class, which is used in the "XX_data.csv" file to indicate the sub-class of a specific audio;

The extended name of the class ("display_name").

Worth mentioning, sub-classes do not have a specific purpose in our task. They have been kept to maintain as much as possible the structure of the "class_labels_indices.csv" file provided by AudioSet. The same applies to the "XX_data.csv" files, which have roughly the same structures of "Evaluation", "Balanced train", and "Unbalanced train" AudioSet CSV files.

Indeed, each "XX_data.csv" file contains, for each entry, the following information:

ID: the identifier of the entry;

YTID: the YouTube identifier of the video;

start_seconds and end_seconds: which delimit the portion of audio (extracted from YTID) which is of interest for this task;

positive_labels: the label(s) associated with the audio.

Credits
The structure of the CSV files contained in this dataset, as well as part of their content, was inspired by the CSV files composing the AudioSet dataset which is made available by Google Inc. under a Creative Commons Attribution 4.0 International (CC BY 4.0) license, while its ontology is available under a Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Particularly, from AudioSet, we retrieved:

The structure of the CSV files as discussed above.

Data contained in GE_data.csv (which is a minimal portion of data made available by AudioSet) as well as the related 19 classes (in GE_labels.csv) which we selected among the hundreds of classes included in the AudioSet ontology.

Pointers contained in "XX_data.csv" files other than GE_data.csv have been retrieved manually from scratch. Then, the related "XX_labels.csv" files have been created consequently.

More about downloading the AudioSet dataset can be found here.
1000 Empirical Time series
figshare.com
researchdata.edu.au
png
Updated May 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ben Fulcher (2023). 1000 Empirical Time series [Dataset]. http://doi.org/10.6084/m9.figshare.5436136.v10
Explore at:
pngAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5436136.v10
Dataset updated
May 30, 2023
Dataset provided by
figshare
Authors
Ben Fulcher
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A diverse selection of 1000 empirical time series, along with results of an hctsa feature extraction, using v1.06 of hctsa and Matlab 2019b, computed on a server at The University of Sydney.The results of the computation are in the hctsa file, HCTSA_Empirical1000.mat for use in Matlab using v1.06 of hctsa.The same data is also provided in .csv format for the hctsa_datamatrix.csv (results of feature computation), with information about rows (time series) in hctsa_timeseries-info.csv, information about columns (features) in hctsa_features.csv (and corresponding hctsa code used to compute each feature in hctsa_masterfeatures.csv), and the data of individual time series (each line a time series, for time series described in hctsa_timeseries-info.csv) is in hctsa_timeseries-data.csv. These .csv files were produced by running >>OutputToCSV(HCTSA_Empirical1000.mat,true,true); in hctsa.The input file, INP_Empirical1000.mat, is for use with hctsa, and contains the time-series data and metadata for the 1000 time series. For example, massive feature extraction from these data on the user's machine, using hctsa, can proceed as>> TS_Init('INP_Empirical1000.mat');Some visualizations of the dataset are in CarpetPlot.png (first 1000 samples of all time series as a carpet (color) plot) and 150TS-250samples.png (conventional time-series plots of the first 250 samples of a sample of 150 time series from the dataset). More visualizations can be performed by the user using TS_PlotTimeSeries from the hctsa package.See links in references for more comprehensive documentation for performing methodological comparison using this dataset, and on how to download and use v1.06 of hctsa.
d
Comma separated value (CSV) text files of navigation and elevation data...
catalog.data.gov
Updated Jul 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Comma separated value (CSV) text files of navigation and elevation data collected by the U.S. Geological Survey during field activity 2016-030-FA offshore Sandwich Beach, MA in June 2016 [Dataset]. https://catalog.data.gov/dataset/comma-separated-value-csv-text-files-of-navigation-and-elevation-data-collected-by-the-u-s
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
The objectives of the survey were to provide bathymetric and sidescan sonar data for sediment transport studies and coastal change model development for ongoing studies of nearshore coastal dynamics along Sandwich Town Neck Beach, MA. Data collection equipment used for this investigation are mounted on an unmanned surface vehicle (USV) uniquely adapted from a commercially sold gas-powered kayak and termed the "jetyak". The jetyak design is the result of a collaborative effort between USGS and Woods Hole Oceanographic Institution (WHOI) scientists.
e
ESS-DIVE Reporting Format for Comma-separated Values (CSV) File Structure
knb.ecoinformatics.org
dataone.org
+1more
Updated May 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Terri Velliquette; Jessica Welch; Michael Crow; Ranjeet Devarakonda; Susan Heinz; Robert Crystal-Ornelas (2023). ESS-DIVE Reporting Format for Comma-separated Values (CSV) File Structure [Dataset]. http://doi.org/10.15485/1734841
Explore at:
Unique identifier
https://doi.org/10.15485/1734841
Dataset updated
May 4, 2023
Dataset provided by
ESS-DIVE
Authors
Terri Velliquette; Jessica Welch; Michael Crow; Ranjeet Devarakonda; Susan Heinz; Robert Crystal-Ornelas
Time period covered
Jan 1, 2020 - Sep 30, 2021
Description
The ESS-DIVE reporting format for Comma-separated Values (CSV) file structure is based on a combination of existing guidelines and recommendations including some found within the Earth Science Community with valuable input from the Environmental Systems Science (ESS) Community. The CSV reporting format is designed to promote interoperability and machine-readability of CSV data files while also facilitating the collection of some file-level metadata content. Tabular data in the form of rows and columns should be archived in its simplest form, and we recommend submitting these tabular data following the ESS-DIVE reporting format for generic comma-separated values (CSV) text format files. In general, the CSV file format is more likely accessible by future systems when compared to a proprietary format and CSV files are preferred because this format is easier to exchange between different programs increasing the interoperability of a data file. By defining the reporting format and providing guidelines for how to structure CSV files and some field content within, this can increase the machine-readability of the data file for extracting, compiling, and comparing the data across files and systems. Data package files are in .csv, .png, and .md. Open the .csv with e.g. Microsoft Excel, LibreOffice, or Google Sheets. Open the .md files by downloading and using a text editor (e.g., notepad or TextEdit). Open the .png in e.g. a web browser, photo viewer/editor, or Google Drive.
e
Anion Data for the East River Watershed, Colorado (2014-2022)
knb.ecoinformatics.org
search-dev.test.dataone.org
+3more
Updated Feb 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kenneth Williams; Curtis Beutler; Wendy Brown; Alexander Newman; Dylan O'Ryan; Roelof Versteeg (2023). Anion Data for the East River Watershed, Colorado (2014-2022) [Dataset]. http://doi.org/10.15485/1668054
Explore at:
Unique identifier
https://doi.org/10.15485/1668054
Dataset updated
Feb 1, 2023
Dataset provided by
ESS-DIVE
Authors
Kenneth Williams; Curtis Beutler; Wendy Brown; Alexander Newman; Dylan O'Ryan; Roelof Versteeg
Time period covered
May 2, 2014 - Mar 14, 2022
Area covered

Description
The anion data for the East River Watershed, Colorado, consists of fluoride, chloride, sulfate, nitrate, and phosphate concentrations collected at multiple, long-term monitoring sites that include stream, groundwater, and spring sampling locations. These locations represent important and/or unique end-member locations for which solute concentrations can be diagnostic of the connection between terrestrial and aquatic systems. Such locations include drainages underlined entirely or largely by shale bedrock, land covered dominated by conifers, aspens, or meadows, and drainages impacted by historic mining activity and the presence of naturally mineralized rock. Developing a long-term record of solute concentrations from a diversity of environments is a critical component of quantifying the impacts of both climate change and discrete climate perturbations, such as drought, forest mortality, and wildfire, on the riverine export of multiple anionic species. Such data may be combined with stream gauging stations co-located at each monitoring site to directly quantify the seasonal and annual mass flux of these anionic species out of the watershed. This data package contains (1) a zip file (anion_data_2014-2022.zip) containing a total of 345 data files of anion data from across the Lawrence Berkeley National Laboratory (LBNL) Watershed Function Scientific Focus Area (SFA) which is reported in .csv files per location; (2) a file-level metadata (flmd.csv) file that lists each file contained in the dataset with associated metadata; and (3) a data dictionary (dd.csv) file that contains terms/column_headers used throughout the files along with a definition, units, and data type. Update on 6/10/2022: versioned updates to this dataset was made along with these changes: (1) updated anion data for all locations up to 2021-12-31, (2) removal of units from column headers in datafiles, (3) added row underneath headers to contain units of variables, (4) restructure of units to comply with CSV reporting format requirements, and (5) the addition of the file-level metadata (flmd.csv) and data dictionary (dd.csv) were added to comply with the File-Level Metadata Reporting Format. Update on 2022-09-09: Updates were made to reporting format specific files (file-level metadata and data dictionary) to correct swapped file names, add additional details on metadata descriptions on both files, add a header_row column to enable parsing, and add version number and date to file names (v2_20220909_flmd.csv and v2_20220909_dd.csv).Update on 2022-12-20: Updates were made to both the data files and reporting format specific files. Conversion issues affecting ER-PLM locations for anion data was resolved for the data files. Additionally, the flmd and dd files were updated to reflect the updated versions of these files. Available data was added up until 2022-03-14.
m
Network traffic and code for machine learning classification
data.mendeley.com
Updated Feb 20, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Víctor Labayen (2020). Network traffic and code for machine learning classification [Dataset]. http://doi.org/10.17632/5pmnkshffm.2
Explore at:
Unique identifier
https://doi.org/10.17632/5pmnkshffm.2
Dataset updated
Feb 20, 2020
Authors
Víctor Labayen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset is a set of network traffic traces in pcap/csv format captured from a single user. The traffic is classified in 5 different activities (Video, Bulk, Idle, Web, and Interactive) and the label is shown in the filename. There is also a file (mapping.csv) with the mapping of the host's IP address, the csv/pcap filename and the activity label.

Activities:

Interactive: applications that perform real-time interactions in order to provide a suitable user experience, such as editing a file in google docs and remote CLI's sessions by SSH. Bulk data transfer: applications that perform a transfer of large data volume files over the network. Some examples are SCP/FTP applications and direct downloads of large files from web servers like Mediafire, Dropbox or the university repository among others. Web browsing: contains all the generated traffic while searching and consuming different web pages. Examples of those pages are several blogs and new sites and the moodle of the university. Vídeo playback: contains traffic from applications that consume video in streaming or pseudo-streaming. The most known server used are Twitch and Youtube but the university online classroom has also been used. Idle behaviour: is composed by the background traffic generated by the user computer when the user is idle. This traffic has been captured with every application closed and with some opened pages like google docs, YouTube and several web pages, but always without user interaction.

The capture is performed in a network probe, attached to the router that forwards the user network traffic, using a SPAN port. The traffic is stored in pcap format with all the packet payload. In the csv file, every non TCP/UDP packet is filtered out, as well as every packet with no payload. The fields in the csv files are the following (one line per packet): Timestamp, protocol, payload size, IP address source and destination, UDP/TCP port source and destination. The fields are also included as a header in every csv file.

The amount of data is stated as follows:

Bulk : 19 traces, 3599 s of total duration, 8704 MBytes of pcap files Video : 23 traces, 4496 s, 1405 MBytes Web : 23 traces, 4203 s, 148 MBytes Interactive : 42 traces, 8934 s, 30.5 MBytes Idle : 52 traces, 6341 s, 0.69 MBytes

The code of our machine learning approach is also included. There is a README.txt file with the documentation of how to use the code.
c
Waitrose Products Information Dataset in CSV Format - Comprehensive Product...
crawlfeeds.com
csv, zip
Updated Sep 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2024). Waitrose Products Information Dataset in CSV Format - Comprehensive Product Data [Dataset]. https://crawlfeeds.com/datasets/waitrose-products-information-dataset-in-csv-format-comprehensive-product-data
Explore at:
zip, csvAvailable download formats
Dataset updated
Sep 7, 2024
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
Download the comprehensive Waitrose Products Information Dataset in CSV format.

This detailed dataset includes product titles, prices, brands, descriptions, ingredients, nutritional information, and more. Ideal for data analysis, market research, and e-commerce applications.

Get accurate and up-to-date product data from Waitrose.
d
Arctic Ecoregional Sea Surface Temperature (SST) comma separated file format...
catalog.data.gov
data.amerigeoss.org
Updated Jun 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Climate Adaptation Science Centers (2024). Arctic Ecoregional Sea Surface Temperature (SST) comma separated file format (csv) [Dataset]. https://catalog.data.gov/dataset/arctic-ecoregional-sea-surface-temperature-sst-comma-separated-file-format-csv
Explore at:
Dataset updated
Jun 15, 2024
Dataset provided by
Climate Adaptation Science Centers
Description
The Sea Surface Temperature (SST) data of the Arctic show temperature ranges in degrees C using points whose locations correspond to the centroids of AVHRR Pathfinder version 5 monthly, global, 4 km data set (PFSST V50). The pathfinder rasters are available from the NOAA National Oceanographic Data Center (NODC) and from the Physical Oceanography Distributed Active Archive Center (PO.DAAC), hosted by NASA JPL. Furthermore, each point in the SST dataset is categorized by the ecoregion in which it is located. This classification is based on the Marine Ecosystems Of the World (MEOW) developed and distributed by The Nature Conservancy. These data have been QA'd in that we have selected only data values with associated quality flags of 4-7. No data points are not included here.

Facebook

Twitter

Click to copy link

Link copied

Cite

Bartosz Symonides (2018). Database with raw data (CSV file). [Dataset]. http://doi.org/10.6084/m9.figshare.6411002.v1

Database with raw data (CSV file).

Explore at:

txtAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.6411002.v1

Dataset updated

Jun 3, 2018

Dataset provided by

figshare
Figsharehttp://figshare.com/

Authors

Bartosz Symonides

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Survival after open versus endovascular repair of abdominal aortic aneurysm. Polish population analysis. (in press)

Clear search

Close search

Google apps

Main menu

Database with raw data (CSV file).

Raw Data - CSV Files

Data from: CSV file of names, times, and locations of images collected by an...

Sample Graph Datasets in CSV Format

Sample Graph Datasets in CSV Format

Description

CSV nodes

CSV edges

Metadata

CSV nodes (tiny graphs)

CSV edges (tiny graphs)

Metadata (tiny graphs)

Full oral and gene database (csv format)

emp-data-csv-File

Dataset

Contents

Dog Food Data Extracted from Chewy (USA) - 4,500 Records in CSV Format

Use Cases:

COCI CSV dataset of all the citation data

CSV data

Download CSV DB

Data pipeline Validation And Load Testing using Multiple CSV Files

Data from: Pre-compiled metrics data sets, links to yearly statistics files...

Level Crossing Warning Bell (LCWB) Dataset

1000 Empirical Time series

Comma separated value (CSV) text files of navigation and elevation data...

ESS-DIVE Reporting Format for Comma-separated Values (CSV) File Structure

Anion Data for the East River Watershed, Colorado (2014-2022)

Network traffic and code for machine learning classification

Waitrose Products Information Dataset in CSV Format - Comprehensive Product...

Arctic Ecoregional Sea Surface Temperature (SST) comma separated file format...

Database with raw data (CSV file).