100+ datasets found

GitTables 1M - CSV files
zenodo.org
explore.openaire.eu
zip
Updated Jun 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Madelon Hulsebos; Çağatay Demiralp; Paul Groth; Madelon Hulsebos; Çağatay Demiralp; Paul Groth (2022). GitTables 1M - CSV files [Dataset]. http://doi.org/10.5281/zenodo.6515973
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6515973
Dataset updated
Jun 6, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Madelon Hulsebos; Çağatay Demiralp; Paul Groth; Madelon Hulsebos; Çağatay Demiralp; Paul Groth
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset contains >800K CSV files behind the GitTables 1M corpus.

For more information about the GitTables corpus, visit:

- our website for GitTables, or

- the main GitTables download page on Zenodo.

Sample Graph Datasets in CSV Format

zenodo.org

csv

Updated Dec 9, 2024

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Edwin Carreño; Edwin Carreño (2024). Sample Graph Datasets in CSV Format [Dataset]. http://doi.org/10.5281/zenodo.14330132

Explore at:

csvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.14330132

Dataset updated

Dec 9, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Edwin Carreño; Edwin Carreño

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Sample Graph Datasets in CSV Format

Note: none of the data sets published here contain actual data, they are for testing purposes only.

Description

This data repository contains graph datasets, where each graph is represented by two CSV files: one for node information and another for edge details. To link the files to the same graph, their names include a common identifier based on the number of nodes. For example:

dataset_30_nodes_interactions.csv:contains 30 rows (nodes).
dataset_30_edges_interactions.csv: contains 47 rows (edges).
the common identifier dataset_30 refers to the same graph.

CSV nodes

Each dataset contains the following columns:

Name of the Column	Type	Description
UniProt ID	string	protein identification
label	string	protein label (type of node)
properties	string	a dictionary containing properties related to the protein.

CSV edges

Each dataset contains the following columns:

Name of the Column	Type	Description
Relationship ID	string	relationship identification
Source ID	string	identification of the source protein in the relationship
Target ID	string	identification of the target protein in the relationship
label	string	relationship label (type of relationship)
properties	string	a dictionary containing properties related to the relationship.

Metadata

Graph	Number of Nodes	Number of Edges	Sparse graph
dataset_30*	30	47	Y
dataset_60*	60	181	Y
dataset_120*	120	689	Y
dataset_240*	240	2819	Y
dataset_300*	300	4658	Y
dataset_600*	600	18004	Y
dataset_1200*	1200	71785	Y
dataset_2400*	2400	288600	Y
dataset_3000*	3000	449727	Y
dataset_6000*	6000	1799413	Y
dataset_12000*	12000	7199863	Y
dataset_24000*	24000	28792361	Y

This repository include two (2) additional tiny graph datasets to experiment before dealing with larger datasets.

CSV nodes (tiny graphs)

Each dataset contains the following columns:

Name of the Column	Type	Description
ID	string	node identification
label	string	node label (type of node)
properties	string	a dictionary containing properties related to the node.

CSV edges (tiny graphs)

Each dataset contains the following columns:

Name of the Column	Type	Description
ID	string	relationship identification
source	string	identification of the source node in the relationship
target	string	identification of the target node in the relationship
label	string	relationship label (type of relationship)
properties	string	a dictionary containing properties related to the relationship.

Metadata (tiny graphs)

Graph	Number of Nodes	Number of Edges	Sparse graph
dataset_dummy*	3	6	N
dataset_dummy2*	3	6	N

d
Gravity Data for Island of Hawai`i.csv
catalog.data.gov
data.usgs.gov
+2more
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Gravity Data for Island of Hawai`i.csv [Dataset]. https://catalog.data.gov/dataset/gravity-data-for-island-of-hawaii-csv
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
U.S. Geological Survey
Area covered
Island of Hawai'i, Hawaii
Description
This data set includes gravity measurements for the Island of Hawai`i collected as the source data for "Deep magmatic structures of Hawaiian volcanoes, imaged by three-dimensional gravity models" (Kauahikaua, Hildenbrand, and Webring, 2000). Data for 3,611 observations are stored as a single table and disseminated in .CSV format. Each observation record includes values for field station ID, latitude and longitude (in both Old Hawaiian and WGS84 projections), elevation, and Observed Gravity value. See associated publication for reduction and interpretation of these data.
Gene expression csv files
figshare.com
txt
Updated Jun 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cristina Alvira (2023). Gene expression csv files [Dataset]. http://doi.org/10.6084/m9.figshare.21861975.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21861975.v1
Dataset updated
Jun 12, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Cristina Alvira
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Csv files containing all detectable genes.
POCI CSV dataset of all the citation data
figshare.com
zip
Updated Dec 27, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenCitations (2022). POCI CSV dataset of all the citation data [Dataset]. http://doi.org/10.6084/m9.figshare.21776351.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21776351.v1
Dataset updated
Dec 27, 2022
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
OpenCitations
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset contains all the citation data (in CSV format) included in POCI, released on 27 December 2022. In particular, each line of the CSV file defines a citation, and includes the following information:

[field "oci"] the Open Citation Identifier (OCI) for the citation; [field "citing"] the PMID of the citing entity; [field "cited"] the PMID of the cited entity; [field "creation"] the creation date of the citation (i.e. the publication date of the citing entity); [field "timespan"] the time span of the citation (i.e. the interval between the publication date of the cited entity and the publication date of the citing entity); [field "journal_sc"] it records whether the citation is a journal self-citations (i.e. the citing and the cited entities are published in the same journal); [field "author_sc"] it records whether the citation is an author self-citation (i.e. the citing and the cited entities have at least one author in common).

This version of the dataset contains:

717,654,703 citations; 26,024,862 bibliographic resources.

The size of the zipped archive is 9.6 GB, while the size of the unzipped CSV file is 50 GB. Additional information about POCI at official webpage.
car data.csv_Final
kaggle.com
zip
Updated Feb 9, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
kkaranismm (2024). car data.csv_Final [Dataset]. https://www.kaggle.com/datasets/kkaranismm/car-data-csv-final
Explore at:
zip(2555 bytes)Available download formats
Dataset updated
Feb 9, 2024
Authors
kkaranismm
Description
Dataset

This dataset was created by kkaranismm

Contents
a
PopCenterCounty US CSV
tndata-myutk.opendata.arcgis.com
arc-gis-hub-home-arcgishub.hub.arcgis.com
Updated Jan 24, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Tennessee (2022). PopCenterCounty US CSV [Dataset]. https://tndata-myutk.opendata.arcgis.com/datasets/f94180ba29c543d5b989604c36bf2c11
Explore at:
Dataset updated
Jan 24, 2022
Dataset authored and provided by
University of Tennessee
License
Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
License information was derived automatically
Area covered
Pacific Ocean, North Pacific Ocean
Description
The mean "Center of Population" for each county in 2000, 2010 and 2020, as published by the United State Census Bureau, is shown in this layer..The population center for each county represents the point where a flat and rigid representation of the county would balance if identical weights for each person were placed at their residence. Looking at the movement of the point over time helps convey information about the predominant population trend in the county including areas loosing population or areas gaining population.For each location, the nearest feature from the US Geologic Survey Geographic Names Information System (GNIS) was appended along with the GNIS feature class and county where the point was located.
POCI CSV dataset of the provenance information of all the citation data
figshare.com
zip
Updated Dec 27, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
POCI CSV dataset of the provenance information of all the citation data [Dataset]. https://figshare.com/articles/dataset/POCI_CSV_dataset_of_the_provenance_information_of_all_the_citation_data/21776456
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21776456.v1
Dataset updated
Dec 27, 2022
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
OpenCitations
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset contains the provenance information (in CSV format) of all the citation data included in POCI, released on 27 December 2022. In particular, each line of the CSV file defines a citation, and includes the following information:

[field "oci"] the Open Citation Identifier (OCI) for the citation; [field "snapshot"] the identifier of the snapshot; [field "agent"] the name of the agent that have created the citation data; [field "source"] the URL of the source dataset from where the citation data have been extracted; [field "created"] the creation time of the citation data. [field "invalidated"] the start of the destruction, cessation, or expiry of an existing entity by an activity; [field "description"] a textual description of the activity made; [field "update"] the UPDATE SPARQL query that keeps track of which metadata have been modified.

The size of the zipped archive is 5 GB, while the size of the unzipped CSV file is 122 GB.Additional information about POCI at official webpage.
d
Comma separated value (CSV) text files of navigation and elevation data...
catalog.data.gov
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Comma separated value (CSV) text files of navigation and elevation data collected by the U.S. Geological Survey during field activity 2016-030-FA offshore Sandwich Beach, MA in June 2016 [Dataset]. https://catalog.data.gov/dataset/comma-separated-value-csv-text-files-of-navigation-and-elevation-data-collected-by-the-u-s
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
East Sandwich Beach
Description
The objectives of the survey were to provide bathymetric and sidescan sonar data for sediment transport studies and coastal change model development for ongoing studies of nearshore coastal dynamics along Sandwich Town Neck Beach, MA. Data collection equipment used for this investigation are mounted on an unmanned surface vehicle (USV) uniquely adapted from a commercially sold gas-powered kayak and termed the "jetyak". The jetyak design is the result of a collaborative effort between USGS and Woods Hole Oceanographic Institution (WHOI) scientists.
d
socialLoafingExperiment_2023-05-31.csv - Dataset - data.govt.nz - discover...
catalogue.data.govt.nz
Updated Feb 1, 2001
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2001). socialLoafingExperiment_2023-05-31.csv - Dataset - data.govt.nz - discover and use data [Dataset]. https://catalogue.data.govt.nz/dataset/oai-figshare-com-article-25751550
Explore at:
Dataset updated
Feb 1, 2001
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In my dissertation, I explored the interplay between queue configuration and server performance, with the aim to identify the underlying mechanisms driving distinct behaviors.In the first lab experiment, I investigate the group dynamics of servers operating in various queue configurations. I discover that shared queue structures tend to heighten servers' perceptions that their individual efforts cannot be identified, and that their contributions are dispensable, both of which can demotivate servers and lead to a decrease in their working speed.This file is the raw data for my first experiment. This is the csv file from M-turk. The other file is my code in R to analyse the dataset.
UCI and OpenML Data Sets for Ordinal Quantification
zenodo.org
zip
Updated Jul 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mirko Bunse; Mirko Bunse; Alejandro Moreo; Alejandro Moreo; Fabrizio Sebastiani; Fabrizio Sebastiani; Martin Senz; Martin Senz (2023). UCI and OpenML Data Sets for Ordinal Quantification [Dataset]. http://doi.org/10.5281/zenodo.8177302
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8177302
Dataset updated
Jul 25, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mirko Bunse; Mirko Bunse; Alejandro Moreo; Alejandro Moreo; Fabrizio Sebastiani; Fabrizio Sebastiani; Martin Senz; Martin Senz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These four labeled data sets are targeted at ordinal quantification. The goal of quantification is not to predict the label of each individual instance, but the distribution of labels in unlabeled sets of data.

With the scripts provided, you can extract CSV files from the UCI machine learning repository and from OpenML. The ordinal class labels stem from a binning of a continuous regression label.

We complement this data set with the indices of data items that appear in each sample of our evaluation. Hence, you can precisely replicate our samples by drawing the specified data items. The indices stem from two evaluation protocols that are well suited for ordinal quantification. To this end, each row in the files app_val_indices.csv, app_tst_indices.csv, app-oq_val_indices.csv, and app-oq_tst_indices.csv represents one sample.

Our first protocol is the artificial prevalence protocol (APP), where all possible distributions of labels are drawn with an equal probability. The second protocol, APP-OQ, is a variant thereof, where only the smoothest 20% of all APP samples are considered. This variant is targeted at ordinal quantification tasks, where classes are ordered and a similarity of neighboring classes can be assumed.

Usage

You can extract four CSV files through the provided script extract-oq.jl, which is conveniently wrapped in a Makefile. The Project.toml and Manifest.toml specify the Julia package dependencies, similar to a requirements file in Python.

Preliminaries: You have to have a working Julia installation. We have used Julia v1.6.5 in our experiments.

Data Extraction: In your terminal, you can call either

make

(recommended), or

julia --project="." --eval "using Pkg; Pkg.instantiate()" julia --project="." extract-oq.jl

Outcome: The first row in each CSV file is the header. The first column, named "class_label", is the ordinal class.

Further Reading

Implementation of our experiments: https://github.com/mirkobunse/regularized-oq
d
Open Data T3 2021 (format csv)
data.gouv.fr
csv
Updated Dec 16, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Avicca (2021). Open Data T3 2021 (format csv) [Dataset]. https://www.data.gouv.fr/en/datasets/open-data-t3-2021-format-csv/
Explore at:
csv(11971583)Available download formats
Dataset updated
Dec 16, 2021
Dataset authored and provided by
Avicca
Description
Open Data T3 2021 (format csv)
l
Drug consumption database: original.csv
figshare.le.ac.uk
figshare.com
txt
Updated May 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Drug consumption database: original.csv [Dataset]. https://figshare.le.ac.uk/articles/dataset/Drug_consumption_database_original_csv/7588415
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.25392/leicester.data.7588415.v1
Dataset updated
May 30, 2023
Dataset provided by
University of Leicester
Authors
Elaine Fehrman; Vincent Egan; Evgeny Mirkes
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Drug consumption database with original values of attributes. DescriptionDB.pdf contains detailed description of database.
csv file for kaggle by muni
kaggle.com
zip
Updated Jul 23, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MUNEERA (2019). csv file for kaggle by muni [Dataset]. https://www.kaggle.com/muneeramoinudheen/csv-file-for-kaggle-by-muni
Explore at:
zip(267 bytes)Available download formats
Dataset updated
Jul 23, 2019
Authors
MUNEERA
Description
Dataset

This dataset was created by MUNEERA

Contents
Z
Brussel mobility Twitter sentiment analysis CSV Dataset
data.niaid.nih.gov
zenodo.org
Updated May 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Betancur Arenas, Juliana (2024). Brussel mobility Twitter sentiment analysis CSV Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11401123
Explore at:
Dataset updated
May 31, 2024
Dataset provided by
van Vessem, Charlotte
Ginis, Vincent
Tori, Floriano
Betancur Arenas, Juliana
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Brussels
Description
SSH CENTRE (Social Sciences and Humanities for Climate, Energy aNd Transport Research Excellence) is a Horizon Europe project, engaging directly with stakeholders across research, policy, and business (including citizens) to strengthen social innovation, SSH-STEM collaboration, transdisciplinary policy advice, inclusive engagement, and SSH communities across Europe, accelerating the EU’s transition to carbon neutrality. SSH CENTRE is based in a range of activities related to Open Science, inclusivity and diversity – especially with regards Southern and Eastern Europe and different career stages – including: development of novel SSH-STEM collaborations to facilitate the delivery of the EU Green Deal; SSH knowledge brokerage to support regions in transition; and the effective design of strategies for citizen engagement in EU R&I activities. Outputs include action-led agendas and building stakeholder synergies through regular Policy Insight events.This is captured in a high-profile virtual SSH CENTRE generating and sharing best practice for SSH policy advice, overcoming fragmentation to accelerate the EU’s journey to a sustainable future.The documents uploaded here are part of WP2 whereby novel, interdisciplinary teams were provided funding to undertake activities to develop a policy recommendation related to EU Green Deal policy. Each of these policy recommendations, and the activities that inform them, will be written-up as a chapter in an edited book collection. Three books will make up this edited collection - one on climate, one on energy and one on mobility. As part of writing a chapter for the SSH CENTRE book on ‘Mobility’, we set out to analyse the sentiment of users on Twitter regarding shared and active mobility modes in Brussels. This involved us collecting tweets between 2017-2022. A tweet was collected if it contained a previously defined mobility keyword (for example: metro) and either the name of a (local) politician, a neighbourhood or municipality, or a (shared) mobility provider. The files attached to this Zenodo webpage is a csv files containing the tweets collected.”.
a
TMS daily traffic counts CSV
hub.arcgis.com
opendata-nzta.opendata.arcgis.com
Updated Aug 30, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Waka Kotahi (2020). TMS daily traffic counts CSV [Dataset]. https://hub.arcgis.com/datasets/9cb86b342f2d4f228067a7437a7f7313
Explore at:
Dataset updated
Aug 30, 2020
Dataset authored and provided by
Waka Kotahi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
You can also access an API version of this dataset.

TMS

(traffic monitoring system) daily-updated traffic counts API

Important note: due to the size of this dataset, you won't be able to open it fully in Excel. Use notepad / R / any software package which can open more than a million rows.

Data reuse caveats: as per license.

Data quality

statement: please read the accompanying user manual, explaining:

how

this data is collected identification of count stations traffic monitoring technology monitoring hierarchy and conventions typical survey specification data calculation TMS operation.

Traffic

monitoring for state highways: user manual

[PDF 465 KB]

The data is at daily granularity. However, the actual update

frequency of the data depends on the contract the site falls within. For telemetry

sites it's once a week on a Wednesday. Some regional sites are fortnightly, and

some monthly or quarterly. Some are only 4 weeks a year, with timing depending

on contractors’ programme of work.

Data quality caveats: you must use this data in

conjunction with the user manual and the following caveats.

The

road sensors used in data collection are subject to both technical errors and environmental interference.Data is compiled from a variety of sources. Accuracy may vary and the data should only be used as a guide.As not all road sections are monitored, a direct calculation of Vehicle Kilometres Travelled (VKT) for a region is not possible.Data is sourced from Waka Kotahi New Zealand Transport Agency TMS data.For sites that use dual loops classification is by length. Vehicles with a length of less than 5.5m are classed as light vehicles. Vehicles over 11m long are classed as heavy vehicles. Vehicles between 5.5 and 11m are split 50:50 into light and heavy.In September 2022, the National Telemetry contract was handed to a new contractor. During the handover process, due to some missing documents and aged technology, 40 of the 96 national telemetry traffic count sites went offline. Current contractor has continued to upload data from all active sites and have gradually worked to bring most offline sites back online. Please note and account for possible gaps in data from National Telemetry Sites.

The NZTA Vehicle

Classification Relationships diagram below shows the length classification (typically dual loops) and axle classification (typically pneumatic tube counts),

and how these map to the Monetised benefits and costs manual, table A37,

page 254.

Monetised benefits and costs manual [PDF 9 MB]

For the full TMS

classification schema see Appendix A of the traffic counting manual vehicle

classification scheme (NZTA 2011), below.

Traffic monitoring for state highways: user manual [PDF 465 KB]

State highway traffic monitoring (map)

State highway traffic monitoring sites
Z
Data from: Bio-logger Ethogram Benchmark: A benchmark for computational...
data.niaid.nih.gov
portalcientifico.unileon.es
+3more
Updated Apr 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zacarian, Katherine (2024). Bio-logger Ethogram Benchmark: A benchmark for computational analysis of animal behavior, using animal-borne tags [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7807280
Explore at:
Dataset updated
Apr 19, 2024
Dataset provided by
Vainio, Outi
Mata-Silva, Vicente
Vehkaoja, Antti
Baglione, Vittorio
Trapote, Eva
Zacarian, Katherine
Jeantet, Lorène
Ladds, Monique A.
Moreno-González, Víctor
Maekawa, Takuya
Hoffman, Benjamin
DeSantis, Dominic L.
Chevallier, Damien
Friedlaender, Ari
Cusimano, Maddie
Canestrari, Daniela
Yoda, Ken
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains the datasets and experiment results presented in our arxiv paper:

B. Hoffman, M. Cusimano, V. Baglione, D. Canestrari, D. Chevallier, D. DeSantis, L. Jeantet, M. Ladds, T. Maekawa, V. Mata-Silva, V. Moreno-González, A. Pagano, E. Trapote, O. Vainio, A. Vehkaoja, K. Yoda, K. Zacarian, A. Friedlaender, "A benchmark for computational analysis of animal behavior, using animal-borne tags," 2023.

Standardized code to implement, train, and evaluate models can be found at https://github.com/earthspecies/BEBE/.

Please note the licenses in each dataset folder.

Zip folders beginning with "formatted": These are the datasets we used to run the experiments reported in the benchmark paper.

Zip folders beginning with "raw": These are the unprocessed datasets used in BEBE. Code to process these raw datasets into the formatted ones used by BEBE can be found at https://github.com/earthspecies/BEBE-datasets/.

Zip folders beginning with "experiments": Results of the cross-validation experiments reported in the paper, as well as hyperparameter optimization. Confusion matrices for all experiments can also be found here. Note that dt, rf, and svm refer to the feature set from Nathan et al., 2012.

Results used in Fig. 4 of arxiv paper (deep neural networks vs. classical models){dataset}_ harnet_nogyr{dataset}_CRNN{dataset}_CNN{dataset}_dt{dataset}_rf{dataset}_svm{dataset}_wavelet_dt{dataset}_wavelet_rf{dataset}_wavelet_svm

Results used in Fig. 5D of arxiv paper (full data setting)If dataset contains gyroscope (HAR, jeantet_turtles, vehkaoja_dogs):{dataset}_harnet_nogyr{dataset}_harnet_random_nogyr{dataset}_harnet_unfrozen_nogyr{dataset}_RNN_nogyr{dataset}_CRNN_nogyr{dataset}_rf_nogyrOtherwise:{dataset}_harnet_nogyr{dataset}_harnet_unfrozen_nogyr{dataset}_harnet_random_nogyr{dataset}_RNN_nogyr{dataset}_CRNN{dataset}_rf

Results used in Fig. 5E of arxiv paper (reduced data setting)If dataset contains gyroscope (HAR, jeantet_turtles, vehkaoja_dogs):{dataset}_harnet_low_data_nogyr{dataset}_harnet_random_low_data_nogyr{dataset}_harnet_unfrozen_low_data_nogyr{dataset}_RNN_low_data_nogyr{dataset}_wavelet_RNN_low_data_nogyr{dataset}_CRNN_low_data_nogyr{dataset}_rf_low_data_nogyr

Otherwise:{dataset}_harnet_low_data_nogyr{dataset}_harnet_random_low_data_nogyr{dataset}_harnet_unfrozen_low_data_nogyr{dataset}_RNN_low_data_nogyr{dataset}_wavelet_RNN_low_data_nogyr{dataset}_CRNN_low_data{dataset}_rf_low_data

CSV files: we also include summaries of the experimental results in experiments_summary.csv, experiments_by_fold_individual.csv, experiments_by_fold_behavior.csv.

experiments_summary.csv - results averaged over individuals and behavior classesdataset (str): name of datasetexperiment (str): name of model with experiment setting fig4 (bool): True if dataset+experiment was used in figure 4 of arxiv paperfig5d (bool): True if dataset+experiment was used in figure 5d of arxiv paperfig5e (bool): True if dataset+experiment was used in figure 5e of arxiv paperf1_mean (float): mean of macro-averaged F1 score, averaged over individuals in test foldsf1_std (float): standard deviation of macro-averaged F1 score, computed over individuals in test foldsprec_mean, prec_std (float): analogous for precisionrec_mean, rec_std (float): analogous for recallexperiments_by_fold_individual.csv - results per individual in the test foldsdataset (str): name of datasetexperiment (str): name of model with experiment setting fig4 (bool): True if dataset+experiment was used in figure 4 of arxiv paperfig5d (bool): True if dataset+experiment was used in figure 5d of arxiv paperfig5e (bool): True if dataset+experiment was used in figure 5e of arxiv paperfold (int): test fold indexindividual (int): individuals are numbered zero-indexed, starting from fold 1f1 (float): macro-averaged f1 score for this individualprecision (float): macro-averaged precision for this individualrecall (float): macro-averaged recall for this individual

experiments_by_fold_behavior.csv - results per behavior class, for each test folddataset (str): name of datasetexperiment (str): name of model with experiment setting fig4 (bool): True if dataset+experiment was used in figure 4 of arxiv paperfig5d (bool): True if dataset+experiment was used in figure 5d of arxiv paperfig5e (bool): True if dataset+experiment was used in figure 5e of arxiv paperfold (int): test fold indexbehavior_class (str): name of behavior classf1 (float): f1 score for this behavior, averaged over individuals in the test foldprecision (float): precision for this behavior, averaged over individuals in the test foldrecall (float): recall for this behavior, averaged over individuals in the test foldtrain_ground_truth_label_counts (int): number of timepoints labeled with this behavior class, in the training set
d
Pre-compiled metrics data sets, links to yearly statistics files in CSV...
b2find.dkrz.de
Updated Dec 2, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2018). Pre-compiled metrics data sets, links to yearly statistics files in CSV format - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/a5b35769-bca3-51fc-846c-94256507be1e
Explore at:
Dataset updated
Dec 2, 2018
Description
Errata: On Dec 2nd, 2018, several yearly statistics files were replaced with new versions to correct an inconsistency related to the computation of the "dma8epax" statistics. As written in Schultz et al. (2017) [https://doi.org/10.1525/elementa.244], Supplement 1, Table 6: "When the aggregation period is “seasonal”, “summer”, or “annual”, the 4th highest daily 8-hour maximum of the aggregation period will be computed.". The data values for these aggregation periods are correct, however, the header information in the original files stated that the respective data column would contain "average daily maximum 8-hour ozone mixing ratio (nmol mol-1)". Therefore, the header of the seasonal, summer, and annual files has been corrected. Furthermore, the "dma8epax" column in the monthly files erroneously contained 4th highest daily maximum 8-hour average values, while it should have listed monthly average values instead. The data of this metric in the monthly files have therefore been replaced. The new column header reads "avgdma8epax". The updated files contain a version label "1.1" and a brief description of the error. If you have made use of previous TOAR data files with the "dma8epax" metric, please exchange your data files.
h
truthy-dpo-csv
huggingface.co
Updated Jan 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CultriX (2024). truthy-dpo-csv [Dataset]. https://huggingface.co/datasets/CultriX/truthy-dpo-csv
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 23, 2024
Authors
CultriX
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
CultriX/truthy-dpo-csv dataset hosted on Hugging Face and contributed by the HF Datasets community
u
Inclusive design and dissemination in digital scholarly editing : CSV...
repository.uantwerpen.be
works.hcommons.org
Updated 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bleeker, Elli; Dillen, Wout; Kelly, Aodhán; Martinez, Merisa; Sichani, Anna-Maria (2019). Inclusive design and dissemination in digital scholarly editing : CSV dataset [Dataset]. http://doi.org/10.17613/C3M9-KQ76
Explore at:
Unique identifier
https://doi.org/10.17613/C3M9-KQ76
Dataset updated
2019
Dataset provided by
University of Antwerp
Faculty of Arts. Literature
Authors
Bleeker, Elli; Dillen, Wout; Kelly, Aodhán; Martinez, Merisa; Sichani, Anna-Maria
Description
In 2017, the authors designed a survey titled Inclusive Design and Dissemination in Digital Scholarly Editions. The survey was designed and hosted using SurveyMonkey (https://www.surveymonkey.com) and was open from 1 July to 31 November 2017. The survey received 219 responses, 109 of which completed every required question in the survey – resulting in a completion rate of 49,7%. At the 2017 ADHO conference in Montreal (Canada), the authors participated in a panel discussion on the subject, where they discussed some preliminary survey results (Sichani et al. 2017). A more detailed treatment of the complete survey results will be published Variants 14 (https://journals.openedition.org/variants/), the journal of the ESTS (Martinez et al. forthcoming). In view of this publication, the authors have deposited the survey results as data sets here. These include a CSV file of the survey’s data (scrubbed of respondents’ personal information), and the current PDF with graphical representations of the survey’s statistics. Both files present the survey’s raw, uncorrected (albeit redacted) data, as recorded and automatically analyzed by SurveyMonkey, including response rates per question and diagrams. As the uncorrected survey results, some of the data offered in these files may differ slightly from those presented in the forthcoming Variants article. For their qualitative analysis of the survey’s data in that publication, the authors corrected the data (e.g. excluding invalid answers, or reclassifying incorrectly classified answers), and interpreted them (e.g. creating categories for similar responses). Such interventions were justified in the relevant sections of the Variants article. Rather than depositing the corrected version of the survey’s results in the Humanities Commons repository, the authors decided to publish the uncorrected results instead, so as not to force their interpretation of the survey’s data on future research.

Facebook

Twitter

Click to copy link

Link copied

Cite

Madelon Hulsebos; Çağatay Demiralp; Paul Groth; Madelon Hulsebos; Çağatay Demiralp; Paul Groth (2022). GitTables 1M - CSV files [Dataset]. http://doi.org/10.5281/zenodo.6515973

GitTables 1M - CSV files

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.6515973

Dataset updated

Jun 6, 2022

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Madelon Hulsebos; Çağatay Demiralp; Paul Groth; Madelon Hulsebos; Çağatay Demiralp; Paul Groth

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

This dataset contains >800K CSV files behind the GitTables 1M corpus.

For more information about the GitTables corpus, visit:

- our website for GitTables, or

- the main GitTables download page on Zenodo.

Clear search

Close search

Google apps

Main menu

GitTables 1M - CSV files

Sample Graph Datasets in CSV Format

Sample Graph Datasets in CSV Format

Description

CSV nodes

CSV edges

Metadata

CSV nodes (tiny graphs)

CSV edges (tiny graphs)

Metadata (tiny graphs)

Gravity Data for Island of Hawai`i.csv

Gene expression csv files

POCI CSV dataset of all the citation data

car data.csv_Final

Dataset

Contents

PopCenterCounty US CSV

POCI CSV dataset of the provenance information of all the citation data

Comma separated value (CSV) text files of navigation and elevation data...

socialLoafingExperiment_2023-05-31.csv - Dataset - data.govt.nz - discover...

UCI and OpenML Data Sets for Ordinal Quantification

Open Data T3 2021 (format csv)

Drug consumption database: original.csv

csv file for kaggle by muni

Dataset

Contents

Brussel mobility Twitter sentiment analysis CSV Dataset

TMS daily traffic counts CSV

Data from: Bio-logger Ethogram Benchmark: A benchmark for computational...

Pre-compiled metrics data sets, links to yearly statistics files in CSV...

truthy-dpo-csv

Inclusive design and dissemination in digital scholarly editing : CSV...

GitTables 1M - CSV files