100+ datasets found
  1. GitTables 1M - CSV files

    • zenodo.org
    • explore.openaire.eu
    zip
    Updated Jun 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Madelon Hulsebos; Çağatay Demiralp; Paul Groth; Madelon Hulsebos; Çağatay Demiralp; Paul Groth (2022). GitTables 1M - CSV files [Dataset]. http://doi.org/10.5281/zenodo.6515973
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 6, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Madelon Hulsebos; Çağatay Demiralp; Paul Groth; Madelon Hulsebos; Çağatay Demiralp; Paul Groth
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains >800K CSV files behind the GitTables 1M corpus.

    For more information about the GitTables corpus, visit:

    - our website for GitTables, or

    - the main GitTables download page on Zenodo.

  2. Sample Graph Datasets in CSV Format

    • zenodo.org
    csv
    Updated Dec 9, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Edwin Carreño; Edwin Carreño (2024). Sample Graph Datasets in CSV Format [Dataset]. http://doi.org/10.5281/zenodo.14330132
    Explore at:
    csvAvailable download formats
    Dataset updated
    Dec 9, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Edwin Carreño; Edwin Carreño
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Sample Graph Datasets in CSV Format

    Note: none of the data sets published here contain actual data, they are for testing purposes only.

    Description

    This data repository contains graph datasets, where each graph is represented by two CSV files: one for node information and another for edge details. To link the files to the same graph, their names include a common identifier based on the number of nodes. For example:

    • dataset_30_nodes_interactions.csv:contains 30 rows (nodes).
    • dataset_30_edges_interactions.csv: contains 47 rows (edges).
    • the common identifier dataset_30 refers to the same graph.

    CSV nodes

    Each dataset contains the following columns:

    Name of the ColumnTypeDescription
    UniProt IDstringprotein identification
    labelstringprotein label (type of node)
    propertiesstringa dictionary containing properties related to the protein.

    CSV edges

    Each dataset contains the following columns:

    Name of the ColumnTypeDescription
    Relationship IDstringrelationship identification
    Source IDstringidentification of the source protein in the relationship
    Target IDstringidentification of the target protein in the relationship
    labelstringrelationship label (type of relationship)
    propertiesstringa dictionary containing properties related to the relationship.

    Metadata

    GraphNumber of NodesNumber of EdgesSparse graph

    dataset_30*

    30

    47

    Y

    dataset_60*

    60

    181

    Y

    dataset_120*

    120

    689

    Y

    dataset_240*

    240

    2819

    Y

    dataset_300*

    300

    4658

    Y

    dataset_600*

    600

    18004

    Y

    dataset_1200*

    1200

    71785

    Y

    dataset_2400*

    2400

    288600

    Y

    dataset_3000*

    3000

    449727

    Y

    dataset_6000*

    6000

    1799413

    Y

    dataset_12000*

    12000

    7199863

    Y

    dataset_24000*

    24000

    28792361

    Y

    This repository include two (2) additional tiny graph datasets to experiment before dealing with larger datasets.

    CSV nodes (tiny graphs)

    Each dataset contains the following columns:

    Name of the ColumnTypeDescription
    IDstringnode identification
    labelstringnode label (type of node)
    propertiesstringa dictionary containing properties related to the node.

    CSV edges (tiny graphs)

    Each dataset contains the following columns:

    Name of the ColumnTypeDescription
    IDstringrelationship identification
    sourcestringidentification of the source node in the relationship
    targetstringidentification of the target node in the relationship
    labelstringrelationship label (type of relationship)
    propertiesstringa dictionary containing properties related to the relationship.

    Metadata (tiny graphs)

    GraphNumber of NodesNumber of EdgesSparse graph
    dataset_dummy*36N
    dataset_dummy2*36N
  3. d

    Gravity Data for Island of Hawai`i.csv

    • catalog.data.gov
    • data.usgs.gov
    • +2more
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Gravity Data for Island of Hawai`i.csv [Dataset]. https://catalog.data.gov/dataset/gravity-data-for-island-of-hawaii-csv
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    U.S. Geological Survey
    Area covered
    Island of Hawai'i, Hawaii
    Description

    This data set includes gravity measurements for the Island of Hawai`i collected as the source data for "Deep magmatic structures of Hawaiian volcanoes, imaged by three-dimensional gravity models" (Kauahikaua, Hildenbrand, and Webring, 2000). Data for 3,611 observations are stored as a single table and disseminated in .CSV format. Each observation record includes values for field station ID, latitude and longitude (in both Old Hawaiian and WGS84 projections), elevation, and Observed Gravity value. See associated publication for reduction and interpretation of these data.

  4. Gene expression csv files

    • figshare.com
    txt
    Updated Jun 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cristina Alvira (2023). Gene expression csv files [Dataset]. http://doi.org/10.6084/m9.figshare.21861975.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 12, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Cristina Alvira
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Csv files containing all detectable genes.

  5. POCI CSV dataset of all the citation data

    • figshare.com
    zip
    Updated Dec 27, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenCitations ​ (2022). POCI CSV dataset of all the citation data [Dataset]. http://doi.org/10.6084/m9.figshare.21776351.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 27, 2022
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    OpenCitations ​
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains all the citation data (in CSV format) included in POCI, released on 27 December 2022. In particular, each line of the CSV file defines a citation, and includes the following information:

    [field "oci"] the Open Citation Identifier (OCI) for the citation; [field "citing"] the PMID of the citing entity; [field "cited"] the PMID of the cited entity; [field "creation"] the creation date of the citation (i.e. the publication date of the citing entity); [field "timespan"] the time span of the citation (i.e. the interval between the publication date of the cited entity and the publication date of the citing entity); [field "journal_sc"] it records whether the citation is a journal self-citations (i.e. the citing and the cited entities are published in the same journal); [field "author_sc"] it records whether the citation is an author self-citation (i.e. the citing and the cited entities have at least one author in common).

    This version of the dataset contains:

    717,654,703 citations; 26,024,862 bibliographic resources.

    The size of the zipped archive is 9.6 GB, while the size of the unzipped CSV file is 50 GB. Additional information about POCI at official webpage.

  6. car data.csv_Final

    • kaggle.com
    zip
    Updated Feb 9, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    kkaranismm (2024). car data.csv_Final [Dataset]. https://www.kaggle.com/datasets/kkaranismm/car-data-csv-final
    Explore at:
    zip(2555 bytes)Available download formats
    Dataset updated
    Feb 9, 2024
    Authors
    kkaranismm
    Description

    Dataset

    This dataset was created by kkaranismm

    Contents

  7. a

    PopCenterCounty US CSV

    • tndata-myutk.opendata.arcgis.com
    • arc-gis-hub-home-arcgishub.hub.arcgis.com
    Updated Jan 24, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Tennessee (2022). PopCenterCounty US CSV [Dataset]. https://tndata-myutk.opendata.arcgis.com/datasets/f94180ba29c543d5b989604c36bf2c11
    Explore at:
    Dataset updated
    Jan 24, 2022
    Dataset authored and provided by
    University of Tennessee
    License

    Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
    License information was derived automatically

    Area covered
    Pacific Ocean, North Pacific Ocean
    Description

    The mean "Center of Population" for each county in 2000, 2010 and 2020, as published by the United State Census Bureau, is shown in this layer..The population center for each county represents the point where a flat and rigid representation of the county would balance if identical weights for each person were placed at their residence. Looking at the movement of the point over time helps convey information about the predominant population trend in the county including areas loosing population or areas gaining population.For each location, the nearest feature from the US Geologic Survey Geographic Names Information System (GNIS) was appended along with the GNIS feature class and county where the point was located.

  8. POCI CSV dataset of the provenance information of all the citation data

    • figshare.com
    zip
    Updated Dec 27, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    POCI CSV dataset of the provenance information of all the citation data [Dataset]. https://figshare.com/articles/dataset/POCI_CSV_dataset_of_the_provenance_information_of_all_the_citation_data/21776456
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 27, 2022
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    OpenCitations ​
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains the provenance information (in CSV format) of all the citation data included in POCI, released on 27 December 2022. In particular, each line of the CSV file defines a citation, and includes the following information:

    [field "oci"] the Open Citation Identifier (OCI) for the citation; [field "snapshot"] the identifier of the snapshot; [field "agent"] the name of the agent that have created the citation data; [field "source"] the URL of the source dataset from where the citation data have been extracted; [field "created"] the creation time of the citation data. [field "invalidated"] the start of the destruction, cessation, or expiry of an existing entity by an activity; [field "description"] a textual description of the activity made; [field "update"] the UPDATE SPARQL query that keeps track of which metadata have been modified.

    The size of the zipped archive is 5 GB, while the size of the unzipped CSV file is 122 GB.Additional information about POCI at official webpage.

  9. d

    Comma separated value (CSV) text files of navigation and elevation data...

    • catalog.data.gov
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Comma separated value (CSV) text files of navigation and elevation data collected by the U.S. Geological Survey during field activity 2016-030-FA offshore Sandwich Beach, MA in June 2016 [Dataset]. https://catalog.data.gov/dataset/comma-separated-value-csv-text-files-of-navigation-and-elevation-data-collected-by-the-u-s
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    East Sandwich Beach
    Description

    The objectives of the survey were to provide bathymetric and sidescan sonar data for sediment transport studies and coastal change model development for ongoing studies of nearshore coastal dynamics along Sandwich Town Neck Beach, MA. Data collection equipment used for this investigation are mounted on an unmanned surface vehicle (USV) uniquely adapted from a commercially sold gas-powered kayak and termed the "jetyak". The jetyak design is the result of a collaborative effort between USGS and Woods Hole Oceanographic Institution (WHOI) scientists.

  10. d

    socialLoafingExperiment_2023-05-31.csv - Dataset - data.govt.nz - discover...

    • catalogue.data.govt.nz
    Updated Feb 1, 2001
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2001). socialLoafingExperiment_2023-05-31.csv - Dataset - data.govt.nz - discover and use data [Dataset]. https://catalogue.data.govt.nz/dataset/oai-figshare-com-article-25751550
    Explore at:
    Dataset updated
    Feb 1, 2001
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In my dissertation, I explored the interplay between queue configuration and server performance, with the aim to identify the underlying mechanisms driving distinct behaviors.In the first lab experiment, I investigate the group dynamics of servers operating in various queue configurations. I discover that shared queue structures tend to heighten servers' perceptions that their individual efforts cannot be identified, and that their contributions are dispensable, both of which can demotivate servers and lead to a decrease in their working speed.This file is the raw data for my first experiment. This is the csv file from M-turk. The other file is my code in R to analyse the dataset.

  11. UCI and OpenML Data Sets for Ordinal Quantification

    • zenodo.org
    zip
    Updated Jul 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mirko Bunse; Mirko Bunse; Alejandro Moreo; Alejandro Moreo; Fabrizio Sebastiani; Fabrizio Sebastiani; Martin Senz; Martin Senz (2023). UCI and OpenML Data Sets for Ordinal Quantification [Dataset]. http://doi.org/10.5281/zenodo.8177302
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 25, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mirko Bunse; Mirko Bunse; Alejandro Moreo; Alejandro Moreo; Fabrizio Sebastiani; Fabrizio Sebastiani; Martin Senz; Martin Senz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These four labeled data sets are targeted at ordinal quantification. The goal of quantification is not to predict the label of each individual instance, but the distribution of labels in unlabeled sets of data.

    With the scripts provided, you can extract CSV files from the UCI machine learning repository and from OpenML. The ordinal class labels stem from a binning of a continuous regression label.

    We complement this data set with the indices of data items that appear in each sample of our evaluation. Hence, you can precisely replicate our samples by drawing the specified data items. The indices stem from two evaluation protocols that are well suited for ordinal quantification. To this end, each row in the files app_val_indices.csv, app_tst_indices.csv, app-oq_val_indices.csv, and app-oq_tst_indices.csv represents one sample.

    Our first protocol is the artificial prevalence protocol (APP), where all possible distributions of labels are drawn with an equal probability. The second protocol, APP-OQ, is a variant thereof, where only the smoothest 20% of all APP samples are considered. This variant is targeted at ordinal quantification tasks, where classes are ordered and a similarity of neighboring classes can be assumed.

    Usage

    You can extract four CSV files through the provided script extract-oq.jl, which is conveniently wrapped in a Makefile. The Project.toml and Manifest.toml specify the Julia package dependencies, similar to a requirements file in Python.

    Preliminaries: You have to have a working Julia installation. We have used Julia v1.6.5 in our experiments.

    Data Extraction: In your terminal, you can call either

    make

    (recommended), or

    julia --project="." --eval "using Pkg; Pkg.instantiate()"
    julia --project="." extract-oq.jl

    Outcome: The first row in each CSV file is the header. The first column, named "class_label", is the ordinal class.

    Further Reading

    Implementation of our experiments: https://github.com/mirkobunse/regularized-oq

  12. d

    Open Data T3 2021 (format csv)

    • data.gouv.fr
    csv
    Updated Dec 16, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Avicca (2021). Open Data T3 2021 (format csv) [Dataset]. https://www.data.gouv.fr/en/datasets/open-data-t3-2021-format-csv/
    Explore at:
    csv(11971583)Available download formats
    Dataset updated
    Dec 16, 2021
    Dataset authored and provided by
    Avicca
    Description

    Open Data T3 2021 (format csv)

  13. l

    Drug consumption database: original.csv

    • figshare.le.ac.uk
    • figshare.com
    txt
    Updated May 30, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Drug consumption database: original.csv [Dataset]. https://figshare.le.ac.uk/articles/dataset/Drug_consumption_database_original_csv/7588415
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    University of Leicester
    Authors
    Elaine Fehrman; Vincent Egan; Evgeny Mirkes
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Drug consumption database with original values of attributes. DescriptionDB.pdf contains detailed description of database.

  14. csv file for kaggle by muni

    • kaggle.com
    zip
    Updated Jul 23, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MUNEERA (2019). csv file for kaggle by muni [Dataset]. https://www.kaggle.com/muneeramoinudheen/csv-file-for-kaggle-by-muni
    Explore at:
    zip(267 bytes)Available download formats
    Dataset updated
    Jul 23, 2019
    Authors
    MUNEERA
    Description

    Dataset

    This dataset was created by MUNEERA

    Contents

  15. Z

    Brussel mobility Twitter sentiment analysis CSV Dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated May 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Betancur Arenas, Juliana (2024). Brussel mobility Twitter sentiment analysis CSV Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11401123
    Explore at:
    Dataset updated
    May 31, 2024
    Dataset provided by
    van Vessem, Charlotte
    Ginis, Vincent
    Tori, Floriano
    Betancur Arenas, Juliana
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Brussels
    Description

    SSH CENTRE (Social Sciences and Humanities for Climate, Energy aNd Transport Research Excellence) is a Horizon Europe project, engaging directly with stakeholders across research, policy, and business (including citizens) to strengthen social innovation, SSH-STEM collaboration, transdisciplinary policy advice, inclusive engagement, and SSH communities across Europe, accelerating the EU’s transition to carbon neutrality. SSH CENTRE is based in a range of activities related to Open Science, inclusivity and diversity – especially with regards Southern and Eastern Europe and different career stages – including: development of novel SSH-STEM collaborations to facilitate the delivery of the EU Green Deal; SSH knowledge brokerage to support regions in transition; and the effective design of strategies for citizen engagement in EU R&I activities. Outputs include action-led agendas and building stakeholder synergies through regular Policy Insight events.This is captured in a high-profile virtual SSH CENTRE generating and sharing best practice for SSH policy advice, overcoming fragmentation to accelerate the EU’s journey to a sustainable future.The documents uploaded here are part of WP2 whereby novel, interdisciplinary teams were provided funding to undertake activities to develop a policy recommendation related to EU Green Deal policy. Each of these policy recommendations, and the activities that inform them, will be written-up as a chapter in an edited book collection. Three books will make up this edited collection - one on climate, one on energy and one on mobility. As part of writing a chapter for the SSH CENTRE book on ‘Mobility’, we set out to analyse the sentiment of users on Twitter regarding shared and active mobility modes in Brussels. This involved us collecting tweets between 2017-2022. A tweet was collected if it contained a previously defined mobility keyword (for example: metro) and either the name of a (local) politician, a neighbourhood or municipality, or a (shared) mobility provider. The files attached to this Zenodo webpage is a csv files containing the tweets collected.”.

  16. a

    TMS daily traffic counts CSV

    • hub.arcgis.com
    • opendata-nzta.opendata.arcgis.com
    Updated Aug 30, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Waka Kotahi (2020). TMS daily traffic counts CSV [Dataset]. https://hub.arcgis.com/datasets/9cb86b342f2d4f228067a7437a7f7313
    Explore at:
    Dataset updated
    Aug 30, 2020
    Dataset authored and provided by
    Waka Kotahi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    You can also access an API version of this dataset.

    TMS

    (traffic monitoring system) daily-updated traffic counts API

    Important note: due to the size of this dataset, you won't be able to open it fully in Excel. Use notepad / R / any software package which can open more than a million rows.

    Data reuse caveats: as per license.

    Data quality

    statement: please read the accompanying user manual, explaining:

    how

     this data is collected identification 
    
     of count stations traffic 
    
     monitoring technology monitoring 
    
     hierarchy and conventions typical 
    
     survey specification data 
    
     calculation TMS 
    
     operation. 
    

    Traffic

    monitoring for state highways: user manual

    [PDF 465 KB]

    The data is at daily granularity. However, the actual update

    frequency of the data depends on the contract the site falls within. For telemetry

    sites it's once a week on a Wednesday. Some regional sites are fortnightly, and

    some monthly or quarterly. Some are only 4 weeks a year, with timing depending

    on contractors’ programme of work.

    Data quality caveats: you must use this data in

    conjunction with the user manual and the following caveats.

    The

     road sensors used in data collection are subject to both technical errors and 
    
     environmental interference.Data 
    
     is compiled from a variety of sources. Accuracy may vary and the data 
    
     should only be used as a guide.As 
    
     not all road sections are monitored, a direct calculation of Vehicle 
    
     Kilometres Travelled (VKT) for a region is not possible.Data 
    
     is sourced from Waka Kotahi New Zealand Transport Agency TMS data.For 
    
     sites that use dual loops classification is by length. Vehicles with a length of less than 5.5m are 
    
     classed as light vehicles. Vehicles over 11m long are classed as heavy 
    
     vehicles. Vehicles between 5.5 and 11m are split 50:50 into light and 
    
     heavy.In September 2022, the National Telemetry contract was handed to a new contractor. During the handover process, due to some missing documents and aged technology, 40 of the 96 national telemetry traffic count sites went offline. Current contractor has continued to upload data from all active sites and have gradually worked to bring most offline sites back online. Please note and account for possible gaps in data from National Telemetry Sites. 
    

    The NZTA Vehicle

    Classification Relationships diagram below shows the length classification (typically dual loops) and axle classification (typically pneumatic tube counts),

    and how these map to the Monetised benefits and costs manual, table A37,

    page 254.

    Monetised benefits and costs manual [PDF 9 MB]

    For the full TMS

    classification schema see Appendix A of the traffic counting manual vehicle

    classification scheme (NZTA 2011), below.

    Traffic monitoring for state highways: user manual [PDF 465 KB]

    State highway traffic monitoring (map)

    State highway traffic monitoring sites

  17. Z

    Data from: Bio-logger Ethogram Benchmark: A benchmark for computational...

    • data.niaid.nih.gov
    • portalcientifico.unileon.es
    • +3more
    Updated Apr 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zacarian, Katherine (2024). Bio-logger Ethogram Benchmark: A benchmark for computational analysis of animal behavior, using animal-borne tags [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7807280
    Explore at:
    Dataset updated
    Apr 19, 2024
    Dataset provided by
    Vainio, Outi
    Mata-Silva, Vicente
    Vehkaoja, Antti
    Baglione, Vittorio
    Trapote, Eva
    Zacarian, Katherine
    Jeantet, Lorène
    Ladds, Monique A.
    Moreno-González, Víctor
    Maekawa, Takuya
    Hoffman, Benjamin
    DeSantis, Dominic L.
    Chevallier, Damien
    Friedlaender, Ari
    Cusimano, Maddie
    Canestrari, Daniela
    Yoda, Ken
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains the datasets and experiment results presented in our arxiv paper:

    B. Hoffman, M. Cusimano, V. Baglione, D. Canestrari, D. Chevallier, D. DeSantis, L. Jeantet, M. Ladds, T. Maekawa, V. Mata-Silva, V. Moreno-González, A. Pagano, E. Trapote, O. Vainio, A. Vehkaoja, K. Yoda, K. Zacarian, A. Friedlaender, "A benchmark for computational analysis of animal behavior, using animal-borne tags," 2023.

    Standardized code to implement, train, and evaluate models can be found at https://github.com/earthspecies/BEBE/.

    Please note the licenses in each dataset folder.

    Zip folders beginning with "formatted": These are the datasets we used to run the experiments reported in the benchmark paper.

    Zip folders beginning with "raw": These are the unprocessed datasets used in BEBE. Code to process these raw datasets into the formatted ones used by BEBE can be found at https://github.com/earthspecies/BEBE-datasets/.

    Zip folders beginning with "experiments": Results of the cross-validation experiments reported in the paper, as well as hyperparameter optimization. Confusion matrices for all experiments can also be found here. Note that dt, rf, and svm refer to the feature set from Nathan et al., 2012.

    Results used in Fig. 4 of arxiv paper (deep neural networks vs. classical models){dataset}_ harnet_nogyr{dataset}_CRNN{dataset}_CNN{dataset}_dt{dataset}_rf{dataset}_svm{dataset}_wavelet_dt{dataset}_wavelet_rf{dataset}_wavelet_svm

    Results used in Fig. 5D of arxiv paper (full data setting)If dataset contains gyroscope (HAR, jeantet_turtles, vehkaoja_dogs):{dataset}_harnet_nogyr{dataset}_harnet_random_nogyr{dataset}_harnet_unfrozen_nogyr{dataset}_RNN_nogyr{dataset}_CRNN_nogyr{dataset}_rf_nogyrOtherwise:{dataset}_harnet_nogyr{dataset}_harnet_unfrozen_nogyr{dataset}_harnet_random_nogyr{dataset}_RNN_nogyr{dataset}_CRNN{dataset}_rf

    Results used in Fig. 5E of arxiv paper (reduced data setting)If dataset contains gyroscope (HAR, jeantet_turtles, vehkaoja_dogs):{dataset}_harnet_low_data_nogyr{dataset}_harnet_random_low_data_nogyr{dataset}_harnet_unfrozen_low_data_nogyr{dataset}_RNN_low_data_nogyr{dataset}_wavelet_RNN_low_data_nogyr{dataset}_CRNN_low_data_nogyr{dataset}_rf_low_data_nogyr

    Otherwise:{dataset}_harnet_low_data_nogyr{dataset}_harnet_random_low_data_nogyr{dataset}_harnet_unfrozen_low_data_nogyr{dataset}_RNN_low_data_nogyr{dataset}_wavelet_RNN_low_data_nogyr{dataset}_CRNN_low_data{dataset}_rf_low_data

    CSV files: we also include summaries of the experimental results in experiments_summary.csv, experiments_by_fold_individual.csv, experiments_by_fold_behavior.csv.

    experiments_summary.csv - results averaged over individuals and behavior classesdataset (str): name of datasetexperiment (str): name of model with experiment setting fig4 (bool): True if dataset+experiment was used in figure 4 of arxiv paperfig5d (bool): True if dataset+experiment was used in figure 5d of arxiv paperfig5e (bool): True if dataset+experiment was used in figure 5e of arxiv paperf1_mean (float): mean of macro-averaged F1 score, averaged over individuals in test foldsf1_std (float): standard deviation of macro-averaged F1 score, computed over individuals in test foldsprec_mean, prec_std (float): analogous for precisionrec_mean, rec_std (float): analogous for recallexperiments_by_fold_individual.csv - results per individual in the test foldsdataset (str): name of datasetexperiment (str): name of model with experiment setting fig4 (bool): True if dataset+experiment was used in figure 4 of arxiv paperfig5d (bool): True if dataset+experiment was used in figure 5d of arxiv paperfig5e (bool): True if dataset+experiment was used in figure 5e of arxiv paperfold (int): test fold indexindividual (int): individuals are numbered zero-indexed, starting from fold 1f1 (float): macro-averaged f1 score for this individualprecision (float): macro-averaged precision for this individualrecall (float): macro-averaged recall for this individual

    experiments_by_fold_behavior.csv - results per behavior class, for each test folddataset (str): name of datasetexperiment (str): name of model with experiment setting fig4 (bool): True if dataset+experiment was used in figure 4 of arxiv paperfig5d (bool): True if dataset+experiment was used in figure 5d of arxiv paperfig5e (bool): True if dataset+experiment was used in figure 5e of arxiv paperfold (int): test fold indexbehavior_class (str): name of behavior classf1 (float): f1 score for this behavior, averaged over individuals in the test foldprecision (float): precision for this behavior, averaged over individuals in the test foldrecall (float): recall for this behavior, averaged over individuals in the test foldtrain_ground_truth_label_counts (int): number of timepoints labeled with this behavior class, in the training set

  18. d

    Pre-compiled metrics data sets, links to yearly statistics files in CSV...

    • b2find.dkrz.de
    Updated Dec 2, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). Pre-compiled metrics data sets, links to yearly statistics files in CSV format - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/a5b35769-bca3-51fc-846c-94256507be1e
    Explore at:
    Dataset updated
    Dec 2, 2018
    Description

    Errata: On Dec 2nd, 2018, several yearly statistics files were replaced with new versions to correct an inconsistency related to the computation of the "dma8epax" statistics. As written in Schultz et al. (2017) [https://doi.org/10.1525/elementa.244], Supplement 1, Table 6: "When the aggregation period is “seasonal”, “summer”, or “annual”, the 4th highest daily 8-hour maximum of the aggregation period will be computed.". The data values for these aggregation periods are correct, however, the header information in the original files stated that the respective data column would contain "average daily maximum 8-hour ozone mixing ratio (nmol mol-1)". Therefore, the header of the seasonal, summer, and annual files has been corrected. Furthermore, the "dma8epax" column in the monthly files erroneously contained 4th highest daily maximum 8-hour average values, while it should have listed monthly average values instead. The data of this metric in the monthly files have therefore been replaced. The new column header reads "avgdma8epax". The updated files contain a version label "1.1" and a brief description of the error. If you have made use of previous TOAR data files with the "dma8epax" metric, please exchange your data files.

  19. h

    truthy-dpo-csv

    • huggingface.co
    Updated Jan 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CultriX (2024). truthy-dpo-csv [Dataset]. https://huggingface.co/datasets/CultriX/truthy-dpo-csv
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 23, 2024
    Authors
    CultriX
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    CultriX/truthy-dpo-csv dataset hosted on Hugging Face and contributed by the HF Datasets community

  20. u

    Inclusive design and dissemination in digital scholarly editing : CSV...

    • repository.uantwerpen.be
    • works.hcommons.org
    Updated 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bleeker, Elli; Dillen, Wout; Kelly, Aodhán; Martinez, Merisa; Sichani, Anna-Maria (2019). Inclusive design and dissemination in digital scholarly editing : CSV dataset [Dataset]. http://doi.org/10.17613/C3M9-KQ76
    Explore at:
    Dataset updated
    2019
    Dataset provided by
    University of Antwerp
    Faculty of Arts. Literature
    Authors
    Bleeker, Elli; Dillen, Wout; Kelly, Aodhán; Martinez, Merisa; Sichani, Anna-Maria
    Description

    In 2017, the authors designed a survey titled Inclusive Design and Dissemination in Digital Scholarly Editions. The survey was designed and hosted using SurveyMonkey (https://www.surveymonkey.com) and was open from 1 July to 31 November 2017. The survey received 219 responses, 109 of which completed every required question in the survey – resulting in a completion rate of 49,7%. At the 2017 ADHO conference in Montreal (Canada), the authors participated in a panel discussion on the subject, where they discussed some preliminary survey results (Sichani et al. 2017). A more detailed treatment of the complete survey results will be published Variants 14 (https://journals.openedition.org/variants/), the journal of the ESTS (Martinez et al. forthcoming). In view of this publication, the authors have deposited the survey results as data sets here. These include a CSV file of the survey’s data (scrubbed of respondents’ personal information), and the current PDF with graphical representations of the survey’s statistics. Both files present the survey’s raw, uncorrected (albeit redacted) data, as recorded and automatically analyzed by SurveyMonkey, including response rates per question and diagrams. As the uncorrected survey results, some of the data offered in these files may differ slightly from those presented in the forthcoming Variants article. For their qualitative analysis of the survey’s data in that publication, the authors corrected the data (e.g. excluding invalid answers, or reclassifying incorrectly classified answers), and interpreted them (e.g. creating categories for similar responses). Such interventions were justified in the relevant sections of the Variants article. Rather than depositing the corrected version of the survey’s results in the Humanities Commons repository, the authors decided to publish the uncorrected results instead, so as not to force their interpretation of the survey’s data on future research.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Madelon Hulsebos; Çağatay Demiralp; Paul Groth; Madelon Hulsebos; Çağatay Demiralp; Paul Groth (2022). GitTables 1M - CSV files [Dataset]. http://doi.org/10.5281/zenodo.6515973
Organization logo

GitTables 1M - CSV files

Explore at:
zipAvailable download formats
Dataset updated
Jun 6, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Madelon Hulsebos; Çağatay Demiralp; Paul Groth; Madelon Hulsebos; Çağatay Demiralp; Paul Groth
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

This dataset contains >800K CSV files behind the GitTables 1M corpus.

For more information about the GitTables corpus, visit:

- our website for GitTables, or

- the main GitTables download page on Zenodo.

Search
Clear search
Close search
Google apps
Main menu