100+ datasets found

train csv file
kaggle.com
zip
Updated May 5, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Emmanuel Arias (2018). train csv file [Dataset]. https://www.kaggle.com/datasets/eamanu/train
Explore at:
zip(33695 bytes)Available download formats
Dataset updated
May 5, 2018
Authors
Emmanuel Arias
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Dataset

This dataset was created by Emmanuel Arias

Released under Database: Open Database, Contents: Database Contents

Contents
f
Example of a csv file exported from the database.
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Oct 24, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Caselle, Jennifer E.; Iles, Alison; Tinker, Martin T.; Black, August; Novak, Mark; Carr, Mark H.; Malone, Dan; Beas-Luna, Rodrigo; Hoban, Michael (2014). Example of a csv file exported from the database. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001227183
Explore at:
Dataset updated
Oct 24, 2014
Authors
Caselle, Jennifer E.; Iles, Alison; Tinker, Martin T.; Black, August; Novak, Mark; Carr, Mark H.; Malone, Dan; Beas-Luna, Rodrigo; Hoban, Michael
Description
Example of a csv file exported from the database.
Database with raw data (CSV file).
figshare.com
txt
Updated Jun 3, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bartosz Symonides (2018). Database with raw data (CSV file). [Dataset]. http://doi.org/10.6084/m9.figshare.6411002.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.6411002.v1
Dataset updated
Jun 3, 2018
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Bartosz Symonides
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Survival after open versus endovascular repair of abdominal aortic aneurysm. Polish population analysis. (in press)
Synthetic Person Records: 10K to 10M Records
kaggle.com
zip
Updated Oct 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Swain (2025). Synthetic Person Records: 10K to 10M Records [Dataset]. https://www.kaggle.com/datasets/swainproject/synthetic-data-person
Explore at:
zip(913881690 bytes)Available download formats
Dataset updated
Oct 26, 2025
Authors
Swain
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains 7 pre-generated CSV files with realistic synthetic person records, ranging from 10,000 to 10,000,000 records. Perfect for development, testing, prototyping, and data analysis workflows without privacy concerns.

What's Included

Each CSV file contains complete demographic information: - person_id: Unique identifier - firstname, lastname: Realistic names (international) - gender, age: Demographics - street, streetnumber, address_unit, postalcode, city: Complete addresses - phone: Realistic phone numbers - email: Valid email addresses

File Sizes

10K records

100K records

500K records

1M records

2M records

5M records

10M records

Why This Dataset?

✓ No privacy concerns—completely synthetic data ✓ Perfect for database testing and imports ✓ Ideal for ML model training and prototyping ✓ Ready-to-use CSV format ✓ Multiple sizes for different use cases

Use Cases

Database development and testing

Data pipeline validation

ETL workflow testing

Machine learning prototyping

API testing with realistic data

Load testing and performance benchmarks

License: CC BY 4.0 (Please attribute to Swain / Swainlabs when sharing)
m
Download CSV DB
maclookup.app
json
Updated Nov 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Download CSV DB [Dataset]. https://maclookup.app/downloads/csv-database
Explore at:
jsonAvailable download formats
Dataset updated
Nov 20, 2025
Description
Free, daily updated MAC prefix and vendor CSV database. Download now for accurate device identification.
Full oral and gene database (csv format)
figshare.com
datasetcatalog.nlm.nih.gov
application/gzip
Updated May 22, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Braden Tierney (2019). Full oral and gene database (csv format) [Dataset]. http://doi.org/10.6084/m9.figshare.8001362.v1
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.8001362.v1
Dataset updated
May 22, 2019
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Braden Tierney
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is our complete database in csv format (with gene names, ID's, annotations, lengths, cluster sizes, and taxonomic classifications) that can be queried on our website. The difference is that it does not have the sequences – those can be downloaded in other files on figshare. This file, as well as those, can be parsed and linked by the gene identifier.We recommend downloading this database and parsing it yourself if you attempt to run a query that is too large for our servers to handle.
Board Games Dataset in CSV
kaggle.com
zip
Updated Jan 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sean B Coates (2024). Board Games Dataset in CSV [Dataset]. https://www.kaggle.com/datasets/seanbcoates/board-games-dataset-in-csv
Explore at:
zip(43036908 bytes)Available download formats
Dataset updated
Jan 11, 2024
Authors
Sean B Coates
Description
I don't know SQLite, I use PostgreSQL. I needed to work with this dataset in PGAdmin, so I converted Gabriele Baldassarre's Board Games Dataset (https://www.kaggle.com/datasets/gabrio/board-games-dataset/data) .sqlite files into .csv UTF-8 format to create my own database in PGAdmin. I uploaded them here to make it easier for anyone else that wants to do the same.

The board_games.csv file likely contains all the information you are looking for.
f
A CSV file of our study database, which we used for the analyses in this...
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Jun 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mik, Egbert G.; Juffermans, Nicole P.; van der Bom, Johanna G.; Wille, Maarten E.; Tsonaka, Roula; Baysan, Meryem; Bergsma, Jule E.; Arbous, Sesmu M.; Broere, Mark (2024). A CSV file of our study database, which we used for the analyses in this manuscript. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001357370
Explore at:
Dataset updated
Jun 3, 2024
Authors
Mik, Egbert G.; Juffermans, Nicole P.; van der Bom, Johanna G.; Wille, Maarten E.; Tsonaka, Roula; Baysan, Meryem; Bergsma, Jule E.; Arbous, Sesmu M.; Broere, Mark
Description
A CSV file of our study database, which we used for the analyses in this manuscript.
c
Dog Food Data Extracted from Chewy (USA) - 4,500 Records in CSV Format
crawlfeeds.com
csv, zip
Updated Apr 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Dog Food Data Extracted from Chewy (USA) - 4,500 Records in CSV Format [Dataset]. https://crawlfeeds.com/datasets/dog-food-data-extracted-from-chewy-usa-4-500-records-in-csv-format
Explore at:
zip, csvAvailable download formats
Dataset updated
Apr 22, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
The Dog Food Data Extracted from Chewy (USA) dataset contains 4,500 detailed records of dog food products sourced from one of the leading pet supply platforms in the United States, Chewy. This dataset is ideal for businesses, researchers, and data analysts who want to explore and analyze the dog food market, including product offerings, pricing strategies, brand diversity, and customer preferences within the USA.

The dataset includes essential information such as product names, brands, prices, ingredient details, product descriptions, weight options, and availability. Organized in a CSV format for easy integration into analytics tools, this dataset provides valuable insights for those looking to study the pet food market, develop marketing strategies, or train machine learning models.

Key Features:

Record Count: 4,500 dog food product records.

Data Fields: Product names, brands, prices, descriptions, ingredients .. etc. Find more fields under data points section.

Format: CSV, easy to import into databases and data analysis tools.

Source: Extracted from Chewy’s official USA platform.

Geography: Focused on the USA dog food market.

Use Cases:

Market Research: Analyze trends and preferences in the USA dog food market, including popular brands, price ranges, and product availability.

E-commerce Analysis: Understand how Chewy presents and prices dog food products, helping businesses compare their own product offerings.

Competitor Analysis: Compare different brands and products to develop competitive strategies for dog food businesses.

Machine Learning Models: Use the dataset for machine learning tasks such as product recommendation systems, demand forecasting, and price optimization.
f
File S1 - Mynodbcsv: Lightweight Zero-Config Database Solution for Handling...
figshare.com
zip
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanisław Adaszewski (2023). File S1 - Mynodbcsv: Lightweight Zero-Config Database Solution for Handling Very Large CSV Files [Dataset]. http://doi.org/10.1371/journal.pone.0103319.s001
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0103319.s001
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Stanisław Adaszewski
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Set of Python scripts to generate data for benchmarks: equivalents of ADNI_Clin_6800_geno.csv, PTDEMOG.csv, MicroarrayExpression_fixed.csv and Probes.csv files, the dummy.csv, dummy2.csv and the microbenchmark CSV files. (ZIP)
Adventure Works 2022 CSVs
kaggle.com
zip
Updated Nov 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Algorismus (2022). Adventure Works 2022 CSVs [Dataset]. https://www.kaggle.com/datasets/algorismus/adventure-works-in-excel-tables
Explore at:
zip(567646 bytes)Available download formats
Dataset updated
Nov 2, 2022
Authors
Algorismus
License
http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
Description
Adventure Works 2022 dataset

How this Dataset is created?

On the official website the dataset is available over SQL server (localhost) and CSVs to be used via Power BI Desktop running on Virtual Lab (Virtaul Machine). As per first two steps of Importing data are executed in the virtual lab and then resultant Power BI tables are copied in CSVs. Added records till year 2022 as required.

How this Dataset may help you?

this dataset will be helpful in case you want to work offline with Adventure Works data in Power BI desktop in order to carry lab instructions as per training material on official website. The dataset is useful in case you want to work on Power BI desktop Sales Analysis example from Microsoft website PL 300 learning.

How to use this Dataset?

Download the CSV file(s) and import in Power BI desktop as tables. The CSVs are named as tables created after first two steps of importing data as mentioned in the PL-300 Microsoft Power BI Data Analyst exam lab.
arXiv publications dataset with simulated citation relationships
figshare.com
txt
Updated Jun 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacek Miecznikowski; Dominik Tomaszuk (2023). arXiv publications dataset with simulated citation relationships [Dataset]. http://doi.org/10.6084/m9.figshare.6449756.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.6449756.v1
Dataset updated
Jun 5, 2023
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Jacek Miecznikowski; Dominik Tomaszuk
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
arXiv publications dataset with simulated citation relationshipshttps://github.com/jacekmiecznikowski/neo4index App evaluates scientific reasearch impact using author-level metrics (h-index and more)This collection contains data aquired from arXiv.org via OAI2 protocol.arXiv does not provide citations metadata so this data was pseudo-randomly simulated.We evaluated scientific reasearch impact using six popular author-level metrics:* h-index,* m quotient,* e-index,* m-index,* r-index,* ar-index.Sourcehttps://arxiv.org/help/bulk_data (downloaded: 2018-03-23; over 1.3 million publications)Files* arxiv_bulk_metadata_2018-03-23.tar.gz - file downloaded using oai-harvester contains metadata of all arXiv publications to date.* categories.csv - file contains categories from arXiv with category-subcategory division* publications.csv - file contains information about articles like: id, title, abstract, url, categories and date* authors.csv - file contains authors data like first name, last name and id of published article* citations.csv - file contains simulated relationships between all publications using arxivCite* indices.csv - file contains 6 author-level metrics calculated on database using neo4indexStatisticsh-index Average = 3.5836524733724495m quotient Average = 0.5831426366846965e-index Average = 7.9260187734579075m-index Average = 29.436844659143155r-index Average = 8.931101630575293ar-index Average = 3.5439082808721025h-index Median = 1.0m quotient Median = 0.4167e-index Median = 5.3852m-index Median = 17.0r-index Median = 5.831ar-index Median = 2.7928h-index Mode = 1.0m quotient Mode = 1.0e-index Mode = 0.0m-index Mode = 0.0r-index Mode = 0.0ar-index Mode = 0.0
Z
HWRT database of handwritten symbols
data.niaid.nih.gov
zenodo.org
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thoma, Martin (2020). HWRT database of handwritten symbols [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_50022
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
Karlsruhe Institute of Technology
Authors
Thoma, Martin
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
The HWRT database of handwritten symbols contains on-line data of handwritten symbols such as all alphanumeric characters, arrows, greek characters and mathematical symbols like the integral symbol.

The database can be downloaded in form of bzip2-compressed tar files. Each tar file contains:

symbols.csv: A CSV file with the rows symbol_id, latex, training_samples, test_samples. The symbol id is an integer, the row latex contains the latex code of the symbol, the rows training_samples and test_samples contain integers with the number of labeled data.

train-data.csv: A CSV file with the rows symbol_id, user_id, user_agent and data.

test-data.csv: A CSV file with the rows symbol_id, user_id, user_agent and data.

All CSV files use ";" as delimiter and "'" as quotechar. The data is given in YAML format as a list of lists of dictinaries. Each dictionary has the keys "x", "y" and "time". (x,y) are coordinates and time is the UNIX time.

About 90% of the data was made available by Daniel Kirsch via github.com/kirel/detexify-data. Thank you very much, Daniel!
Database and Model weights for Clinical Decision Support System for...
zenodo.org
bin, csv
Updated Mar 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Arias-Garzón; Daniel Arias-Garzón (2025). Database and Model weights for Clinical Decision Support System for Discharge Diagnosis Recommendations Project v1.0.0 [Dataset]. http://doi.org/10.5281/zenodo.14969314
Explore at:
bin, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14969314
Dataset updated
Mar 5, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Daniel Arias-Garzón; Daniel Arias-Garzón
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains:

An SQL script in MySQL for creating the database of the project on GitHub, Clinical Decision Support System for Discharge Diagnosis Recommendations.

Data from each table in CSV files, with "ind" at the beginning.

Weights for the Negations and Uncertainties Model (Bert.h5).
MIT-BIH annotation CSV file
kaggle.com
zip
Updated Nov 18, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sadia Khan (2020). MIT-BIH annotation CSV file [Dataset]. https://www.kaggle.com/sadiakhanesha/mitbih-annotation-csv-file
Explore at:
zip(857583 bytes)Available download formats
Dataset updated
Nov 18, 2020
Authors
Sadia Khan
Description
Dataset

This dataset was created by Sadia Khan

Contents
Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...
zenodo.org
data.europa.eu
zip
Updated Oct 20, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari (2022). LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive snapshots of our lives in the wild [Dataset]. http://doi.org/10.5281/zenodo.6832242
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6832242
Dataset updated
Oct 20, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
LifeSnaps Dataset Documentation

Ubiquitous self-tracking technologies have penetrated various aspects of our lives, from physical and mental health monitoring to fitness and entertainment. Yet, limited data exist on the association between in the wild large-scale physical activity patterns, sleep, stress, and overall health, and behavioral patterns and psychological measurements due to challenges in collecting and releasing such datasets, such as waning user engagement, privacy considerations, and diversity in data modalities. In this paper, we present the LifeSnaps dataset, a multi-modal, longitudinal, and geographically-distributed dataset, containing a plethora of anthropological data, collected unobtrusively for the total course of more than 4 months by n=71 participants, under the European H2020 RAIS project. LifeSnaps contains more than 35 different data types from second to daily granularity, totaling more than 71M rows of data. The participants contributed their data through numerous validated surveys, real-time ecological momentary assessments, and a Fitbit Sense smartwatch, and consented to make these data available openly to empower future research. We envision that releasing this large-scale dataset of multi-modal real-world data, will open novel research opportunities and potential applications in the fields of medical digital innovations, data privacy and valorization, mental and physical well-being, psychology and behavioral sciences, machine learning, and human-computer interaction.

The following instructions will get you started with the LifeSnaps dataset and are complementary to the original publication.

Data Import: Reading CSV

For ease of use, we provide CSV files containing Fitbit, SEMA, and survey data at daily and/or hourly granularity. You can read the files via any programming language. For example, in Python, you can read the files into a Pandas DataFrame with the pandas.read_csv() command.

Data Import: Setting up a MongoDB (Recommended)

To take full advantage of the LifeSnaps dataset, we recommend that you use the raw, complete data via importing the LifeSnaps MongoDB database.

To do so, open the terminal/command prompt and run the following command for each collection in the DB. Ensure you have MongoDB Database Tools installed from here.

For the Fitbit data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c fitbit

For the SEMA data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c sema

For surveys data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c surveys

If you have access control enabled, then you will need to add the --username and --password parameters to the above commands.

Data Availability

The MongoDB database contains three collections, fitbit, sema, and surveys, containing the Fitbit, SEMA3, and survey data, respectively. Similarly, the CSV files contain related information to these collections. Each document in any collection follows the format shown below:

{ _id:
Plant Species.csv
figshare.com
txt
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Malka Halgamuge (2023). Plant Species.csv [Dataset]. http://doi.org/10.6084/m9.figshare.4793326.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.4793326.v1
Dataset updated
May 31, 2023
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Malka Halgamuge
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A dataset of plant species in Glenelg Shire, Australia.
w
Our World In Data - Dataset - waterdata
wbwaterdata.org
Updated Jul 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). Our World In Data - Dataset - waterdata [Dataset]. https://wbwaterdata.org/dataset/our-world-in-data
Explore at:
Dataset updated
Jul 12, 2020
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This database collates 3552 development indicators from different studies with data by country and year, including single year and multiple year time series. The data is presented as charts, the data can be downloaded from linked project pages/references for each set, and the data for each presented graph is available as a CSV file as well as a visual download of the graph (both available via the download link under each chart).
Z
The dataset of the Global Collections survey of natural history collections
data.niaid.nih.gov
Updated Jul 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Woodburn, Matt; Corrigan, Robert J.; Drew, Nicholas; Meyer, Cailin; Smith, Vincent S.; Vincent, Sarah (2024). The dataset of the Global Collections survey of natural history collections [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6985398
Explore at:
Dataset updated
Jul 16, 2024
Dataset provided by
Natural History Museum, London
Smithsonian National Museum of Natural History
Authors
Woodburn, Matt; Corrigan, Robert J.; Drew, Nicholas; Meyer, Cailin; Smith, Vincent S.; Vincent, Sarah
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
From 2016 to 2018, we surveyed the world’s largest natural history museum collections to begin mapping this globally distributed scientific infrastructure. The resulting dataset includes 73 institutions across the globe. It has:

Basic institution data for the 73 contributing institutions, including estimated total collection sizes, geographic locations (to the city) and latitude/longitude, and Research Organization Registry (ROR) identifiers where available.

Resourcing information, covering the numbers of research, collections and volunteer staff in each institution.

Indicators of the presence and size of collections within each institution broken down into a grid of 19 collection disciplines and 16 geographic regions.

Measures of the depth and breadth of individual researcher experience across the same disciplines and geographic regions.

This dataset contains the data (raw and processed) collected for the survey, and specifications for the schema used to store the data. It includes:

A diagram of the MySQL database schema.

A SQL dump of the MySQL database schema, excluding the data.

A SQL dump of the MySQL database schema with all data. This may be imported into an instance of MySQL Server to create a complete reconstruction of the database.

Raw data from each database table in CSV format.

A set of more human-readable views of the data in CSV format. These correspond to the database tables, but foreign keys are substituted for values from the linked tables to make the data easier to read and analyse.

A text file containing the definitions of the size categories used in the collection_unit table.

The global collections data may also be accessed at https://rebrand.ly/global-collections. This is a preliminary dashboard, constructed and published using Microsoft Power BI, that enables the exploration of the data through a set of visualisations and filters. The dashboard consists of three pages:

Institutional profile: Enables the selection of a specific institution and provides summary information on the institution and its location, staffing, total collection size, collection breakdown and researcher expertise.

Overall heatmap: Supports an interactive exploration of the global picture, including a heatmap of collection distribution across the discipline and geographic categories, and visualisations that demonstrate the relative breadth of collections across institutions and correlations between collection size and breadth. Various filters allow the focus to be refined to specific regions and collection sizes.

Browse: Provides some alternative methods of filtering and visualising the global dataset to look at patterns in the distribution and size of different types of collections across the global view.
All trait data - Datasets - OpenData.eol.org
opendata.eol.org
Updated Jun 5, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
eol.org (2019). All trait data - Datasets - OpenData.eol.org [Dataset]. https://opendata.eol.org/dataset/all-trait-data-large
Explore at:
Dataset updated
Jun 5, 2019
Dataset provided by
Encyclopedia of Lifehttp://eol.org/
Description
This zip archive records all of the trait records in EOL's graph database. It contains five .csv files: pages.csv listing taxa and their names, traits.csv with trait records, metadata.csv with auxiliary records referred to by trait records, inferred.csv (see below) and terms.csv listing all of the relationship URIs in the database. For a description of the schema, see https://github.com/EOL/eol_website/blob/master/doc/trait-schema.md inferred.csv lists additional taxa to which a trait record applies by taxonomic inference, in addition to the ancestral taxon to which it is attached. For instance, the record describing locomotion=flight for Aves is also inferred to apply to most of the descendants of Aves, except for any flightless subclades that are excluded from the inference pattern. All the trait record referred to in the 2nd column of the inferred file have full records available in the traits file. THIS RESOURCE IS UPDATED MONTHLY. It is not archived regularly. Please save your download if you want to be able to refer to it at a later date