Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset was created by Emmanuel Arias
Released under Database: Open Database, Contents: Database Contents
Facebook
TwitterExample of a csv file exported from the database.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Survival after open versus endovascular repair of abdominal aortic aneurysm. Polish population analysis. (in press)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains 7 pre-generated CSV files with realistic synthetic person records, ranging from 10,000 to 10,000,000 records. Perfect for development, testing, prototyping, and data analysis workflows without privacy concerns.
Each CSV file contains complete demographic information: - person_id: Unique identifier - firstname, lastname: Realistic names (international) - gender, age: Demographics - street, streetnumber, address_unit, postalcode, city: Complete addresses - phone: Realistic phone numbers - email: Valid email addresses
✓ No privacy concerns—completely synthetic data ✓ Perfect for database testing and imports ✓ Ideal for ML model training and prototyping ✓ Ready-to-use CSV format ✓ Multiple sizes for different use cases
License: CC BY 4.0 (Please attribute to Swain / Swainlabs when sharing)
Facebook
TwitterFree, daily updated MAC prefix and vendor CSV database. Download now for accurate device identification.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is our complete database in csv format (with gene names, ID's, annotations, lengths, cluster sizes, and taxonomic classifications) that can be queried on our website. The difference is that it does not have the sequences – those can be downloaded in other files on figshare. This file, as well as those, can be parsed and linked by the gene identifier.We recommend downloading this database and parsing it yourself if you attempt to run a query that is too large for our servers to handle.
Facebook
TwitterI don't know SQLite, I use PostgreSQL. I needed to work with this dataset in PGAdmin, so I converted Gabriele Baldassarre's Board Games Dataset (https://www.kaggle.com/datasets/gabrio/board-games-dataset/data) .sqlite files into .csv UTF-8 format to create my own database in PGAdmin. I uploaded them here to make it easier for anyone else that wants to do the same.
The board_games.csv file likely contains all the information you are looking for.
Facebook
TwitterA CSV file of our study database, which we used for the analyses in this manuscript.
Facebook
Twitterhttps://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
The Dog Food Data Extracted from Chewy (USA) dataset contains 4,500 detailed records of dog food products sourced from one of the leading pet supply platforms in the United States, Chewy. This dataset is ideal for businesses, researchers, and data analysts who want to explore and analyze the dog food market, including product offerings, pricing strategies, brand diversity, and customer preferences within the USA.
The dataset includes essential information such as product names, brands, prices, ingredient details, product descriptions, weight options, and availability. Organized in a CSV format for easy integration into analytics tools, this dataset provides valuable insights for those looking to study the pet food market, develop marketing strategies, or train machine learning models.
Key Features:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Set of Python scripts to generate data for benchmarks: equivalents of ADNI_Clin_6800_geno.csv, PTDEMOG.csv, MicroarrayExpression_fixed.csv and Probes.csv files, the dummy.csv, dummy2.csv and the microbenchmark CSV files. (ZIP)
Facebook
Twitterhttp://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
On the official website the dataset is available over SQL server (localhost) and CSVs to be used via Power BI Desktop running on Virtual Lab (Virtaul Machine). As per first two steps of Importing data are executed in the virtual lab and then resultant Power BI tables are copied in CSVs. Added records till year 2022 as required.
this dataset will be helpful in case you want to work offline with Adventure Works data in Power BI desktop in order to carry lab instructions as per training material on official website. The dataset is useful in case you want to work on Power BI desktop Sales Analysis example from Microsoft website PL 300 learning.
Download the CSV file(s) and import in Power BI desktop as tables. The CSVs are named as tables created after first two steps of importing data as mentioned in the PL-300 Microsoft Power BI Data Analyst exam lab.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
arXiv publications dataset with simulated citation relationshipshttps://github.com/jacekmiecznikowski/neo4index App evaluates scientific reasearch impact using author-level metrics (h-index and more)This collection contains data aquired from arXiv.org via OAI2 protocol.arXiv does not provide citations metadata so this data was pseudo-randomly simulated.We evaluated scientific reasearch impact using six popular author-level metrics:* h-index,* m quotient,* e-index,* m-index,* r-index,* ar-index.Sourcehttps://arxiv.org/help/bulk_data (downloaded: 2018-03-23; over 1.3 million publications)Files* arxiv_bulk_metadata_2018-03-23.tar.gz - file downloaded using oai-harvester contains metadata of all arXiv publications to date.* categories.csv - file contains categories from arXiv with category-subcategory division* publications.csv - file contains information about articles like: id, title, abstract, url, categories and date* authors.csv - file contains authors data like first name, last name and id of published article* citations.csv - file contains simulated relationships between all publications using arxivCite* indices.csv - file contains 6 author-level metrics calculated on database using neo4indexStatisticsh-index Average = 3.5836524733724495m quotient Average = 0.5831426366846965e-index Average = 7.9260187734579075m-index Average = 29.436844659143155r-index Average = 8.931101630575293ar-index Average = 3.5439082808721025h-index Median = 1.0m quotient Median = 0.4167e-index Median = 5.3852m-index Median = 17.0r-index Median = 5.831ar-index Median = 2.7928h-index Mode = 1.0m quotient Mode = 1.0e-index Mode = 0.0m-index Mode = 0.0r-index Mode = 0.0ar-index Mode = 0.0
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
The HWRT database of handwritten symbols contains on-line data of handwritten symbols such as all alphanumeric characters, arrows, greek characters and mathematical symbols like the integral symbol.
The database can be downloaded in form of bzip2-compressed tar files. Each tar file contains:
symbols.csv: A CSV file with the rows symbol_id, latex, training_samples, test_samples. The symbol id is an integer, the row latex contains the latex code of the symbol, the rows training_samples and test_samples contain integers with the number of labeled data.
train-data.csv: A CSV file with the rows symbol_id, user_id, user_agent and data.
test-data.csv: A CSV file with the rows symbol_id, user_id, user_agent and data.
All CSV files use ";" as delimiter and "'" as quotechar. The data is given in YAML format as a list of lists of dictinaries. Each dictionary has the keys "x", "y" and "time". (x,y) are coordinates and time is the UNIX time.
About 90% of the data was made available by Daniel Kirsch via github.com/kirel/detexify-data. Thank you very much, Daniel!
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains:
Facebook
TwitterThis dataset was created by Sadia Khan
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
LifeSnaps Dataset Documentation
Ubiquitous self-tracking technologies have penetrated various aspects of our lives, from physical and mental health monitoring to fitness and entertainment. Yet, limited data exist on the association between in the wild large-scale physical activity patterns, sleep, stress, and overall health, and behavioral patterns and psychological measurements due to challenges in collecting and releasing such datasets, such as waning user engagement, privacy considerations, and diversity in data modalities. In this paper, we present the LifeSnaps dataset, a multi-modal, longitudinal, and geographically-distributed dataset, containing a plethora of anthropological data, collected unobtrusively for the total course of more than 4 months by n=71 participants, under the European H2020 RAIS project. LifeSnaps contains more than 35 different data types from second to daily granularity, totaling more than 71M rows of data. The participants contributed their data through numerous validated surveys, real-time ecological momentary assessments, and a Fitbit Sense smartwatch, and consented to make these data available openly to empower future research. We envision that releasing this large-scale dataset of multi-modal real-world data, will open novel research opportunities and potential applications in the fields of medical digital innovations, data privacy and valorization, mental and physical well-being, psychology and behavioral sciences, machine learning, and human-computer interaction.
The following instructions will get you started with the LifeSnaps dataset and are complementary to the original publication.
Data Import: Reading CSV
For ease of use, we provide CSV files containing Fitbit, SEMA, and survey data at daily and/or hourly granularity. You can read the files via any programming language. For example, in Python, you can read the files into a Pandas DataFrame with the pandas.read_csv() command.
Data Import: Setting up a MongoDB (Recommended)
To take full advantage of the LifeSnaps dataset, we recommend that you use the raw, complete data via importing the LifeSnaps MongoDB database.
To do so, open the terminal/command prompt and run the following command for each collection in the DB. Ensure you have MongoDB Database Tools installed from here.
For the Fitbit data, run the following:
mongorestore --host localhost:27017 -d rais_anonymized -c fitbit
For the SEMA data, run the following:
mongorestore --host localhost:27017 -d rais_anonymized -c sema
For surveys data, run the following:
mongorestore --host localhost:27017 -d rais_anonymized -c surveys
If you have access control enabled, then you will need to add the --username and --password parameters to the above commands.
Data Availability
The MongoDB database contains three collections, fitbit, sema, and surveys, containing the Fitbit, SEMA3, and survey data, respectively. Similarly, the CSV files contain related information to these collections. Each document in any collection follows the format shown below:
{
_id:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A dataset of plant species in Glenelg Shire, Australia.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This database collates 3552 development indicators from different studies with data by country and year, including single year and multiple year time series. The data is presented as charts, the data can be downloaded from linked project pages/references for each set, and the data for each presented graph is available as a CSV file as well as a visual download of the graph (both available via the download link under each chart).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
From 2016 to 2018, we surveyed the world’s largest natural history museum collections to begin mapping this globally distributed scientific infrastructure. The resulting dataset includes 73 institutions across the globe. It has:
Basic institution data for the 73 contributing institutions, including estimated total collection sizes, geographic locations (to the city) and latitude/longitude, and Research Organization Registry (ROR) identifiers where available.
Resourcing information, covering the numbers of research, collections and volunteer staff in each institution.
Indicators of the presence and size of collections within each institution broken down into a grid of 19 collection disciplines and 16 geographic regions.
Measures of the depth and breadth of individual researcher experience across the same disciplines and geographic regions.
This dataset contains the data (raw and processed) collected for the survey, and specifications for the schema used to store the data. It includes:
A diagram of the MySQL database schema.
A SQL dump of the MySQL database schema, excluding the data.
A SQL dump of the MySQL database schema with all data. This may be imported into an instance of MySQL Server to create a complete reconstruction of the database.
Raw data from each database table in CSV format.
A set of more human-readable views of the data in CSV format. These correspond to the database tables, but foreign keys are substituted for values from the linked tables to make the data easier to read and analyse.
A text file containing the definitions of the size categories used in the collection_unit table.
The global collections data may also be accessed at https://rebrand.ly/global-collections. This is a preliminary dashboard, constructed and published using Microsoft Power BI, that enables the exploration of the data through a set of visualisations and filters. The dashboard consists of three pages:
Institutional profile: Enables the selection of a specific institution and provides summary information on the institution and its location, staffing, total collection size, collection breakdown and researcher expertise.
Overall heatmap: Supports an interactive exploration of the global picture, including a heatmap of collection distribution across the discipline and geographic categories, and visualisations that demonstrate the relative breadth of collections across institutions and correlations between collection size and breadth. Various filters allow the focus to be refined to specific regions and collection sizes.
Browse: Provides some alternative methods of filtering and visualising the global dataset to look at patterns in the distribution and size of different types of collections across the global view.
Facebook
TwitterThis zip archive records all of the trait records in EOL's graph database. It contains five .csv files: pages.csv listing taxa and their names, traits.csv with trait records, metadata.csv with auxiliary records referred to by trait records, inferred.csv (see below) and terms.csv listing all of the relationship URIs in the database. For a description of the schema, see https://github.com/EOL/eol_website/blob/master/doc/trait-schema.md inferred.csv lists additional taxa to which a trait record applies by taxonomic inference, in addition to the ancestral taxon to which it is attached. For instance, the record describing locomotion=flight for Aves is also inferred to apply to most of the descendants of Aves, except for any flightless subclades that are excluded from the inference pattern. All the trait record referred to in the 2nd column of the inferred file have full records available in the traits file. THIS RESOURCE IS UPDATED MONTHLY. It is not archived regularly. Please save your download if you want to be able to refer to it at a later date
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset was created by Emmanuel Arias
Released under Database: Open Database, Contents: Database Contents