67 datasets found

T
VIERS- User Preference Service
datahub.va.gov
data.va.gov
+1more
application/rdfxml +5
Updated Sep 12, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2019). VIERS- User Preference Service [Dataset]. https://www.datahub.va.gov/dataset/VIERS-User-Preference-Service/ffxm-y9uj
Explore at:
xml, tsv, json, application/rdfxml, csv, application/rssxmlAvailable download formats
Dataset updated
Sep 12, 2019
Description
The Preferences service provides a means to store, retrieve, and manage user preferences. The service supports definition of enterprise wide preferences, as well as preferences that are specific to an application or business domain. The service supports dynamic creation and modification of preference definitions, supports the dynamic setting and modification of preference values,and supports governance of changes to preference domain definitions, preference definitions, and changes to preference values.
d
Shein and Fast Fashion E-Receipt Data | Consumer Transaction Data | Asia,...
datarade.ai
.json, .xml, .csv
Updated Jun 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Measurable AI (2024). Shein and Fast Fashion E-Receipt Data | Consumer Transaction Data | Asia, EMEA, LATAM, MENA, India | Granular & Aggregate Data | 23+ Countries [Dataset]. https://datarade.ai/data-products/shein-and-fast-fashion-e-receipt-data-consumer-transaction-measurable-ai
Explore at:
.json, .xml, .csvAvailable download formats
Dataset updated
Jun 20, 2024
Dataset authored and provided by
Measurable AI
Area covered
United States
Description
The Measurable AI Temu & Fast Fashion E-Receipt Dataset is a leading source of email receipts and transaction data, offering data collected directly from users via Proprietary Consumer Apps, with millions of opt-in users.

We source our email receipt consumer data panel via two consumer apps which garner the express consent of our end-users (GDPR compliant). We then aggregate and anonymize all the transactional data to produce raw and aggregate datasets for our clients.

Use Cases Our clients leverage our datasets to produce actionable consumer insights such as: - Market share analysis - User behavioral traits (e.g. retention rates) - Average order values - Promotional strategies used by the key players. Several of our clients also use our datasets for forecasting and understanding industry trends better.

Coverage - Asia (Japan, Thailand, Malaysia, Vietnam, Indonesia, Singapore, Hong Kong, Phillippines) - EMEA (Spain, United Arab Emirates, Saudi, Qatar) - Latin America (Brazil, Mexico, Columbia, Argentina)

Granular Data Itemized, high-definition data per transaction level with metrics such as - Order value - Items ordered - No. of orders per user - Delivery fee - Service fee - Promotions used - Geolocation data and more - Email ID (can work out user overlap with peers and loyalty)

Aggregate Data - Weekly/ monthly order volume - Revenue delivered in aggregate form, with historical data dating back to 2018.

Most of our clients are fast-growing Tech Companies, Financial Institutions, Buyside Firms, Market Research Agencies, Consultancies and Academia.

Our dataset is GDPR compliant, contains no PII information and is aggregated & anonymized with user consent. Contact business@measurable.ai for a data dictionary and to find out our volume in each country.
d
Bumble, Match, Tinder Dating App Data | Consumer Transaction Data | US, EU,...
datarade.ai
.json, .xml, .csv
Updated Oct 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Measurable AI (2024). Bumble, Match, Tinder Dating App Data | Consumer Transaction Data | US, EU, Asia, EMEA, LATAM, MENA, India | Granular & Aggregate Data available [Dataset]. https://datarade.ai/data-products/bumble-match-tinder-dating-app-data-consumer-transaction-measurable-ai
Explore at:
.json, .xml, .csvAvailable download formats
Dataset updated
Oct 12, 2023
Dataset authored and provided by
Measurable AI
Area covered
United States
Description
The Measurable AI Dating App Consumer Transaction Dataset is a leading source of in-app purchases , offering data collected directly from users via Proprietary Consumer Apps, with millions of opt-in users.

We source our in-app and email receipt consumer data panel via two consumer apps which garner the express consent of our end-users (GDPR compliant). We then aggregate and anonymize all the transactional data to produce raw and aggregate datasets for our clients.

Use Cases Our clients leverage our datasets to produce actionable consumer insights such as: - Market share analysis - User behavioral traits (e.g. retention rates) - Average order values - User overlap between competitors - Promotional strategies used by the key players. Several of our clients also use our datasets for forecasting and understanding industry trends better.

Coverage - Asia - EMEA (Spain, United Arab Emirates) - USA - Europe

Granular Data Itemized, high-definition data per transaction level with metrics such as - Order value - Features/subscription plans purchased - No. of orders per user - Promotions used - Geolocation data and more

Aggregate Data - Weekly/ monthly order volume - Revenue delivered in aggregate form, with historical data dating back to 2018. All the transactional e-receipts are sent from app to users’ registered accounts.

Most of our clients are fast-growing Tech Companies, Financial Institutions, Buyside Firms, Market Research Agencies, Consultancies and Academia.

Our dataset is GDPR compliant, contains no PII information and is aggregated & anonymized with user consent. Contact michelle@measurable.ai for a data dictionary and to find out our volume in each country.
Rural Definitions
catalog.data.gov
gimi9.com
+1more
Updated Apr 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Economic Research Service, Department of Agriculture (2025). Rural Definitions [Dataset]. https://catalog.data.gov/dataset/rural-definitions
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Economic Research Servicehttp://www.ers.usda.gov/
Description
Note: Updates to this data product are discontinued. Dozens of definitions are currently used by Federal and State agencies, researchers, and policymakers. The ERS Rural Definitions data product allows users to make comparisons among nine representative rural definitions. Methods of designating the urban periphery range from the use of municipal boundaries to definitions based on counties. Definitions based on municipal boundaries may classify as rural much of what would typically be considered suburban. Definitions that delineate the urban periphery based on counties may include extensive segments of a county that many would consider rural. We have selected a representative set of nine alternative rural definitions and compare social and economic indicators from the 2000 decennial census across the nine definitions. We chose socioeconomic indicators (population, education, poverty, etc.) that are commonly used to highlight differences between urban and rural areas.
Industrial Energy End Use in the U.S
kaggle.com
Updated Dec 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). Industrial Energy End Use in the U.S [Dataset]. https://www.kaggle.com/datasets/thedevastator/unlocking-industrial-energy-end-use-in-the-u-s
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 14, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Industrial Energy End Use in the U.S

Facility-Level Combustion Energy Data

By US Open Data Portal, data.gov [source]

About this dataset

This dataset contains in-depth facility-level information on industrial combustion energy use in the United States. It provides an essential resource for understanding consumption patterns across different sectors and industries, as reported by large emitters (>25,000 metric tons CO2e per year) under the U.S. EPA's Greenhouse Gas Reporting Program (GHGRP). Our records have been calculated using EPA default emissions factors and contain data on fuel type, location (latitude, longitude), combustion unit type and energy end use classified by manufacturing NAICS code. Additionally, our dataset reveals valuable insight into the thermal spectrum of low-temperature energy use from a 2010 Energy Information Administration Manufacturing Energy Consumption Survey (MECS). This information is critical to assessing industrial trends of energy consumption in manufacturing sectors and can serve as an informative baseline for efficient or renewable alternative plans of operation at these facilities. With this dataset you're just a few clicks away from analyzing research questions related to consumption levels across industries, waste issues associated with unconstrained fossil fuel burning practices and their environmental impacts

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset provides detailed information on industrial combustion energy end use in the United States. Knowing how certain industries use fuel can be valuable for those interested in reducing energy consumption and its associated environmental impacts.

To make the most out of this dataset, users should first become familiar with what's included by looking at the columns and their respective definitions. After becoming familiar with the data, users should start to explore areas of interest such as Fuel Type, Report Year, Primary NAICS Code, Emissions Indicators etc. The more granular and specific details you can focus on will help build a stronger analysis from which to draw conclusions from your data set.

Next steps could include filtering your data set down by region or end user type (such as direct related processes or indirect support activities). Segmenting your data set further can allow you to identify trends between fuel type used in different regions or compare emissions indicators between different processes within manufacturing industries etc. By taking a closer look through this lens you may be able to find valuable insights that can help inform better decision making when it comes to reducing energy consumption throughout industry in both public and private sectors alike.

if exploring specific trends within industry is not something that’s of particular interest to you but rather understanding general patterns among large emitters across regions then it may be beneficial for your analysis to group like-data together and take averages over larger samples which better represent total production across an area or multiple states (timeline varies depending on needs). This approach could open up new possibilities for exploring correlations between economic productivity metrics compared against industrial energy use over periods of time which could lead towards more formal investigations about where efforts are being made towards improved resource efficiency standards among certain industries/areas of production compared against other more inefficient sectors/regionsetc — all from what's already present here!

By leveraging the information provided within this dataset users have access to many opportunities for finding all sorts of interesting yet practical insights which can have important impacts far beyond understanding just another singular statistic alone; so happy digging!

Research Ideas

Analyzing the trends in combustion energy uses by region across different industries.

Predicting the potential of transitioning to clean and renewable sources of energy considering the current end-uses and their magnitude based on this data.

Creating an interactive web map application to visualize multiple industrial sites, including their energy sources and emissions data from this dataset combined with other sources (EPA’s GHGRP, MECS survey, etc)

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

**License: [CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication](https://creativecommons...
Open Data Portal Catalogue
open.canada.ca
datasets.ai
+1more
csv, json, jsonl, png +2
Updated Jun 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Treasury Board of Canada Secretariat (2025). Open Data Portal Catalogue [Dataset]. https://open.canada.ca/data/en/dataset/c4c5c7f1-bfa6-4ff6-b4a0-c164cb2060f7
Explore at:
csv, sqlite, json, png, jsonl, xlsxAvailable download formats
Dataset updated
Jun 14, 2025
Dataset provided by
Treasury Board of Canada Secretariathttp://www.tbs-sct.gc.ca/
Treasury Board of Canadahttps://www.canada.ca/en/treasury-board-secretariat/corporate/about-treasury-board.html
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Description
The open data portal catalogue is a downloadable dataset containing some key metadata for the general datasets available on the Government of Canada's Open Data portal. Resource 1 is generated using the ckanapi tool (external link) Resources 2 - 8 are generated using the Flatterer (external link) utility. ###Description of resources: 1. Dataset is a JSON Lines (external link) file where the metadata of each Dataset/Open Information Record is one line of JSON. The file is compressed with GZip. The file is heavily nested and recommended for users familiar with working with nested JSON. 2. Catalogue is a XLSX workbook where the nested metadata of each Dataset/Open Information Record is flattened into worksheets for each type of metadata. 3. datasets metadata contains metadata at the dataset level. This is also referred to as the package in some CKAN documentation. This is the main table/worksheet in the SQLite database and XLSX output. 4. Resources Metadata contains the metadata for the resources contained within each dataset. 5. resource views metadata contains the metadata for the views applied to each resource, if a resource has a view configured. 6. datastore fields metadata contains the DataStore information for CSV datasets that have been loaded into the DataStore. This information is displayed in the Data Dictionary for DataStore enabled CSVs. 7. Data Package Fields contains a description of the fields available in each of the tables within the Catalogue, as well as the count of the number of records each table contains. 8. data package entity relation diagram Displays the title and format for column, in each table in the Data Package in the form of a ERD Diagram. The Data Package resource offers a text based version. 9. SQLite Database is a .db database, similar in structure to Catalogue. This can be queried with database or analytical software tools for doing analysis.
f
Data from "Obstacles to the Reuse of Study Metadata in ClinicalTrials.gov"
figshare.com
zip
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Laura Miron; Rafael Gonçalves; Mark A. Musen (2023). Data from "Obstacles to the Reuse of Study Metadata in ClinicalTrials.gov" [Dataset]. http://doi.org/10.6084/m9.figshare.12743939.v2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12743939.v2
Dataset updated
Jun 1, 2023
Dataset provided by
figshare
Authors
Laura Miron; Rafael Gonçalves; Mark A. Musen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This fileset provides supporting data and corpora for the empirical study described in: Laura Miron, Rafael S. Goncalves and Mark A. Musen. Obstacles to the Reuse of Metadata in ClinicalTrials.govDescription of filesOriginal data files:- AllPublicXml.zip contains the set of all public XML records in ClinicalTrials.gov (protocols and summary results information), on which all remaining analyses are based. Set contains 302,091 records downloaded on April 3, 2019.- public.xsd is the XML schema downloaded from ClinicalTrials.gov on April 3, 2019, used to validate records in AllPublicXML.BioPortal API Query Results- condition_matches.csv contains the results of querying the BioPortal API for all ontology terms that are an 'exact match' to each condition string scraped from the ClinicalTrials.gov XML. Columns={filename, condition, url, bioportal term, cuis, tuis}. - intervention_matches.csv contains BioPortal API query results for all interventions scraped from the ClinicalTrials.gov XML. Columns={filename, intervention, url, bioportal term, cuis, tuis}.Data Element Definitions- supplementary_table_1.xlsx Mapping of element names, element types, and whether elements are required in ClinicalTrials.gov data dictionaries, the ClinicalTrials.gov XML schema declaration for records (public.XSD), the Protocol Registration System (PRS), FDAAA801, and the WHO required data elements for clinical trial registrations.Column and value definitions: - CT.gov Data Dictionary Section: Section heading for a group of data elements in the ClinicalTrials.gov data dictionary (https://prsinfo.clinicaltrials.gov/definitions.html) - CT.gov Data Dictionary Element Name: Name of an element/field according to the ClinicalTrials.gov data dictionaries (https://prsinfo.clinicaltrials.gov/definitions.html) and (https://prsinfo.clinicaltrials.gov/expanded_access_definitions.html) - CT.gov Data Dictionary Element Type: "Data" if the element is a field for which the user provides a value, "Group Heading" if the element is a group heading for several sub-fields, but is not in itself associated with a user-provided value. - Required for CT.gov for Interventional Records: "Required" if the element is required for interventional records according to the data dictionary, "CR" if the element is conditionally required, "Jan 2017" if the element is required for studies starting on or after January 18, 2017, the effective date of the FDAAA801 Final Rule, "-" indicates if this element is not applicable to interventional records (only observational or expanded access) - Required for CT.gov for Observational Records: "Required" if the element is required for interventional records according to the data dictionary, "CR" if the element is conditionally required, "Jan 2017" if the element is required for studies starting on or after January 18, 2017, the effective date of the FDAAA801 Final Rule, "-" indicates if this element is not applicable to observational records (only interventional or expanded access) - Required in CT.gov for Expanded Access Records?: "Required" if the element is required for interventional records according to the data dictionary, "CR" if the element is conditionally required, "Jan 2017" if the element is required for studies starting on or after January 18, 2017, the effective date of the FDAAA801 Final Rule, "-" indicates if this element is not applicable to expanded access records (only interventional or observational) - CT.gov XSD Element Definition: abbreviated xpath to the corresponding element in the ClinicalTrials.gov XSD (public.XSD). The full xpath includes 'clinical_study/' as a prefix to every element. (There is a single top-level element called "clinical_study" for all other elements.) - Required in XSD? : "Yes" if the element is required according to public.XSD, "No" if the element is optional, "-" if the element is not made public or included in the XSD - Type in XSD: "text" if the XSD type was "xs:string" or "textblock", name of enum given if type was enum, "integer" if type was "xs:integer" or "xs:integer" extended with the "type" attribute, "struct" if the type was a struct defined in the XSD - PRS Element Name: Name of the corresponding entry field in the PRS system - PRS Entry Type: Entry type in the PRS system. This column contains some free text explanations/observations - FDAAA801 Final Rule FIeld Name: Name of the corresponding required field in the FDAAA801 Final Rule (https://www.federalregister.gov/documents/2016/09/21/2016-22129/clinical-trials-registration-and-results-information-submission). This column contains many empty values where elements in ClinicalTrials.gov do not correspond to a field required by the FDA - WHO Field Name: Name of the corresponding field required by the WHO Trial Registration Data Set (v 1.3.1) (https://prsinfo.clinicaltrials.gov/trainTrainer/WHO-ICMJE-ClinTrialsgov-Cross-Ref.pdf)Analytical Results:- EC_human_review.csv contains the results of a manual review of random sample eligibility criteria from 400 CT.gov records. Table gives filename, criteria, and whether manual review determined the criteria to contain criteria for "multiple subgroups" of participants.- completeness.xlsx contains counts and percentages of interventional records missing fields required by FDAAA801 and its Final Rule.- industry_completeness.xlsx contains percentages of interventional records missing required fields, broken up by agency class of trial's lead sponsor ("NIH", "US Fed", "Industry", or "Other"), and before and after the effective date of the Final Rule- location_completeness.xlsx contains percentages of interventional records missing required fields, broken up by whether record listed at least one location in the United States and records with only international location (excluding trials with no listed location), and before and after the effective date of the Final RuleIntermediate Results:- cache.zip contains pickle and csv files of pandas dataframes with values scraped from the XML records in AllPublicXML. Downloading these files greatly speeds up running analysis steps from jupyter notebooks in our github repository.
Z
Data from: Covid19Kerala.info-Data: A collective open dataset of COVID-19...
data.niaid.nih.gov
explore.openaire.eu
+1more
Updated Sep 6, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hritwik N Edavalath (2020). Covid19Kerala.info-Data: A collective open dataset of COVID-19 outbreak in the south Indian state of Kerala [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3818096
Explore at:
Dataset updated
Sep 6, 2020
Dataset provided by
Nikhil Narayanan
Musfir Mohammed
Sharadh Manian
Kumar Sujith
Sreekanth Chaliyeduth
Neetha Nanoth Vellichirammal
Akhil Balakrishnan
Jijo Ulahannan
Jeevan Uthaman
Hritwik N Edavalath
Sreehari Pillai
Sindhu Joseph
Prem Prabhakaran
Unnikrishnan Sureshkumar
Sooraj P Suresh
E Rajeevan
Shabeesh Balan
Manoj Karingamadathil
Nishad Thalhath
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Area covered
South India, India, Kerala
Description
Covid19Kerala.info-Data is a consolidated multi-source open dataset of metadata from the COVID-19 outbreak in the Indian state of Kerala. It is created and maintained by volunteers of ‘Collective for Open Data Distribution-Keralam’ (CODD-K), a nonprofit consortium of individuals formed for the distribution and longevity of open-datasets. Covid19Kerala.info-Data covers a set of correlated temporal and spatial metadata of SARS-CoV-2 infections and prevention measures in Kerala. Static releases of this dataset snapshots are manually produced from a live database maintained as a set of publicly accessible Google sheets. This dataset is made available under the Open Data Commons Attribution License v1.0 (ODC-BY 1.0).

Schema and data package Datapackage with schema definition is accessible at https://codd-k.github.io/covid19kerala.info-data/datapackage.json. Provided datapackage and schema are based on Frictionless data Data Package specification.

Temporal and Spatial Coverage

This dataset covers COVID-19 outbreak and related data from the state of Kerala, India, from January 31, 2020 till the date of the publication of this snapshot. The dataset shall be maintained throughout the entirety of the COVID-19 outbreak.

The spatial coverage of the data lies within the geographical boundaries of the Kerala state which includes its 14 administrative subdivisions. The state is further divided into Local Self Governing (LSG) Bodies. Reference to this spatial information is included on appropriate data facets. Available spatial information on regions outside Kerala was mentioned, but it is limited as a reference to the possible origins of the infection clusters or movement of the individuals.

Longevity and Provenance

The dataset snapshot releases are published and maintained in a designated GitHub repository maintained by CODD-K team. Periodic snapshots from the live database will be released at regular intervals. The GitHub commit logs for the repository will be maintained as a record of provenance, and archived repository will be maintained at the end of the project lifecycle for the longevity of the dataset.

Data Stewardship

CODD-K expects all administrators, managers, and users of its datasets to manage, access, and utilize them in a manner that is consistent with the consortium’s need for security and confidentiality and relevant legal frameworks within all geographies, especially Kerala and India. As a responsible steward to maintain and make this dataset accessible— CODD-K absolves from all liabilities of the damages, if any caused by inaccuracies in the dataset.

License

This dataset is made available by the CODD-K consortium under ODC-BY 1.0 license. The Open Data Commons Attribution License (ODC-By) v1.0 ensures that users of this dataset are free to copy, distribute and use the dataset to produce works and even to modify, transform and build upon the database, as long as they attribute the public use of the database or works produced from the same, as mentioned in the citation below.

Disclaimer

Covid19Kerala.info-Data is provided under the ODC-BY 1.0 license as-is. Though every attempt is taken to ensure that the data is error-free and up to date, the CODD-K consortium do not bear any responsibilities for inaccuracies in the dataset or any losses—monetary or otherwise—that users of this dataset may incur.
NADA-SynShapes: A synthetic shape benchmark for testing probabilistic deep...
zenodo.org
text/x-python, zip
Updated Apr 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Giulio Del Corso; Giulio Del Corso; Volpini Federico; Volpini Federico; Claudia Caudai; Claudia Caudai; Davide Moroni; Davide Moroni; Sara Colantonio; Sara Colantonio (2025). NADA-SynShapes: A synthetic shape benchmark for testing probabilistic deep learning models [Dataset]. http://doi.org/10.5281/zenodo.15194187
Explore at:
zip, text/x-pythonAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15194187
Dataset updated
Apr 16, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Giulio Del Corso; Giulio Del Corso; Volpini Federico; Volpini Federico; Claudia Caudai; Claudia Caudai; Davide Moroni; Davide Moroni; Sara Colantonio; Sara Colantonio
License
Attribution-NonCommercial-NoDerivs 2.5 (CC BY-NC-ND 2.5)https://creativecommons.org/licenses/by-nc-nd/2.5/
License information was derived automatically
Time period covered
Dec 18, 2024
Description
NADA (Not-A-Database) is an easy-to-use geometric shape data generator that allows users to define non-uniform multivariate parameter distributions to test novel methodologies. The full open-source package is provided at GIT:NA_DAtabase. See Technical Report for details on how to use the provided package.

This database includes 3 repositories:

NADA_Dis: Is the model able to correctly characterize/Disentangle a complex latent space?
The repository contains 3x100,000 synthetic black and white images to test the ability of the models to correctly define a proper latent space (e.g., autoencoders) and disentangle it. The first 100,000 images contain 4 shapes and uniform parameter space distributions, while the other images have a more complex underlying distribution (truncated Gaussian and correlated marginal variables).

NADA_OOD: Does the model identify Out-Of-Distribution images?
The repository contains 100,000 training images (4 different shapes with 3 possible colors located in the upper left corner of the canvas) and 6x100,000 increasingly different sets of images (changing the color class balance, reducing the radius of the shape, moving the shape to the lower left corner) providing increasingly challenging out-of-distribution images.
This can help to test not only the capability of a model, but also methods that produce reliability estimates and should correctly classify OOD elements as "unreliable" as they are far from the original distributions.

NADA_AlEp: Does the model distinguish between different types (Aleatoric/Epistemic) of uncertainties?
The repository contains 5x100,000 images with different type of noise/uncertainties:

NADA_AlEp_0_Clean: Dataset clean of noise to use as a possible training set.

NADA_AlEp_1_White_Noise: Epistemic white noise dataset. Each image is perturbed with an amount of white noise randomly sampled from 0% to 90%.

NADA_AlEp_2_Deformation: Dataset with Epistemic deformation noise. Each image is deformed by a randomly amount uniformly sampled between 0% and 90%. 0% corresponds to the original image, while 100% is a full deformation to the circumscribing circle.

NADA_AlEp_3_Label: Dataset with label noise. Formally, 20% of Triangles of a given color are missclassified as a Square with a random color (among Blue, Orange, and Brown) and viceversa (Squares to Triangles). Label noise introduces \textit{Aleatoric Uncertainty} because it is inherent in the data and cannot be reduced.

NADA_AlEp_4_Combined: Combined dataset with all previous sources of uncertainty.

Each image can be used for classification (shape/color) or regression (radius/area) tasks.

All datasets can be modified and adapted to the user's research question using the included open source data generator.
E
Data from: Slovenian Definition Extraction training dataset DF_NDF_wiki_slo...
live.european-language-grid.eu
binary format
Updated May 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Slovenian Definition Extraction training dataset DF_NDF_wiki_slo 1.0 [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/21587
Explore at:
binary formatAvailable download formats
Dataset updated
May 18, 2023
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
The Slovenian definition extraction training dataset DF_NDF_wiki_slo contains 38613 sentences extracted from the Slovenian Wikipedia. The first sentence of a term's description on Wikipedia is considered a definition, and all other sentences are considered non-definitions.

The corpus consists of the following files each containing one definition / non-definition sentence per line:

Definitions: df_ndf_wiki_slo_Y.txt with 3251 definition sentences.

Non-definitions: df_ndf_wiki_slo_N.txt with 14678 non-definition sentences which do not contain the term at the beginning of the sentence.

Non-definitions: df_ndf_wiki_slo_N1.txt with 20684 non-definition sentences which may also contain the term at the beginning of the sentence.

The dataset is described in more detail in Fišer et al. 2010. If you use this resource, please cite:

Fišer, D., Pollak, S., Vintar, Š. (2010). Learning to Mine Definitions from Slovene Structured and Unstructured Knowledge-Rich Resources. Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10). https://aclanthology.org/L10-1089/

Reference to training Transformer-based definition extraction models using this dataset: Tran, T.H.H., Podpečan, V., Jemec Tomazin, M., Pollak, Senja (2023). Definition Extraction for Slovene: Patterns, Transformer Classifiers and ChatGPT. Proceedings of the ELEX 2023: Electronic lexicography in the 21st century. Invisible lexicography: everywhere lexical data is used without users realizing they make use of a “dictionary”.

Related resources: Jemec Tomazin, M. et al. (2023). Slovenian Definition Extraction evaluation datasets RSDO-def 1.0, Slovenian language resource repository CLARIN.SI, http://hdl.handle.net/11356/1841
f
Collection of example datasets used for the book - R Programming -...
figshare.com
txt
Updated Dec 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kingsley Okoye; Samira Hosseini (2023). Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research [Dataset]. http://doi.org/10.6084/m9.figshare.24728073.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24728073.v1
Dataset updated
Dec 4, 2023
Dataset provided by
figshare
Authors
Kingsley Okoye; Samira Hosseini
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This book is written for statisticians, data analysts, programmers, researchers, teachers, students, professionals, and general consumers on how to perform different types of statistical data analysis for research purposes using the R programming language. R is an open-source software and object-oriented programming language with a development environment (IDE) called RStudio for computing statistics and graphical displays through data manipulation, modelling, and calculation. R packages and supported libraries provides a wide range of functions for programming and analyzing of data. Unlike many of the existing statistical softwares, R has the added benefit of allowing the users to write more efficient codes by using command-line scripting and vectors. It has several built-in functions and libraries that are extensible and allows the users to define their own (customized) functions on how they expect the program to behave while handling the data, which can also be stored in the simple object system.For all intents and purposes, this book serves as both textbook and manual for R statistics particularly in academic research, data analytics, and computer programming targeted to help inform and guide the work of the R users or statisticians. It provides information about different types of statistical data analysis and methods, and the best scenarios for use of each case in R. It gives a hands-on step-by-step practical guide on how to identify and conduct the different parametric and non-parametric procedures. This includes a description of the different conditions or assumptions that are necessary for performing the various statistical methods or tests, and how to understand the results of the methods. The book also covers the different data formats and sources, and how to test for reliability and validity of the available datasets. Different research experiments, case scenarios and examples are explained in this book. It is the first book to provide a comprehensive description and step-by-step practical hands-on guide to carrying out the different types of statistical analysis in R particularly for research purposes with examples. Ranging from how to import and store datasets in R as Objects, how to code and call the methods or functions for manipulating the datasets or objects, factorization, and vectorization, to better reasoning, interpretation, and storage of the results for future use, and graphical visualizations and representations. Thus, congruence of Statistics and Computer programming for Research.
User Model for Amazon
kaggle.com
Updated May 21, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aditya6196 (2020). User Model for Amazon [Dataset]. https://www.kaggle.com/aditya6196/user-model-for-amazon/metadata
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 21, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Aditya6196
Description
DESCRIPTION

The dataset provided contains movie reviews given by Amazon customers. Reviews were given between May 1996 and July 2014.

Data Dictionary UserID – 4848 customers who provided a rating for each movie Movie 1 to Movie 206 – 206 movies for which ratings are provided by 4848 distinct users

Data Considerations - All the users have not watched all the movies and therefore, all movies are not rated. These missing values are represented by NA. - Ratings are on a scale of -1 to 10 where -1 is the least rating and 10 is the best.

Analysis Task - Exploratory Data Analysis:

Which movies have maximum views/ratings? What is the average rating for each movie? Define the top 5 movies with the maximum ratings. Define the top 5 movies with the least audience. - Recommendation Model: Some of the movies hadn’t been watched and therefore, are not rated by the users. Netflix would like to take this as an opportunity and build a machine learning recommendation algorithm which provides the ratings for each of the users.

Divide the data into training and test data Build a recommendation model on training data Make predictions on the test data
🌟 Emoji Trends Dataset
kaggle.com
Updated Jul 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Waqar Ali (2024). 🌟 Emoji Trends Dataset [Dataset]. https://www.kaggle.com/datasets/waqi786/emoji-trends-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 31, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Waqar Ali
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This dataset provides a detailed analysis of emoji usage across various social media platforms. It captures how different emojis are used in different contexts, reflecting emotions, trends, and user demographics.

With emojis becoming a universal digital language, this dataset helps researchers, marketers, and data analysts explore how people express emotions online and identify patterns in social media communication.

📌 Key Features: 😊 Emoji Details: Emoji 🎭: The specific emoji used in a post, comment, or message. Context 💬: The meaning or emotion associated with the emoji (e.g., Happy, Love, Funny, Sad). Platform 🌐: The social media platform where the emoji was used (e.g., Facebook, Instagram, Twitter). 👤 User Demographics: User Age 🎂: Age of the user who posted the emoji (ranges from 13 to 65 years). User Gender 🚻: Gender of the user (Male/Female). 📈 Additional Insights: Emoji Popularity 🔥: Frequency of each emoji’s usage across platforms. Trends Over Time 📅: How emoji usage changes based on trends or events. Regional Usage Patterns 🌍: How different cultures and regions use emojis differently. 📊 Use Cases & Applications: 🔹 Understanding emoji trends across social media 🔹 Analyzing emotional expression through digital communication 🔹 Exploring demographic differences in emoji usage 🔹 Identifying platform-specific emoji preferences 🔹 Enhancing sentiment analysis models with emoji insights

⚠️ Important Note: This dataset is synthetically generated for educational and analytical purposes. It does not contain real user data but is designed to reflect real-world trends in emoji usage.
Research Data Framework (RDaF) Database
catalog.data.gov
gimi9.com
+1more
Updated Mar 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2025). Research Data Framework (RDaF) Database [Dataset]. https://catalog.data.gov/dataset/research-data-framework-rdaf-database
Explore at:
Dataset updated
Mar 14, 2025
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
The NIST RDaF is a map of the research data space that uses a lifecycle approach with six high-level lifecycle stages to organize key information concerning research data management (RDM) and research data dissemination. Through a community-driven and in-depth process, stakeholders identified topics and subtopics?programmatic and operational activities, concepts, and other important factors relevant to RDM. All elements of the RDaF framework foundation?the lifecycle stages and their associated topics and subtopics?are defined. Most subtopics have several informative references, which are resources such as guidelines, standards, and policies that assist stakeholders in addressing that subtopic. Further, the NIST RDaF team identified 14 Overarching Themes which are pervasive throughout the framework. The Framework foundation enables organizations and individual researchers to use the RDaF for self-assessment of their RDM status. The RDaF includes sample ?profiles? for various job functions or roles, each containing topics and subtopics that an individual in the given role is encouraged to consider in fulfilling their RDM responsibilities. Individual researchers and organizations involved in the research data lifecycle can tailor these profiles for their specific job function using a tool available on the RDaF website. The methodologies used to generate all features of the RDaF are described in detail in the publication NIST SP 1500-8.This database version of the NIST RDaF is designed so that users can readily navigate the various lifecycle stages, topics, subtopics, and overarching themes from numerous locations. In addition, unlike the published text version, links are included for the definitions of most topics and subtopics and for informative references for most subtopics. For more information on the database, please see the FAQ page.
u
Authcode - Dataset
portalinvestigacion.um.es
ieee-dataport.org
Updated 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sánchez Sánchez, Pedro Miguel; Fernández Maimó, Lorenzo; Huertas Celdrán, Alberto; Martínez Pérez, Gregorio; Sánchez Sánchez, Pedro Miguel; Fernández Maimó, Lorenzo; Huertas Celdrán, Alberto; Martínez Pérez, Gregorio (2020). Authcode - Dataset [Dataset]. https://portalinvestigacion.um.es/documentos/668fc48eb9e7c03b01be0e33
Explore at:
Dataset updated
2020
Authors
Sánchez Sánchez, Pedro Miguel; Fernández Maimó, Lorenzo; Huertas Celdrán, Alberto; Martínez Pérez, Gregorio; Sánchez Sánchez, Pedro Miguel; Fernández Maimó, Lorenzo; Huertas Celdrán, Alberto; Martínez Pérez, Gregorio
Description
Intending to cover the existing gap regarding behavioral datasets modelling interactions of users with individual a multiple devices in Smart Office to later authenticate them continuously, we publish the following collection of datasets, which has been generated after having five users interacting for 60 days with their personal computer and mobile devices. Below you can find a brief description of each dataset.Dataset 1 (2.3 GB). This dataset contains 92975 vectors of features (8096 per vector) that model the interactions of the five users with their personal computers. Each vector contains aggregated data about keyboard and mouse activity, as well as application usage statistics. More info about features meaning can be found in the readme file. Originally, the number of features of this dataset was 24 065 but after filtering the constant features, this number was reduced to 8096. There was a high number of constant features to 0 since each possible digraph (two keys combination) was considered when collecting the data. However, there are many unusual digraphs that the users never introduced in their computers, so these features were deleted in the uploaded dataset.Dataset 2 (8.9 MB). This dataset contains 61918 vectors of features (15 per vector)that model the interactions of the five users with their mobile devices. Each vector contains aggregated data about application usage statistics. More info about features meaning can be found in the readme file.Dataset 3 (28.9 MB). This dataset contains 133590vectors of features (42 per vector)that model the interactions of the five users with their mobile devices. Each vector contains aggregated data about the gyroscope and Accelerometer sensors.More info about features meaning can be found in the readme file.Dataset 4 (162.4 MB). This dataset contains 145465vectors of features (241 per vector)that model the interactions of the five users with both personal computers and mobile devices. Each vector contains the aggregation of the most relevant features of both devices. More info about features meaning can be found in the readme file.Dataset 5 (878.7 KB). This dataset is composed of 7 datasets. Each one of them contains an aggregation of feature vectors generated from the active/inactive intervals of personal computers and mobile devices by considering different time windows ranging from 1h to 24h.1h: 4074 vectors2h: 2149 vectors3h: 1470 vectors4h: 1133 vectors6h: 770 vectors12h: 440 vectors24h: 229 vectors
Database Creation Description and Data Dictionaries
figshare.com
txt
Updated Aug 11, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jordan Kempker; John David Ike (2016). Database Creation Description and Data Dictionaries [Dataset]. http://doi.org/10.6084/m9.figshare.3569067.v3
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3569067.v3
Dataset updated
Aug 11, 2016
Dataset provided by
Figsharehttp://figshare.com/
Authors
Jordan Kempker; John David Ike
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
There are several Microsoft Word documents here detailing data creation methods and with various dictionaries describing the included and derived variables.The Database Creation Description is meant to walk a user through some of the steps detailed in the SAS code with this project.The alphabetical list of variables is intended for users as sometimes this makes some coding steps easier to copy and paste from this list instead of retyping.The NIS Data Dictionary contains some general dataset description as well as each variable's responses.
b
Vocabulary of Interlinked Datasets
bioregistry.io
Updated Aug 13, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Vocabulary of Interlinked Datasets [Dataset]. https://bioregistry.io/void
Explore at:
Dataset updated
Aug 13, 2021
Description
The Vocabulary of Interlinked Datasets (VoID) is an RDF Schema vocabulary for expressing metadata about RDF datasets. It is intended as a bridge between the publishers and users of RDF data, with applications ranging from data discovery to cataloging and archiving of datasets. This document provides a formal definition of the new RDF classes and properties introduced for VoID. It is a companion to the main specification document for VoID, Describing Linked Datasets with the VoID Vocabulary.
d
LNWB Ch03 Data Processes
search.dataone.org
hydroshare.org
+1more
Updated Apr 15, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christina Bandaragoda; Joanne Greenberg; Peter Gill; Bracken Capen; Mary Dumas (2022). LNWB Ch03 Data Processes [Dataset]. https://search.dataone.org/view/sha256%3A2a8103e6f0e432948dd223f69ee2ce60f9611139cdfae7b8dab0b800e6f2526f
Explore at:
Dataset updated
Apr 15, 2022
Dataset provided by
Hydroshare
Authors
Christina Bandaragoda; Joanne Greenberg; Peter Gill; Bracken Capen; Mary Dumas
Description
Overview: The Lower Nooksack Water Budget Project involved assembling a wide range of existing data related to WRIA 1 and specifically the Lower Nooksack Subbasin, updating existing data sets and generating new data sets. This Data Management Plan provides an overview of the data sets, formats and collaboration environment that was used to develop the project. Use of a plan during development of the technical work products provided a forum for the data development and management to be conducted with transparent methods and processes. At project completion, the Data Management Plan provides an accessible archive of the data resources used and supporting information on the data storage, intended access, sharing and re-use guidelines.

One goal of the Lower Nooksack Water Budget project is to make this “usable technical information” as accessible as possible across technical, policy and general public users. The project data, analyses and documents will be made available through the WRIA 1 Watershed Management Project website http://wria1project.org. This information is intended for use by the WRIA 1 Joint Board and partners working to achieve the adopted goals and priorities of the WRIA 1 Watershed Management Plan.

Model outputs for the Lower Nooksack Water Budget are summarized by sub-watersheds (drainages) and point locations (nodes). In general, due to changes in land use over time and changes to available streamflow and climate data, the water budget for any watershed needs to be updated periodically. Further detailed information about data sources is provided in review packets developed for specific technical components including climate, streamflow and groundwater level, soils and land cover, and water use.

Purpose: This project involves assembling a wide range of existing data related to the WRIA 1 and specifically the Lower Nooksack Subbasin, updating existing data sets and generating new data sets. Data will be used as input to various hydrologic, climatic and geomorphic components of the Topnet-Water Management (WM) model, but will also be available to support other modeling efforts in WRIA 1. Much of the data used as input to the Topnet model is publicly available and maintained by others, (i.e., USGS DEMs and streamflow data, SSURGO soils data, University of Washington gridded meteorological data). Pre-processing is performed to convert these existing data into a format that can be used as input to the Topnet model. Post-processing of Topnet model ASCII-text file outputs is subsequently combined with spatial data to generate GIS data that can be used to create maps and illustrations of the spatial distribution of water information. Other products generated during this project will include documentation of methods, input by WRIA 1 Joint Board Staff Team during review and comment periods, communication tools developed for public engagement and public comment on the project.

In order to maintain an organized system of developing and distributing data, Lower Nooksack Water Budget project collaborators should be familiar with standards for data management described in this document, and the following issues related to generating and distributing data: 1. Standards for metadata and data formats 2. Plans for short-term storage and data management (i.e., file formats, local storage and back up procedures and security) 3. Legal and ethical issues (i.e., intellectual property, confidentiality of study participants) 4. Access policies and provisions (i.e., how the data will be made available to others, any restrictions needed) 5. Provisions for long-term archiving and preservation (i.e., establishment of a new data archive or utilization of an existing archive) 6. Assigned data management responsibilities (i.e., persons responsible for ensuring data Management, monitoring compliance with the Data Management Plan)

This resource is a subset of the Lower Nooksack Water Budget (LNWB) Collection Resource.
c
ckanext-customuserprivileges
catalog.civicdataecosystem.org
Updated Jun 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). ckanext-customuserprivileges [Dataset]. https://catalog.civicdataecosystem.org/dataset/ckanext-customuserprivileges
Explore at:
Dataset updated
Jun 4, 2025
Description
The ckanext-customuserprivileges extension enhances CKAN's dataset management capabilities by providing a mechanism for specifying granular administrative control over datasets. It introduces an autocomplete field to datasets, enabling the definition of specific users who can administer a given dataset. This empowers administrators to override default CKAN permissions and assign specific users as administrators, both for unowned datasets and for datasets belonging to organizations/companies, adding a layer of permission customization to CKAN's existing user roles. Key Features: Dataset Administrator Field: Adds an autocomplete field to the dataset creation and editing forms, allowing for the specification of users who can administer the dataset, using usernames. Unowned Dataset Control: When no administrators are specified upon creation of an unowned dataset, only the user creating the dataset can manage it by default. When dataset administrators are specified, those users gain editing privileges. Company Dataset Enhancement: For company datasets this adds another layer, requiring administrators of the dataset to also be administrators or editors within the respective company. Granular Permission Control: Allows administrators to override CKAN's default permissions and assign specific users as administrators for individual datasets. Use Cases: Shared Data Repository: An organization using CKAN for data sharing can use the extension to allow specific team members (beyond the dataset creator) to manage particular datasets, regardless of their broader organization roles. Dataset Delegation: Enable a data steward user to administer all datasets related to particular field (e.g. environmental data, financial data) without been given overall rights for managing all datasets within the environment. Technical Integration: The extension integrates directly into CKAN's dataset creation and editing workflow by introducing a new field (which may need to be configured to appear). It also modifies the permission checking logic to incorporate the specified dataset administrators when deciding whether a user has permission to manage a dataset. Benefits & Impact: By implementing the ckanext-customuserprivileges extension, organizations can achieve more fine-grained control over dataset administration in CKAN. This ensures that appropriate users have the necessary editing privileges, while maintaining overall data governance policies
d
FoodPanda Food & Grocery Transaction Data | Email Receipt Data | Asia |...
datarade.ai
.json, .xml, .csv
Updated Oct 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Measurable AI (2023). FoodPanda Food & Grocery Transaction Data | Email Receipt Data | Asia | Granular & Aggregate Data available [Dataset]. https://datarade.ai/data-products/foodpanda-food-grocery-transaction-data-email-receipt-dat-measurable-ai
Explore at:
.json, .xml, .csvAvailable download formats
Dataset updated
Oct 13, 2023
Dataset authored and provided by
Measurable AI
Area covered
Singapore, Thailand, Philippines, Pakistan, Hong Kong, Malaysia, Taiwan
Description
The Measurable AI FoodPanda Food & Grocery Transaction dataset is a leading source of email receipts and transaction data, offering data collected directly from users via Proprietary Consumer Apps, with millions of opt-in users.

We source our email receipt consumer data panel via two consumer apps which garner the express consent of our end-users (GDPR compliant). We then aggregate and anonymize all the transactional data to produce raw and aggregate datasets for our clients.

Use Cases Our clients leverage our datasets to produce actionable consumer insights such as: - Market share analysis - User behavioral traits (e.g. retention rates) - Average order values - Promotional strategies used by the key players. Several of our clients also use our datasets for forecasting and understanding industry trends better.

Coverage - Asia (Hong Kong, Taiwan, Singapore, Thailand, Malaysia, Philippines, Pakistan)

Granular Data Itemized, high-definition data per transaction level with metrics such as - Order value - Items ordered - No. of orders per user - Delivery fee - Service fee - Promotions used - Geolocation data and more

Aggregate Data - Weekly/ monthly order volume - Revenue delivered in aggregate form, with historical data dating back to 2018. All the transactional e-receipts are sent from the FoodPanda food delivery app to users’ registered accounts.

Most of our clients are fast-growing Tech Companies, Financial Institutions, Buyside Firms, Market Research Agencies, Consultancies and Academia.

Our dataset is GDPR compliant, contains no PII information and is aggregated & anonymized with user consent. Contact business@measurable.ai for a data dictionary and to find out our volume in each country.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2019). VIERS- User Preference Service [Dataset]. https://www.datahub.va.gov/dataset/VIERS-User-Preference-Service/ffxm-y9uj

VIERS- User Preference Service

Explore at:

xml, tsv, json, application/rdfxml, csv, application/rssxmlAvailable download formats

Dataset updated

Sep 12, 2019

Description

The Preferences service provides a means to store, retrieve, and manage user preferences. The service supports definition of enterprise wide preferences, as well as preferences that are specific to an application or business domain. The service supports dynamic creation and modification of preference definitions, supports the dynamic setting and modification of preference values,and supports governance of changes to preference domain definitions, preference definitions, and changes to preference values.

Clear search

Close search

Google apps

Main menu

VIERS- User Preference Service

Shein and Fast Fashion E-Receipt Data | Consumer Transaction Data | Asia,...

Bumble, Match, Tinder Dating App Data | Consumer Transaction Data | US, EU,...

Rural Definitions

Industrial Energy End Use in the U.S

Industrial Energy End Use in the U.S

Facility-Level Combustion Energy Data

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Open Data Portal Catalogue

Data from "Obstacles to the Reuse of Study Metadata in ClinicalTrials.gov"

Data from: Covid19Kerala.info-Data: A collective open dataset of COVID-19...

NADA-SynShapes: A synthetic shape benchmark for testing probabilistic deep...

Data from: Slovenian Definition Extraction training dataset DF_NDF_wiki_slo...

Collection of example datasets used for the book - R Programming -...

User Model for Amazon

🌟 Emoji Trends Dataset

Research Data Framework (RDaF) Database

Authcode - Dataset

Database Creation Description and Data Dictionaries

Vocabulary of Interlinked Datasets

LNWB Ch03 Data Processes

ckanext-customuserprivileges

FoodPanda Food & Grocery Transaction Data | Email Receipt Data | Asia |...

VIERS- User Preference Service