100+ datasets found

bleeding-edge-gameplay-sample
huggingface.co
Updated Feb 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Microsoft (2025). bleeding-edge-gameplay-sample [Dataset]. https://huggingface.co/datasets/microsoft/bleeding-edge-gameplay-sample
Explore at:
Dataset updated
Feb 21, 2025
Dataset authored and provided by
Microsofthttp://microsoft.com/
Description
This dataset contains 1024 60 second video clips of Bleeding Edge gameplay (75GB). The data has already been processed into the following format: 300x180 videos sampled at 10 fps.

Dataset Structure Data Files

testing_dataset_part1.zip & testing_dataset_part2.zip – Contains all 1024 60 second trajectories used for our evaluation.

4 examples from the dataset:

FB[…].npz – .npz file (described below) FB[…].mp4 – 60 seconds .mp4 video of the images from the .npz file.… See the full description on the dataset page: https://huggingface.co/datasets/microsoft/bleeding-edge-gameplay-sample.
Adventure Works 2022 CSVs
kaggle.com
zip
Updated Nov 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Algorismus (2022). Adventure Works 2022 CSVs [Dataset]. https://www.kaggle.com/datasets/algorismus/adventure-works-in-excel-tables
Explore at:
zip(567646 bytes)Available download formats
Dataset updated
Nov 2, 2022
Authors
Algorismus
License
http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
Description
Adventure Works 2022 dataset

How this Dataset is created?

On the official website the dataset is available over SQL server (localhost) and CSVs to be used via Power BI Desktop running on Virtual Lab (Virtaul Machine). As per first two steps of Importing data are executed in the virtual lab and then resultant Power BI tables are copied in CSVs. Added records till year 2022 as required.

How this Dataset may help you?

this dataset will be helpful in case you want to work offline with Adventure Works data in Power BI desktop in order to carry lab instructions as per training material on official website. The dataset is useful in case you want to work on Power BI desktop Sales Analysis example from Microsoft website PL 300 learning.

How to use this Dataset?

Download the CSV file(s) and import in Power BI desktop as tables. The CSVs are named as tables created after first two steps of importing data as mentioned in the PL-300 Microsoft Power BI Data Analyst exam lab.
c
Data from: Delta Neighborhood Physical Activity Study
s.cnmilf.com
agdatacommons.nal.usda.gov
+1more
Updated Jun 5, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Delta Neighborhood Physical Activity Study [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/delta-neighborhood-physical-activity-study-f82d7
Explore at:
Dataset updated
Jun 5, 2025
Dataset provided by
Agricultural Research Service
Description
The Delta Neighborhood Physical Activity Study was an observational study designed to assess characteristics of neighborhood built environments associated with physical activity. It was an ancillary study to the Delta Healthy Sprouts Project and therefore included towns and neighborhoods in which Delta Healthy Sprouts participants resided. The 12 towns were located in the Lower Mississippi Delta region of Mississippi. Data were collected via electronic surveys between August 2016 and September 2017 using the Rural Active Living Assessment (RALA) tools and the Community Park Audit Tool (CPAT). Scale scores for the RALA Programs and Policies Assessment and the Town-Wide Assessment were computed using the scoring algorithms provided for these tools via SAS software programming. The Street Segment Assessment and CPAT do not have associated scoring algorithms and therefore no scores are provided for them. Because the towns were not randomly selected and the sample size is small, the data may not be generalizable to all rural towns in the Lower Mississippi Delta region of Mississippi. Dataset one contains data collected with the RALA Programs and Policies Assessment (PPA) tool. Dataset two contains data collected with the RALA Town-Wide Assessment (TWA) tool. Dataset three contains data collected with the RALA Street Segment Assessment (SSA) tool. Dataset four contains data collected with the Community Park Audit Tool (CPAT). [Note : title changed 9/4/2020 to reflect study name] Resources in this dataset:Resource Title: Dataset One RALA PPA Data Dictionary. File Name: RALA PPA Data Dictionary.csvResource Description: Data dictionary for dataset one collected using the RALA PPA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Two RALA TWA Data Dictionary. File Name: RALA TWA Data Dictionary.csvResource Description: Data dictionary for dataset two collected using the RALA TWA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Three RALA SSA Data Dictionary. File Name: RALA SSA Data Dictionary.csvResource Description: Data dictionary for dataset three collected using the RALA SSA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Four CPAT Data Dictionary. File Name: CPAT Data Dictionary.csvResource Description: Data dictionary for dataset four collected using the CPAT.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset One RALA PPA. File Name: RALA PPA Data.csvResource Description: Data collected using the RALA PPA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Two RALA TWA. File Name: RALA TWA Data.csvResource Description: Data collected using the RALA TWA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Three RALA SSA. File Name: RALA SSA Data.csvResource Description: Data collected using the RALA SSA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Four CPAT. File Name: CPAT Data.csvResource Description: Data collected using the CPAT.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Data Dictionary. File Name: DataDictionary_RALA_PPA_SSA_TWA_CPAT.csvResource Description: This is a combined data dictionary from each of the 4 dataset files in this set.
f
Example of a filtered Microsoft Excel spreadsheet for TaAMY2 single null...
datasetcatalog.nlm.nih.gov
Updated Sep 28, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mieog, Jos C.; Ral, Jean-Philippe F. (2016). Example of a filtered Microsoft Excel spreadsheet for TaAMY2 single null mutant detection (selected data). [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001527938
Explore at:
Dataset updated
Sep 28, 2016
Authors
Mieog, Jos C.; Ral, Jean-Philippe F.
Description
Example of a filtered Microsoft Excel spreadsheet for TaAMY2 single null mutant detection (selected data).
Microsoft Malware Sample
kaggle.com
Updated Dec 15, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dheemanth Bhat (2022). Microsoft Malware Sample [Dataset]. https://www.kaggle.com/datasets/dheemanthbhat/microsoft-malware-sample
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 15, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Dheemanth Bhat
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
Source

This dataset contains vectorized byte-files taken from the original dataset of Microsoft Malware Classification Challenge (BIG 2015) competition. Original dataset belongs to http://arxiv.org/abs/1802.10135.
Original Train and Test dataset are ~18GB each. This random sample extracted and vectorized is just ~15MB is size.

How the dataset is sampled?

Randomly equal number of malware byte-files from each class (except Simda) are selcted.

Byte data in hexadecimal characters are then subjected to preprocessing.

Finally preprocessed hex strings are then vectorized using scikit-learn CountVectorizer.

Note: Original dataset contains only 42 byte-files for malware class 5 (Simda).

https://i.imgur.com/CFWhzYr.png" alt="balanced-dataset-pie-chart">
Sorting/selecting data in Excel with VLOOKUP()
figshare.com
xlsx
Updated Jan 18, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anneke Batenburg (2016). Sorting/selecting data in Excel with VLOOKUP() [Dataset]. http://doi.org/10.6084/m9.figshare.964802.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.964802.v1
Dataset updated
Jan 18, 2016
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Anneke Batenburg
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Example of how I use MS Excel's VLOOKUP() function to filter my data.
e
Cloud to Street - Microsoft Flood Dataset - Sentinel-1
collections.eurodatacube.com
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sentinel Hub, Cloud to Street - Microsoft Flood Dataset - Sentinel-1 [Dataset]. https://collections.eurodatacube.com/microsoft-floods-s1/
Explore at:
Dataset provided by
<a href="https://www.sentinel-hub.com/">Sentinel Hub</a>
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Cloud to Street - Microsoft Flood Dataset (C2S-MS Floods) is a dataset of near-coincident Sentinel-1 and Sentinel-2 data paired with water labels from 18 global flood events. These labels are derived products of MODIS sensor on board NASA's Aqua and Terra satellites produced as a part of the study, "Satellite imaging reveals increased proportion of population exposed to floods," Nature (2021), doi: 10.1038/s41586-021-03695-w. In this collection, we keep the water label which represents the maximum observed flood extent during the time period of the event. For a detailed description of the methods used to generate these labels, please refer to the original paper.
bing_coronavirus_query_set
huggingface.co
Updated Mar 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Microsoft (2025). bing_coronavirus_query_set [Dataset]. https://huggingface.co/datasets/microsoft/bing_coronavirus_query_set
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 26, 2025
Dataset authored and provided by
Microsofthttp://microsoft.com/
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
Dataset Card for BingCoronavirusQuerySet

Dataset Summary

Please note that you can specify the start and end date of the data. You can get start and end dates from here: https://github.com/microsoft/BingCoronavirusQuerySet/tree/master/data/2020 example: load_dataset("bing_coronavirus_query_set", queries_by="state", start_date="2020-09-01", end_date="2020-09-30")

You can also load the data by country by using queries_by="country".

Supported Tasks and… See the full description on the dataset page: https://huggingface.co/datasets/microsoft/bing_coronavirus_query_set.
AdventureWorks Sample Mfg Database Tables
kaggle.com
zip
Updated Feb 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael Brown (2023). AdventureWorks Sample Mfg Database Tables [Dataset]. https://www.kaggle.com/datasets/universalanalyst/adventureworks-sample-mfg-database-tables
Explore at:
zip(3689556 bytes)Available download formats
Dataset updated
Feb 24, 2023
Authors
Michael Brown
Description
In order to practice writing SQL queries in a semi-realistic database, I discovered and imported Microsoft's AdventureWorks sample database into Microsoft SQL Server Express. The Adventure Works [fictious] company represents a bicycle manufacturer that sells bicycles and accessories to global markets. Queries were written for developing and testing a Tableau dashboard.

The dataset presented here represents a fraction of the entire manufacturing relational database. Tables within the dataset include product, purchasing, work order, and transaction data.

The full database sample can be found on Microsoft SQL Docs website: https://learn.microsoft.com/en-us/sql/samples/ and additionally on Github: https://github.com/microsoft/sql-server-samples
FStarDataSet-V2
huggingface.co
Updated Sep 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Microsoft (2024). FStarDataSet-V2 [Dataset]. https://huggingface.co/datasets/microsoft/FStarDataSet-V2
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 4, 2024
Dataset authored and provided by
Microsofthttp://microsoft.com/
License
https://choosealicense.com/licenses/cdla-permissive-2.0/https://choosealicense.com/licenses/cdla-permissive-2.0/
Description
This dataset is the Version 2.0 of microsoft/FStarDataSet.

Primary-Objective

This dataset's primary objective is to train and evaluate Proof-oriented Programming with AI (PoPAI, in short). Given a specification of a program and proof in F*, the objective of a AI model is to synthesize the implemantation (see below for details about the usage of this dataset, including the input and output).

Data Format

Each of the examples in this dataset are organized as dictionaries… See the full description on the dataset page: https://huggingface.co/datasets/microsoft/FStarDataSet-V2.
Z
Sample Dataset - HR Subject Areas
data.niaid.nih.gov
Updated Jan 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Weber, Marc (2023). Sample Dataset - HR Subject Areas [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7447111
Explore at:
Dataset updated
Jan 18, 2023
Authors
Weber, Marc
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset created as part of the Master Thesis "Business Intelligence – Automation of Data Marts modeling and its data processing".

Lucerne University of Applied Sciences and Arts

Master of Science in Applied Information and Data Science (MScIDS)

Autumn Semester 2022

Change log Version 1.1:

The following SQL scripts were added:

Index Type Name 1 View pg.dictionary_table 2 View pg.dictionary_column 3 View pg.dictionary_relation 4 View pg.accesslayer_table 5 View pg.accesslayer_column 6 View pg.accesslayer_relation 7 View pg.accesslayer_fact_candidate 8 Stored Procedure pg.get_fact_candidate 9 Stored Procedure pg.get_dimension_candidate 10 Stored Procedure pg.get_columns

Scripts are based on Microsoft SQL Server Version 2017 and compatible with a data warehouse built with Datavault Builder. Data warehouse objects scripts of the sample data warehouse are restricted and cannot be shared.
Microsoft Azure Predictive Maintenance
kaggle.com
zip
Updated Oct 15, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
arnab (2020). Microsoft Azure Predictive Maintenance [Dataset]. https://www.kaggle.com/arnabbiswas1/microsoft-azure-predictive-maintenance
Explore at:
zip(32497141 bytes)Available download formats
Dataset updated
Oct 15, 2020
Authors
arnab
Description
Context

This an example data source which can be used for Predictive Maintenance Model Building. It consists of the following data:

Machine conditions and usage: The operating conditions of a machine e.g. data collected from sensors.

Failure history: The failure history of a machine or component within the machine.

Maintenance history: The repair history of a machine, e.g. error codes, previous maintenance activities or component replacements.

Machine features: The features of a machine, e.g. engine size, make and model, location.

Details

Telemetry Time Series Data (PdM_telemetry.csv): It consists of hourly average of voltage, rotation, pressure, vibration collected from 100 machines for the year 2015.

Error (PdM_errors.csv): These are errors encountered by the machines while in operating condition. Since, these errors don't shut down the machines, these are not considered as failures. The error date and times are rounded to the closest hour since the telemetry data is collected at an hourly rate.

Maintenance (PdM_maint.csv): If a component of a machine is replaced, that is captured as a record in this table. Components are replaced under two situations: 1. During the regular scheduled visit, the technician replaced it (Proactive Maintenance) 2. A component breaks down and then the technician does an unscheduled maintenance to replace the component (Reactive Maintenance). This is considered as a failure and corresponding data is captured under Failures. Maintenance data has both 2014 and 2015 records. This data is rounded to the closest hour since the telemetry data is collected at an hourly rate.

Failures (PdM_failures.csv): Each record represents replacement of a component due to failure. This data is a subset of Maintenance data. This data is rounded to the closest hour since the telemetry data is collected at an hourly rate.

Metadata of Machines (PdM_Machines.csv): Model type & age of the Machines.

Acknowledgements

This dataset was available as a part of Azure AI Notebooks for Predictive Maintenance. But as of 15th Oct, 2020 the notebook (link) is no longer available. However, the data can still be downloaded using the following URLs:

https://azuremlsampleexperiments.blob.core.windows.net/datasets/PdM_telemetry.csv https://azuremlsampleexperiments.blob.core.windows.net/datasets/PdM_errors.csv https://azuremlsampleexperiments.blob.core.windows.net/datasets/PdM_maint.csv https://azuremlsampleexperiments.blob.core.windows.net/datasets/PdM_failures.csv https://azuremlsampleexperiments.blob.core.windows.net/datasets/PdM_machines.csv

Inspiration

Try to use this data to build Machine Learning models related to Predictive Maintenance.
Data from: Delta Produce Sources Study
catalog.data.gov
agdatacommons.nal.usda.gov
Updated Apr 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Delta Produce Sources Study [Dataset]. https://catalog.data.gov/dataset/delta-produce-sources-study-51a7a
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description
The Delta Produce Sources Study was an observational study designed to measure and compare food environments of farmers markets (n=3) and grocery stores (n=12) in 5 rural towns located in the Lower Mississippi Delta region of Mississippi. Data were collected via electronic surveys from June 2019 to March 2020 using a modified version of the Nutrition Environment Measures Survey (NEMS) Farmers Market Audit tool. The tool was modified to collect information pertaining to source of fresh produce and also for use with both farmers markets and grocery stores. Availability, source, quality, and price information were collected and compared between farmers markets and grocery stores for 13 fresh fruits and 32 fresh vegetables via SAS software programming. Because the towns were not randomly selected and the sample sizes are relatively small, the data may not be generalizable to all rural towns in the Lower Mississippi Delta region of Mississippi. Resources in this dataset:Resource Title: Delta Produce Sources Study dataset . File Name: DPS Data Public.csvResource Description: The dataset contains variables corresponding to availability, source (country, state and town if country is the United States), quality, and price (by weight or volume) of 13 fresh fruits and 32 fresh vegetables sold in farmers markets and grocery stores located in 5 Lower Mississippi Delta towns.Resource Software Recommended: Microsoft Excel,url: https://www.microsoft.com/en-us/microsoft-365/excel Resource Title: Delta Produce Sources Study data dictionary. File Name: DPS Data Dictionary Public.csvResource Description: This file is the data dictionary corresponding to the Delta Produce Sources Study dataset.Resource Software Recommended: Microsoft Excel,url: https://www.microsoft.com/en-us/microsoft-365/excel
t
T-Drive Trajectory Data Sample - Dataset - LDM
service.tib.eu
resodate.org
Updated Dec 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). T-Drive Trajectory Data Sample - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/t-drive-trajectory-data-sample
Explore at:
Dataset updated
Dec 2, 2024
Description
T-Drive trajectory data sample. https://www.microsoft.com/en-us/research/publication/t-drive-trajectory-data-sample/, 2011.
d
GP Practice Prescribing Presentation-level Data - July 2014
digital.nhs.uk
csv, zip
Updated Oct 31, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2014). GP Practice Prescribing Presentation-level Data - July 2014 [Dataset]. https://digital.nhs.uk/data-and-information/publications/statistical/practice-level-prescribing-data
Explore at:
csv(1.4 GB), zip(257.7 MB), csv(1.7 MB), csv(275.8 kB)Available download formats
Dataset updated
Oct 31, 2014
License
https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions
Time period covered
Jul 1, 2014 - Jul 31, 2014
Area covered
United Kingdom
Description
Warning: Large file size (over 1GB). Each monthly data set is large (over 4 million rows), but can be viewed in standard software such as Microsoft WordPad (save by right-clicking on the file name and selecting 'Save Target As', or equivalent on Mac OSX). It is then possible to select the required rows of data and copy and paste the information into another software application, such as a spreadsheet. Alternatively, add-ons to existing software, such as the Microsoft PowerPivot add-on for Excel, to handle larger data sets, can be used. The Microsoft PowerPivot add-on for Excel is available from Microsoft http://office.microsoft.com/en-gb/excel/download-power-pivot-HA101959985.aspx Once PowerPivot has been installed, to load the large files, please follow the instructions below. Note that it may take at least 20 to 30 minutes to load one monthly file. 1. Start Excel as normal 2. Click on the PowerPivot tab 3. Click on the PowerPivot Window icon (top left) 4. In the PowerPivot Window, click on the "From Other Sources" icon 5. In the Table Import Wizard e.g. scroll to the bottom and select Text File 6. Browse to the file you want to open and choose the file extension you require e.g. CSV Once the data has been imported you can view it in a spreadsheet. What does the data cover? General practice prescribing data is a list of all medicines, dressings and appliances that are prescribed and dispensed each month. A record will only be produced when this has occurred and there is no record for a zero total. For each practice in England, the following information is presented at presentation level for each medicine, dressing and appliance, (by presentation name): - the total number of items prescribed and dispensed - the total net ingredient cost - the total actual cost - the total quantity The data covers NHS prescriptions written in England and dispensed in the community in the UK. Prescriptions written in England but dispensed outside England are included. The data includes prescriptions written by GPs and other non-medical prescribers (such as nurses and pharmacists) who are attached to GP practices. GP practices are identified only by their national code, so an additional data file - linked to the first by the practice code - provides further detail in relation to the practice. Presentations are identified only by their BNF code, so an additional data file - linked to the first by the BNF code - provides the chemical name for that presentation.
m
Dataset of development of business during the COVID-19 crisis
data.mendeley.com
narcis.nl
Updated Nov 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tatiana N. Litvinova (2020). Dataset of development of business during the COVID-19 crisis [Dataset]. http://doi.org/10.17632/9vvrd34f8t.1
Explore at:
Unique identifier
https://doi.org/10.17632/9vvrd34f8t.1
Dataset updated
Nov 9, 2020
Authors
Tatiana N. Litvinova
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
To create the dataset, the top 10 countries leading in the incidence of COVID-19 in the world were selected as of October 22, 2020 (on the eve of the second full of pandemics), which are presented in the Global 500 ranking for 2020: USA, India, Brazil, Russia, Spain, France and Mexico. For each of these countries, no more than 10 of the largest transnational corporations included in the Global 500 rating for 2020 and 2019 were selected separately. The arithmetic averages were calculated and the change (increase) in indicators such as profitability and profitability of enterprises, their ranking position (competitiveness), asset value and number of employees. The arithmetic mean values of these indicators for all countries of the sample were found, characterizing the situation in international entrepreneurship as a whole in the context of the COVID-19 crisis in 2020 on the eve of the second wave of the pandemic. The data is collected in a general Microsoft Excel table. Dataset is a unique database that combines COVID-19 statistics and entrepreneurship statistics. The dataset is flexible data that can be supplemented with data from other countries and newer statistics on the COVID-19 pandemic. Due to the fact that the data in the dataset are not ready-made numbers, but formulas, when adding and / or changing the values in the original table at the beginning of the dataset, most of the subsequent tables will be automatically recalculated and the graphs will be updated. This allows the dataset to be used not just as an array of data, but as an analytical tool for automating scientific research on the impact of the COVID-19 pandemic and crisis on international entrepreneurship. The dataset includes not only tabular data, but also charts that provide data visualization. The dataset contains not only actual, but also forecast data on morbidity and mortality from COVID-19 for the period of the second wave of the pandemic in 2020. The forecasts are presented in the form of a normal distribution of predicted values and the probability of their occurrence in practice. This allows for a broad scenario analysis of the impact of the COVID-19 pandemic and crisis on international entrepreneurship, substituting various predicted morbidity and mortality rates in risk assessment tables and obtaining automatically calculated consequences (changes) on the characteristics of international entrepreneurship. It is also possible to substitute the actual values identified in the process and following the results of the second wave of the pandemic to check the reliability of pre-made forecasts and conduct a plan-fact analysis. The dataset contains not only the numerical values of the initial and predicted values of the set of studied indicators, but also their qualitative interpretation, reflecting the presence and level of risks of a pandemic and COVID-19 crisis for international entrepreneurship.
S
Data from: Microsoft Concept Graph: Mining Semantic Concepts for Short Text...
scidb.cn
Updated Oct 16, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lei Ji; Yujing Wang; Botian Shi; Dawei Zhang; Zhongyuan Wang; Jun Yan (2020). Microsoft Concept Graph: Mining Semantic Concepts for Short Text Understanding [Dataset]. http://doi.org/10.11922/sciencedb.j00104.00047
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.11922/sciencedb.j00104.00047
Dataset updated
Oct 16, 2020
Dataset provided by
Science Data Bank
Authors
Lei Ji; Yujing Wang; Botian Shi; Dawei Zhang; Zhongyuan Wang; Jun Yan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Four tables and 23 figures of this paper. Table 1 shows the concept space comparison of existing taxonomies. Table 2 presents Hearst pattern examples. Table 3 shows labeling guideline for conceptualization. Table 4 presents precision of short text understanding. Figure 1 shows the framework overviews. Figure 2 is local taxonomy construction. Figure 3 shows horizontal merging. Figure 4 shows vertical merging: single sense alignment. Figure 5 shows vertical merging: multiple sense alignment. Figure 6 is a subgraph of heterogeneous semantic network around watch. Figure 7 is the compression procedure of typed-term co-occurrence network. Figure 8 presents an example of short text understanding. Figure 9 present examples of Chain model and Pairwise model. Figure 10 is a snapshot of the Probase browser. Figure 11 is a snapshot of single instance conceptualization.Figure 12 is a snapshot of context-aware single instance conceptualization. Figure 13 shows an example of short text conceptualization. Figure 14 is the framework of topic search. Figure 15 is a snapshot of the Web tables. Figure 16 shows query recommendation snapshot. Figure 17 shows the correlation of CTR with ads relevance score. Figure 18 presents the distribution of concepts in Microsoft Concept Graph. Figure 19 shows concept coverage of different taxonomies. Figure 20 shows precision of extracted isA pairs on 40 concepts.Figure 21 is precision of isA pairs after each iteration. Figure 22 shows the number of discovered concepts and isA pairs after each iteration. Figure 23 shows precision and nDCG comparison.
Escape Excel: A tool for preventing gene symbol and accession conversion...
plos.figshare.com
xlsx
Updated Jun 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eric A. Welsh; Paul A. Stewart; Brent M. Kuenzi; James A. Eschrich (2023). Escape Excel: A tool for preventing gene symbol and accession conversion errors [Dataset]. http://doi.org/10.1371/journal.pone.0185207
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0185207
Dataset updated
Jun 5, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Eric A. Welsh; Paul A. Stewart; Brent M. Kuenzi; James A. Eschrich
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundMicrosoft Excel automatically converts certain gene symbols, database accessions, and other alphanumeric text into dates, scientific notation, and other numerical representations. These conversions lead to subsequent, irreversible, corruption of the imported text. A recent survey of popular genomic literature estimates that one-fifth of all papers with supplementary gene lists suffer from this issue.ResultsHere, we present an open-source tool, Escape Excel, which prevents these erroneous conversions by generating an escaped text file that can be safely imported into Excel. Escape Excel is implemented in a variety of formats (http://www.github.com/pstew/escape_excel), including a command line based Perl script, a Windows-only Excel Add-In, an OS X drag-and-drop application, a simple web-server, and as a Galaxy web environment interface. Test server implementations are accessible as a Galaxy interface (http://apostl.moffitt.org) and simple non-Galaxy web server (http://apostl.moffitt.org:8000/).ConclusionsEscape Excel detects and escapes a wide variety of problematic text strings so that they are not erroneously converted into other representations upon importation into Excel. Examples of problematic strings include date-like strings, time-like strings, leading zeroes in front of numbers, and long numeric and alphanumeric identifiers that should not be automatically converted into scientific notation. It is hoped that greater awareness of these potential data corruption issues, together with diligent escaping of text files prior to importation into Excel, will help to reduce the amount of Excel-corrupted data in scientific analyses and publications.
Data from: Delta Food Outlets Study
catalog.data.gov
agdatacommons.nal.usda.gov
Updated May 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Delta Food Outlets Study [Dataset]. https://catalog.data.gov/dataset/delta-food-outlets-study-2786d
Explore at:
Dataset updated
May 8, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description
The Delta Food Outlets Study was an observational study designed to assess the nutritional environments of 5 towns located in the Lower Mississippi Delta region of Mississippi. It was an ancillary study to the Delta Healthy Sprouts Project and therefore included towns in which Delta Healthy Sprouts participants resided and that contained at least one convenience (corner) store, grocery store, or gas station. Data were collected via electronic surveys between March 2016 and September 2018 using the Nutrition Environment Measures Survey (NEMS) tools. Survey scores for the NEMS Corner Store, NEMS Grocery Store, and NEMS Restaurant were computed using modified scoring algorithms provided for these tools via SAS software programming. Because the towns were not randomly selected and the sample sizes are relatively small, the data may not be generalizable to all rural towns in the Lower Mississippi Delta region of Mississippi. Dataset one (NEMS-C) contains data collected with the NEMS Corner (convenience) Store tool. Dataset two (NEMS-G) contains data collected with the NEMS Grocery Store tool. Dataset three (NEMS-R) contains data collected with the NEMS Restaurant tool. Resources in this dataset:Resource Title: Delta Food Outlets Data Dictionary. File Name: DFO_DataDictionary_Public.csvResource Description: This file contains the data dictionary for all 3 datasets that are part of the Delta Food Outlets Study.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset One NEMS-C. File Name: NEMS-C Data.csvResource Description: This file contains data collected with the Nutrition Environment Measures Survey (NEMS) tool for convenience stores.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Two NEMS-G. File Name: NEMS-G Data.csvResource Description: This file contains data collected with the Nutrition Environment Measures Survey (NEMS) tool for grocery stores.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Three NEMS-R. File Name: NEMS-R Data.csvResource Description: This file contains data collected with the Nutrition Environment Measures Survey (NEMS) tool for restaurants.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel
B
Data Cleaning Sample
borealisdata.ca
dataone.org
Updated Jul 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP3/ZCN177
Dataset updated
Jul 13, 2023
Dataset provided by
Borealis
Authors
Rong Luo
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Sample data for exercises in Further Adventures in Data Cleaning.

Facebook

Twitter

Click to copy link

Link copied

Cite

Microsoft (2025). bleeding-edge-gameplay-sample [Dataset]. https://huggingface.co/datasets/microsoft/bleeding-edge-gameplay-sample

bleeding-edge-gameplay-sample

microsoft/bleeding-edge-gameplay-sample

Explore at:

Dataset updated

Feb 21, 2025

Dataset authored and provided by

Microsofthttp://microsoft.com/

Description

This dataset contains 1024 60 second video clips of Bleeding Edge gameplay (75GB). The data has already been processed into the following format: 300x180 videos sampled at 10 fps.

  Dataset Structure





  Data Files

testing_dataset_part1.zip & testing_dataset_part2.zip – Contains all 1024 60 second trajectories used for our evaluation.

4 examples from the dataset:

FB[…].npz – .npz file (described below) FB[…].mp4 – 60 seconds .mp4 video of the images from the .npz file.… See the full description on the dataset page: https://huggingface.co/datasets/microsoft/bleeding-edge-gameplay-sample.

Clear search

Close search

Google apps

Main menu

bleeding-edge-gameplay-sample

Adventure Works 2022 CSVs

Adventure Works 2022 dataset

How this Dataset is created?

How this Dataset may help you?

How to use this Dataset?

Data from: Delta Neighborhood Physical Activity Study

Example of a filtered Microsoft Excel spreadsheet for TaAMY2 single null...

Microsoft Malware Sample

Source

How the dataset is sampled?

Sorting/selecting data in Excel with VLOOKUP()

Cloud to Street - Microsoft Flood Dataset - Sentinel-1

bing_coronavirus_query_set

AdventureWorks Sample Mfg Database Tables

FStarDataSet-V2

Sample Dataset - HR Subject Areas

Microsoft Azure Predictive Maintenance

Context

Details

Acknowledgements

Inspiration

Data from: Delta Produce Sources Study

T-Drive Trajectory Data Sample - Dataset - LDM

GP Practice Prescribing Presentation-level Data - July 2014

Dataset of development of business during the COVID-19 crisis

Data from: Microsoft Concept Graph: Mining Semantic Concepts for Short Text...

Escape Excel: A tool for preventing gene symbol and accession conversion...

Data from: Delta Food Outlets Study

Data Cleaning Sample

bleeding-edge-gameplay-sample

microsoft/bleeding-edge-gameplay-sample