Facebook
TwitterThis dataset contains 1024 60 second video clips of Bleeding Edge gameplay (75GB). The data has already been processed into the following format: 300x180 videos sampled at 10 fps.
Dataset Structure
Data Files
testing_dataset_part1.zip & testing_dataset_part2.zip – Contains all 1024 60 second trajectories used for our evaluation.
4 examples from the dataset:
FB[…].npz – .npz file (described below) FB[…].mp4 – 60 seconds .mp4 video of the images from the .npz file.… See the full description on the dataset page: https://huggingface.co/datasets/microsoft/bleeding-edge-gameplay-sample.
Facebook
Twitterhttp://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
On the official website the dataset is available over SQL server (localhost) and CSVs to be used via Power BI Desktop running on Virtual Lab (Virtaul Machine). As per first two steps of Importing data are executed in the virtual lab and then resultant Power BI tables are copied in CSVs. Added records till year 2022 as required.
this dataset will be helpful in case you want to work offline with Adventure Works data in Power BI desktop in order to carry lab instructions as per training material on official website. The dataset is useful in case you want to work on Power BI desktop Sales Analysis example from Microsoft website PL 300 learning.
Download the CSV file(s) and import in Power BI desktop as tables. The CSVs are named as tables created after first two steps of importing data as mentioned in the PL-300 Microsoft Power BI Data Analyst exam lab.
Facebook
TwitterThe Delta Neighborhood Physical Activity Study was an observational study designed to assess characteristics of neighborhood built environments associated with physical activity. It was an ancillary study to the Delta Healthy Sprouts Project and therefore included towns and neighborhoods in which Delta Healthy Sprouts participants resided. The 12 towns were located in the Lower Mississippi Delta region of Mississippi. Data were collected via electronic surveys between August 2016 and September 2017 using the Rural Active Living Assessment (RALA) tools and the Community Park Audit Tool (CPAT). Scale scores for the RALA Programs and Policies Assessment and the Town-Wide Assessment were computed using the scoring algorithms provided for these tools via SAS software programming. The Street Segment Assessment and CPAT do not have associated scoring algorithms and therefore no scores are provided for them. Because the towns were not randomly selected and the sample size is small, the data may not be generalizable to all rural towns in the Lower Mississippi Delta region of Mississippi. Dataset one contains data collected with the RALA Programs and Policies Assessment (PPA) tool. Dataset two contains data collected with the RALA Town-Wide Assessment (TWA) tool. Dataset three contains data collected with the RALA Street Segment Assessment (SSA) tool. Dataset four contains data collected with the Community Park Audit Tool (CPAT). [Note : title changed 9/4/2020 to reflect study name] Resources in this dataset:Resource Title: Dataset One RALA PPA Data Dictionary. File Name: RALA PPA Data Dictionary.csvResource Description: Data dictionary for dataset one collected using the RALA PPA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Two RALA TWA Data Dictionary. File Name: RALA TWA Data Dictionary.csvResource Description: Data dictionary for dataset two collected using the RALA TWA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Three RALA SSA Data Dictionary. File Name: RALA SSA Data Dictionary.csvResource Description: Data dictionary for dataset three collected using the RALA SSA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Four CPAT Data Dictionary. File Name: CPAT Data Dictionary.csvResource Description: Data dictionary for dataset four collected using the CPAT.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset One RALA PPA. File Name: RALA PPA Data.csvResource Description: Data collected using the RALA PPA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Two RALA TWA. File Name: RALA TWA Data.csvResource Description: Data collected using the RALA TWA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Three RALA SSA. File Name: RALA SSA Data.csvResource Description: Data collected using the RALA SSA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Four CPAT. File Name: CPAT Data.csvResource Description: Data collected using the CPAT.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Data Dictionary. File Name: DataDictionary_RALA_PPA_SSA_TWA_CPAT.csvResource Description: This is a combined data dictionary from each of the 4 dataset files in this set.
Facebook
TwitterExample of a filtered Microsoft Excel spreadsheet for TaAMY2 single null mutant detection (selected data).
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
This dataset contains vectorized byte-files taken from the original dataset of Microsoft Malware Classification Challenge (BIG 2015) competition. Original dataset belongs to http://arxiv.org/abs/1802.10135.
Original Train and Test dataset are ~18GB each. This random sample extracted and vectorized is just ~15MB is size.
CountVectorizer.Note: Original dataset contains only 42 byte-files for malware class 5 (Simda).
https://i.imgur.com/CFWhzYr.png" alt="balanced-dataset-pie-chart">
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Example of how I use MS Excel's VLOOKUP() function to filter my data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Cloud to Street - Microsoft Flood Dataset (C2S-MS Floods) is a dataset of near-coincident Sentinel-1 and Sentinel-2 data paired with water labels from 18 global flood events. These labels are derived products of MODIS sensor on board NASA's Aqua and Terra satellites produced as a part of the study, "Satellite imaging reveals increased proportion of population exposed to floods," Nature (2021), doi: 10.1038/s41586-021-03695-w. In this collection, we keep the water label which represents the maximum observed flood extent during the time period of the event. For a detailed description of the methods used to generate these labels, please refer to the original paper.
Facebook
Twitterhttps://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Dataset Card for BingCoronavirusQuerySet
Dataset Summary
Please note that you can specify the start and end date of the data. You can get start and end dates from here: https://github.com/microsoft/BingCoronavirusQuerySet/tree/master/data/2020 example: load_dataset("bing_coronavirus_query_set", queries_by="state", start_date="2020-09-01", end_date="2020-09-30")
You can also load the data by country by using queries_by="country".
Supported Tasks and… See the full description on the dataset page: https://huggingface.co/datasets/microsoft/bing_coronavirus_query_set.
Facebook
TwitterIn order to practice writing SQL queries in a semi-realistic database, I discovered and imported Microsoft's AdventureWorks sample database into Microsoft SQL Server Express. The Adventure Works [fictious] company represents a bicycle manufacturer that sells bicycles and accessories to global markets. Queries were written for developing and testing a Tableau dashboard.
The dataset presented here represents a fraction of the entire manufacturing relational database. Tables within the dataset include product, purchasing, work order, and transaction data.
The full database sample can be found on Microsoft SQL Docs website: https://learn.microsoft.com/en-us/sql/samples/ and additionally on Github: https://github.com/microsoft/sql-server-samples
Facebook
Twitterhttps://choosealicense.com/licenses/cdla-permissive-2.0/https://choosealicense.com/licenses/cdla-permissive-2.0/
This dataset is the Version 2.0 of microsoft/FStarDataSet.
Primary-Objective
This dataset's primary objective is to train and evaluate Proof-oriented Programming with AI (PoPAI, in short). Given a specification of a program and proof in F*, the objective of a AI model is to synthesize the implemantation (see below for details about the usage of this dataset, including the input and output).
Data Format
Each of the examples in this dataset are organized as dictionaries… See the full description on the dataset page: https://huggingface.co/datasets/microsoft/FStarDataSet-V2.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset created as part of the Master Thesis "Business Intelligence – Automation of Data Marts modeling and its data processing".
Lucerne University of Applied Sciences and Arts
Master of Science in Applied Information and Data Science (MScIDS)
Autumn Semester 2022
Change log Version 1.1:
The following SQL scripts were added:
Index
Type
Name
1
View
pg.dictionary_table
2
View
pg.dictionary_column
3
View
pg.dictionary_relation
4
View
pg.accesslayer_table
5
View
pg.accesslayer_column
6
View
pg.accesslayer_relation
7
View
pg.accesslayer_fact_candidate
8
Stored Procedure
pg.get_fact_candidate
9
Stored Procedure
pg.get_dimension_candidate
10
Stored Procedure
pg.get_columns
Scripts are based on Microsoft SQL Server Version 2017 and compatible with a data warehouse built with Datavault Builder. Data warehouse objects scripts of the sample data warehouse are restricted and cannot be shared.
Facebook
TwitterThis an example data source which can be used for Predictive Maintenance Model Building. It consists of the following data:
Telemetry Time Series Data (PdM_telemetry.csv): It consists of hourly average of voltage, rotation, pressure, vibration collected from 100 machines for the year 2015.
Error (PdM_errors.csv): These are errors encountered by the machines while in operating condition. Since, these errors don't shut down the machines, these are not considered as failures. The error date and times are rounded to the closest hour since the telemetry data is collected at an hourly rate.
Maintenance (PdM_maint.csv): If a component of a machine is replaced, that is captured as a record in this table. Components are replaced under two situations: 1. During the regular scheduled visit, the technician replaced it (Proactive Maintenance) 2. A component breaks down and then the technician does an unscheduled maintenance to replace the component (Reactive Maintenance). This is considered as a failure and corresponding data is captured under Failures. Maintenance data has both 2014 and 2015 records. This data is rounded to the closest hour since the telemetry data is collected at an hourly rate.
Failures (PdM_failures.csv): Each record represents replacement of a component due to failure. This data is a subset of Maintenance data. This data is rounded to the closest hour since the telemetry data is collected at an hourly rate.
Metadata of Machines (PdM_Machines.csv): Model type & age of the Machines.
This dataset was available as a part of Azure AI Notebooks for Predictive Maintenance. But as of 15th Oct, 2020 the notebook (link) is no longer available. However, the data can still be downloaded using the following URLs:
https://azuremlsampleexperiments.blob.core.windows.net/datasets/PdM_telemetry.csv https://azuremlsampleexperiments.blob.core.windows.net/datasets/PdM_errors.csv https://azuremlsampleexperiments.blob.core.windows.net/datasets/PdM_maint.csv https://azuremlsampleexperiments.blob.core.windows.net/datasets/PdM_failures.csv https://azuremlsampleexperiments.blob.core.windows.net/datasets/PdM_machines.csv
Try to use this data to build Machine Learning models related to Predictive Maintenance.
Facebook
TwitterThe Delta Produce Sources Study was an observational study designed to measure and compare food environments of farmers markets (n=3) and grocery stores (n=12) in 5 rural towns located in the Lower Mississippi Delta region of Mississippi. Data were collected via electronic surveys from June 2019 to March 2020 using a modified version of the Nutrition Environment Measures Survey (NEMS) Farmers Market Audit tool. The tool was modified to collect information pertaining to source of fresh produce and also for use with both farmers markets and grocery stores. Availability, source, quality, and price information were collected and compared between farmers markets and grocery stores for 13 fresh fruits and 32 fresh vegetables via SAS software programming. Because the towns were not randomly selected and the sample sizes are relatively small, the data may not be generalizable to all rural towns in the Lower Mississippi Delta region of Mississippi. Resources in this dataset:Resource Title: Delta Produce Sources Study dataset . File Name: DPS Data Public.csvResource Description: The dataset contains variables corresponding to availability, source (country, state and town if country is the United States), quality, and price (by weight or volume) of 13 fresh fruits and 32 fresh vegetables sold in farmers markets and grocery stores located in 5 Lower Mississippi Delta towns.Resource Software Recommended: Microsoft Excel,url: https://www.microsoft.com/en-us/microsoft-365/excel Resource Title: Delta Produce Sources Study data dictionary. File Name: DPS Data Dictionary Public.csvResource Description: This file is the data dictionary corresponding to the Delta Produce Sources Study dataset.Resource Software Recommended: Microsoft Excel,url: https://www.microsoft.com/en-us/microsoft-365/excel
Facebook
TwitterT-Drive trajectory data sample. https://www.microsoft.com/en-us/research/publication/t-drive-trajectory-data-sample/, 2011.
Facebook
Twitterhttps://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions
Warning: Large file size (over 1GB). Each monthly data set is large (over 4 million rows), but can be viewed in standard software such as Microsoft WordPad (save by right-clicking on the file name and selecting 'Save Target As', or equivalent on Mac OSX). It is then possible to select the required rows of data and copy and paste the information into another software application, such as a spreadsheet. Alternatively, add-ons to existing software, such as the Microsoft PowerPivot add-on for Excel, to handle larger data sets, can be used. The Microsoft PowerPivot add-on for Excel is available from Microsoft http://office.microsoft.com/en-gb/excel/download-power-pivot-HA101959985.aspx Once PowerPivot has been installed, to load the large files, please follow the instructions below. Note that it may take at least 20 to 30 minutes to load one monthly file. 1. Start Excel as normal 2. Click on the PowerPivot tab 3. Click on the PowerPivot Window icon (top left) 4. In the PowerPivot Window, click on the "From Other Sources" icon 5. In the Table Import Wizard e.g. scroll to the bottom and select Text File 6. Browse to the file you want to open and choose the file extension you require e.g. CSV Once the data has been imported you can view it in a spreadsheet. What does the data cover? General practice prescribing data is a list of all medicines, dressings and appliances that are prescribed and dispensed each month. A record will only be produced when this has occurred and there is no record for a zero total. For each practice in England, the following information is presented at presentation level for each medicine, dressing and appliance, (by presentation name): - the total number of items prescribed and dispensed - the total net ingredient cost - the total actual cost - the total quantity The data covers NHS prescriptions written in England and dispensed in the community in the UK. Prescriptions written in England but dispensed outside England are included. The data includes prescriptions written by GPs and other non-medical prescribers (such as nurses and pharmacists) who are attached to GP practices. GP practices are identified only by their national code, so an additional data file - linked to the first by the practice code - provides further detail in relation to the practice. Presentations are identified only by their BNF code, so an additional data file - linked to the first by the BNF code - provides the chemical name for that presentation.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
To create the dataset, the top 10 countries leading in the incidence of COVID-19 in the world were selected as of October 22, 2020 (on the eve of the second full of pandemics), which are presented in the Global 500 ranking for 2020: USA, India, Brazil, Russia, Spain, France and Mexico. For each of these countries, no more than 10 of the largest transnational corporations included in the Global 500 rating for 2020 and 2019 were selected separately. The arithmetic averages were calculated and the change (increase) in indicators such as profitability and profitability of enterprises, their ranking position (competitiveness), asset value and number of employees. The arithmetic mean values of these indicators for all countries of the sample were found, characterizing the situation in international entrepreneurship as a whole in the context of the COVID-19 crisis in 2020 on the eve of the second wave of the pandemic. The data is collected in a general Microsoft Excel table. Dataset is a unique database that combines COVID-19 statistics and entrepreneurship statistics. The dataset is flexible data that can be supplemented with data from other countries and newer statistics on the COVID-19 pandemic. Due to the fact that the data in the dataset are not ready-made numbers, but formulas, when adding and / or changing the values in the original table at the beginning of the dataset, most of the subsequent tables will be automatically recalculated and the graphs will be updated. This allows the dataset to be used not just as an array of data, but as an analytical tool for automating scientific research on the impact of the COVID-19 pandemic and crisis on international entrepreneurship. The dataset includes not only tabular data, but also charts that provide data visualization. The dataset contains not only actual, but also forecast data on morbidity and mortality from COVID-19 for the period of the second wave of the pandemic in 2020. The forecasts are presented in the form of a normal distribution of predicted values and the probability of their occurrence in practice. This allows for a broad scenario analysis of the impact of the COVID-19 pandemic and crisis on international entrepreneurship, substituting various predicted morbidity and mortality rates in risk assessment tables and obtaining automatically calculated consequences (changes) on the characteristics of international entrepreneurship. It is also possible to substitute the actual values identified in the process and following the results of the second wave of the pandemic to check the reliability of pre-made forecasts and conduct a plan-fact analysis. The dataset contains not only the numerical values of the initial and predicted values of the set of studied indicators, but also their qualitative interpretation, reflecting the presence and level of risks of a pandemic and COVID-19 crisis for international entrepreneurship.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Four tables and 23 figures of this paper. Table 1 shows the concept space comparison of existing taxonomies. Table 2 presents Hearst pattern examples. Table 3 shows labeling guideline for conceptualization. Table 4 presents precision of short text understanding. Figure 1 shows the framework overviews. Figure 2 is local taxonomy construction. Figure 3 shows horizontal merging. Figure 4 shows vertical merging: single sense alignment. Figure 5 shows vertical merging: multiple sense alignment. Figure 6 is a subgraph of heterogeneous semantic network around watch. Figure 7 is the compression procedure of typed-term co-occurrence network. Figure 8 presents an example of short text understanding. Figure 9 present examples of Chain model and Pairwise model. Figure 10 is a snapshot of the Probase browser. Figure 11 is a snapshot of single instance conceptualization.Figure 12 is a snapshot of context-aware single instance conceptualization. Figure 13 shows an example of short text conceptualization. Figure 14 is the framework of topic search. Figure 15 is a snapshot of the Web tables. Figure 16 shows query recommendation snapshot. Figure 17 shows the correlation of CTR with ads relevance score. Figure 18 presents the distribution of concepts in Microsoft Concept Graph. Figure 19 shows concept coverage of different taxonomies. Figure 20 shows precision of extracted isA pairs on 40 concepts.Figure 21 is precision of isA pairs after each iteration. Figure 22 shows the number of discovered concepts and isA pairs after each iteration. Figure 23 shows precision and nDCG comparison.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundMicrosoft Excel automatically converts certain gene symbols, database accessions, and other alphanumeric text into dates, scientific notation, and other numerical representations. These conversions lead to subsequent, irreversible, corruption of the imported text. A recent survey of popular genomic literature estimates that one-fifth of all papers with supplementary gene lists suffer from this issue.ResultsHere, we present an open-source tool, Escape Excel, which prevents these erroneous conversions by generating an escaped text file that can be safely imported into Excel. Escape Excel is implemented in a variety of formats (http://www.github.com/pstew/escape_excel), including a command line based Perl script, a Windows-only Excel Add-In, an OS X drag-and-drop application, a simple web-server, and as a Galaxy web environment interface. Test server implementations are accessible as a Galaxy interface (http://apostl.moffitt.org) and simple non-Galaxy web server (http://apostl.moffitt.org:8000/).ConclusionsEscape Excel detects and escapes a wide variety of problematic text strings so that they are not erroneously converted into other representations upon importation into Excel. Examples of problematic strings include date-like strings, time-like strings, leading zeroes in front of numbers, and long numeric and alphanumeric identifiers that should not be automatically converted into scientific notation. It is hoped that greater awareness of these potential data corruption issues, together with diligent escaping of text files prior to importation into Excel, will help to reduce the amount of Excel-corrupted data in scientific analyses and publications.
Facebook
TwitterThe Delta Food Outlets Study was an observational study designed to assess the nutritional environments of 5 towns located in the Lower Mississippi Delta region of Mississippi. It was an ancillary study to the Delta Healthy Sprouts Project and therefore included towns in which Delta Healthy Sprouts participants resided and that contained at least one convenience (corner) store, grocery store, or gas station. Data were collected via electronic surveys between March 2016 and September 2018 using the Nutrition Environment Measures Survey (NEMS) tools. Survey scores for the NEMS Corner Store, NEMS Grocery Store, and NEMS Restaurant were computed using modified scoring algorithms provided for these tools via SAS software programming. Because the towns were not randomly selected and the sample sizes are relatively small, the data may not be generalizable to all rural towns in the Lower Mississippi Delta region of Mississippi. Dataset one (NEMS-C) contains data collected with the NEMS Corner (convenience) Store tool. Dataset two (NEMS-G) contains data collected with the NEMS Grocery Store tool. Dataset three (NEMS-R) contains data collected with the NEMS Restaurant tool. Resources in this dataset:Resource Title: Delta Food Outlets Data Dictionary. File Name: DFO_DataDictionary_Public.csvResource Description: This file contains the data dictionary for all 3 datasets that are part of the Delta Food Outlets Study.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset One NEMS-C. File Name: NEMS-C Data.csvResource Description: This file contains data collected with the Nutrition Environment Measures Survey (NEMS) tool for convenience stores.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Two NEMS-G. File Name: NEMS-G Data.csvResource Description: This file contains data collected with the Nutrition Environment Measures Survey (NEMS) tool for grocery stores.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Three NEMS-R. File Name: NEMS-R Data.csvResource Description: This file contains data collected with the Nutrition Environment Measures Survey (NEMS) tool for restaurants.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Sample data for exercises in Further Adventures in Data Cleaning.
Facebook
TwitterThis dataset contains 1024 60 second video clips of Bleeding Edge gameplay (75GB). The data has already been processed into the following format: 300x180 videos sampled at 10 fps.
Dataset Structure
Data Files
testing_dataset_part1.zip & testing_dataset_part2.zip – Contains all 1024 60 second trajectories used for our evaluation.
4 examples from the dataset:
FB[…].npz – .npz file (described below) FB[…].mp4 – 60 seconds .mp4 video of the images from the .npz file.… See the full description on the dataset page: https://huggingface.co/datasets/microsoft/bleeding-edge-gameplay-sample.