29 datasets found

w
Country, domain, employees, industry and sector of companies called...
workwithdata.com
Updated Feb 1, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2024). Country, domain, employees, industry and sector of companies called Microsoft [Dataset]. https://www.workwithdata.com/dataset?entity=companies&col=country,industry,domain,sector,company,employees&f=1&fcol0=company&fop0=includes&fval0=Microsoft
Explore at:
Dataset updated
Feb 1, 2024
Dataset authored and provided by
Work With Data
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
This dataset is about companies and is filtered where the company includes Microsoft. It has 6 columns such as company, country, domain, employees, and industry. The data is ordered by revenues (descending).
Microsoft Teams: number of daily active users 2019-2024
statista.com
flwrdeptvarieties.store
Updated Jan 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Microsoft Teams: number of daily active users 2019-2024 [Dataset]. https://www.statista.com/statistics/1033742/worldwide-microsoft-teams-daily-and-monthly-users/
Explore at:
Dataset updated
Jan 2, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
The number of daily active users of Microsoft Teams has stayed the same in the past year, around 320 million. Due to the impact of the coronavirus (COVID-19) outbreak and the growing practices of social distancing and working from home, Microsoft has seen dramatic increases in the daily use of their communication and collaboration platform within a short period of time. Microsoft Teams is part of Microsoft 365, a set of collaboration apps and services launched in July 2017. Increased data consumption from “staying at home”    The average daily in-home data usage in the United States has increased significantly during the coronavirus (COVID-19) outbreak in March 2020. Compared to the same amount of days in March 2019, the daily average in-home data usage increased by a total of 4.4 gigabytes in March 2020, a roughly 40 percent increase. Data consumption from the usage of gaming consoles and smartphones increased the most, although the increases can be observed across nearly all device categories. Social media platforms and video and conference all platforms are the technology services that are used the most during the outbreak in the U.S.
orca-math-word-problems-200k
huggingface.co
Updated Mar 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Microsoft (2024). orca-math-word-problems-200k [Dataset]. https://huggingface.co/datasets/microsoft/orca-math-word-problems-200k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 4, 2024
Dataset authored and provided by
Microsofthttp://microsoft.com/
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card

This dataset contains ~200K grade school math word problems. All the answers in this dataset is generated using Azure GPT4-Turbo. Please refer to Orca-Math: Unlocking the potential of SLMs in Grade School Math for details about the dataset construction.

Dataset Sources

Repository: microsoft/orca-math-word-problems-200k Paper: Orca-Math: Unlocking the potential of SLMs in Grade School Math

Direct Use

This dataset has been… See the full description on the dataset page: https://huggingface.co/datasets/microsoft/orca-math-word-problems-200k.
P
MIND Dataset
paperswithcode.com
Updated Apr 19, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MIND Dataset [Dataset]. https://paperswithcode.com/dataset/mind
Explore at:
Dataset updated
Apr 19, 2021
Authors
Fangzhao Wu; Ying Qiao; Jiun-Hung Chen; Chuhan Wu; Tao Qi; Jianxun Lian; Danyang Liu; Xing Xie; Jianfeng Gao; Winnie Wu; Ming Zhou
Description
MIcrosoft News Dataset (MIND) is a large-scale dataset for news recommendation research. It was collected from anonymized behavior logs of Microsoft News website. The mission of MIND is to serve as a benchmark dataset for news recommendation and facilitate the research in news recommendation and recommender systems area.

MIND contains about 160k English news articles and more than 15 million impression logs generated by 1 million users. Every news article contains rich textual content including title, abstract, body, category and entities. Each impression log contains the click events, non-clicked events and historical news click behaviors of this user before this impression. To protect user privacy, each user was de-linked from the production system when securely hashed into an anonymized ID.
P
MS COCO Dataset
paperswithcode.com
Updated Apr 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tsung-Yi Lin; Michael Maire; Serge Belongie; Lubomir Bourdev; Ross Girshick; James Hays; Pietro Perona; Deva Ramanan; C. Lawrence Zitnick; Piotr Dollár, MS COCO Dataset [Dataset]. https://paperswithcode.com/dataset/coco
Explore at:
Dataset updated
Apr 15, 2024
Authors
Tsung-Yi Lin; Michael Maire; Serge Belongie; Lubomir Bourdev; Ross Girshick; James Hays; Pietro Perona; Deva Ramanan; C. Lawrence Zitnick; Piotr Dollár
Description
The MS COCO (Microsoft Common Objects in Context) dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images.

Splits: The first version of MS COCO dataset was released in 2014. It contains 164K images split into training (83K), validation (41K) and test (41K) sets. In 2015 additional test set of 81K images was released, including all the previous test images and 40K new images.

Based on community feedback, in 2017 the training/validation split was changed from 83K/41K to 118K/5K. The new split uses the same images and annotations. The 2017 test set is a subset of 41K images of the 2015 test set. Additionally, the 2017 release contains a new unannotated dataset of 123K images.

Annotations: The dataset has annotations for

object detection: bounding boxes and per-instance segmentation masks with 80 object categories, captioning: natural language descriptions of the images (see MS COCO Captions), keypoints detection: containing more than 200,000 images and 250,000 person instances labeled with keypoints (17 possible keypoints, such as left eye, nose, right hip, right ankle), stuff image segmentation – per-pixel segmentation masks with 91 stuff categories, such as grass, wall, sky (see MS COCO Stuff), panoptic: full scene segmentation, with 80 thing categories (such as person, bicycle, elephant) and a subset of 91 stuff categories (grass, sky, road), dense pose: more than 39,000 images and 56,000 person instances labeled with DensePose annotations – each labeled person is annotated with an instance id and a mapping between image pixels that belong to that person body and a template 3D model. The annotations are publicly available only for training and validation images.
Z
A dataset to assess Microsoft Copilot Answers \\ in the Context of Swiss,...
data.niaid.nih.gov
recerca.uoc.edu
Updated Jan 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaltenbrunner, Andreas (2024). A dataset to assess Microsoft Copilot Answers \\ in the Context of Swiss, Bavarian and Hesse Elections. [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10517696
Explore at:
Dataset updated
Jan 16, 2024
Dataset provided by
Kaltenbrunner, Andreas
Romano, Salvatore
Angius, Riccardo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Switzerland
Description
This readme file was generated on 2024-01-15 by Salvatore Romano

GENERAL INFORMATION

Title of Dataset: A dataset to assess Microsoft Copilot Answers in the Context of Swiss, Bavarian and Hesse Elections.

Author/Principal Investigator InformationName: Salvatore RomanoORCID: 0000-0003-0856-4989Institution: Universitat Oberta de Catalunya, AID4So.Address: Rambla del Poblenou, 154. 08018 Barcelona.Email: salvatore@aiforensics.org

Author/Associate or Co-investigator InformationName: Riccardo AngiusORCID: 0000-0003-0291-3332Institution: Ai ForensicsAddress: Paris, France.Email: riccardo@aiforensics.org

Date of data collection: from 2023-09-21 to 2023-10-02.

Geographic location of data collection: Switzerland and Germany.

Information about funding sources that supported the collection of the data: The data collection and analysis was supported by AlgorithmWatch's DataSkop project, funded by Germany’s Federal Ministry of Education and Research (BMBF) as part of the program “Mensch-Technik-Interaktion” (human-technology interaction). dataskop.netIn Switzerland, the investigation was realized with the support of Stiftung Mercator Schweiz.AI Forensics contribution was supported in part by the Open Society Foundations.AI Forensics data collection infrastructure is supported by the Bright Initiative.

SHARING/ACCESS INFORMATION

Licenses/restrictions placed on the data: This publication is licensed under a Creative Commons Attribution 4.0 International License.https://creativecommons.org/licenses/by/4.0/deed.en

Links to publications that cite or use the data: https://aiforensics.org//uploads/AIF_AW_Bing_Chat_Elections_Report_ca7200fe8d.pdf

Links to other publicly accessible locations of the data: NA

Links/relationships to ancillary data sets: NA

Was data derived from another source? NAIf yes, list source(s):

Recommended citation for this dataset: S. Romano, R. Angius, N. Kerby, P. Bouchaud, J. Amidei, A. Kaltenbrunner. 2024. A dataset to assess Microsoft Copilot Answers in the Context of Swiss, Bavarian and Hesse Elections. https://aiforensics.org//uploads/AIF_AW_Bing_Chat_Elections_Report_ca7200fe8d.pdf

DATA & FILE OVERVIEW

File List: Microsof-Copilot-Answers_in-Swiss-Bavarian-Hess-Elections.csvThe only dataset for this research. It includes rows with prompts and responses from Microsoft Copilot, along with associated metadata for each entry.

Relationship between files, if important: NA

Additional related data collected that was not included in the current data package: NA

Are there multiple versions of the dataset? NAIf yes, name of file(s) that was updated: Why was the file updated? When was the file updated?

METHODOLOGICAL INFORMATION

Description of methods used for collection/generation of data:In our algorithmic auditing research, we adopted for a sock-puppet audit methodology (Sandvig at Al., 2014). This method aligns with the growing interdisciplinary focus on algorithm audits, which prioritize fairness, accountability, and transparency to uncover biases in algorithmic systems (Bandy, 2021). Sock-puppet auditing offers a fully controlled environment to understand the behavior of the system.

Every sample was collected by running a new browser instance connected to the internet via a network of VPNs and residential IPs based in Switzerland and Germany, then accessing Microsoft Copilot through its official URL. Every time, the settings for Language and Country/Region were set to match those of potential voters from the respective regions (English, German, French, or Italian, and Switzerland or Germany). We did not simulate any form of user history or additional personalization. Importantly, Microsoft Copilot's default settings remained unchanged, ensuring that all interactions occurred in the Conversation Style" set asBalanced".

Sandvig, C.; Hamilton, K.; Karahalios, K.; and Langbort, C. 2014. Auditing algorithms: Research methods for detecting discrimination on internet platforms. Data and discrimination: converting critical concerns into productive inquiry,22(2014): 4349–4357.

Bandy, J. 2021. Problematic machine behavior: A systematic literature review of algorithm audits. Proceedings of the acm on human-computer interaction, 5(CSCW1): 1–34

Methods for processing the data: The process involved analyzing the HTML code of the web pages that were accessed. During this examination, key metadata were identified and extracted from the HTML structure. Once this information was successfully extracted, the rest of the HTML page, which primarily consisted of code and elements not pertinent to the needed information, was discarded. This approach ensured that only the most relevant and useful data was retained, while all unnecessary and extraneous HTML components were efficiently removed, streamlining the data collection and analysis process.

Instrument- or software-specific information needed to interpret the data: NA

Standards and calibration information, if appropriate: NA

Environmental/experimental conditions: NA

Describe any quality-assurance procedures performed on the data: NA

People involved with sample collection, processing, analysis and/or submission: Salvatore Romano, Riccardo Angius, Natalie Kerby, Paul Bouchaud, Jacopo Amidei, Andreas Kaltenbrunner.

DATA-SPECIFIC INFORMATION FOR:Microsof-Copilot-Answers_in-Swiss-Bavarian-Hess-Elections.csv

Number of variables: Number of Variables: 33

Number of cases/rows: 5562

Variable List:prompt - (object) Text of the prompt.answer - (object) Text of the answer.country - (object) Country information.language - (object) Language of the text.input_conversation_id - (object) Identifier for the conversation.conversation_group_ids - (object) Group IDs for the conversation.conversation_group_names - (object) Group names for the conversation.experiment_id - (object) Identifier for the experiment group.experiment_name - (object) Name of the experiment group.begin - (object) Start time.end - (object) End time.datetime - (int64) Datetime stamp.week - (int64) Week number.attributions - (object) Link quoted in the text.attribution_links - (object) Links for attributions.search_query - (object) Search query used by the chatbot.unlabelled - (int64) Unlabelled flag.exploratory_sample - (int64) Exploratory sample flag.very_relevant - (int64) Very relevant flag.needs_review - (int64) Needs review flag.misleading_factual_error - (int64) Misleading factual error flag.nonsense_factual_error - (int64) Nonsense factual error flag.rejects_question_framing - (int64) Rejects question framing flag.deflection - (int64) Deflection flag.shield - (int64) Shield flag.wrong_answer_language - (int64) Wrong answer language flag.political_imbalance - (int64) Political imbalance flag.refusal - (int64) Refusal flag.factual_error - (int64) Factual error flag.evasion - (int64) Evasion flag.absolutely_accurate - (int64) Absolutely accurate flag.macrocategory - (object) Macro-category of the content.

Missing data codes:NA

Specialized formats or other abbreviations used: NA
I
Data from: Assessing research data deposits and usage statistics within...
databank.illinois.edu
Updated Dec 19, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christie Wiley (2017). Data from: Assessing research data deposits and usage statistics within IDEALS [Dataset]. http://doi.org/10.13012/B2IDB-1235375_V1
Explore at:
Unique identifier
https://doi.org/10.13012/B2IDB-1235375_V1
Dataset updated
Dec 19, 2017
Authors
Christie Wiley
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Objectives: This study follows-up on previous work that began examining data deposited in an institutional repository. The work here extends the earlier study by answering the following lines of research questions: (1) what is the file composition of datasets ingested into the University of Illinois at Urbana-Champaign campus repository? Are datasets more likely to be single file or multiple file items? (2) what is the usage data associated with these datasets? Which items are most popular? Methods: The dataset records collected in this study were identified by filtering item types categorized as "data" or "dataset" using the advanced search function in IDEALS. Returned search results were collected in an Excel spreadsheet to include data such as the Handle identifier, date ingested, file formats, composition code, and the download count from the item's statistics report. The Handle identifier represents the dataset record's persistent identifier. Composition represents codes that categorize items as single or multiple file deposits. Date available represents the date the dataset record was published in the campus repository. Download statistics were collected via a website link for each dataset record and indicates the number of times the dataset record has been downloaded. Once the data was collected, it was used to evaluate datasets deposited into IDEALS. Results: A total of 522 datasets were identified for analysis covering the period between January 2007 and August 2016. This study revealed two influxes occurring during the period of 2008-2009 and in 2014. During the first time frame a large number of PDFs were deposited by the Illinois Department of Agriculture. Whereas, Microsoft Excel files were deposited in 2014 by the Rare Books and Manuscript Library. Single file datasets clearly dominate the deposits in the campus repository. The total download count for all datasets was 139,663 and the average downloads per month per file across all datasets averaged 3.2. Conclusion: Academic librarians, repository managers, and research data services staff can use the results presented here to anticipate the nature of research data that may be deposited within institutional repositories. With increased awareness, content recruitment, and improvements, IRs can provide a viable cyberinfrastructure for researchers to deposit data, but much can be learned from the data already deposited. Awareness of trends can help librarians facilitate discussions with researchers about research data deposits as well as better tailor their services to address short-term and long-term research needs.
m
Dataset of development of business during the COVID-19 crisis
data.mendeley.com
narcis.nl
Updated Nov 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tatiana N. Litvinova (2020). Dataset of development of business during the COVID-19 crisis [Dataset]. http://doi.org/10.17632/9vvrd34f8t.1
Explore at:
Unique identifier
https://doi.org/10.17632/9vvrd34f8t.1
Dataset updated
Nov 9, 2020
Authors
Tatiana N. Litvinova
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
To create the dataset, the top 10 countries leading in the incidence of COVID-19 in the world were selected as of October 22, 2020 (on the eve of the second full of pandemics), which are presented in the Global 500 ranking for 2020: USA, India, Brazil, Russia, Spain, France and Mexico. For each of these countries, no more than 10 of the largest transnational corporations included in the Global 500 rating for 2020 and 2019 were selected separately. The arithmetic averages were calculated and the change (increase) in indicators such as profitability and profitability of enterprises, their ranking position (competitiveness), asset value and number of employees. The arithmetic mean values of these indicators for all countries of the sample were found, characterizing the situation in international entrepreneurship as a whole in the context of the COVID-19 crisis in 2020 on the eve of the second wave of the pandemic. The data is collected in a general Microsoft Excel table. Dataset is a unique database that combines COVID-19 statistics and entrepreneurship statistics. The dataset is flexible data that can be supplemented with data from other countries and newer statistics on the COVID-19 pandemic. Due to the fact that the data in the dataset are not ready-made numbers, but formulas, when adding and / or changing the values in the original table at the beginning of the dataset, most of the subsequent tables will be automatically recalculated and the graphs will be updated. This allows the dataset to be used not just as an array of data, but as an analytical tool for automating scientific research on the impact of the COVID-19 pandemic and crisis on international entrepreneurship. The dataset includes not only tabular data, but also charts that provide data visualization. The dataset contains not only actual, but also forecast data on morbidity and mortality from COVID-19 for the period of the second wave of the pandemic in 2020. The forecasts are presented in the form of a normal distribution of predicted values and the probability of their occurrence in practice. This allows for a broad scenario analysis of the impact of the COVID-19 pandemic and crisis on international entrepreneurship, substituting various predicted morbidity and mortality rates in risk assessment tables and obtaining automatically calculated consequences (changes) on the characteristics of international entrepreneurship. It is also possible to substitute the actual values identified in the process and following the results of the second wave of the pandemic to check the reliability of pre-made forecasts and conduct a plan-fact analysis. The dataset contains not only the numerical values of the initial and predicted values of the set of studied indicators, but also their qualitative interpretation, reflecting the presence and level of risks of a pandemic and COVID-19 crisis for international entrepreneurship.
Microsoft Coco Dataset
universe.roboflow.com
zip
Updated Mar 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Microsoft (2025). Microsoft Coco Dataset [Dataset]. https://universe.roboflow.com/microsoft/coco/model/3
Explore at:
zipAvailable download formats
Dataset updated
Mar 23, 2025
Dataset authored and provided by
Microsofthttp://microsoft.com/
Variables measured
Object Bounding Boxes
Description
Microsoft Common Objects in Context (COCO) Dataset

The Common Objects in Context (COCO) dataset is a widely recognized collection designed to spur object detection, segmentation, and captioning research. Created by Microsoft, COCO provides annotations, including object categories, keypoints, and more. The model it a valuable asset for machine learning practitioners and researchers. Today, many model architectures are benchmarked against COCO, which has enabled a standard system by which architectures can be compared.

While COCO is often touted to comprise over 300k images, it's pivotal to understand that this number includes diverse formats like keypoints, among others. Specifically, the labeled dataset for object detection stands at 123,272 images.

The full object detection labeled dataset is made available here, ensuring researchers have access to the most comprehensive data for their experiments. With that said, COCO has not released their test set annotations, meaning the test data doesn't come with labels. Thus, this data is not included in the dataset.

The Roboflow team has worked extensively with COCO. Here are a few links that may be helpful as you get started working with this dataset:

An introduction to the COCO dataset

Weird images in COCO, and what that tells us about the utility and limits of COCO
Number of Office 365 enterprise subscribers worldwide 2025, by country
statista.com
Updated Feb 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Number of Office 365 enterprise subscribers worldwide 2025, by country [Dataset]. https://www.statista.com/statistics/983321/worldwide-office-365-user-numbers-by-country/
Explore at:
Dataset updated
Feb 27, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
World
Description
Microsoft 365 is used by over two million companies worldwide, with over one million customers in the United States alone using the office suite software. Office 365 is the brand name previously used by Microsoft for a group of software applications providing productivity related services to its subscribers. Office 365 applications include Outlook, OneDrive, Word, Excel, PowerPoint, OneNote, SharePoint and Microsoft Teams. The consumer and small business plans of Office 365 were renamed as Microsoft 365 on 21 April, 2020. Global office suite market share  An office suite is a collection of software applications (word processing, spreadsheets, database etc.) designed to be used for tasks within an organization. Worldwide market share of office suite technologies is split between Google’s G Suite and Microsoft’s Office 365, with G Suite controlling around 45 percent of the global market and Office 365 holding around 26 percent. This trend is similar across most worldwide regions.
G
Database – all data for all years
open.canada.ca
ouvert.canada.ca
doc, html, png, zip
Updated Nov 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Environment and Climate Change Canada (2024). Database – all data for all years [Dataset]. https://open.canada.ca/data/en/dataset/06022cc0-a31e-4b4c-850d-d4dccda5f3ac
Explore at:
html, doc, png, zipAvailable download formats
Dataset updated
Nov 28, 2024
Dataset provided by
Environment and Climate Change Canada
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Time period covered
Jan 1, 1993 - Dec 31, 2023
Description
The National Pollutant Release Inventory (NPRI) is Canada's public inventory of pollutant releases (to air, water and land), disposals and transfers for recycling. This database contains the full NPRI dataset from 1993 to the current reporting year. To help you navigate, a Microsoft Word file provides information on the database’s structure and schema. The database is available in Microsoft Access format (accdb). The data are in normalized or “list” format and are optimized for pivot table analyses. The data are also available in a CSV format : https://open.canada.ca/data/en/dataset/40e01423-7728-429c-ac9d-2954385ccdfb. Please consult the following resources to enhance your analysis: - Guide on using and Interpreting NPRI Data: https://www.canada.ca/en/environment-climate-change/services/national-pollutant-release-inventory/using-interpreting-data.html - Access additional data from the NPRI, including datasets and mapping products: https://www.canada.ca/en/environment-climate-change/services/national-pollutant-release-inventory/tools-resources-data/exploredata.html Supplemental Information This data is also available in non-proprietary CSV format on the Bulk Data page. http://open.canada.ca/data/en/dataset/40e01423-7728-429c-ac9d-2954385ccdfb These files contain data from 1993 to the latest reporting year available. These datasets are in normalized or ‘list’ format and are optimized for pivot table analyses. Supporting Projects: National Pollutant Release Inventory (NPRI)
d
Highway-Runoff Database (HRDB) Version 1.1.0
catalog.data.gov
data.usgs.gov
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Highway-Runoff Database (HRDB) Version 1.1.0 [Dataset]. https://catalog.data.gov/dataset/highway-runoff-database-hrdb-version-1-1-0
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
The Highway-Runoff Database (HRDB) was developed by the U.S. Geological Survey, in cooperation with the Federal Highway Administration (FHWA) to provide planning-level information for decision makers, planners, and highway engineers to assess and mitigate possible adverse effects of highway runoff on the Nation’s receiving waters. The HRDB was assembled by using a Microsoft Access database application to facilitate use of the data and to calculate runoff-quality statistics with methods that properly handle censored-concentration data. This data release provides highway-runoff data, including information about monitoring sites, precipitation, runoff, and event-mean concentrations of water-quality constituents. The dataset was compiled from 37 studies as documented in 113 scientific or technical reports. The dataset includes data from 242 highway sites across the country. It includes data from 6,837 storm events with dates ranging from April 1975 to November 2017. Therefore, these data span more than 40 years; vehicle emissions and background sources of highway-runoff constituents have changed markedly during this time. For example, some of the early data is affected by use of leaded gasoline, phosphorus-based detergents, and industrial atmospheric deposition. The dataset includes 106,441 concentration values with data for 414 different water-quality constituents. This dataset was assembled from various sources and the original data was collected and analyzed by using various protocols. Where possible the USGS worked with State departments of transportation and the original researchers to obtain, document, and verify the data that was included in the HRDB. This new version (1.1.0) of the database contains software updates to provide data-quality information within the Graphical User Interface (GUI), calculate statistics for multiple sites in batch mode, and output additional statistics. However, inclusion in this dataset does not constitute endorsement by the USGS or the FHWA. People who use this data are responsible for ensuring that the data are complete and correct and that it is suitable for their intended purposes.
cats_vs_dogs
huggingface.co
tensorflow.org
+1more
Updated Nov 26, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Microsoft (2021). cats_vs_dogs [Dataset]. https://huggingface.co/datasets/microsoft/cats_vs_dogs
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 26, 2021
Dataset authored and provided by
Microsofthttp://microsoft.com/
License
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Description
Dataset Card for Cats Vs. Dogs

Dataset Summary

A large set of images of cats and dogs. There are 1738 corrupted images that are dropped. This dataset is part of a now-closed Kaggle competition and represents a subset of the so-called Asirra dataset. From the competition page:

The Asirra data set Web services are often protected with a challenge that's supposed to be easy for people to solve, but difficult for computers. Such a challenge is often called a CAPTCHA… See the full description on the dataset page: https://huggingface.co/datasets/microsoft/cats_vs_dogs.
the global data warehouse as a service market was USD 4,874.9 million in...
cognitivemarketresearch.com
pdf,excel,csv,ppt
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cognitive Market Research, the global data warehouse as a service market was USD 4,874.9 million in 2022! [Dataset]. https://www.cognitivemarketresearch.com/data-warehouse-as-a-service-market-report
Explore at:
pdf,excel,csv,pptAvailable download formats
Dataset authored and provided by
Cognitive Market Research
License
https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy
Time period covered
2021 - 2033
Area covered
Global
Description
According to Cognitive Market Research, the global data warehouse as a service market was USD 4,874.9 million in 2022 and will grow at a compound annual growth rate (CAGR) of 23.5% from 2023 to 2030. How are the Key Drivers Affecting the Data Warehouse as a Service Market?

Rising Demand for High Speed And Low Latency Analytics is Driving the Data Warehouse as a Service Market

The rising demand for high-speed and low-latency analytics propels the Data Warehouse as a Service (DWaaS) Market. Businesses require real-time insights from vast datasets to make agile decisions. DWaaS platforms can process and analyze data rapidly, enabling quicker response times.

In May 2021, WPP unveiled a collaboration with Microsoft aimed at innovative content production transformation by introducing Cloud Studio.

(Source:http://news.microsoft.com/2021/05/05/wpp-and-microsoft-to-creatively-transform-content-production-through-new-cloud-studio-partnership/)

With the need to extract actionable insights swiftly, DWaaS solutions cater to this demand, enhancing operational efficiency, improving decision-making, and bolstering organizations' competitiveness in the rapidly evolving digital landscape.

The Factors Restraining the Growth of the Data Warehouse as a Service Market

Data Security Concerns are Restraining the Data Warehouse as a Service Market

Data security concerns constrain the Data Warehouse as a Service (DWaaS) Market. Organizations hesitate to migrate sensitive data to cloud-based solutions due to potential breaches, unauthorized access, and compliance risks. Ensuring robust encryption, authentication, and compliance with data protection regulations is challenging. Building trust in cloud-based storage and analytics security is crucial for wider DWaaS adoption as businesses prioritize safeguarding their valuable data assets.

Impact of the COVID-19 Pandemic on the Data Warehouse as a Service Market:

COVID-19 significantly disrupted the Data Warehouse as a Service (DWaaS) market. The pandemic's remote work requirements accelerated the demand for cloud-based data solutions. Organizations sought scalable and accessible DWaaS to accommodate changing data needs. Simultaneously, economic uncertainties led some businesses to delay or reconsider investments. The DWaaS landscape responded with increased emphasis on flexibility, remote accessibility, cost optimization, and robust security measures to address the evolving challenges posed by the pandemic. Introduction of Data Warehouse as a Service:

The data warehouse as a service (DWaaS) Market is growing due to businesses' increasing need for scalable and cost-effective data management solutions. DWaaS offers the flexibility to handle large and diverse data sets, enabling data-driven decision-making. The cloud-based nature of DWaaS streamlines implementation reduces infrastructure costs, and ensures easy accessibility, contributing to its rapid adoption and market expansion.

In February 2021, AWS launched the Amazon Redshift Query Editor, compatible with ENHANCED cluster VPC routing. This feature extends support to all node types, and the query time-out limit was extended from 10 minutes to 24 hours for handling queries with longer execution times.

(Source:http://aws.amazon.com/about-aws/whats-new/2021/02/amazon-redshift-query-editor-supports-clusters-with-enhanced-vpc-routing-query-run-times-node-types/)
Transparent Data Encryption – Solution for Security of Database Contents
figshare.com
sindex.sdl.edu.sa
pdf
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Riyazuddin Qureshi (2023). Transparent Data Encryption – Solution for Security of Database Contents [Dataset]. http://doi.org/10.6084/m9.figshare.1517810.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1517810.v1
Dataset updated
Jun 2, 2023
Dataset provided by
figshare
Authors
Riyazuddin Qureshi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract— The present study deals with Transparent Data Encryption which is a technology used to solve the problems of security of data. Transparent Data Encryption means encryptingdatabases on hard disk and on any backup media. Present day global business environment presents numerous security threats and compliance challenges. To protect against data thefts andfrauds we require security solutions that are transparent by design. Transparent Data Encryption provides transparent, standards-based security that protects data on the network, on disk and on backup media. It is easy and effective protection ofstored data by transparently encrypting data. Transparent Data Encryption can be used to provide high levels of security to columns, table and tablespace that is database files stored onhard drives or floppy disks or CD’s, and other information that requires protection. It is the technology used by Microsoft SQL Server 2008 to encrypt database contents. The term encryptionmeans the piece of information encoded in such a way that it can only be decoded read and understood by people for whom the information is intended. The study deals with ways to createMaster Key, creation of certificate protected by the master key, creation of database master key and protection by the certificate and ways to set the database to use encryption in Microsoft SQLServer 2008.
W
Impact and Risk Analysis Database Documentation
cloud.csiss.gmu.edu
researchdata.edu.au
+3more
zip
Updated Dec 13, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Australia (2019). Impact and Risk Analysis Database Documentation [Dataset]. https://cloud.csiss.gmu.edu/uddi/dataset/05e851cf-57a5-4127-948a-1b41732d538c
Explore at:
zip(3577368)Available download formats
Dataset updated
Dec 13, 2019
Dataset provided by
Australia
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract

Four documents describe the specifications, methods and scripts of the Impact and Risk Analysis Databases developed for the Bioregional Assessments Programme. They are:

Bioregional Assessment Impact and Risk Databases Installation Advice (IMIA Database Installation Advice v1.docx).

Naming Convention of the Bioregional Assessment Impact and Risk Databases (IMIA Project Naming Convention v39.docx).

Data treatments for the Bioregional Assessment Impact and Risk Databases (IMIA Project Data Treatments v02.docx).

Quality Assurance of the Bioregional Assessment Impact and Risk Databases (IMIA Project Quality Assurance Protocol v17.docx).

This dataset also includes the Materialised View Information Manager (MatInfoManager.zip). This Microsoft Access database is used to manage the overlay definitions of materialized views of the Impact and Risk Analysis Databases. For more information about this tool, refer to the Data Treatments document.

The documentation supports all five Impact and Risk Analysis Databases developed for the assessment areas:

Maranoa-Balonne-Condamine: http://data.bioregionalassessments.gov.au/dataset/69075f3e-67ba-405b-8640-96e6cb2a189a

Gloucester: http://data.bioregionalassessments.gov.au/dataset/d78c474c-5177-42c2-873c-64c7fe2b178c

Hunter: http://data.bioregionalassessments.gov.au/dataset/7c170d60-ff09-4982-bd89-dd3998a88a47

Namoi: http://data.bioregionalassessments.gov.au/dataset/1549c88d-927b-4cb5-b531-1d584d59be58

Galilee: http://data.bioregionalassessments.gov.au/dataset/3dbb5380-2956-4f40-a535-cbdcda129045

Purpose

These documents describe end-to-end treatments of scientific data for the Impact and Risk Analysis Databases, developed and published by the Bioregional Assessment Programme. The applied approach to data quality assurance is also described. These documents are intended for people with an advanced knowledge in geospatial analysis and database administration, who seek to understand, restore or utilise the Analysis Databases and their underlying methods of analysis.

Dataset History

The Impact and Risk Analysis Database Documentation was created for and by the Information Modelling and Impact Assessment Project (IMIA Project).

Dataset Citation

Bioregional Assessment Programme (2018) Impact and Risk Analysis Database Documentation. Bioregional Assessment Source Dataset. Viewed 12 December 2018, http://data.bioregionalassessments.gov.au/dataset/05e851cf-57a5-4127-948a-1b41732d538c.
v
Global export data of Microsoft
volza.com
csv
Updated Feb 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Volza.LLC (2025). Global export data of Microsoft [Dataset]. https://www.volza.com/exports-united-states/united-states-export-data-of-microsoft
Explore at:
csvAvailable download formats
Dataset updated
Feb 17, 2025
Dataset provided by
Volza.LLC
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Count of exporters, Sum of export value, 2014-01-01/2021-09-30, Count of export shipments
Description
10545 Global export shipment records of Microsoft with prices, volume & current Buyer's suppliers relationships based on actual Global export trade database.
The Business Intelligence Tools Market size was USD 16.9 Million in 2023
cognitivemarketresearch.com
pdf,excel,csv,ppt
Updated Jan 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cognitive Market Research (2024). The Business Intelligence Tools Market size was USD 16.9 Million in 2023 [Dataset]. https://www.cognitivemarketresearch.com/business-intelligence-tools-market-report
Explore at:
pdf,excel,csv,pptAvailable download formats
Dataset updated
Jan 17, 2024
Dataset authored and provided by
Cognitive Market Research
License
https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy
Time period covered
2021 - 2033
Area covered
Global
Description
According to Cognitive Market Research, the global Business Intelligence market size is USD 16.9 million in 2023 and will expand at a compound annual growth rate (CAGR) of 9.50% from 2023 to 2030.

The demand for Business Intelligence s is rising due to the increasing data complexity and rising focus on data-driven decision-making. Demand for adults remains higher in the Business Intelligence market. The Business intelligence platform category held the highest Business intelligence market revenue share in 2023. North American Business Intelligence will continue to lead, whereas the Asia-Pacific Business Intelligence market will experience the most substantial growth until 2030.

Growing Emphasis on Data-Driven Decision-Making to Provide Viable Market Output

In the Business Intelligence Tools market, the increasing recognition of the strategic importance of data-driven decision-making serves as a primary driver. Organizations across various industries are realizing the transformative power of insights derived from BI tools. As the volume of data generated continues to soar, businesses seek sophisticated tools that can efficiently analyze and interpret this information. The ability of BI tools to convert raw data into actionable insights empowers decision-makers to formulate informed strategies, enhance operational efficiency, and gain a competitive edge in a data-centric business landscape.

In June 2020, SAS and Microsoft established a comprehensive technology and go-to-market strategic alliance. As part of the collaboration, SAS's industry solutions and analytical products will be moved to Microsoft Azure, SAS Cloud's preferred cloud provider.

Source-news.microsoft.com/2020/06/15/sas-and-microsoft-partner-to-further-shape-the-future-of-analytics-and-ai/#:~:text=and%20SAS%20today%20announced%20an,from%20their%20digital%20transformation%20initiatives.

Rise in Adoption of Advanced Analytics and Artificial Intelligence to Propel Market Growth

Another significant driver in the Business Intelligence Tools market is the escalating adoption of advanced analytics and artificial intelligence (AI) capabilities. Modern BI tools are incorporating AI-driven functionalities such as machine learning algorithms, natural language processing, and predictive analytics. These technologies enable users to uncover deeper insights, identify patterns, and predict future trends. The integration of AI not only enhances the analytical capabilities of BI tools but also automates processes, reducing manual efforts and improving the overall efficiency of data analysis. This trend aligns with the industry's pursuit of more intelligent and automated BI solutions to derive maximum value from data assets.

In March 2020, IBM created a new, dynamic global dashboard to display the global spread of COVID-19 with the assistance of IBM Cognos Analytics. The World Health Organization (WHO) and state and municipal governments provide the COVID-19 data displayed in this dashboard.

Source-www.ibm.com/blog/creating-trusted-covid-19-data-for-communities/

Market Dynamics of the Business Intelligence tool Market

Data Security and Privacy Concerns to Restrict Market Growth

One of the key restraints in the Business Intelligence Tools market revolves around persistent concerns regarding data security and privacy. As organizations increasingly rely on BI tools to process and analyze sensitive business information, the risk of data breaches and unauthorized access becomes a prominent challenge. Heightened awareness of regulatory requirements, such as GDPR, has intensified the focus on protecting sensitive data. Businesses face the challenge of implementing robust security measures within BI tools to ensure compliance with regulations and safeguard against potential data vulnerabilities, thereby slowing down the adoption pace.

Impact of COVID-19 on the Business Intelligence market

The COVID-19 pandemic has had a profound impact on the Business Intelligence (BI) market. As organizations grappled with unprecedented disruptions, the need for timely and accurate insights became paramount. The pandemic accelerated the adoption of BI tools as businesses sought to navigate uncertainties and make data-driven decisions. Remote work became a norm, prompting increased demand for BI solutions that support virtual collaboration and enable users to access analytics from anywhere. Moreover, there w...
Product Review Datasets for User Sentiment Analysis
datarade.ai
Updated Sep 28, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Product Review Datasets for User Sentiment Analysis [Dataset]. https://datarade.ai/data-products/product-review-datasets-for-user-sentiment-analysis-oxylabs
Explore at:
.json, .xml, .csv, .xlsAvailable download formats
Dataset updated
Sep 28, 2018
Dataset authored and provided by
Oxylabs
Area covered
Barbados, Antigua and Barbuda, Egypt, Libya, Canada, Sudan, Italy, Hong Kong, South Africa, Argentina
Description
Product Review Datasets: Uncover user sentiment

Harness the power of Product Review Datasets to understand user sentiment and insights deeply. These datasets are designed to elevate your brand and product feature analysis, help you evaluate your competitive stance, and assess investment risks.

Data sources:

Trustpilot: datasets encompassing general consumer reviews and ratings across various businesses, products, and services.

Leave the data collection challenges to us and dive straight into market insights with clean, structured, and actionable data, including:

Product name;

Product category;

Number of ratings;

Ratings average;

Review title;

Review body;

Choose from multiple data delivery options to suit your needs:

Receive data in easy-to-read formats like spreadsheets or structured JSON files.

Select your preferred data storage solutions, including SFTP, Webhooks, Google Cloud Storage, AWS S3, and Microsoft Azure Storage.

Tailor data delivery frequencies, whether on-demand or per your agreed schedule.

Why choose Oxylabs?

Fresh and accurate data: Access organized, structured, and comprehensive data collected by our leading web scraping professionals.

Time and resource savings: Concentrate on your core business goals while we efficiently handle the data extraction process at an affordable cost.

Adaptable solutions: Share your specific data requirements, and we'll craft a customized data collection approach to meet your objectives.

Legal compliance: Partner with a trusted leader in ethical data collection. Oxylabs is a founding member of the Ethical Web Data Collection Initiative, aligning with GDPR and CCPA standards.

Pricing Options:

Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.

Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.

Experience a seamless journey with Oxylabs:

Understanding your data needs: We work closely to understand your business nature and daily operations, defining your unique data requirements.

Developing a customized solution: Our experts create a custom framework to extract public data using our in-house web scraping infrastructure.

Delivering data sample: We provide a sample for your feedback on data quality and the entire delivery process.

Continuous data delivery: We continuously collect public data and deliver custom datasets per the agreed frequency.

Join the ranks of satisfied customers who appreciate our meticulous attention to detail and personalized support. Experience the power of Product Review Datasets today to uncover valuable insights and enhance decision-making.
g
Data from: Composition of Foods Raw, Processed, Prepared USDA National...
gimi9.com
s.cnmilf.com
+5more
Updated Jan 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Composition of Foods Raw, Processed, Prepared USDA National Nutrient Database for Standard Reference, Release 28 [Dataset]. https://www.gimi9.com/dataset/data-gov_d8d4187967fe985eb5cff53b50fec7c3345687ce/
Explore at:
Dataset updated
Jan 12, 2024
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
[Note: Integrated as part of FoodData Central, April 2019.] The database consists of several sets of data: food descriptions, nutrients, weights and measures, footnotes, and sources of data. The Nutrient Data file contains mean nutrient values per 100 g of the edible portion of food, along with fields to further describe the mean value. Information is provided on household measures for food items. Weights are given for edible material without refuse. Footnotes are provided for a few items where information about food description, weights and measures, or nutrient values could not be accommodated in existing fields. Data have been compiled from published and unpublished sources. Published data sources include the scientific literature. Unpublished data include those obtained from the food industry, other government agencies, and research conducted under contracts initiated by USDA’s Agricultural Research Service (ARS). Updated data have been published electronically on the USDA Nutrient Data Laboratory (NDL) web site since 1992. Standard Reference (SR) 28 includes composition data for all the food groups and nutrients published in the 21 volumes of "Agriculture Handbook 8" (US Department of Agriculture 1976-92), and its four supplements (US Department of Agriculture 1990-93), which superseded the 1963 edition (Watt and Merrill, 1963). SR28 supersedes all previous releases, including the printed versions, in the event of any differences. Attribution for photos: Photo 1: k7246-9 Copyright free, public domain photo by Scott Bauer Photo 2: k8234-2 Copyright free, public domain photo by Scott Bauer Resources in this dataset: Resource Title: READ ME - Documentation and User Guide - Composition of Foods Raw, Processed, Prepared - USDA National Nutrient Database for Standard Reference, Release 28. File Name: sr28_doc.pdfResource Software Recommended: Adobe Acrobat Reader,url: http://www.adobe.com/prodindex/acrobat/readstep.html Resource Title: ASCII (6.0Mb; ISO/IEC 8859-1). File Name: sr28asc.zipResource Description: Delimited file suitable for importing into many programs. The tables are organized in a relational format, and can be used with a relational database management system (RDBMS), which will allow you to form your own queries and generate custom reports.Resource Title: ACCESS (25.2Mb). File Name: sr28db.zipResource Description: This file contains the SR28 data imported into a Microsoft Access (2007 or later) database. It includes relationships between files and a few sample queries and reports.Resource Title: ASCII (Abbreviated; 1.1Mb; ISO/IEC 8859-1). File Name: sr28abbr.zipResource Description: Delimited file suitable for importing into many programs. This file contains data for all food items in SR28, but not all nutrient values--starch, fluoride, betaine, vitamin D2 and D3, added vitamin E, added vitamin B12, alcohol, caffeine, theobromine, phytosterols, individual amino acids, individual fatty acids, or individual sugars are not included. These data are presented per 100 grams, edible portion. Up to two household measures are also provided, allowing the user to calculate the values per household measure, if desired.Resource Title: Excel (Abbreviated; 2.9Mb). File Name: sr28abxl.zipResource Description: For use with Microsoft Excel (2007 or later), but can also be used by many other spreadsheet programs. This file contains data for all food items in SR28, but not all nutrient values--starch, fluoride, betaine, vitamin D2 and D3, added vitamin E, added vitamin B12, alcohol, caffeine, theobromine, phytosterols, individual amino acids, individual fatty acids, or individual sugars are not included. These data are presented per 100 grams, edible portion. Up to two household measures are also provided, allowing the user to calculate the values per household measure, if desired.Resource Software Recommended: Microsoft Excel,url: https://www.microsoft.com/ Resource Title: ASCII (Update Files; 1.1Mb; ISO/IEC 8859-1). File Name: sr28upd.zipResource Description: Update Files - Contains updates for those users who have loaded Release 27 into their own programs and wish to do their own updates. These files contain the updates between SR27 and SR28. Delimited file suitable for import into many programs.

Facebook

Twitter

Click to copy link

Link copied

Cite

Work With Data (2024). Country, domain, employees, industry and sector of companies called Microsoft [Dataset]. https://www.workwithdata.com/dataset?entity=companies&col=country,industry,domain,sector,company,employees&f=1&fcol0=company&fop0=includes&fval0=Microsoft

Country, domain, employees, industry and sector of companies called Microsoft

Explore at:

Dataset updated

Feb 1, 2024

Dataset authored and provided by

Work With Data

License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

This dataset is about companies and is filtered where the company includes Microsoft. It has 6 columns such as company, country, domain, employees, and industry. The data is ordered by revenues (descending).

Clear search

Close search

Google apps

Main menu

Country, domain, employees, industry and sector of companies called...

Microsoft Teams: number of daily active users 2019-2024

orca-math-word-problems-200k

MIND Dataset

MS COCO Dataset

A dataset to assess Microsoft Copilot Answers \\ in the Context of Swiss,...

Data from: Assessing research data deposits and usage statistics within...

Dataset of development of business during the COVID-19 crisis

Microsoft Coco Dataset

Microsoft Common Objects in Context (COCO) Dataset

Number of Office 365 enterprise subscribers worldwide 2025, by country

Database – all data for all years

Highway-Runoff Database (HRDB) Version 1.1.0

cats_vs_dogs

the global data warehouse as a service market was USD 4,874.9 million in...

Transparent Data Encryption – Solution for Security of Database Contents

Impact and Risk Analysis Database Documentation

Abstract

Purpose

Dataset History

Dataset Citation

Global export data of Microsoft

The Business Intelligence Tools Market size was USD 16.9 Million in 2023

Product Review Datasets for User Sentiment Analysis

Data from: Composition of Foods Raw, Processed, Prepared USDA National...

Country, domain, employees, industry and sector of companies called MicrosoftSee More Versions

Country, domain, employees, industry and sector of companies called Microsoft