53 datasets found

US Broadband Usage Across Counties
kaggle.com
Updated Jan 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). US Broadband Usage Across Counties [Dataset]. https://www.kaggle.com/datasets/thedevastator/us-broadband-usage-across-counties-and-zip-codes
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 6, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
Area covered
United States
Description
US Broadband Usage Across Counties

Utilizing Microsoft's Data to Estimate Access

By Amber Thomas [source]

About this dataset

This dataset provides an estimation of broadband usage in the United States, focusing on how many people have access to broadband and how many are actually using it at broadband speeds. Through data collected by Microsoft from our services, including package size and total time of download, we can estimate the throughput speed of devices connecting to the internet across zip codes and counties.

According to Federal Communications Commission (FCC) estimates, 14.5 million people don't have access to any kind of broadband connection. This data set aims to address this contrast between those with estimated availability but no actual use by providing more accurate usage numbers downscaled to county and zip code levels. Who gets counted as having access is vastly important -- it determines who gets included in public funding opportunities dedicated solely toward closing this digital divide gap. The implications can be huge: millions around this country could remain invisible if these number aren't accurately reported or used properly in decision-making processes.

This dataset includes aggregated information about these locations with less than 20 devices for increased accuracy when estimating Broadband Usage in the United States-- allowing others to use it for developing solutions that improve internet access or label problem areas accurately where no real or reliable connectivity exists among citizens within communities large and small throughout the US mainland.. Please review the license terms before using these data so that you may adhere appropriately with stipulations set forth under Microsoft's Open Use Of Data Agreement v1.0 agreement prior to utilizing this dataset for your needs-- both professional and educational endeavors alike!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

How to Use the US Broadband Usage Dataset

This dataset provides broadband usage estimates in the United States by county and zip code. It is ideally suited for research into how broadband connects households, towns and cities. Understanding this information is vital for closing existing disparities in access to high-speed internet, and for devising strategies for making sure all Americans can stay connected in a digital world.

The dataset contains six columns: - County – The name of the county for which usage statistics are provided. - Zip Code (5-Digit) – The 5-digit zip code from which usage data was collected from within that county or metropolitan area/micro area/divisions within states as reported by the US Census Bureau in 2018[2].
- Population (Households) – Estimated number of households defined according to [3] based on data from the US Census Bureau American Community Survey's 5 Year Estimates[4].
- Average Throughput (Mbps)- Average Mbps download speed derived from a combination of data collected anonymous devices connected through Microsoft services such as Windows Update, Office 365, Xbox Live Core Services, etc.[5]
- Percent Fast (> 25 Mbps)- Percentage of machines with throughput greater than 25 Mbps calculated using [6]. 6) Percent Slow (< 3 Mbps)- Percentage of machines with throughput less than 3Mbps calculated using [7].

Research Ideas

Targeting marketing campaigns based on broadband use. Companies can use the geographic and demographic data in this dataset to create targeted advertising campaigns that are tailored to individuals living in areas where broadband access is scarce or lacking.

Creating an educational platform for those without reliable access to broadband internet. By leveraging existing technologies such as satellite internet, media streaming services like Netflix, and platforms such as Khan Academy or EdX, those with limited access could gain access to new educational options from home.

Establishing public-private partnerships between local governments and telecom providers need better data about gaps in service coverage and usage levels in order to make decisions about investments into new infrastructure buildouts for better connectivity options for rural communities

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

See the dataset description for more information.

Columns

File: broadband_data_2020October.csv

Acknowledgements

If you use this dataset in your research,...
Microsoft Geolife GPS Trajectory Dataset
kaggle.com
Updated Jun 27, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Möbius (2022). Microsoft Geolife GPS Trajectory Dataset [Dataset]. https://www.kaggle.com/datasets/arashnic/microsoft-geolife-gps-trajectory-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 27, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Möbius
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

This GPS trajectory dataset was collected in (Microsoft Research) Geolife project by 178 users in a period of over four years (from April 2007 to October 2011). A GPS trajectory of this dataset is represented by a sequence of time-stamped points, each of which contains the information of latitude, longitude and altitude. This dataset contains 17,621 trajectories with a total distance of 1,251,654 kilometers and a total duration of 48,203 hours. These trajectories were recorded by different GPS loggers and GPS-phones, and have a variety of sampling rates. 91 percent of the trajectories are logged in a dense representation, e.g. every 1~5 seconds or every 5~10 meters per point.

Content

This dataset recoded a broad range of users’ outdoor movements, including not only life routines like go home and go to work but also some entertainments and sports activities, such as shopping, sightseeing, dining, hiking, and cycling.

Data Format - Trajectory file Every single folder of this dataset stores a user’s GPS log files, which were converted to PLT format. Each PLT file contains a single trajectory and is named by its starting time. To avoid potential confusion of time zone, we use GMT in the date/time property of each point, which is different from our previous release. - PLT format: Line 1…6 are useless in this dataset, and can be ignored. Points are described in following lines, one for each line. Field 1: Latitude in decimal degrees. Field 2: Longitude in decimal degrees. Field 3: All set to 0 for this dataset. Field 4: Altitude in feet (-777 if not valid). Field 5: Date - number of days (with fractional part) that have passed since 12/30/1899. Field 6: Date as a string. Field 7: Time as a string. Note that field 5 and field 6&7 represent the same date/time in this dataset. You may use either of them. Example: 39.906631,116.385564,0,492,40097.5864583333,2009-10-11,14:04:30 39.906554,116.385625,0,492,40097.5865162037,2009-10-11,14:04:35 - Transportation mode labels Possible transportation modes are: walk, bike, bus, car, subway, train, airplane, boat, run and motorcycle. Again, we have converted the date/time of all labels to GMT, even though most of them were created in China. Example: Start Time End TimeTransportation Mode 2008/04/02 11:24:21 2008/04/02 11:50:45 bus 2008/04/03 01:07:03 2008/04/03 11:31:55 train 2008/04/03 11:32:24 2008/04/03 11:46:14 walk 2008/04/03 11:47:14 2008/04/03 11:55:07 car

First, you can regard the label of both taxi and car as driving although we set them with different labels for future usage. Second, a user could label the transportation mode of a light rail as train while others may use subway as the label. Actually, no trajectory can be recorded in an underground subway system since a GPS logger cannot receive any signal there. In Beijing, the light rails and subway systems are seamlessly connected, e.g., line 13 (a light rail) is connected with line 10 and line 2, which are subway systems. Sometimes, a line (like line 5) is comprised of partial subways and partial light rails. So, users may have a variety of understanding in their transportation modes. You can differentiate the real train trajectories (connecting two cities) from the light rail trajectory (generating in a city) according to their distances. Or, just treat them the same.

More: User Guide: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/User20Guide-1.2.pdf

Citation

Please cite the following papers when using this GPS dataset. [1] Yu Zheng, Lizhu Zhang, Xing Xie, Wei-Ying Ma. Mining interesting locations and travel sequences from GPS trajectories. In Proceedings of International conference on World Wild Web (WWW 2009), Madrid Spain. ACM Press: 791-800.

[2] Yu Zheng, Quannan Li, Yukun Chen, Xing Xie, Wei-Ying Ma. Understanding Mobility Based on GPS Data. In Proceedings of ACM conference on Ubiquitous Computing (UbiComp 2008), Seoul, Korea. ACM Press: 312-321. [3] Yu Zheng, Xing Xie, Wei-Ying Ma, GeoLife: A Collaborative Social Networking Service among User, location and trajectory. Invited paper, in IEEE Data Engineering Bulletin. 33, 2, 2010, pp. 32-40.

Inspiration

This trajectory dataset can be used in many research fields, such as mobility pattern mining, user activity recognition, location-based social networks, location privacy, and location recommendation.
b
Data from: Coarse datasets for the 2002-2010 Tsimane' Amazonian Panel...
scholarworks.brandeis.edu
docx, pdf, xls
Updated Mar 15, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ricardo Godoy; William R. Leonard; Victoria Reyes-Garcia; Tomas Huanca (2022). Coarse datasets for the 2002-2010 Tsimane' Amazonian Panel Study(TAPS) - Introduction and authorization [Dataset]. https://scholarworks.brandeis.edu/esploro/outputs/dataset/Coarse-datasets-for-the-2002-2010-Tsimane/9924097301801921
Explore at:
xls(1472000 bytes), pdf(140365 bytes), docx(32618 bytes)Available download formats
Dataset updated
Mar 15, 2022
Authors
Ricardo Godoy; William R. Leonard; Victoria Reyes-Garcia; Tomas Huanca
Time period covered
Mar 2022
Measurement technique
<p>See Chapter 4 of "Too little, too late" for general methods, and different chapters for methods on different topics</p>
Description
Introduction. This document provides an overview of an archive composed of four sections.
[1] An introduction (this document) which describes the scope of the project
[2] Yearly folder, from 2002 until 2010, of the coarse Microsoft Access datasets + the surveys used to collect information for each year. The word coarse does not mean the information in the Microsoft Access dataset was not corrected for mistakes; it was, but some mistakes and inconsistencies remain, such as with data on age or education. Furthermore, the coarse dataset provides disaggregated information for selected topics, which appear in summary statistics in the clean dataset. For example, in the coarse dataset one can find the different illnesses afflicting a person during the past 14 days whereas in the clean dataset only the total number of illnesses appears.
[3] A letter from the Gran Consejo Tsimane’ authorizing the public use of de-identified data collected in our studies among Tsimane’.
[4] A Microsoft Excel document with the unique identification number for each person in the panel study.

Background. During 2002-2010, a team of international researchers, surveyors, and translators gathered longitudinal (panel) data on the demography, economy, social relations, health, nutritional status, local ecological knowledge, and emotions of about 1400 native Amazonians known as Tsimane’ who lived in thirteen villages near and far from towns in the department of Beni in the Bolivian Amazon. A report titled “Too little, too late” summarizes selected findings from the study and is available to the public at the electronic library of Brandeis University:
https://scholarworks.brandeis.edu/permalink/01BRAND_INST/1bo2f6t/alma9923926194001921

A copy of the clean, merged, and appended Stata (V17) dataset is available to the public at the following two web addresses:
[a] Brandeis University:
https://scholarworks.brandeis.edu/permalink/01BRAND_INST/1bo2f6t/alma9923926193901921
[b] Inter-university Consortium for Political and Social Research (ICPSR), University of Michigan (only available to users affiliated with institutions belonging to ICPSR)
http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/37671/utilization

Chapter 4 of the report “Too little, too late” mentioned above describes the motivation and history of the study, the difference between the coarse and clean datasets, and topics which can be examined only with coarse data.

Aims. The aims of this archive are to:
· Make available in Microsoft Access the coarse de-identified dataset [1] for each of the seven yearly surveys (2004-2010) and [2] one Access data based on quarterly surveys done during 2002 and 2003. Together, these two datasets form one longitudinal dataset of individuals, households, and villages.
· Provide guidance on how to link files within and across years, and
· Make available a Microsoft Excel file with a unique identification number to link individuals across years
The datasets in the archive.
· Eight Microsoft Access datasets with data on a wide range of variables. Except for the Access file for 2002-2003, all the other information in each of the other Access files refers to one year. Within any Access dataset, users will find two types of files:
o Thematic files. The name of a thematic file contains the prefix tbl (e.g., 29_tbl_Demography or tbl_29_Demography). The file name (sometimes in Spanish, sometimes in English) indicates the content of the file. For example, in the Access dataset for one year, the micro file tbl_30_Ventas has all the information on sales for that year. Within each micro file, columns contain information on a variable and the name of the column indicates the content of the variable. For instance, the column heading item in the Sales file would indicate the type of good sold. The exac…
Windows and Doors Extraction
hub.arcgis.com
Updated Nov 10, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Esri (2020). Windows and Doors Extraction [Dataset]. https://hub.arcgis.com/content/8c0078cc7e314e31b20001d94daace5e
Explore at:
Dataset updated
Nov 10, 2020
Dataset authored and provided by
Esrihttp://esri.com/
Description
This deep learning model is used for extracting windows and doors in textured building data displayed in 3D views. Manually digitizing windows/doors from 3D building data can be a slow process. This model automates the extraction of these objects from a 3D view and can help in speeding up 3D editing and analysis workflows. Using this model, existing building data can be enhanced with additional information on location, size and orientation of windows and doors. The extracted windows and doors can be further used to perform 3D visibility analysis using existing 3D geoprocessing tools in ArcGIS.This model can be useful in many industries and workflows. National Government and state-level law enforcement could use this model in security analysis scenarios. Local governments could use windows and door locations to help with tax assessments with CAMA (computer aided mass appraisal) plus impact-studies for urban planning. Public safety users might be interested in regards to physical or visual access to restricted areas, or the ability to build evacuation plans. The commercial sector, with everyone from real-estate agents to advertisers to office/interior designers, would be able to benefit from knowing where windows and doors are located. Even utilities, especially mobile phone providers, could take advantage of knowing window sizes and positions. To be clear, this model doesn't solve these problems, but it does allow users to extract and collate some of the data they will need to do it.Using the modelThis model is generic and is expected to work well with a variety of building styles and shapes. To use this model, you need to install supported deep learning frameworks packages first. See Install deep learning frameworks for ArcGIS for more information. The model can be used with the Interactive Object Detection tool.A blog on the ArcGIS Pro tool that leverages this model is published on Esri Blogs. We've also published steps on how to retrain this model further using your data.InputThe model is expected to work with any textured building data displayed in 3D views. Example data sources include textured multipatches, 3D object scene layers, and integrated mesh layers. OutputFeature class with polygons representing the detected windows and doors in the input imagery. Model architectureThe model uses the FasterRCNN model architecture implemented using ArcGIS API for Python.Training dataThis model was trained using images from the Open Images Dataset.Sample resultsBelow, are sample results of the windows detected with this model in ArcGIS Pro using the Interactive Object Detection tool, which outputs the detected objects as a symbolized point feature class with size and orientation attributes.
p
Microsoft Access Users Email List
prospectwallet.com
Updated Jul 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prospect Wallet: B2B Mailing & Email lists | Direct Mail Marketing (2025). Microsoft Access Users Email List [Dataset]. https://www.prospectwallet.com/technology/microsoft-access-users-email-list/
Explore at:
Dataset updated
Jul 25, 2025
Dataset authored and provided by
Prospect Wallet: B2B Mailing & Email lists | Direct Mail Marketing
Description
Microsoft Access Users Email List Leading companies such as Toyota USA and Exxon Mobil utilize Microsoft Access for effective data management in the worldwide database market, which controls 20.57%.

Businesses may take advantage of the growth of this platform by reaching the right decision-makers from companies using Microsoft Access. To help you interact with software developers, data architects, and business leaders, our verified Microsoft
Microsoft Teams: number of daily active users 2019-2024
statista.com
Updated Jun 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Microsoft Teams: number of daily active users 2019-2024 [Dataset]. https://www.statista.com/statistics/1033742/worldwide-microsoft-teams-daily-and-monthly-users/
Explore at:
Dataset updated
Jun 26, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
The number of daily active users of Microsoft Teams has stayed the same in the past year, around *** million. Due to the impact of the coronavirus (COVID-19) outbreak and the growing practices of social distancing and working from home, Microsoft has seen dramatic increases in the daily use of their communication and collaboration platform within a short period of time. Microsoft Teams is part of Microsoft 365, a set of collaboration apps and services launched in *********. Increased data consumption from “staying at home”    The average daily in-home data usage in the United States has increased significantly during the coronavirus (COVID-19) outbreak in **********. Compared to the same amount of days in **********, the daily average in-home data usage increased by a total of *** gigabytes in **********, a roughly ** percent increase. Data consumption from the usage of gaming consoles and smartphones increased the most, although the increases can be observed across nearly all device categories. Social media platforms and video and conference all platforms are the technology services that are used the most during the outbreak in the U.S.
Κ
The Enhanced Microsoft Academic Knowledge Graph
datacatalogue.sodanet.gr
Updated Apr 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Κατάλογος Δεδομένων SoDaNet (2024). The Enhanced Microsoft Academic Knowledge Graph [Dataset]. http://doi.org/10.17903/FK2/TZWQPD
Explore at:
Unique identifier
https://doi.org/10.17903/FK2/TZWQPD
Dataset updated
Apr 30, 2024
Dataset provided by
Κατάλογος Δεδομένων SoDaNet
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 1800 - Dec 31, 2021
Area covered
Worldwide
Dataset funded by
European Commission
Description
The Enhanced Microsoft Academic Knowledge Graph (EMAKG) is a large dataset of scientific publications and related entities, including authors, institutions, journals, conferences, and fields of study. The proposed dataset originates from the Microsoft Academic Knowledge Graph (MAKG), one of the most extensive freely available knowledge graphs of scholarly data. To build the dataset, we first assessed the limitations of the current MAKG. Then, based on these, several methods were designed to enhance data and facilitate the number of use case scenarios, particularly in mobility and network analysis. EMAKG provides two main advantages: It has improved usability, facilitating access to non-expert users It includes an increased number of types of information obtained by integrating various datasets and sources, which help expand the application domains. For instance, geographical information could help mobility and migration research. The knowledge graph completeness is improved by retrieving and merging information on publications and other entities no longer available in the latest version of MAKG. Furthermore, geographical and collaboration networks details are employed to provide data on authors as well as their annual locations and career nationalities, together with worldwide yearly stocks and flows. Among others, the dataset also includes: fields of study (and publications) labelled by their discipline(s); abstracts and linguistic features, i.e., standard language codes, tokens , and types entities’ general information, e.g., date of foundation and type of institutions; and academia related metrics, i.e., h-index. The resulting dataset maintains all the characteristics of the parent datasets and includes a set of additional subsets and data that can be used for new case studies relating to network analysis, knowledge exchange, linguistics, computational linguistics, and mobility and human migration, among others.

Data from: Login Data Set for Risk-Based Authentication

zenodo.org
data.niaid.nih.gov

zip

Updated Jun 30, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

Stephan Wiefling; Stephan Wiefling; Paul René Jørgensen; Paul René Jørgensen; Sigurd Thunem; Sigurd Thunem; Luigi Lo Iacono; Luigi Lo Iacono (2022). Login Data Set for Risk-Based Authentication [Dataset]. http://doi.org/10.5281/zenodo.6782156

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.6782156

Dataset updated

Jun 30, 2022

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Stephan Wiefling; Stephan Wiefling; Paul René Jørgensen; Paul René Jørgensen; Sigurd Thunem; Sigurd Thunem; Luigi Lo Iacono; Luigi Lo Iacono

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Login Data Set for Risk-Based Authentication

Synthesized login feature data of >33M login attempts and >3.3M users on a large-scale online service in Norway. Original data collected between February 2020 and February 2021.

This data sets aims to foster research and development for Risk-Based Authentication (RBA) systems. The data was synthesized from the real-world login behavior of more than 3.3M users at a large-scale single sign-on (SSO) online service in Norway.

The users used this SSO to access sensitive data provided by the online service, e.g., a cloud storage and billing information. We used this data set to study how the Freeman et al. (2016) RBA model behaves on a large-scale online service in the real world (see Publication). The synthesized data set can reproduce these results made on the original data set (see Study Reproduction). Beyond that, you can use this data set to evaluate and improve RBA algorithms under real-world conditions.

WARNING: The feature values are plausible, but still totally artificial. Therefore, you should NOT use this data set in productive systems, e.g., intrusion detection systems.

Overview

The data set contains the following features related to each login attempt on the SSO:

Feature	Data Type	Description	Range or Example
IP Address	String	IP address belonging to the login attempt	0.0.0.0 - 255.255.255.255
Country	String	Country derived from the IP address	US
Region	String	Region derived from the IP address	New York
City	String	City derived from the IP address	Rochester
ASN	Integer	Autonomous system number derived from the IP address	0 - 600000
User Agent String	String	User agent string submitted by the client	Mozilla/5.0 (Windows NT 10.0; Win64; ...
OS Name and Version	String	Operating system name and version derived from the user agent string	Windows 10
Browser Name and Version	String	Browser name and version derived from the user agent string	Chrome 70.0.3538
Device Type	String	Device type derived from the user agent string	(`mobile`, `desktop`, `tablet`, `bot`, `unknown`)¹
User ID	Integer	Idenfication number related to the affected user account	[Random pseudonym]
Login Timestamp	Integer	Timestamp related to the login attempt	[64 Bit timestamp]
Round-Trip Time (RTT) [ms]	Integer	Server-side measured latency between client and server	1 - 8600000
Login Successful	Boolean	`True`: Login was successful, `False`: Login failed	(`true`, `false`)
Is Attack IP	Boolean	IP address was found in known attacker data set	(`true`, `false`)
Is Account Takeover	Boolean	Login attempt was identified as account takeover by incident response team of the online service	(`true`, `false`)

Data Creation

As the data set targets RBA systems, especially the Freeman et al. (2016) model, the statistical feature probabilities between all users, globally and locally, are identical for the categorical data. All the other data was randomly generated while maintaining logical relations and timely order between the features.

The timestamps, however, are not identical and contain randomness. The feature values related to IP address and user agent string were randomly generated by publicly available data, so they were very likely not present in the real data set. The RTTs resemble real values but were randomly assigned among users per geolocation. Therefore, the RTT entries were probably in other positions in the original data set.

The country was randomly assigned per unique feature value. Based on that, we randomly assigned an ASN related to the country, and generated the IP addresses for this ASN. The cities and regions were derived from the generated IP addresses for privacy reasons and do not reflect the real logical relations from the original data set.
The device types are identical to the real data set. Based on that, we randomly assigned the OS, and based on the OS the browser information. From this information, we randomly generated the user agent string. Therefore, all the logical relations regarding the user agent are identical as in the real data set.
The RTT was randomly drawn from the login success status and synthesized geolocation data. We did this to ensure that the RTTs are realistic ones.

Regarding the Data Values

Due to unresolvable conflicts during the data creation, we had to assign some unrealistic IP addresses and ASNs that are not present in the real world. Nevertheless, these do not have any effects on the risk scores generated by the Freeman et al. (2016) model.

You can recognize them by the following values:

ASNs with values >= 500.000
IP addresses in the range 10.0.0.0 - 10.255.255.255 (10.0.0.0/8 CIDR range)

Study Reproduction

Based on our evaluation, this data set can reproduce our study results regarding the RBA behavior of an RBA model using the IP address (IP address, country, and ASN) and user agent string (Full string, OS name and version, browser name and version, device type) as features.

The calculated RTT significances for countries and regions inside Norway are not identical using this data set, but have similar tendencies. The same is true for the Median RTTs per country. This is due to the fact that the available number of entries per country, region, and city changed with the data creation procedure. However, the RTTs still reflect the real-world distributions of different geolocations by city.

See RESULTS.md for more details.

Ethics

By using the SSO service, the users agreed in the data collection and evaluation for research purposes. For study reproduction and fostering RBA research, we agreed with the data owner to create a synthesized data set that does not allow re-identification of customers.

The synthesized data set does not contain any sensitive data values, as the IP addresses, browser identifiers, login timestamps, and RTTs were randomly generated and assigned.

Publication

You can find more details on our conducted study in the following journal article:

Pump Up Password Security! Evaluating and Enhancing Risk-Based Authentication on a Real-World Large-Scale Online Service (2022)
Stephan Wiefling, Paul René Jørgensen, Sigurd Thunem, and Luigi Lo Iacono.
ACM Transactions on Privacy and Security

Bibtex

@article{Wiefling_Pump_2022,
 author = {Wiefling, Stephan and Jørgensen, Paul René and Thunem, Sigurd and Lo Iacono, Luigi},
 title = {Pump {Up} {Password} {Security}! {Evaluating} and {Enhancing} {Risk}-{Based} {Authentication} on a {Real}-{World} {Large}-{Scale} {Online} {Service}},
 journal = {{ACM} {Transactions} on {Privacy} and {Security}},
 doi = {10.1145/3546069},
 publisher = {ACM},
 year  = {2022}
}

License

This data set and the contents of this repository are licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. See the LICENSE file for details. If the data set is used within a publication, the following journal article has to be cited as the source of the data set:

Stephan Wiefling, Paul René Jørgensen, Sigurd Thunem, and Luigi Lo Iacono: Pump Up Password Security! Evaluating and Enhancing Risk-Based Authentication on a Real-World Large-Scale Online Service. In: ACM Transactions on Privacy and Security (2022). doi: 10.1145/3546069

Few (invalid) user agents strings from the original data set could not be parsed, so their device type is empty. Perhaps this parse error is useful information for your studies, so we kept these 1526 entries.↩︎

Windows Instance Segmentation Dataset
universe.roboflow.com
zip
Updated May 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roboflow Universe Projects (2023). Windows Instance Segmentation Dataset [Dataset]. https://universe.roboflow.com/roboflow-universe-projects/windows-instance-segmentation/model/2
Explore at:
zipAvailable download formats
Dataset updated
May 3, 2023
Dataset provided by
Roboflowhttps://roboflow.com/
Authors
Roboflow Universe Projects
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Windows Polygons
Description
Here are a few use cases for this project:

Smart Building Design and Analysis: Architects and engineers could use the Windows Instance Segmentation model to automatically analyze building facades in images and identify the distribution, sizes, and styles of windows. This information can be used to improve building designs for daylighting, ventilation, and aesthetic purposes.

Real Estate Appraisal and Listing: Real estate professionals can use the model to analyze property photos, automatically identifying and categorizing windows to create more detailed and accurate property listings. Potential buyers and renters can then use this information for better search results and understanding of architectural features.

Energy Efficiency Analysis: Energy consultants and researchers can utilize the Windows Instance Segmentation model to analyze the prevalence of different window styles and their impact on building energy efficiency. This can help in developing more sustainable building designs and energy retrofit strategies.

Urban Planning and Cityscape Analysis: Urban planners and city officials can make use of this model to assess the distribution of windows in urban environments, understanding how they contribute to the overall aesthetic and livability of neighborhoods. This information can guide zoning regulations and future development projects to create more visually appealing and functional cities.

Augmented Reality (AR) Applications: Developers of AR applications, particularly those focused on architecture and interior design, can integrate the Windows Instance Segmentation model to recognize windows in real-world environments. This can enable users to visualize new window styles, treatments, or decorations, helping them make better-informed design decisions.
t
Microsoft Ranking dataset - Dataset - LDM
service.tib.eu
Updated Jan 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Microsoft Ranking dataset - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/microsoft-ranking-dataset
Explore at:
Dataset updated
Jan 3, 2025
Description
The dataset contains relevance scores for websites recommended to different users, and comprises of 30, 000 user-website pairs. For a user i and website j, the data contains a 136-dimensional feature vector uj i, which consists of user i’s attributes corresponding to website j, such as length of stay or number of clicks on the website. Furthermore, for each user-website pair, the dataset also contains a relevance score, i.e. how relevant the website was to the user.
MODIS/Terra Land Surface Temperature/3-Band Emissivity Daily L3 Global 1km...
data.staging.idas-ds1.appdat.jsc.nasa.gov
data.nasa.gov
Updated Aug 4, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). MODIS/Terra Land Surface Temperature/3-Band Emissivity Daily L3 Global 1km SIN Grid Night V006 - Dataset - NASA Open Data Portal [Dataset]. https://data.staging.idas-ds1.appdat.jsc.nasa.gov/dataset/modis-terra-land-surface-temperature-3-band-emissivity-daily-l3-global-1km-sin-grid-night-
Explore at:
Dataset updated
Aug 4, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
The MOD21A1N Version 6 data product was decommissioned on July 31, 2023. Users are encouraged to use the MOD21A1N Version 6.1 data product.A new suite of Moderate Resolution Imaging Spectroradiometer (MODIS) Land Surface Temperature and Emissivity (LST&E) products are available in Collection 6. The MOD21 Land Surface Temperature (LST) algorithm differs from the algorithm of the MOD11 LST products, in that the MOD21 algorithm is based on the ASTER Temperature/Emissivity Separation (TES) technique, whereas the MOD11 uses the split-window technique. The MOD21 TES algorithm uses a physics-based algorithm to dynamically retrieve both the LST and spectral emissivity simultaneously from the MODIS thermal infrared bands 29, 31, and 32. The TES algorithm is combined with an improved Water Vapor Scaling (WVS) atmospheric correction scheme to stabilize the retrieval during very warm and humid conditions. The MOD21A1N dataset is produced daily from nighttime Level 2 Gridded (L2G) intermediate LST products. The L2G process maps the daily MOD21 swath granules onto a sinusoidal MODIS grid and stores all observations falling over a gridded cell for a given day. The MOD21A1 algorithm sorts through these observations for each cell and estimates the final LST value as an average from all observations that are cloud free and have good LST&E accuracies. The nighttime average is weighted by the observation coverage for that cell. Only observations having an observation coverage greater than a 15% threshold are considered. The MOD21A1N product contains seven Science Datasets (SDS), which include the calculated LST as well as quality control, the three emissivity bands, view zenith angle, and time of observation. MOD21A1N products are available two months after acquisition due to latency of data inputs. Additional details regarding the methodology used to create this Level 3 (L3) product are available in the Algorithm Theoretical Basis Document (ATBD).Known Issues * Forward processing of Terra MODIS LST&E Version 6 data products was discontinued on December 31, 2005. Users are encouraged to use the MOD21A1N Version 6.1 data product. Users of MODIS LST products may notice an increase in occurrences of extreme high temperature outliers in the unfiltered MxD21 Version 6 and 6.1 products compared to the heritage MxD11 LST products. This can occur especially over desert regions like the Sahara where undetected cloud and dust can negatively impact both the MxD21 and MxD11 retrieval algorithms. * In the MxD11 LST products, these contaminated pixels are flagged in the algorithm and set to fill values in the output products based on differences in the band 32 and band 31 radiances used in the generalized split window algorithm. In the MxD21 LST products, values for the contaminated pixels are retained in the output products (and may result in overestimated temperatures), and users need to apply Quality Control (QC) filtering and other error analyses for filtering out bad values. High temperature outlier thresholds are not employed in MxD21 since it would potentially remove naturally occurring hot surface targets such as fires and lava flows. High atmospheric aerosol optical depth (AOD) caused by vast dust outbreaks in the Sahara and other deserts highlighted in the example documentation are the primary reason for high outlier surface temperature values (and corresponding low emissivity values) in the MxD21 LST products. Future versions of the MxD21 product will include a dust flag from the MODIS aerosol product and/or brightness temperature look up tables to filter out contaminated dust pixels. It should be noted that in the MxD11B day/night algorithm products, more advanced cloud filtering is employed in the multi-day products based on a temporal analysis of historical LST over cloudy areas. This may result in more stringent filtering of dust contaminated pixels in these products. * In order to mitigate the impact of dust in the MxD21 V6 and 6.1 products, the science team recommends using a combination of the existing QC bits, emissivity values, and estimated product errors, to confidently remove bad pixels from analysis. For more details, refer to this dust and cloud contamination example documentation. For complete information about known issues please refer to the MODIS/VIIRS Land Quality Assessment website.Improvements/Changes from Previous Versions New product for MODIS Version 6.
e
Data synchronizator of Where2test pipeline - Dataset - B2FIND
b2find.eudat.eu
Updated Oct 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Data synchronizator of Where2test pipeline - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/edcad24f-c5bd-5c3d-a460-467df9af3cec
Explore at:
Dataset updated
Oct 30, 2023
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Software to synchonise the data between various data sources and casus database server. For Unix users please use MigrateWhere2test_0.7Unix.zip and for WIndows users please use MigrateWhere2test_0.7Win.zip. In order to use the scripts, please use the following instructions: Windows Create the postgreq sql database and set the port 5432 Create folder C:\Workspaces and unzip the unix file. Create folder in workspaces, com.com.casus.env.where2test.migration\COM_CASUS_WHERE2TEST_MIGRATION and then unzip the source file inside COM_CASUS_WHERE2TEST_MIGRATION. Set run Develop and run the .bat file on the folder MigrateWhere2test_0.7Unix to run in localhost. Unix Create PostgreSQL with port 32771. Create folder /home/wildan/Workspaces and unzip the unix file. Open the file MigrateWhere2test/MigrateWhere2test_run.sh and change the mode "Default" by "Production" Create folder in workspaces, com.com.casus.env.where2test.migration.unix/COM_CASUS_WHERE2TEST_MIGRATION and then unzip the source file inside COM_CASUS_WHERE2TEST_MIGRATION.
Analysis, Modeling, and Simulation (AMS) Testbed Development and Evaluation...
catalog.data.gov
data.bts.gov
+3more
Updated Dec 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Federal Highway Administration (2023). Analysis, Modeling, and Simulation (AMS) Testbed Development and Evaluation to Support Dynamic Mobility Applications (DMA) and Active Transportation and Demand Management (ATDM) Programs: San Mateo Testbed Analysis Plan [supporting datasets] [Dataset]. https://catalog.data.gov/dataset/analysis-modeling-and-simulation-ams-testbed-development-and-evaluation-to-support-dynamic-9521f
Explore at:
Dataset updated
Dec 7, 2023
Dataset provided by
Federal Highway Administrationhttps://highways.dot.gov/
Description
This zip file contains files of data to support FHWA-JPO-16-370, Analysis, Modeling, and Simulation (AMS) Testbed Development and Evaluation to Support Dynamic Mobility Applications (DMA) and Active Transportation and Demand Management (ATDM) Programs - San Mateo Testbed Analysis Plan : Final Report. Zip size is 1.5 GB. The files have been uploaded as-is; no further documentation was supplied by NTL. All located .docx files were copied to .pdf document files which are an archival format. These .pdfs were then added to the zip file alongside the original .docx files. The attached zip files can be unzipped using any zip compression/decompression software. These zip file contains files in the following formats: .pdf document files which can be read using any pdf reader; .docx document files which may be opened with Microsoft Word or some other open source document editors; .xlsx spreadsheet files which may be opened with Microsoft Excel or some other open source spreadsheet editors; .syn files are a proprietary file format for signal timing plans which are provided in the Synchro Model given as “El Camino Real Synchro.syn” and can be opened using Trafficware Synchro, which may require users to purchase a license or software (for more information go to http://www.trafficware.com/); .csv data files, an open format, which may be opened with any text editor or in many spreadsheet applications; .db generic database files, often associated with thumbnail images in the Windows operating environment; .rbc files, which are scripts written in Rembo-C, which can be opened in a text editor, but require a server with Rembo installed to run the scripts; .vap audio files which will require special audio editing software to manipulate; .dll dynamically linked files for Windows program operations; .layx, a file type on which we could not locate reliable information; and .inpx files, a file type on which we could not locate reliable information [software requirements]. These files were last accessed in 2017. Files were accessed in 2017. Data will be preserved as is.
RecoReact
huggingface.co
Updated Sep 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Microsoft (2025). RecoReact [Dataset]. https://huggingface.co/datasets/microsoft/RecoReact
Explore at:
Dataset updated
Sep 19, 2025
Dataset authored and provided by
Microsofthttp://microsoft.com/
License
https://choosealicense.com/licenses/cdla-permissive-2.0/https://choosealicense.com/licenses/cdla-permissive-2.0/
Description
Overview

RecoReact is a novel dataset of multi-turn interactions between real users and a (random) AI assistant, providing a unique set of feedback signals, including multi-turn natural language requests, structured item selections, item ratings, and user profiles. The dataset spans three different domains: news articles, travel destinations, and meal planning.

Dataset Details

This dataset contains 1785 interactions from 595 users (with a median completion time of about… See the full description on the dataset page: https://huggingface.co/datasets/microsoft/RecoReact.
Fundamental Data Record for Atmospheric Composition [ATMOS_L1B]
earth.esa.int
Updated Jul 1, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
European Space Agency (2024). Fundamental Data Record for Atmospheric Composition [ATMOS_L1B] [Dataset]. https://earth.esa.int/eogateway/catalog/fdr-for-atmospheric-composition
Explore at:
Dataset updated
Jul 1, 2024
Dataset authored and provided by
European Space Agencyhttp://www.esa.int/
License
https://earth.esa.int/eogateway/documents/20142/1564626/Terms-and-Conditions-for-the-use-of-ESA-Data.pdfhttps://earth.esa.int/eogateway/documents/20142/1564626/Terms-and-Conditions-for-the-use-of-ESA-Data.pdf
Time period covered
Jun 28, 1995 - Apr 7, 2012
Description
The Fundamental Data Record (FDR) for Atmospheric Composition UVN v.1.0 dataset is a cross-instrument Level-1 product [ATMOS_L1B] generated in 2023 and resulting from the ESA FDR4ATMOS project. The FDR contains selected Earth Observation Level 1b parameters (irradiance/reflectance) from the nadir-looking measurements of the ERS-2 GOME and Envisat SCIAMACHY missions for the period ranging from 1995 to 2012. The data record offers harmonised cross-calibrated spectra with focus on spectral windows in the Ultraviolet-Visible-Near Infrared regions for the retrieval of critical atmospheric constituents like ozone (O3), sulphur dioxide (SO2), nitrogen dioxide (NO2) column densities, alongside cloud parameters. The FDR4ATMOS products should be regarded as experimental due to the innovative approach and the current use of a limited-sized test dataset to investigate the impact of harmonization on the Level 2 target species, specifically SO2, O3 and NO2. Presently, this analysis is being carried out within follow-on activities. The FDR4ATMOS V1 is currently being extended to include the MetOp GOME-2 series. Product format For many aspects, the FDR product has improved compared to the existing individual mission datasets: GOME solar irradiances are harmonised using a validated SCIAMACHY solar reference spectrum, solving the problem of the fast-changing etalon present in the original GOME Level 1b data; Reflectances for both GOME and SCIAMACHY are provided in the FDR product. GOME reflectances are harmonised to degradation-corrected SCIAMACHY values, using collocated data from the CEOS PIC sites; SCIAMACHY data are scaled to the lowest integration time within the spectral band using high-frequency PMD measurements from the same wavelength range. This simplifies the use of the SCIAMACHY spectra which were split in a complex cluster structure (with own integration time) in the original Level 1b data; The harmonization process applied mitigates the viewing angle dependency observed in the UV spectral region for GOME data; Uncertainties are provided. Each FDR product provides, within the same file, irradiance/reflectance data for UV-VIS-NIR special regions across all orbits on a single day, including therein information from the individual ERS-2 GOME and Envisat SCIAMACHY measurements. FDR has been generated in two formats: Level 1A and Level 1B targeting expert users and nominal applications respectively. The Level 1A [ATMOS_L1A] data include additional parameters such as harmonisation factors, PMD, and polarisation data extracted from the original mission Level 1 products. The ATMOS_L1A dataset is not part of the nominal dissemination to users. In case of specific requirements, please contact EOHelp. Please refer to the README file for essential guidance before using the data. All the new products are conveniently formatted in NetCDF. Free standard tools, such as Panoply, can be used to read NetCDF data. Panoply is sourced and updated by external entities. For further details, please consult our Terms and Conditions page. Uncertainty characterisation One of the main aspects of the project was the characterization of Level 1 uncertainties for both instruments, based on metrological best practices. The following documents are provided: General guidance on a metrological approach to Fundamental Data Records (FDR) Uncertainty Characterisation document Effect tables NetCDF files containing example uncertainty propagation analysis and spectral error correlation matrices for SCIAMACHY (Atlantic and Mauretania scene for 2003 and 2010) and GOME (Atlantic scene for 2003) reflectance_uncertainty_example_FDR4ATMOS_GOME.nc reflectance_uncertainty_example_FDR4ATMOS_SCIA.nc Known Issues Non-monotonous wavelength axis for SCIAMACHY in FDR data version 1.0 In the SCIAMACHY OBSERVATION group of the atmospheric FDR v1.0 dataset (DOI: 10.5270/ESA-852456e), the wavelength axis (lambda variable) is not monotonically increasing. This issue affects all spectral channels (UV, VIS, NIR) in the SCIAMACHY group, while GOME OBSERVATION data remain unaffected. The root cause of the issue lies in the incorrect indexing of the lambda variable during the NetCDF writing process. Notably, the wavelength values themselves are calculated correctly within the processing chain. Temporary Workaround The wavelength axis is correct in the first record of each product. As a workaround, users can extract the wavelength axis from the first record and apply it to all subsequent measurements within the same product. The first record can be retrieved by setting the first two indices (time and scanline) to 0 (assuming counting of array indices starts at 0). Note that this process must be repeated separately for each spectral range (UV, VIS, NIR) and every daily product. Since the wavelength axis of SCIAMACHY is highly stable over time, using the first record introduces no expected impact on retrieval results. Python pseudo-code example: lambda_...
d
Mental Health Services Monthly Statistics
digital.nhs.uk
Updated Mar 15, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). Mental Health Services Monthly Statistics [Dataset]. https://digital.nhs.uk/data-and-information/publications/statistical/mental-health-services-monthly-statistics
Explore at:
Dataset updated
Mar 15, 2020
License
https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions
Time period covered
Dec 1, 2019 - Mar 31, 2020
Description
This publication provides the most timely picture available of people using NHS funded secondary mental health, learning disabilities and autism services in England. These are experimental statistics which are undergoing development and evaluation. This information will be of use to people needing access to information quickly for operational decision making and other purposes. More detailed information on the quality and completeness of these statistics is made available later in our Mental Health Bulletin: Annual Report publication series. The Data Collection Board (DCB) has now approved the decommissioning of the interim collection of Early Intervention in Psychosis (EIP) waiting times information, known as NHS England Unify Collection within this publication. Waiting times for EIP for October 2019 activity onwards are now monitored using data from the Mental Health Services Data Set (MHSDS). From April 2020 NHS Digital is implementing a multiple submission window model for MHSDS which will enable the resubmission of data throughout the financial year. Following the implementation of the multiple submission window model providers will optionally be able to submit/resubmit data for each month of 2019-20 from April 2020 to 21 May 2020. The opportunity to resubmit data for each month of 2019-20 will impact on the statistics already published for the 2019-20 year. It is likely that the statistics for each month will be republished; however the publication method is as yet undecided and will be proportionate to the changes; further details will be communicated closer to the time. Please be aware of the potential impact of the multiple submission window model on previously published data and use these statistics with reference to it. Further information can be found on the NHS Digital Multiple submission window model for MHSDS webpage linked below. The Provisional March data file has been removed as this is now superseded by the published Performance March data. NHS Digital apologises for any inconvenience caused. From April 2020 onwards, NHS Digital has been implementing a multiple submission window model (MSWM) for MHSDS. This allows providers to retrospectively submit data for a specific reporting period once the initial provisional and performance submission windows have closed. For a limited time, providers were given the opportunity to submit revised monthly data for all months within 2019/20 using the MSWM. As of January 2021, NHS Digital has now released revised 'End of Year' versions of the main monthly csv files for each month between April 2019 and February 2020 which reflect these revised 2019/20 MSWM submissions that occurred after 'Final' monthly data had already been published. Both the 'Final' and 'End of Year' versions of the main monthly csv files are available to download under 'Resources'. The key facts corresponding to both versions are also presented below.
Z
The ORBIT (Object Recognition for Blind Image Training)-India Dataset
data.niaid.nih.gov
zenodo.org
Updated Jul 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Massiceti, Daniela (2024). The ORBIT (Object Recognition for Blind Image Training)-India Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11394528
Explore at:
Dataset updated
Jul 2, 2024
Dataset provided by
Jones, Matt
Grayson, Martin
Morrison, Cecily
Pearson, Jennifer
Massiceti, Daniela
Robinson, Simon
India, Gesu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
India
Description
The ORBIT (Object Recognition for Blind Image Training) -India Dataset is a collection of 105,243 images of 76 commonly used objects, collected by 12 individuals in India who are blind or have low vision. This dataset is an "Indian subset" of the original ORBIT dataset [1, 2], which was collected in the UK and Canada. In contrast to the ORBIT dataset, which was created in a Global North, Western, and English-speaking context, the ORBIT-India dataset features images taken in a low-resource, non-English-speaking, Global South context, a home to 90% of the world’s population of people with blindness. Since it is easier for blind or low-vision individuals to gather high-quality data by recording videos, this dataset, like the ORBIT dataset, contains images (each sized 224x224) derived from 587 videos. These videos were taken by our data collectors from various parts of India using the Find My Things [3] Android app. Each data collector was asked to record eight videos of at least 10 objects of their choice.

Collected between July and November 2023, this dataset represents a set of objects commonly used by people who are blind or have low vision in India, including earphones, talking watches, toothbrushes, and typical Indian household items like a belan (rolling pin), and a steel glass. These videos were taken in various settings of the data collectors' homes and workspaces using the Find My Things Android app.

The image dataset is stored in the ‘Dataset’ folder, organized by folders assigned to each data collector (P1, P2, ...P12) who collected them. Each collector's folder includes sub-folders named with the object labels as provided by our data collectors. Within each object folder, there are two subfolders: ‘clean’ for images taken on clean surfaces and ‘clutter’ for images taken in cluttered environments where the objects are typically found. The annotations are saved inside a ‘Annotations’ folder containing a JSON file per video (e.g., P1--coffee mug--clean--231220_084852_coffee mug_224.json) that contains keys corresponding to all frames/images in that video (e.g., "P1--coffee mug--clean--231220_084852_coffee mug_224--000001.jpeg": {"object_not_present_issue": false, "pii_present_issue": false}, "P1--coffee mug--clean--231220_084852_coffee mug_224--000002.jpeg": {"object_not_present_issue": false, "pii_present_issue": false}, ...). The ‘object_not_present_issue’ key is True if the object is not present in the image, and the ‘pii_present_issue’ key is True, if there is a personally identifiable information (PII) present in the image. Note, all PII present in the images has been blurred to protect the identity and privacy of our data collectors. This dataset version was created by cropping images originally sized at 1080 × 1920; therefore, an unscaled version of the dataset will follow soon.

This project was funded by the Engineering and Physical Sciences Research Council (EPSRC) Industrial ICASE Award with Microsoft Research UK Ltd. as the Industrial Project Partner. We would like to acknowledge and express our gratitude to our data collectors for their efforts and time invested in carefully collecting videos to build this dataset for their community. The dataset is designed for developing few-shot learning algorithms, aiming to support researchers and developers in advancing object-recognition systems. We are excited to share this dataset and would love to hear from you if and how you use this dataset. Please feel free to reach out if you have any questions, comments or suggestions.

REFERENCES:

Daniela Massiceti, Lida Theodorou, Luisa Zintgraf, Matthew Tobias Harris, Simone Stumpf, Cecily Morrison, Edward Cutrell, and Katja Hofmann. 2021. ORBIT: A real-world few-shot dataset for teachable object recognition collected from people who are blind or low vision. DOI: https://doi.org/10.25383/city.14294597

microsoft/ORBIT-Dataset. https://github.com/microsoft/ORBIT-Dataset

Linda Yilin Wen, Cecily Morrison, Martin Grayson, Rita Faia Marques, Daniela Massiceti, Camilla Longden, and Edward Cutrell. 2024. Find My Things: Personalized Accessibility through Teachable AI for People who are Blind or Low Vision. In Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems (CHI EA '24). Association for Computing Machinery, New York, NY, USA, Article 403, 1–6. https://doi.org/10.1145/3613905.3648641

Used Cars Sales Listings Dataset 2025

kaggle.com

Updated Aug 12, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Pratyush Puri (2025). Used Cars Sales Listings Dataset 2025 [Dataset]. https://www.kaggle.com/datasets/pratyushpuri/used-car-sales-listings-dataset-2025

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Aug 12, 2025

Dataset provided by

Kaggle

Authors

Pratyush Puri

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Luxury Cosmetics Pop‑Up Events Dataset

A comprehensive, real-world–anchored synthetic dataset capturing 2,133 luxury beauty pop-up events across global retail hotspots. It focuses on limited-edition product drops, experiential formats, and performance KPIs—especially footfall and sell‑through. The data is designed for analytics use cases such as demand forecasting, footfall modeling, merchandising optimization, pricing analysis, and market expansion studies across regions and venue types.

What this dataset contains

2,133 events from global hubs across North America, Europe, Middle East, Asia‑Pacific, and Latin America
Luxury/premium cosmetics brands and their limited‑release SKUs
Event formats and retail venue archetypes typical of pop‑up retail
Time windows and lease lengths aligned with short‑term pop‑up activations
Core commercial KPIs: price, units sold, sell‑through percentage
Footfall KPI: average daily footfall modeled by location/format/marketing intensity

Ideal use cases

Pop‑up ROI and performance benchmarking by brand, city, venue type, and format
Footfall prediction and location strategy (high‑street vs mall vs airport vs districts)
Limited-edition launch analytics: pricing vs sell‑through dynamics
Event planning: lease length and timing windows vs outcomes
Territory planning: region and city segmentation performance
Portfolio dashboards: cross‑brand comparisons and trend reporting

File formats

CSV, JSON, XLSX, and SQLite (table: popups)

Target users

Retail strategy and analytics teams
Growth, trade marketing, and brand managers
Data scientists building forecasting and optimization models
BI developers building dashboards for pop‑up performance

Column Dictionary

Column	Type	Example	Description
event_id	string	POP100282	Unique identifier for each pop‑up event.
brand	string	Charlotte Tilbury	Luxury/premium cosmetics brand running the pop‑up.
region	string	North America	Macro market region (North America, Europe, Middle East, Asia‑Pacific, Latin America).
city	string	Miami	City of the event; occasionally null to simulate real‑world data gaps.
location_type	string	Art/Design District	Venue archetype: High‑Street, Luxury Mall, Dept Store Atrium, Airport Duty‑Free, Art/Design District.
event_type	string	Flash Event	Pop‑up format: Standalone, Shop‑in‑Shop, Mobile Truck, Flash Event, Mall Kiosk.
start_date	date	2024-02-25	Event start date.
end_date	date	2024-03-02	Event end date; can be null (e.g., ongoing/TBC) to reflect operational uncertainty.
lease_length_days	integer	6	Duration of the activation (days), aligned with short‑term pop‑up leases.
sku	string	LE-UQYNQA1A	Limited‑release product code tied to the event/dataset scope.
product_name	string	Charlotte Tilbury Glow Mascara	Branded product listing (luxury‑oriented descriptors + category).
price_usd	float	62.21	Ticket price (USD) aligned with luxury cosmetics price bands by category.
avg_daily_footfall	integer	1107	Estimated average daily visitors based on venue, format, and activation intensity.
units_sold	integer	3056	Total units sold during the event window; capped by allocation dynamics.
sell_through_pct	float	98.9	Share of allocated inventory sold (%), proxy for demand strength and launch success.

Data quality notes

City and end_date contain a small proportion of nulls to reflect real‑world reporting gaps (e.g., ongoing events).
avg_daily_footfall varies by locati...

Detachable Tablet Market Analysis, Size, and Forecast 2025-2029: APAC...
technavio.com
pdf
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Technavio (2025). Detachable Tablet Market Analysis, Size, and Forecast 2025-2029: APAC (Australia, China, India, Japan, South Korea), North America (US and Canada), Europe (France, Germany, UK), South America , and Middle East and Africa [Dataset]. https://www.technavio.com/report/detachable-tablet-market-industry-analysis
Explore at:
pdfAvailable download formats
Dataset updated
Apr 11, 2025
Dataset provided by
TechNavio
Authors
Technavio
License
https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Time period covered
2025 - 2029
Area covered
Canada, United States
Description
Snapshot img

Detachable Tablet Market Size 2025-2029

The detachable tablet market size is forecast to increase by USD 3.65 billion, at a CAGR of 4.6% between 2024 and 2029.

The market is experiencing significant growth, driven by the proliferation of affordable options and the increasing implementation of portable PCs in education institutions. The availability of low-cost detachable tablets is expanding the market's reach, making these devices accessible to a broader consumer base. Additionally, the adoption of convertible laptops, which offer the functionality of both a laptop and a tablet, is increasing as users seek versatile devices for both personal and professional use. However, the market faces challenges, including the high price point of premium detachable tablets and the growing competition from other mobile devices, such as smartphones and laptops. Furthermore, the lack of standardization in detachable tablet design and compatibility with various software applications can hinder market growth. Companies looking to capitalize on market opportunities must focus on offering competitive pricing, ensuring compatibility with popular software, and providing innovative features to differentiate their products. Effective navigation of these challenges requires a deep understanding of consumer needs and preferences, as well as a commitment to continuous product innovation.

What will be the Size of the Detachable Tablet Market during the forecast period?

Request Free Sample

The market continues to evolve, driven by advancements in technology and shifting consumer preferences. Cloud storage solutions enable seamless access to digital content, while long battery life allows for uninterrupted use. The integration of detachable keyboards transforms these devices into productive 2-in-1 laptops, catering to the needs of professionals. Virtual reality applications expand the market's reach, offering immersive experiences in various sectors. Moreover, viewing angles and high-definition displays enhance the user experience, ensuring crisp visuals for multimedia consumption. The fusion of digital content and detachable tablets facilitates multitasking and flexibility, making these devices indispensable in mobile computing.

Battery life, a critical factor, is continuously improving, ensuring longer usage hours. Detachable keyboards, with their magnetic connectors, offer a sleek design and easy portability. Virtual reality applications, with their immersive capabilities, are revolutionizing industries such as education, healthcare, and entertainment. The ongoing development of detachable tablets is further fueled by advancements in digital content, virtual reality, and productivity suites. Machine learning and artificial intelligence enhance user experience, while capacitive touchscreens and digital pens cater to creative professionals. Fingerprint sensors and screen protectors ensure data protection, making these devices secure and reliable. In summary, the market is characterized by continuous innovation and evolving patterns, offering versatile solutions for various applications in mobile computing.

The integration of cloud storage, detachable keyboards, virtual reality, viewing angle, and digital content creates a dynamic market that caters to the ever-changing needs of consumers and businesses alike.

How is this Detachable Tablet Industry segmented?

The detachable tablet industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

OS Windows iPadOS Others Type Below 8 inches 8 inches Above 8 inches Application Personal Professional Geography North America US Canada Europe France Germany UK APAC Australia China India Japan South Korea Rest of World (ROW)

By OS Insights

The windows segment is estimated to witness significant growth during the forecast period.

In the dynamic world of technology, the market continues to evolve, integrating advanced features that cater to both personal and professional use. Microsoft's Windows, an operating system installed on the majority of electronic devices, powers many detachable tablets. Windows, with its long-standing history dating back to 1985, offers compatibility with a vast array of software applications. The latest addition to the Windows family, Windows 11, was released in October 2022, and since then, Microsoft has consistently delivered updates to improve user experience. One such update, version 23H2, was made available on October 31, 2023. Data protection is a priority in today's digital age, and detachable tablets are no exception.

Advanced security features like face recognition, fingerprint sensors, and machine learning algorithms
w
Windows Vista A Summary
data.wu.ac.at
datahub.io
Updated Oct 10, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Global (2013). Windows Vista A Summary [Dataset]. https://data.wu.ac.at/schema/datahub_io/M2EyNWVlNjEtNGU4MS00OTk0LWJlNWQtNjcwOGU1NzhjYjM5
Explore at:
Dataset updated
Oct 10, 2013
Dataset provided by
Global
Description
Microsoft promises that Vista could have plen..

After a space of nearly 5 years, Microsoft introduced the most recent version of Win-dows which can be named as Vista. With a name change from Longhorn to Vista, the stable version of Windows is likely to be introduced throughout November 2006. The beta version of the application has already been available as a free of charge download. If you've lots of patience or a very high speed internet connection then you can down load Vista which will be almost 2.5 GB in size.

Microsoft guarantees that Vista may have lots of new features including current graphical user interface (GUI), totally revamped music, print and network sub-systems and Windows DVD Maker a new design tool for media. Vista will make usage of peer-to-peer technology to make file sharing between networked computers easier. To compare additional info, please consider taking a peep at: my ryan fellows san diego. With the introduction of Virtual PC in Vista, Microsoft claims that running previous versions of Windows simultaneously to the sam-e equipment won't become a problem.

The developers can make use of the Net Framework type 3.0 presented in Vista. Get more on our related article - Navigate to this website: like us on facebook. This edition is said to be better to use than the standard Windows API.

The most frequent complaint of Windows XP is its unstable security and its inability to guard the device from viruses, buffer overflows and malware. The stated goal of Vista is always to enhance the protection con-siderably and make the machine protected from the attack of virus threats and malwares.

Vista is sold with improved performance of Windows Shell, easier and faster search characteristics, a sidebar resembling Apples Spotlight, pc tools for applets, the latest version of Internet Explorer the controversial browser of Microsoft -, Windows Media Player 11, specialized User Account Control, built in Firewall to control and check outgoing and incoming traffic, Windows Defender, a Windows version of anti-spyware and Windows mail that may replace the existing Outlook Express.

Microsoft states Vista may help in improving the efficiency of the PC in certain critical areas including a reaction to user actions, starting up, and waking up. To read more, you should check out: site. The set up will probably be very fast and while background pro-cessing of scripts and programs are executed, the other preferred tasks can be performed by the users.

The newest sleep state pro-vision in Vista has mixed features for low-power usage and stand-by setting, data security during hibernation.

system should have a brand which should be at the least 800MHZ rate, 5-12 MB RAM and have the latest version of any good graphic card able to doing DirectX version 9 or above If you would like to use Vista on your own pc then. This is merely a minimum requirement. For optimum results an improved model above 1.2GHZ is required.

Facebook

Twitter

Click to copy link

Link copied

Cite

The Devastator (2023). US Broadband Usage Across Counties [Dataset]. https://www.kaggle.com/datasets/thedevastator/us-broadband-usage-across-counties-and-zip-codes

US Broadband Usage Across Counties

Utilizing Microsoft's Data to Estimate Access

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jan 6, 2023

Dataset provided by

Kagglehttp://kaggle.com/

Authors

The Devastator

Area covered

United States

Description

US Broadband Usage Across Counties

Utilizing Microsoft's Data to Estimate Access

By Amber Thomas [source]

About this dataset

This dataset provides an estimation of broadband usage in the United States, focusing on how many people have access to broadband and how many are actually using it at broadband speeds. Through data collected by Microsoft from our services, including package size and total time of download, we can estimate the throughput speed of devices connecting to the internet across zip codes and counties.

According to Federal Communications Commission (FCC) estimates, 14.5 million people don't have access to any kind of broadband connection. This data set aims to address this contrast between those with estimated availability but no actual use by providing more accurate usage numbers downscaled to county and zip code levels. Who gets counted as having access is vastly important -- it determines who gets included in public funding opportunities dedicated solely toward closing this digital divide gap. The implications can be huge: millions around this country could remain invisible if these number aren't accurately reported or used properly in decision-making processes.

This dataset includes aggregated information about these locations with less than 20 devices for increased accuracy when estimating Broadband Usage in the United States-- allowing others to use it for developing solutions that improve internet access or label problem areas accurately where no real or reliable connectivity exists among citizens within communities large and small throughout the US mainland.. Please review the license terms before using these data so that you may adhere appropriately with stipulations set forth under Microsoft's Open Use Of Data Agreement v1.0 agreement prior to utilizing this dataset for your needs-- both professional and educational endeavors alike!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

How to Use the US Broadband Usage Dataset

This dataset provides broadband usage estimates in the United States by county and zip code. It is ideally suited for research into how broadband connects households, towns and cities. Understanding this information is vital for closing existing disparities in access to high-speed internet, and for devising strategies for making sure all Americans can stay connected in a digital world.

The dataset contains six columns: - County – The name of the county for which usage statistics are provided. - Zip Code (5-Digit) – The 5-digit zip code from which usage data was collected from within that county or metropolitan area/micro area/divisions within states as reported by the US Census Bureau in 2018[2].
- Population (Households) – Estimated number of households defined according to [3] based on data from the US Census Bureau American Community Survey's 5 Year Estimates[4].
- Average Throughput (Mbps)- Average Mbps download speed derived from a combination of data collected anonymous devices connected through Microsoft services such as Windows Update, Office 365, Xbox Live Core Services, etc.[5]
- Percent Fast (> 25 Mbps)- Percentage of machines with throughput greater than 25 Mbps calculated using [6]. 6) Percent Slow (< 3 Mbps)- Percentage of machines with throughput less than 3Mbps calculated using [7].

Research Ideas

Targeting marketing campaigns based on broadband use. Companies can use the geographic and demographic data in this dataset to create targeted advertising campaigns that are tailored to individuals living in areas where broadband access is scarce or lacking.

Creating an educational platform for those without reliable access to broadband internet. By leveraging existing technologies such as satellite internet, media streaming services like Netflix, and platforms such as Khan Academy or EdX, those with limited access could gain access to new educational options from home.

Establishing public-private partnerships between local governments and telecom providers need better data about gaps in service coverage and usage levels in order to make decisions about investments into new infrastructure buildouts for better connectivity options for rural communities

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

See the dataset description for more information.

Columns

File: broadband_data_2020October.csv

Acknowledgements

If you use this dataset in your research,...

Clear search

Close search

Google apps

Main menu

US Broadband Usage Across Counties

US Broadband Usage Across Counties

Utilizing Microsoft's Data to Estimate Access

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

How to Use the US Broadband Usage Dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

Microsoft Geolife GPS Trajectory Dataset

Context

Content

Citation

Inspiration

Data from: Coarse datasets for the 2002-2010 Tsimane' Amazonian Panel...

Windows and Doors Extraction

Microsoft Access Users Email List

Microsoft Teams: number of daily active users 2019-2024

The Enhanced Microsoft Academic Knowledge Graph

Data from: Login Data Set for Risk-Based Authentication

Windows Instance Segmentation Dataset

Microsoft Ranking dataset - Dataset - LDM

MODIS/Terra Land Surface Temperature/3-Band Emissivity Daily L3 Global 1km...

Data synchronizator of Where2test pipeline - Dataset - B2FIND

Analysis, Modeling, and Simulation (AMS) Testbed Development and Evaluation...

RecoReact

Fundamental Data Record for Atmospheric Composition [ATMOS_L1B]

Mental Health Services Monthly Statistics

The ORBIT (Object Recognition for Blind Image Training)-India Dataset

Used Cars Sales Listings Dataset 2025

Luxury Cosmetics Pop‑Up Events Dataset

What this dataset contains

Ideal use cases

File formats

Target users

Column Dictionary

Data quality notes

Detachable Tablet Market Analysis, Size, and Forecast 2025-2029: APAC...

Snapshot img

Windows Vista A Summary

US Broadband Usage Across Counties

Utilizing Microsoft's Data to Estimate Access

US Broadband Usage Across Counties

Utilizing Microsoft's Data to Estimate Access

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

How to Use the US Broadband Usage Dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements