By Amber Thomas [source]
This dataset provides an estimation of broadband usage in the United States, focusing on how many people have access to broadband and how many are actually using it at broadband speeds. Through data collected by Microsoft from our services, including package size and total time of download, we can estimate the throughput speed of devices connecting to the internet across zip codes and counties.
According to Federal Communications Commission (FCC) estimates, 14.5 million people don't have access to any kind of broadband connection. This data set aims to address this contrast between those with estimated availability but no actual use by providing more accurate usage numbers downscaled to county and zip code levels. Who gets counted as having access is vastly important -- it determines who gets included in public funding opportunities dedicated solely toward closing this digital divide gap. The implications can be huge: millions around this country could remain invisible if these number aren't accurately reported or used properly in decision-making processes.
This dataset includes aggregated information about these locations with less than 20 devices for increased accuracy when estimating Broadband Usage in the United States-- allowing others to use it for developing solutions that improve internet access or label problem areas accurately where no real or reliable connectivity exists among citizens within communities large and small throughout the US mainland.. Please review the license terms before using these data so that you may adhere appropriately with stipulations set forth under Microsoft's Open Use Of Data Agreement v1.0 agreement prior to utilizing this dataset for your needs-- both professional and educational endeavors alike!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
How to Use the US Broadband Usage Dataset
This dataset provides broadband usage estimates in the United States by county and zip code. It is ideally suited for research into how broadband connects households, towns and cities. Understanding this information is vital for closing existing disparities in access to high-speed internet, and for devising strategies for making sure all Americans can stay connected in a digital world.
The dataset contains six columns: - County – The name of the county for which usage statistics are provided. - Zip Code (5-Digit) – The 5-digit zip code from which usage data was collected from within that county or metropolitan area/micro area/divisions within states as reported by the US Census Bureau in 2018[2].
- Population (Households) – Estimated number of households defined according to [3] based on data from the US Census Bureau American Community Survey's 5 Year Estimates[4].
- Average Throughput (Mbps)- Average Mbps download speed derived from a combination of data collected anonymous devices connected through Microsoft services such as Windows Update, Office 365, Xbox Live Core Services, etc.[5]
- Percent Fast (> 25 Mbps)- Percentage of machines with throughput greater than 25 Mbps calculated using [6]. 6) Percent Slow (< 3 Mbps)- Percentage of machines with throughput less than 3Mbps calculated using [7].
- Targeting marketing campaigns based on broadband use. Companies can use the geographic and demographic data in this dataset to create targeted advertising campaigns that are tailored to individuals living in areas where broadband access is scarce or lacking.
- Creating an educational platform for those without reliable access to broadband internet. By leveraging existing technologies such as satellite internet, media streaming services like Netflix, and platforms such as Khan Academy or EdX, those with limited access could gain access to new educational options from home.
- Establishing public-private partnerships between local governments and telecom providers need better data about gaps in service coverage and usage levels in order to make decisions about investments into new infrastructure buildouts for better connectivity options for rural communities
If you use this dataset in your research, please credit the original authors. Data Source
See the dataset description for more information.
File: broadband_data_2020October.csv
If you use this dataset in your research,...
Introduction. This document provides an overview of an archive composed of four sections.
[1] An introduction (this document) which describes the scope of the project
[2] Yearly folder, from 2002 until 2010, of the coarse Microsoft Access datasets + the surveys used to collect information for each year. The word coarse does not mean the information in the Microsoft Access dataset was not corrected for mistakes; it was, but some mistakes and inconsistencies remain, such as with data on age or education. Furthermore, the coarse dataset provides disaggregated information for selected topics, which appear in summary statistics in the clean dataset. For example, in the coarse dataset one can find the different illnesses afflicting a person during the past 14 days whereas in the clean dataset only the total number of illnesses appears.
[3] A letter from the Gran Consejo Tsimane’ authorizing the public use of de-identified data collected in our studies among Tsimane’.
[4] A Microsoft Excel document with the unique identification number for each person in the panel study.
Background. During 2002-2010, a team of international researchers, surveyors, and translators gathered longitudinal (panel) data on the demography, economy, social relations, health, nutritional status, local ecological knowledge, and emotions of about 1400 native Amazonians known as Tsimane’ who lived in thirteen villages near and far from towns in the department of Beni in the Bolivian Amazon. A report titled “Too little, too late” summarizes selected findings from the study and is available to the public at the electronic library of Brandeis University:
https://scholarworks.brandeis.edu/permalink/01BRAND_INST/1bo2f6t/alma9923926194001921
A copy of the clean, merged, and appended Stata (V17) dataset is available to the public at the following two web addresses:
[a] Brandeis University:
https://scholarworks.brandeis.edu/permalink/01BRAND_INST/1bo2f6t/alma9923926193901921
[b] Inter-university Consortium for Political and Social Research (ICPSR), University of Michigan (only available to users affiliated with institutions belonging to ICPSR)
http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/37671/utilization
Chapter 4 of the report “Too little, too late” mentioned above describes the motivation and history of the study, the difference between the coarse and clean datasets, and topics which can be examined only with coarse data.
Aims. The aims of this archive are to:
· Make available in Microsoft Access the coarse de-identified dataset [1] for each of the seven yearly surveys (2004-2010) and [2] one Access data based on quarterly surveys done during 2002 and 2003. Together, these two datasets form one longitudinal dataset of individuals, households, and villages.
· Provide guidance on how to link files within and across years, and
· Make available a Microsoft Excel file with a unique identification number to link individuals across years
The datasets in the archive.
· Eight Microsoft Access datasets with data on a wide range of variables. Except for the Access file for 2002-2003, all the other information in each of the other Access files refers to one year. Within any Access dataset, users will find two types of files:
o Thematic files. The name of a thematic file contains the prefix tbl (e.g., 29_tbl_Demography or tbl_29_Demography). The file name (sometimes in Spanish, sometimes in English) indicates the content of the file. For example, in the Access dataset for one year, the micro file tbl_30_Ventas has all the information on sales for that year. Within each micro file, columns contain information on a variable and the name of the column indicates the content of the variable. For instance, the column heading item in the Sales file would indicate the type of good sold. The exac…
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This GPS trajectory dataset was collected in (Microsoft Research) Geolife project by 178 users in a period of over four years (from April 2007 to October 2011). A GPS trajectory of this dataset is represented by a sequence of time-stamped points, each of which contains the information of latitude, longitude and altitude. This dataset contains 17,621 trajectories with a total distance of 1,251,654 kilometers and a total duration of 48,203 hours. These trajectories were recorded by different GPS loggers and GPS-phones, and have a variety of sampling rates. 91 percent of the trajectories are logged in a dense representation, e.g. every 1~5 seconds or every 5~10 meters per point.
This dataset recoded a broad range of users’ outdoor movements, including not only life routines like go home and go to work but also some entertainments and sports activities, such as shopping, sightseeing, dining, hiking, and cycling.
Data Format - Trajectory file Every single folder of this dataset stores a user’s GPS log files, which were converted to PLT format. Each PLT file contains a single trajectory and is named by its starting time. To avoid potential confusion of time zone, we use GMT in the date/time property of each point, which is different from our previous release. - PLT format: Line 1…6 are useless in this dataset, and can be ignored. Points are described in following lines, one for each line. Field 1: Latitude in decimal degrees. Field 2: Longitude in decimal degrees. Field 3: All set to 0 for this dataset. Field 4: Altitude in feet (-777 if not valid). Field 5: Date - number of days (with fractional part) that have passed since 12/30/1899. Field 6: Date as a string. Field 7: Time as a string. Note that field 5 and field 6&7 represent the same date/time in this dataset. You may use either of them. Example: 39.906631,116.385564,0,492,40097.5864583333,2009-10-11,14:04:30 39.906554,116.385625,0,492,40097.5865162037,2009-10-11,14:04:35 - Transportation mode labels Possible transportation modes are: walk, bike, bus, car, subway, train, airplane, boat, run and motorcycle. Again, we have converted the date/time of all labels to GMT, even though most of them were created in China. Example: Start Time End TimeTransportation Mode 2008/04/02 11:24:21 2008/04/02 11:50:45 bus 2008/04/03 01:07:03 2008/04/03 11:31:55 train 2008/04/03 11:32:24 2008/04/03 11:46:14 walk 2008/04/03 11:47:14 2008/04/03 11:55:07 car
First, you can regard the label of both taxi and car as driving although we set them with different labels for future usage. Second, a user could label the transportation mode of a light rail as train while others may use subway as the label. Actually, no trajectory can be recorded in an underground subway system since a GPS logger cannot receive any signal there. In Beijing, the light rails and subway systems are seamlessly connected, e.g., line 13 (a light rail) is connected with line 10 and line 2, which are subway systems. Sometimes, a line (like line 5) is comprised of partial subways and partial light rails. So, users may have a variety of understanding in their transportation modes. You can differentiate the real train trajectories (connecting two cities) from the light rail trajectory (generating in a city) according to their distances. Or, just treat them the same.
More: User Guide: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/User20Guide-1.2.pdf
Please cite the following papers when using this GPS dataset. [1] Yu Zheng, Lizhu Zhang, Xing Xie, Wei-Ying Ma. Mining interesting locations and travel sequences from GPS trajectories. In Proceedings of International conference on World Wild Web (WWW 2009), Madrid Spain. ACM Press: 791-800.
[2] Yu Zheng, Quannan Li, Yukun Chen, Xing Xie, Wei-Ying Ma. Understanding Mobility Based on GPS Data. In Proceedings of ACM conference on Ubiquitous Computing (UbiComp 2008), Seoul, Korea. ACM Press: 312-321. [3] Yu Zheng, Xing Xie, Wei-Ying Ma, GeoLife: A Collaborative Social Networking Service among User, location and trajectory. Invited paper, in IEEE Data Engineering Bulletin. 33, 2, 2010, pp. 32-40.
This trajectory dataset can be used in many research fields, such as mobility pattern mining, user activity recognition, location-based social networks, location privacy, and location recommendation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
General descriptionThis dataset contains some markers of Open Science in the publications of the Chemical Biology Consortium Sweden (CBCS) between 2010 and July 2023. The sample of CBCS publications during this period consists of 188 articles. Every publication was visited manually at its DOI URL to answer the following questions.1. Is the research article an Open Access publication?2. Does the research article have a Creative Common license or a similar license?3. Does the research article contain a data availability statement?4. Did the authors submit data of their study to a repository such as EMBL, Genbank, Protein Data Bank PDB, Cambridge Crystallographic Data Centre CCDC, Dryad or a similar repository?5. Does the research article contain supplementary data?6. Do the supplementary data have a persistent identifier that makes them citable as a defined research output?VariablesThe data were compiled in a Microsoft Excel 365 document that includes the following variables.1. DOI URL of research article2. Year of publication3. Research article published with Open Access4. License for research article5. Data availability statement in article6. Supplementary data added to article7. Persistent identifier for supplementary data8. Authors submitted data to NCBI or EMBL or PDB or Dryad or CCDCVisualizationParts of the data were visualized in two figures as bar diagrams using Microsoft Excel 365. The first figure displays the number of publications during a year, the number of publications that is published with open access and the number of publications that contain a data availability statement (Figure 1). The second figure shows the number of publication sper year and how many publications contain supplementary data. This figure also shows how many of the supplementary datasets have a persistent identifier (Figure 2).File formats and softwareThe file formats used in this dataset are:.csv (Text file).docx (Microsoft Word 365 file).jpg (JPEG image file).pdf/A (Portable Document Format for archiving).png (Portable Network Graphics image file).pptx (Microsoft Power Point 365 file).txt (Text file).xlsx (Microsoft Excel 365 file)All files can be opened with Microsoft Office 365 and work likely also with the older versions Office 2019 and 2016. MD5 checksumsHere is a list of all files of this dataset and of their MD5 checksums.1. Readme.txt (MD5: 795f171be340c13d78ba8608dafb3e76)2. Manifest.txt (MD5: 46787888019a87bb9d897effdf719b71)3. Materials_and_methods.docx (MD5: 0eedaebf5c88982896bd1e0fe57849c2),4. Materials_and_methods.pdf (MD5: d314bf2bdff866f827741d7a746f063b),5. Materials_and_methods.txt (MD5: 26e7319de89285fc5c1a503d0b01d08a),6. CBCS_publications_until_date_2023_07_05.xlsx (MD5: 532fec0bd177844ac0410b98de13ca7c),7. CBCS_publications_until_date_2023_07_05.csv (MD5: 2580410623f79959c488fdfefe8b4c7b),8. Data_from_CBCS_publications_until_date_2023_07_05_obtained_by_manual_collection.xlsx (MD5: 9c67dd84a6b56a45e1f50a28419930e5),9. Data_from_CBCS_publications_until_date_2023_07_05_obtained_by_manual_collection.csv (MD5: fb3ac69476bfc57a8adc734b4d48ea2b),10. Aggregated_data_from_CBCS_publications_until_2023_07_05.xlsx (MD5: 6b6cbf3b9617fa8960ff15834869f793),11. Aggregated_data_from_CBCS_publications_until_2023_07_05.csv (MD5: b2b8dd36ba86629ed455ae5ad2489d6e),12. Figure_1_CBCS_publications_until_2023_07_05_Open_Access_and_data_availablitiy_statement.xlsx (MD5: 9c0422cf1bbd63ac0709324cb128410e),13. Figure_1.pptx (MD5: 55a1d12b2a9a81dca4bb7f333002f7fe),14. Image_of_figure_1.jpg (MD5: 5179f69297fbbf2eaaf7b641784617d7),15. Image_of_figure_1.png (MD5: 8ec94efc07417d69115200529b359698),16. Figure_2_CBCS_publications_until_2023_07_05_supplementary_data_and_PID_for_supplementary_data.xlsx (MD5: f5f0d6e4218e390169c7409870227a0a),17. Figure_2.pptx (MD5: 0fd4c622dc0474549df88cf37d0e9d72),18. Image_of_figure_2.jpg (MD5: c6c68b63b7320597b239316a1c15e00d),19. Image_of_figure_2.png (MD5: 24413cc7d292f468bec0ac60cbaa7809)
Analyzing sales data is essential for any business looking to make informed decisions and optimize its operations. In this project, we will utilize Microsoft Excel and Power Query to conduct a comprehensive analysis of Superstore sales data. Our primary objectives will be to establish meaningful connections between various data sheets, ensure data quality, and calculate critical metrics such as the Cost of Goods Sold (COGS) and discount values. Below are the key steps and elements of this analysis:
1- Data Import and Transformation:
2- Data Quality Assessment:
3- Calculating COGS:
4- Discount Analysis:
5- Sales Metrics:
6- Visualization:
7- Report Generation:
Throughout this analysis, the goal is to provide a clear and comprehensive understanding of the Superstore's sales performance. By using Excel and Power Query, we can efficiently manage and analyze the data, ensuring that the insights gained contribute to the store's growth and success.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Intellectual Property Government Open Data (IPGOD) includes over 100 years of registry data on all intellectual property (IP) rights administered by IP Australia. It also has derived information about the applicants who filed these IP rights, to allow for research and analysis at the regional, business and individual level. This is the 2019 release of IPGOD.\r \r \r
IPGOD is large, with millions of data points across up to 40 tables, making them too large to open with Microsoft Excel. Furthermore, analysis often requires information from separate tables which would need specialised software for merging. We recommend that advanced users interact with the IPGOD data using the right tools with enough memory and compute power. This includes a wide range of programming and statistical software such as Tableau, Power BI, Stata, SAS, R, Python, and Scalar.\r \r \r
IP Australia is also providing free trials to a cloud-based analytics platform with the capabilities to enable working with large intellectual property datasets, such as the IPGOD, through the web browser, without any installation of software. IP Data Platform\r \r
\r The following pages can help you gain the understanding of the intellectual property administration and processes in Australia to help your analysis on the dataset.\r \r * Patents\r * Trade Marks\r * Designs\r * Plant Breeder’s Rights\r \r \r
\r
\r Due to the changes in our systems, some tables have been affected.\r \r * We have added IPGOD 225 and IPGOD 325 to the dataset!\r * The IPGOD 206 table is not available this year.\r * Many tables have been re-built, and as a result may have different columns or different possible values. Please check the data dictionary for each table before use.\r \r
\r Data quality has been improved across all tables.\r \r * Null values are simply empty rather than '31/12/9999'.\r * All date columns are now in ISO format 'yyyy-mm-dd'.\r * All indicator columns have been converted to Boolean data type (True/False) rather than Yes/No, Y/N, or 1/0.\r * All tables are encoded in UTF-8.\r * All tables use the backslash \ as the escape character.\r * The applicant name cleaning and matching algorithms have been updated. We believe that this year's method improves the accuracy of the matches. Please note that the "ipa_id" generated in IPGOD 2019 will not match with those in previous releases of IPGOD.
https://www.ibisworld.com/about/termsofuse/https://www.ibisworld.com/about/termsofuse/
The rise in remote work and digital transformation initiatives has accelerated the demand for robust and scalable solutions offered by the database, storage and backup software publishing industry. Cloud adoption has surged, with downstream businesses in finance and healthcare increasingly relying on cloud-based databases and storage systems to ensure accessibility and resilience. To capture demand, publishers have grown revenue through subscription-based offerings, which have expanded the industry's reach and provided recurring revenue over the past five years. Driven by a 47.9% surge in 2021, industry revenue has increased at a CAGR of 10.2% to reach $98.9 billion, including growth of 2.5% in 2025. Advancements in cloud and digital technology have paved the way for new freemium substitutes, reshaping industry competition and introducing operational challenges. As new, cost-effective solutions emerge, traditional publishers have faced the challenge of differentiating their offerings while maintaining profitability. Leading companies such as Microsoft and Oracle have responded with investments in compatibility capabilities and AI features that have been designed to retain users as more options become available. Combined with the emerging threat of cyber attacks, however, these investments have weighed on industry profitability as greater resources are now needed to support different initiatives. With freemium models here to stay, industry revenue growth will decelerate moving forward. Users are expected to demand free tiers among leading publishers, who have already deployed these subscription models at the cost of revenue growth. Despite these trends, however, publishers are expected to benefit from data center expansions and upgrades, which will provide them with the necessary infrastructure to develop next-generation AI and edge computing offerings. With billions of dollars being invested in these areas, industry revenue will be sustained and rise at a CAGR of 2.5% over the next five years to reach $112.0 billion in 2030.
The Common Objects in Context (COCO) dataset is a widely recognized collection designed to spur object detection, segmentation, and captioning research. Created by Microsoft, COCO provides annotations, including object categories, keypoints, and more. The model it a valuable asset for machine learning practitioners and researchers. Today, many model architectures are benchmarked against COCO, which has enabled a standard system by which architectures can be compared.
While COCO is often touted to comprise over 300k images, it's pivotal to understand that this number includes diverse formats like keypoints, among others. Specifically, the labeled dataset for object detection stands at 123,272 images.
The full object detection labeled dataset is made available here, ensuring researchers have access to the most comprehensive data for their experiments. With that said, COCO has not released their test set annotations, meaning the test data doesn't come with labels. Thus, this data is not included in the dataset.
The Roboflow team has worked extensively with COCO. Here are a few links that may be helpful as you get started working with this dataset:
Microsoft 365 is used by over * million companies worldwide, with over *** million customers in the United States alone using the office suite software. Office 365 is the brand name previously used by Microsoft for a group of software applications providing productivity related services to its subscribers. Office 365 applications include Outlook, OneDrive, Word, Excel, PowerPoint, OneNote, SharePoint and Microsoft Teams. The consumer and small business plans of Office 365 were renamed as Microsoft 365 on 21 April, 2020. Global office suite market share An office suite is a collection of software applications (word processing, spreadsheets, database etc.) designed to be used for tasks within an organization. Worldwide market share of office suite technologies is split between Google’s G Suite and Microsoft’s Office 365, with G Suite controlling around ** percent of the global market and Office 365 holding around ** percent. This trend is similar across most worldwide regions.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains the list of acquisitions made by the following companies:
Microsoft, Google, IBM, Hp, Apple, Amazon, Facebook, Twitter, eBay, Adobe, Citrix, Redhat, Blackberry, Disney
The attributes include the date, year, month of the acquisition, name of the company acquired, value or the cost of acquisition, business use-case of the acquisition, and the country from which the acquisition was made. The source of the dataset is Wikipedia, TechCrunch, and CrunchBase.
In 1998, the Adventure Works Cycles company collected a large volume of data about their existing customers, including demographic features and information about purchases they have made. The company is particularly interested in analyzing customer data to determine any apparent relationships between demographic features known about the customers and the likelihood of a customer purchasing a bike. Additionally, the analysis should endeavor to determine whether a customer's average monthly spend with the company can be predicted from known customer characteristics. Using the Adventure Works Cycles customer data you worked to create a regression model that predicts a customer's average monthly spend. The model should predict average monthly spend for new customers for whom no information about average monthly spend or previous bike purchases is available.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Research in information science and scholarly communication strongly relies on the availability of openly accessible datasets of metadata and, where possible, their relative payloads. To this end, CrossRef plays a pivotal role by providing free access to its entire metadata collection, and allowing other initiatives to link and enrich its information. Therefore, a number of key pieces of information result scattered across diverse datasets and resources freely available online. As a result of this fragmentation, researchers in this domain end up struggling with daily integration problems producing a plethora of ad-hoc datasets, therefore incurring in a waste of time, resources, and infringing open science best practices.
The latest DOIBoost release is a metadata collection that enriches CrossRef (October 2019 release: 108,048,986 publication records) with inputs from Microsoft Academic Graph (October 2019 release: 76,171,072 publication records), ORCID (October 2019 release: 12,642,131 publication records), and Unpaywall (August 2019 release: 26,589,869 publication records) for the purpose of supporting high-quality and robust research experiments. As a result of DOIBoost, CrossRef records have been "boosted" as follows:
This entry consists of two files: doiboost_dump-2019-11-27.tar (contains a set of partXYZ.gz files, each one containing the JSON files relative to the enriched CrossRef records), a schemaAndSample.zip, and termsOfUse.doc (contains details on the terms of use of DOIBoost).
Note that this records comes with two relationships to other results of this experiment:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
To create the dataset, the top 10 countries leading in the incidence of COVID-19 in the world were selected as of October 22, 2020 (on the eve of the second full of pandemics), which are presented in the Global 500 ranking for 2020: USA, India, Brazil, Russia, Spain, France and Mexico. For each of these countries, no more than 10 of the largest transnational corporations included in the Global 500 rating for 2020 and 2019 were selected separately. The arithmetic averages were calculated and the change (increase) in indicators such as profitability and profitability of enterprises, their ranking position (competitiveness), asset value and number of employees. The arithmetic mean values of these indicators for all countries of the sample were found, characterizing the situation in international entrepreneurship as a whole in the context of the COVID-19 crisis in 2020 on the eve of the second wave of the pandemic. The data is collected in a general Microsoft Excel table. Dataset is a unique database that combines COVID-19 statistics and entrepreneurship statistics. The dataset is flexible data that can be supplemented with data from other countries and newer statistics on the COVID-19 pandemic. Due to the fact that the data in the dataset are not ready-made numbers, but formulas, when adding and / or changing the values in the original table at the beginning of the dataset, most of the subsequent tables will be automatically recalculated and the graphs will be updated. This allows the dataset to be used not just as an array of data, but as an analytical tool for automating scientific research on the impact of the COVID-19 pandemic and crisis on international entrepreneurship. The dataset includes not only tabular data, but also charts that provide data visualization. The dataset contains not only actual, but also forecast data on morbidity and mortality from COVID-19 for the period of the second wave of the pandemic in 2020. The forecasts are presented in the form of a normal distribution of predicted values and the probability of their occurrence in practice. This allows for a broad scenario analysis of the impact of the COVID-19 pandemic and crisis on international entrepreneurship, substituting various predicted morbidity and mortality rates in risk assessment tables and obtaining automatically calculated consequences (changes) on the characteristics of international entrepreneurship. It is also possible to substitute the actual values identified in the process and following the results of the second wave of the pandemic to check the reliability of pre-made forecasts and conduct a plan-fact analysis. The dataset contains not only the numerical values of the initial and predicted values of the set of studied indicators, but also their qualitative interpretation, reflecting the presence and level of risks of a pandemic and COVID-19 crisis for international entrepreneurship.
Winter Steelhead Distribution June 2012 Version This dataset depicts observation-based stream-level geographic distribution of anadromous winter-run steelhead trout, Oncorhynchus mykiss irideus (O. mykiss), in California. It was developed for the express purpose of assisting with steelhead recovery planning efforts. The distributions reported in this dataset were derived from a subset of the data contained in the Aquatic Species Observation Database (ASOD), a Microsoft Access multi-species observation data capture application. ASOD is an ongoing project designed to capture as complete a set of statewide inland aquatic vertebrate species observation information as possible. Please note: A separate distribution is available for summer-run steelhead. Contact information is the same as for the above. ASOD Observation data were used to develop a network of stream segments. These lines are developed by "tracing down" from each observation to the sea using the flow properties of USGS National Hydrography Dataset (NHD) High Resolution hydrography. Lastly these lines, representing stream segments, were assigned a value of either Anad Present (Anadromous present). The end result (i.e., this layer) consists of a set of lines representing the distribution of steelhead based on observations in the Aquatic Species Observation Database. This dataset represents stream reaches that are known or believed to be used by steelhead based on steelhead observations. Thus, it contains only positive steelhead occurrences. The absence of distribution on a stream does not necessarily indicate that steelhead do not utilize that stream. Additionally, steelhead may not be found in all streams or reaches each year. This is due to natural variations in run size, water conditions, and other environmental factors. The information in this data set should be used as an indicator of steelhead presence/suspected presence at the time of the observation as indicated by the 'Late_Yr' (Latest Year) field attribute. The line features in the dataset may not represent the maximum extent of steelhead on a stream; rather it is important to note that this distribution most likely underestimates the actual distribution of steelhead. This distribution is based on observations found in the ASOD database. The individual observations may not have occurred at the upper extent of anadromous occupation. In addition, no attempt was made to capture every observation of O. mykiss and so it should not be assumed that this dataset is complete for each stream. The distribution dataset was built solely from the ASOD observational data. No additional data (habitat mapping, barriers data, gradient modeling, etc.) were utilized to either add to or validate the data. It is very possible that an anadromous observation in this dataset has been recorded above (upstream of) a barrier as identified in the Passage Assessment Database (PAD). In the near future, we hope to perform a comparative analysis between this dataset and the PAD to identify and resolve all such discrepancies. Such an analysis will add rigor to and help validate both datasets. This dataset has recently undergone a review. Data source contributors as well as CDFG fisheries biologists have been provided the opportunity to review and suggest edits or additions during a recent review. Data contributors were notified and invited to review and comment on the handling of the information that they provided. The distribution was then posted to an intranet mapping application and CDFG biologists were provided an opportunity to review and comment on the dataset. During this review, biologists were also encouraged to add new observation data. This resulting final distribution contains their suggestions and additions. Please refer to "Use Constraints" section below.
This synthetic Siberian Larch tree crown dataset was created for upscaling and machine learning purposes as a part of the SiDroForest (Siberia Drone Forest Inventory) project. The SiDroForest data collection (https://www.pangaea.de/?q=keyword%3A%22SiDroForest%22) consists of vegetation plots covered in Siberia during a 2-month fieldwork expedition in 2018 by the Alfred Wegener Institute Helmholtz Centre for Polar and Marine Research in Germany. During fieldwork fifty-six, 50*50-meter vegetation plots were covered by Unmanned Aerial Vehicle (UAV) flights and Red Green Blue (RGB) and Red Green Near Infrared (RGNIR) photographs were taken with a consumer grade DJI Phantom 4 quadcopter. The synthetic dataset provided here contains Larch (Larix gmelinii (Rupr.) Rupr. and Larix cajanderi Mayr.) tree crowns extracted from the onboard camera RGB UAV images of five selected vegetation plots from this expedition, placed on top of full-resized images from the same RGB flights. The extracted tree crowns have been rotated, rescaled and repositioned across the images with the result of a diverse synthetic dataset that contains 10.000 images for training purposes and 2000 images for validation purposes for complex machine learning neural networks.In addition, the data is saved in the Microsoft's Common Objects in Context dataset (COCO) format (Lin et al.,2013) and can be easily loaded as a dataset for networks such as the Mask R-CNN, U-Nets or the Faster R-NN. These are neural networks for instance segmentation tasks that have become more frequently used over the years for forest monitoring purposes. The images included in this dataset are from the field plots: EN18062 (62.17° N 127.81° E), EN18068 (63.07° N 117.98° E), EN18074 (62.22° N 117.02° E), EN18078 (61.57° N 114.29° E), EN18083 (59.97° N 113° E), located in Central Yakutia, Siberia. These sites were selected based on their vegetation content, their spectral differences in color as well as UAV flight angles and the clarity of the UAV images that were taken with automatic shutter and white balancing (Brieger et al. 2019). From each site 35 images were selected in order of acquisition, starting at the fifteenth image in the flight to make up the backgrounds for the dataset. The first fifteen images were excluded because they often contain a visual representation of the research team.The 117 tree crowns were manually cut out in Gimp software to ensure that they were all Larix trees.Of the tree crowns,15% were included that are at the margin of the image to make sure that the algorithm does not rely on a full tree crown in order to detect a tree. As a background image for the extracted tree crowns, 35 raw UAV images for each of the five sites were selected were included. The images were selected based on their content. In some of the UAV images, the research teams are visible and those have been excluded from this dataset. The five sites were selected based on their spectral diversity, and their vegetation content. The raw UAV images were cropped to 640 by 480 pixels at a resolution of 72 dpi. These are later rescaled to 448 by 448 pixels in the process of the dataset creation. In total there were 175 cropped backgrounds.The synthetic images and their corresponding annotations and masks were created using the cocosynth python software provided by Adam Kelly (2019). The software is open source and available on GitHub: https://github.com/akTwelve/cocosynth.The software takes the tree crowns and rescales and transform them before placing up to three tree crowns on the backgrounds that were provided. The software also creates matching masks that are used by instance segmentation and object detection algorithms to learn the shapes and location of the synthetic crown. COCO annotation files with information about the crowns name and label are also generated. This format can be loaded into a variety of neural networks for training purposes.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 2 rows and is filtered where the book is Windows of opportunity : how nations make wealth. It features 7 columns including author, publication date, language, and book publisher.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
On the behest of the Office of Science and Technology Policy in the White House, six institutions, including ours, have created an open research dataset called COVID-19 Research Dataset (CORD-19) to facilitate the development of question-answering systems that can assist researchers in finding relevant research on COVID-19. As of May 27, 2020, CORD-19 includes more than 100,000 open access publications from major publishers and PubMed as well as preprint articles deposited into medRxiv, bioRxiv, and arXiv. Recent years, however, have also seen question-answering and other machine learning systems exhibit harmful behaviors to humans due to biases in the training data. It is imperative and only ethical for modern scientists to be vigilant in inspecting and be prepared to mitigate the potential biases when working with any datasets. This article describes a framework to examine biases in scientific document collections like CORD-19 by comparing their properties with those derived from the citation behaviors of the entire scientific community. In total, three expanded sets are created for the analyses: 1) the enclosure set CORD-19E composed of CORD-19 articles and their references and citations, mirroring the methodology used in the renowned “A Century of Physics” analysis; 2) the full closure graph CORD-19C that recursively includes references starting with CORD-19; and 3) the inflection closure CORD-19I, that is, a much smaller subset of CORD-19C but already appropriate for statistical analysis based on the theory of the scale-free nature of the citation network. Taken together, all these expanded datasets show much smoother trends when used to analyze global COVID-19 research. The results suggest that while CORD-19 exhibits a strong tilt toward recent and topically focused articles, the knowledge being explored to attack the pandemic encompasses a much longer time span and is very interdisciplinary. A question-answering system with such expanded scope of knowledge may perform better in understanding the literature and answering related questions. However, while CORD-19 appears to have topical coverage biases compared to the expanded sets, the collaboration patterns, especially in terms of team sizes and geographical distributions, are captured very well already in CORD-19 as the raw statistics and trends agree with those from larger datasets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Content and data sourceThis dataset contains the results of a manual analysis of Open Science markers in the publications of the Swedish Metabolomics Centre (SMC) between 2016 and 2024. It contains similar variables as the data of the "Analysis of CBCS publications for Open Access, data availability statements and persistent identifiers for supplementary data" (Kieselbach, 2023).
The sample of these publications was fetched from SciLifeLab on 5 May 2025 at the URL: https://publications.scilifelab.se/label/Swedish Metabolomics Centre (SMC)
It contains 285 articles that are the source data for the work to create this dataset. Every publication was manually visited at its DOI URL and checked for 23 variables.
Questions studiedSome of the questions that were addressed in the collection of the data are:
Does the article have an open license and what kind of license does it have?
Does the article contain research data that may have restricted access such as personal data and health data?
Does the article contain a data availability statement?
Does the article contain supplementary material that the authors added to it?
Does the supplementary material contain research data?
Does the supplementary material contain metabolomics data such as, for instance, summaries and visualizations?
Did the authors submit metabolomics data to MetaboLights at the EBI or to other repsoitories?
Did the authors submit other data to other repositories?
Is data available on request from the authors?
Visualization of dataThe data was compiled and visualized using Microsoft Excel 365. The visualization includes one table that gives a general overview of the dataset, and four figures that show some results of the analysis.
Figure 1. Percentage of publications between 2016 and 2024 with an Open Access License and with a data availability statement.
Figure 2. Submissions to repositories between 2016 and 2024.
Figure 3. Percentage of publications that contained supplementary material and if this supplementary material contained research data and metabolomics data.
Figure 4. Repositories used by the authors between 2016 and 2024.
List of variables1. Year of Publication (answer: year)
Date of Publication (answer: date)
DOI (answer: DOI)
DOI URL (answer: DOI URL)
Research article (answer: Yes or No)
Access to article without paywall (answer: Yes or No)
License for research article (answer: Name of the license or No)
Data with restricted access (answer: Yes or No)
Data availability statement in article (answer: Yes or No)
Supplementary material added to article (answer: Yes or No)
Access to supplementary material without paywall (answer: Yes or No)
Supplementary material contains research data (answer: Yes or No)
Supplementary data contains metabolomics data (answer: Yes or No)
Persistent identifier for supplementary data (answer: Yes or No)
Source data added to the article (answer: Yes or No)
Source data contain metabolomics data (answer: Yes or No)
Authors submitted metabolomics data to MetaboLights (answer: Yes or No)
Authors submitted metabolomics data to another repository (answer: name of the repository or No)
Authors submitted other data to a repository (answer: name of the repository or No)
Authors submitted other data to a second repository (answer: name of the repository or No)
Authors submitted other data to a third repository (answer: name of the repository or No)
Authors submitted code to a repository (answer: name of the repository or No)
Data available on request from the authors (answer: Yes or No)
Variables that are available in the source data1. Title of article
Authors
Journal
Year
(Date) Published
(Date) E-published
Volume
Issue
Pages
DOI
PMID
Labels
Qualifiers
IUID
URL
DOI URL of research article
PubMed URL of research article
File formats and softwareThe file formats used in this dataset are:
.csv (Text file)
.jpg (JPEG image file)
.pdf/A (Portable Document Format for archiving)
.txt (Text file)
.xlsx (Microsoft Excel 365 file)
All files can be opened with Microsoft Office 365.
ReferenceKieselbach, Theresa (2023). Analysis of CBCS publications for Open Access, data availability statements and persistent identifiers for supplementary data. Umeå University. Dataset. https://doi.org/10.17044/scilifelab.23641749.v1
AbbreviationsCC BY 4.0: Creative Commons Attribution 4.0 International Public License
CC BY-NC 4.0: Creative Commons Attribution-NonCommercial 4.0 International Public License
CC BY-NC 3.0: Creative Commons Attribution-NonCommercial 3.0 International Public License
CC BY-NC-ND 4.0: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License
DOI: Digital Object Identifier
EBI: European Bioinformatics Institute
EBI-ArrayExpress: The ArrayExpress collection of functional genomics data at the EBI
EBI-ENA: European Nucleotide Archive at the EBI
EBI-Pride: Proteomics Identification Database at the EBI
e!DAL: electronic Data Archive Library at the Leibniz Institute for Plant Genetics and Crop Plant Research
IUID: Item Unique identification
LUDC: Lund University Diabetes Centre
LUDC repository: data repository at the Lund University Diabetes Centre
NCBI: National Center for Biotechnology Information
NCBI-GEO: The Gene Expression Omnibus database repository at the NCBI
NCBI-SRA: The Sequence Read Archive at the NCBI
PMID: Pubmed Identifier
URL: Uniform Resource Locator
MD5 Checksums of the filesManifest.txt (2 KB): 89f32a728fb74ebecef0aef4633130b0
README.txt (6 KB): 34ea4ad9cb9bdea54755fa87f2d0b913
Analysis_SMC_publications_2016_2024_Open_Access_publication_and_access_to_data_status_2025_06_24.csv (46 KB): 9719df26381901bc6aabfd34fdbfab81
Analysis_SMC_publications_2016_2024_Open_Access_publication_and_access_to_data_status_2025_06_24.xlsx (49 KB): 1ec95dc29262645240e7d8714967bcfc
Table_1_Overview_SMC_publications_2016_2024_status_2025_06_11.csv (391 Bytes): 1fd723dc6f52f18251d41c0d343a4f0f
Table_1_Overview_SMC_publications_2016_2024_status_2025_06_11.xlsx (9 KB): 38622a9681c6f1057a6e1a4be56b0285
Figure_1_SMC_publications_2016_2024_open_access_license_and_data_availability_status_2025_06_11.csv (468 Bytes): 9f9156f8d52603ccdec968f626bc002a
Figure_1_SMC_publications_2016_2024_open_access_license_and_data_availability_status_2025_06_11.jpg (119 KB): dc9a4d7de4c789e8aea46ce66e007301
Figure_1_SMC_publications_2016_2024_open_access_license_and_data_availability_status_2025_06_11.xlsx (15 KB): 6527d1ebd0069ef3757bd1b049f0fc74
Figure_2_SMC_publications_2016_2024_metabolomics_data_and_other_data_to_repositories_status_2024_06_12.csv (300 Bytes): 5abc4a0fcf776f8dc4745f41deddacbc
Figure_2_SMC_publications_2016_2024_metabolomics_data_and_other_data_to_repositories_status_2024_06_12.jpg (126 KB): e03e5bf4ba2d942c3b022aebb0a59033
Figure_2_SMC_publications_2016_2024_metabolomics_data_and_other_data_to_repositories_status_2024_06_12.xlsx (15 KB): a80f977c051d4798db221b07733c694b
Figure_3_SMC_publications_2016_2024_overview_supplementary_data_status_2025_06_11.csv (670 Bytes): a694a3defa98aa52fcdec8ff9e9e3316
Figure_3_SMC_publications_2016_2024_overview_supplementary_data_status_2025_06_11.jpg(153 KB): 3928bdc1f046ca9b6f66bdbcdf936ca8
Figure_3_SMC_publications_2016_2024_overview_supplementary_data_status_2025_06_11.xlsx (15 KB): 46dfda56b116b571b4bf8e3674b44512
Figure_4_SMC_publications_2016_2024_submission_of_data_to_repositories_status_2025_06_12.csv (498 Bytes): 8963a412cc9e458ced2e80883bb93e1a
Figure_4_SMC_publications_2016_2024_submission_of_data_to_repositories_status_2025_06_12.jpg (137 KB): c9ba447225e99431f24732128a754b7e
Figure_4_SMC_publications_2016_2024_submission_of_data_to_repositories_status_2025_06_12.xlsx (16 KB): 1e2813d3ccb0ee14991b276947c21b8a
Materials_and_methods_SMC_publications_2016_2024.docx (19 KB): 71776ffc1e530e1b40255763403b2f40
Materials_and_methods_SMC_publications_2016_2024.txt (4 KB): 26c4b91b958b9e33d93d13dc52b25da9
Materials_and_methods_SMC_publications_2026_2024.pdf (172 KB): eee564f452ef4f3cf57bb81a6874fcd4
SMC_publications_2016_2024_status_2025_05_05.csv (143 KB): 5e61d09244ca90b1e5b057a7afdfe5e7
SMC_publications_2016_2024_status_2025_05_05.xlsx (106 KB): 6977fbcac21ff5a12763e40de90c0a91
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The Non-Relational SQL market size is projected to grow from USD 4.7 billion in 2023 to USD 15.8 billion by 2032, at a compound annual growth rate (CAGR) of 14.5% during the forecast period. This significant growth can be attributed to the rising demand for scalable and flexible database management solutions that efficiently handle large volumes of unstructured data.
One of the primary growth factors driving the Non-Relational SQL market is the exponential increase in data generation from various sources such as social media, IoT devices, and enterprise applications. As businesses seek to leverage this data for gaining insights and making informed decisions, the need for databases that can manage and process unstructured data efficiently has become paramount. Non-Relational SQL databases, such as document stores and graph databases, provide the required flexibility and scalability, making them an ideal choice for modern data-driven enterprises.
Another significant growth factor is the increasing adoption of cloud-based solutions. Cloud deployment offers numerous advantages, including reduced infrastructure costs, scalability, and easier management. These benefits have led to a surge in the adoption of Non-Relational SQL databases hosted on cloud platforms. Major cloud service providers like Amazon Web Services, Microsoft Azure, and Google Cloud offer robust Non-Relational SQL database services, further fueling market growth. Additionally, the integration of AI and machine learning with Non-Relational SQL databases is expected to enhance their capabilities, driving further adoption.
The rapid advancement in technology and the growing need for real-time data processing and analytics are also propelling the market's growth. Non-Relational SQL databases are designed to handle high-velocity data and provide quick query responses, making them suitable for real-time applications such as fraud detection, recommendation engines, and personalized marketing. As organizations increasingly rely on real-time data to enhance customer experiences and optimize operations, the demand for Non-Relational SQL databases is set to rise.
Regional outlook indicates that North America holds the largest share of the Non-Relational SQL market, driven by the presence of major technology companies and early adoption of advanced database technologies. However, the Asia Pacific region is expected to witness the highest growth rate during the forecast period, fueled by the rapid digital transformation initiatives and increasing investments in cloud infrastructure. Europe and Latin America also present significant growth opportunities due to the rising adoption of big data and analytics solutions.
When analyzing the Non-Relational SQL market by database type, we observe that document stores hold a significant share of the market. Document stores, such as MongoDB and Couchbase, are particularly favored for their ability to store, retrieve, and manage document-oriented information. These databases are highly flexible, allowing for the storage of complex data structures and providing an intuitive query language. The increasing adoption of document stores can be ascribed to their ease of use and adaptability to various application requirements, making them a popular choice among developers and businesses.
Key-Value stores represent another crucial segment of the Non-Relational SQL market. These databases are known for their simplicity and high performance, making them ideal for caching, session management, and real-time data processing applications. Redis and Amazon DynamoDB are prominent examples of key-value stores that have gained widespread acceptance. The growing need for low-latency data access and the ability to handle massive volumes of data efficiently are key drivers for the adoption of key-value stores in various industries.
The market for column stores is also expanding as businesses require databases that can handle large-scale analytical queries efficiently. Columnar storage formats, such as Apache Cassandra and HBase, optimize read and write performance for analytical processing, making them suitable for big data analytics and business intelligence applications. The ability to perform complex queries on large datasets quickly is a significant advantage of column stores, driving their adoption in industries that rely heavily on data analytics.
Graph databases, such as Neo4j and Amazon Neptune, are gaining traction due to their ability to model
https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy
According to Cognitive Market Research, the global Artificial Intelligence in Retail market size is USD 4951.2 million in 2023and will expand at a compound annual growth rate (CAGR) of 39.50% from 2023 to 2030.
Enhanced customer personalization to provide viable market output
Demand for online remains higher in Artificial Intelligence in the Retail market.
The machine learning and deep learning category held the highest Artificial Intelligence in Retail market revenue share in 2023.
North American Artificial Intelligence In Retail will continue to lead, whereas the Asia-Pacific Artificial Intelligence In Retail market will experience the most substantial growth until 2030.
Market Dynamics of the Artificial Intelligence in the Retail Market
Key Drivers for Artificial Intelligence in Retail Market
Enhanced Customer Personalization to Provide Viable Market Output
A primary driver of Artificial Intelligence in the Retail market is the pursuit of enhanced customer personalization. A.I. algorithms analyze vast datasets of customer behaviors, preferences, and purchase history to deliver highly personalized shopping experiences. Retailers leverage this insight to offer tailored product recommendations, targeted marketing campaigns, and personalized promotions. The drive for superior customer personalization not only enhances customer satisfaction but also increases engagement and boosts sales. This focus on individualized interactions through A.I. applications is a key driver shaping the dynamic landscape of A.I. in the retail market.
January 2023 - Microsoft and digital start-up AiFi worked together to offer Smart Store Analytics. It is a cloud-based tracking solution that helps merchants with operational and shopper insights for intelligent, cashierless stores.
Source-techcrunch.com/2023/01/10/aifi-microsoft-smart-store-analytics/
Improved Operational Efficiency to Propel Market Growth
Another pivotal driver is the quest for improved operational efficiency within the retail sector. A.I. technologies streamline various aspects of retail operations, from inventory management and demand forecasting to supply chain optimization and cashier-less checkout systems. By automating routine tasks and leveraging predictive analytics, retailers can enhance efficiency, reduce costs, and minimize errors. The pursuit of improved operational efficiency is a key motivator for retailers to invest in AI solutions, enabling them to stay competitive, adapt to dynamic market conditions, and meet the evolving demands of modern consumers in the highly competitive artificial intelligence (AI) retail market.
January 2023 - The EY Retail Intelligence solution, which is based on Microsoft Cloud, was introduced by the Fintech business EY to give customers a safe and efficient shopping experience. In order to deliver insightful information, this solution makes use of Microsoft Cloud for Retail and its technologies, which include image recognition, analytics, and artificial intelligence (A.I.).
Key Restraints for Artificial Intelligence in Retail Market
Data Security Concerns to Restrict Market Growth
A prominent restraint in Artificial Intelligence in the Retail market is the pervasive concern over data security. As retailers increasingly rely on A.I. to process vast amounts of customer data for personalized experiences, there is a growing apprehension regarding the protection of sensitive information. The potential for data breaches and cyberattacks poses a significant challenge, as retailers must navigate the delicate balance between utilizing customer data for AI-driven initiatives and safeguarding it against potential security threats. Addressing these concerns is crucial to building and maintaining consumer trust in A.I. applications within the retail sector.
Key Trends for Artificial Intelligence in Retail Market
Surge in Voice-Enabled Shopping Interfaces Reshaping Retail Experiences
Voice-enabled A.I. assistants such as Amazon Alexa and Google Assistant are revolutionizing the way consumers engage with retail platforms. Shoppers can now utilize voice commands to search, compare, and purchase products, thereby streamlining and accelerating the buying process. Retailers...
By Amber Thomas [source]
This dataset provides an estimation of broadband usage in the United States, focusing on how many people have access to broadband and how many are actually using it at broadband speeds. Through data collected by Microsoft from our services, including package size and total time of download, we can estimate the throughput speed of devices connecting to the internet across zip codes and counties.
According to Federal Communications Commission (FCC) estimates, 14.5 million people don't have access to any kind of broadband connection. This data set aims to address this contrast between those with estimated availability but no actual use by providing more accurate usage numbers downscaled to county and zip code levels. Who gets counted as having access is vastly important -- it determines who gets included in public funding opportunities dedicated solely toward closing this digital divide gap. The implications can be huge: millions around this country could remain invisible if these number aren't accurately reported or used properly in decision-making processes.
This dataset includes aggregated information about these locations with less than 20 devices for increased accuracy when estimating Broadband Usage in the United States-- allowing others to use it for developing solutions that improve internet access or label problem areas accurately where no real or reliable connectivity exists among citizens within communities large and small throughout the US mainland.. Please review the license terms before using these data so that you may adhere appropriately with stipulations set forth under Microsoft's Open Use Of Data Agreement v1.0 agreement prior to utilizing this dataset for your needs-- both professional and educational endeavors alike!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
How to Use the US Broadband Usage Dataset
This dataset provides broadband usage estimates in the United States by county and zip code. It is ideally suited for research into how broadband connects households, towns and cities. Understanding this information is vital for closing existing disparities in access to high-speed internet, and for devising strategies for making sure all Americans can stay connected in a digital world.
The dataset contains six columns: - County – The name of the county for which usage statistics are provided. - Zip Code (5-Digit) – The 5-digit zip code from which usage data was collected from within that county or metropolitan area/micro area/divisions within states as reported by the US Census Bureau in 2018[2].
- Population (Households) – Estimated number of households defined according to [3] based on data from the US Census Bureau American Community Survey's 5 Year Estimates[4].
- Average Throughput (Mbps)- Average Mbps download speed derived from a combination of data collected anonymous devices connected through Microsoft services such as Windows Update, Office 365, Xbox Live Core Services, etc.[5]
- Percent Fast (> 25 Mbps)- Percentage of machines with throughput greater than 25 Mbps calculated using [6]. 6) Percent Slow (< 3 Mbps)- Percentage of machines with throughput less than 3Mbps calculated using [7].
- Targeting marketing campaigns based on broadband use. Companies can use the geographic and demographic data in this dataset to create targeted advertising campaigns that are tailored to individuals living in areas where broadband access is scarce or lacking.
- Creating an educational platform for those without reliable access to broadband internet. By leveraging existing technologies such as satellite internet, media streaming services like Netflix, and platforms such as Khan Academy or EdX, those with limited access could gain access to new educational options from home.
- Establishing public-private partnerships between local governments and telecom providers need better data about gaps in service coverage and usage levels in order to make decisions about investments into new infrastructure buildouts for better connectivity options for rural communities
If you use this dataset in your research, please credit the original authors. Data Source
See the dataset description for more information.
File: broadband_data_2020October.csv
If you use this dataset in your research,...