18 datasets found

Global market share of leading desktop search engines 2015-2025
statista.com
Updated Jan 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Global market share of leading desktop search engines 2015-2025 [Dataset]. https://www.statista.com/statistics/216573/worldwide-market-share-of-search-engines/
Explore at:
Dataset updated
Jan 23, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jan 2015 - Jan 2025
Area covered
Worldwide
Description
As of January 2025, online search engine Bing accounted for 12.23 percent of the global desktop search market, while market leader Google had a share of around 78.83 percent. Meanwhile, Yahoo's market share was 3.07 percent. Google in the global market Ever since the introduction of Google Search in 1997, the company has dominated the search engine market, while the shares of all other tools has been rather lopsided. The majority of Google revenues are generated through advertising. Its parent corporation, Alphabet, was one of the biggest internet companies worldwide as of 2023, with a market capitalization of 1,6 trillion U.S. dollars. The company has also expanded its services to mail, productivity tools, enterprise products, mobile devices, and other ventures. As a result, Google earned one of the highest tech company revenues in 2023 with roughly 305.6 billion U.S. dollars. Search engine usage in different countries Google is the most frequently used search engine worldwide. But in some countries, its’ alternatives are leading or competing with it to some extent. As of the last quarter of 2023, more than 63 percent of internet users in Russia used Yandex, whereas Google users were nearly 36 percent. Meanwhile, Baidu was the most used search engine in China, despite a strong percentage decrease of internet users in the country accessing it. In other countries, like Japan and Mexico, people tend to use Yahoo along with Google. In the first quarter of 2022 nearly 56 percent of the respondents in Japan said that they had used Yahoo in the past four weeks. In the same year, over 27 percent of users in Mexico said they used Yahoo. Another search engine, Bing, operated by Microsoft, was the second most popular search engine in the United Kingdom after Google.
u
Data from: Inventory of online public databases and repositories holding...
agdatacommons.nal.usda.gov
datadiscoverystudio.org
+2more
txt
Updated Feb 8, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Erin Antognoli; Jonathan Sears; Cynthia Parr (2024). Inventory of online public databases and repositories holding agricultural data in 2017 [Dataset]. http://doi.org/10.15482/USDA.ADC/1389839
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.15482/USDA.ADC/1389839
Dataset updated
Feb 8, 2024
Dataset provided by
Ag Data Commons
Authors
Erin Antognoli; Jonathan Sears; Cynthia Parr
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
United States agricultural researchers have many options for making their data available online. This dataset aggregates the primary sources of ag-related data and determines where researchers are likely to deposit their agricultural data. These data serve as both a current landscape analysis and also as a baseline for future studies of ag research data. Purpose As sources of agricultural data become more numerous and disparate, and collaboration and open data become more expected if not required, this research provides a landscape inventory of online sources of open agricultural data. An inventory of current agricultural data sharing options will help assess how the Ag Data Commons, a platform for USDA-funded data cataloging and publication, can best support data-intensive and multi-disciplinary research. It will also help agricultural librarians assist their researchers in data management and publication. The goals of this study were to

establish where agricultural researchers in the United States-- land grant and USDA researchers, primarily ARS, NRCS, USFS and other agencies -- currently publish their data, including general research data repositories, domain-specific databases, and the top journals compare how much data is in institutional vs. domain-specific vs. federal platforms determine which repositories are recommended by top journals that require or recommend the publication of supporting data ascertain where researchers not affiliated with funding or initiatives possessing a designated open data repository can publish data

Approach The National Agricultural Library team focused on Agricultural Research Service (ARS), Natural Resources Conservation Service (NRCS), and United States Forest Service (USFS) style research data, rather than ag economics, statistics, and social sciences data. To find domain-specific, general, institutional, and federal agency repositories and databases that are open to US research submissions and have some amount of ag data, resources including re3data, libguides, and ARS lists were analysed. Primarily environmental or public health databases were not included, but places where ag grantees would publish data were considered.
Search methods We first compiled a list of known domain specific USDA / ARS datasets / databases that are represented in the Ag Data Commons, including ARS Image Gallery, ARS Nutrition Databases (sub-components), SoyBase, PeanutBase, National Fungus Collection, i5K Workspace @ NAL, and GRIN. We then searched using search engines such as Bing and Google for non-USDA / federal ag databases, using Boolean variations of “agricultural data” /“ag data” / “scientific data” + NOT + USDA (to filter out the federal / USDA results). Most of these results were domain specific, though some contained a mix of data subjects. We then used search engines such as Bing and Google to find top agricultural university repositories using variations of “agriculture”, “ag data” and “university” to find schools with agriculture programs. Using that list of universities, we searched each university web site to see if their institution had a repository for their unique, independent research data if not apparent in the initial web browser search. We found both ag specific university repositories and general university repositories that housed a portion of agricultural data. Ag specific university repositories are included in the list of domain-specific repositories. Results included Columbia University – International Research Institute for Climate and Society, UC Davis – Cover Crops Database, etc. If a general university repository existed, we determined whether that repository could filter to include only data results after our chosen ag search terms were applied. General university databases that contain ag data included Colorado State University Digital Collections, University of Michigan ICPSR (Inter-university Consortium for Political and Social Research), and University of Minnesota DRUM (Digital Repository of the University of Minnesota). We then split out NCBI (National Center for Biotechnology Information) repositories. Next we searched the internet for open general data repositories using a variety of search engines, and repositories containing a mix of data, journals, books, and other types of records were tested to determine whether that repository could filter for data results after search terms were applied. General subject data repositories include Figshare, Open Science Framework, PANGEA, Protein Data Bank, and Zenodo. Finally, we compared scholarly journal suggestions for data repositories against our list to fill in any missing repositories that might contain agricultural data. Extensive lists of journals were compiled, in which USDA published in 2012 and 2016, combining search results in ARIS, Scopus, and the Forest Service's TreeSearch, plus the USDA web sites Economic Research Service (ERS), National Agricultural Statistics Service (NASS), Natural Resources and Conservation Service (NRCS), Food and Nutrition Service (FNS), Rural Development (RD), and Agricultural Marketing Service (AMS). The top 50 journals' author instructions were consulted to see if they (a) ask or require submitters to provide supplemental data, or (b) require submitters to submit data to open repositories. Data are provided for Journals based on a 2012 and 2016 study of where USDA employees publish their research studies, ranked by number of articles, including 2015/2016 Impact Factor, Author guidelines, Supplemental Data?, Supplemental Data reviewed?, Open Data (Supplemental or in Repository) Required? and Recommended data repositories, as provided in the online author guidelines for each the top 50 journals. Evaluation We ran a series of searches on all resulting general subject databases with the designated search terms. From the results, we noted the total number of datasets in the repository, type of resource searched (datasets, data, images, components, etc.), percentage of the total database that each term comprised, any dataset with a search term that comprised at least 1% and 5% of the total collection, and any search term that returned greater than 100 and greater than 500 results. We compared domain-specific databases and repositories based on parent organization, type of institution, and whether data submissions were dependent on conditions such as funding or affiliation of some kind. Results A summary of the major findings from our data review:

Over half of the top 50 ag-related journals from our profile require or encourage open data for their published authors. There are few general repositories that are both large AND contain a significant portion of ag data in their collection. GBIF (Global Biodiversity Information Facility), ICPSR, and ORNL DAAC were among those that had over 500 datasets returned with at least one ag search term and had that result comprise at least 5% of the total collection.
Not even one quarter of the domain-specific repositories and datasets reviewed allow open submission by any researcher regardless of funding or affiliation.

See included README file for descriptions of each individual data file in this dataset. Resources in this dataset:Resource Title: Journals. File Name: Journals.csvResource Title: Journals - Recommended repositories. File Name: Repos_from_journals.csvResource Title: TDWG presentation. File Name: TDWG_Presentation.pptxResource Title: Domain Specific ag data sources. File Name: domain_specific_ag_databases.csvResource Title: Data Dictionary for Ag Data Repository Inventory. File Name: Ag_Data_Repo_DD.csvResource Title: General repositories containing ag data. File Name: general_repos_1.csvResource Title: README and file inventory. File Name: README_InventoryPublicDBandREepAgData.txt
Financial Well-Being in America (2017)
catalog.data.gov
catalog-dev.data.gov
Updated Aug 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Consumer Financial Protection Bureau (2024). Financial Well-Being in America (2017) [Dataset]. https://catalog.data.gov/dataset/financial-well-being-in-america-2017
Explore at:
Dataset updated
Aug 16, 2024
Dataset provided by
Consumer Financial Protection Bureauhttp://www.consumerfinance.gov/
Area covered
United States
Description
The 2017 National Financial Well-Being in America Survey, conducted for the CFPB Offices of Financial Education and Financial Protection for Older Americans, was an online survey conducted to measure the financial well-being of adults in the United States. These data were created as a foundation for internal and external research into financial well-being and are relevant to work being done by researchers in the Office of Research who have access to the (deidentified) data.
h
covid-bing-query-gpt4
huggingface.co
Updated Dec 15, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aivin Solatorio (2019). covid-bing-query-gpt4 [Dataset]. https://huggingface.co/datasets/avsolatorio/covid-bing-query-gpt4
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 15, 2019
Authors
Aivin Solatorio
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Bing x GPT-4 Synthetic Query Dataset

This dataset was used in the paper GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning. Refer to https://arxiv.org/abs/2402.16829 for details. The code for generating the data is available at https://github.com/avsolatorio/GISTEmbed.

Citation

@article{solatorio2024gistembed, title={GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning}… See the full description on the dataset page: https://huggingface.co/datasets/avsolatorio/covid-bing-query-gpt4.
Z
Data from: Qbias – A Dataset on Media Bias in Search Queries and Query...
data.niaid.nih.gov
zenodo.org
Updated Mar 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Haak, Fabian (2023). Qbias – A Dataset on Media Bias in Search Queries and Query Suggestions [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7682914
Explore at:
Dataset updated
Mar 1, 2023
Dataset provided by
Haak, Fabian
Schaer, Philipp
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We present Qbias, two novel datasets that promote the investigation of bias in online news search as described in

Fabian Haak and Philipp Schaer. 2023. 𝑄𝑏𝑖𝑎𝑠 - A Dataset on Media Bias in Search Queries and Query Suggestions. In Proceedings of ACM Web Science Conference (WebSci’23). ACM, New York, NY, USA, 6 pages. https://doi.org/10.1145/3578503.3583628.

Dataset 1: AllSides Balanced News Dataset (allsides_balanced_news_headlines-texts.csv)

The dataset contains 21,747 news articles collected from AllSides balanced news headline roundups in November 2022 as presented in our publication. The AllSides balanced news feature three expert-selected U.S. news articles from sources of different political views (left, right, center), often featuring spin bias, and slant other forms of non-neutral reporting on political news. All articles are tagged with a bias label by four expert annotators based on the expressed political partisanship, left, right, or neutral. The AllSides balanced news aims to offer multiple political perspectives on important news stories, educate users on biases, and provide multiple viewpoints. Collected data further includes headlines, dates, news texts, topic tags (e.g., "Republican party", "coronavirus", "federal jobs"), and the publishing news outlet. We also include AllSides' neutral description of the topic of the articles. Overall, the dataset contains 10,273 articles tagged as left, 7,222 as right, and 4,252 as center.

To provide easier access to the most recent and complete version of the dataset for future research, we provide a scraping tool and a regularly updated version of the dataset at https://github.com/irgroup/Qbias. The repository also contains regularly updated more recent versions of the dataset with additional tags (such as the URL to the article). We chose to publish the version used for fine-tuning the models on Zenodo to enable the reproduction of the results of our study.

Dataset 2: Search Query Suggestions (suggestions.csv)

The second dataset we provide consists of 671,669 search query suggestions for root queries based on tags of the AllSides biased news dataset. We collected search query suggestions from Google and Bing for the 1,431 topic tags, that have been used for tagging AllSides news at least five times, approximately half of the total number of topics. The topic tags include names, a wide range of political terms, agendas, and topics (e.g., "communism", "libertarian party", "same-sex marriage"), cultural and religious terms (e.g., "Ramadan", "pope Francis"), locations and other news-relevant terms. On average, the dataset contains 469 search queries for each topic. In total, 318,185 suggestions have been retrieved from Google and 353,484 from Bing.

The file contains a "root_term" column based on the AllSides topic tags. The "query_input" column contains the search term submitted to the search engine ("search_engine"). "query_suggestion" and "rank" represents the search query suggestions at the respective positions returned by the search engines at the given time of search "datetime". We scraped our data from a US server saved in "location".

We retrieved ten search query suggestions provided by the Google and Bing search autocomplete systems for the input of each of these root queries, without performing a search. Furthermore, we extended the root queries by the letters a to z (e.g., "democrats" (root term) >> "democrats a" (query input) >> "democrats and recession" (query suggestion)) to simulate a user's input during information search and generate a total of up to 270 query suggestions per topic and search engine. The dataset we provide contains columns for root term, query input, and query suggestion for each suggested query. The location from which the search is performed is the location of the Google servers running Colab, in our case Iowa in the United States of America, which is added to the dataset.

AllSides Scraper

At https://github.com/irgroup/Qbias, we provide a scraping tool, that allows for the automatic retrieval of all available articles at the AllSides balanced news headlines.

We want to provide an easy means of retrieving the news and all corresponding information. For many tasks it is relevant to have the most recent documents available. Thus, we provide this Python-based scraper, that scrapes all available AllSides news articles and gathers available information. By providing the scraper we facilitate access to a recent version of the dataset for other researchers.
Human Well-Being Index (HWBI) for U.S. Counties, 2000-2010
catalog.data.gov
datasets.ai
+1more
Updated Feb 25, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Environmental Protection Agency, Office of Research and Development-National Health and Environmental Effects Research Laboratory-Gulf Ecology Division (Point of Contact) (2025). Human Well-Being Index (HWBI) for U.S. Counties, 2000-2010 [Dataset]. https://catalog.data.gov/dataset/human-well-being-index-hwbi-for-u-s-counties-2000-201012
Explore at:
Dataset updated
Feb 25, 2025
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Area covered
United States
Description
The Human Well-being Index (HWBI) for U.S. counties is a set of nationally consistent demonstration results that may be used to characterize community well-being. This composite index was developed by U.S. EPA Office of Research and Development in support of its Sustainable and Healthy Communities (SHC) Research. It serves as an endpoint measure for use in the creation of community decision-support tools. The HWBI characterizes community conditions in the context of the flow of economic, social and ecological services. The index calculation approach used a nested-indicator design. A decade (2000-2010) of cultural, economic, and social data were drawn from publicly available sources (e.g., US Census, Bureau of Economic Analysis, American Community Survey, General Social Survey, Centers for Disease Control) to provide the foundation for well-being related indicators. Indicators are integrated into one of eight domains or sub-indices of well-being. These domains were synthesized to represent different aspects of well-being characteristics common across communities of all sizes. Service indicators reflect the availability of select socio-ecological services that influence well-being. Community decisions often result in changes in the flow of community services. Collectively, well-being and service measures provide a means to evaluate relationships between the availability of certain community services and overall well-being. Data used to generate service indicators were also collected from existing data sources. Detailed information about the attributes of the HWBI, its components and related service indicators are described in Indicators and Methods for Constructing a U.S. Human Well-being Index (HWBI) for Ecosystem Services Research (EPA/600/R-12/023. pp. 121) and Indicators and Methods for Evaluating Economic, Ecosystem and Social Services Provisioning (EPA/600/R-14/184. pp. 174), respectively.
Common languages used for web content 2025, by share of websites
statista.com
Updated Feb 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Common languages used for web content 2025, by share of websites [Dataset]. https://www.statista.com/statistics/262946/most-common-languages-on-the-internet/
Explore at:
Dataset updated
Feb 11, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Feb 2025
Area covered
Worldwide
Description
As of February 2025, English was the most popular language for web content, with over 49.4 percent of websites using it. Spanish ranked second, with six percent of web content, while the content in the German language followed, with 5.6 percent. English as the leading online language United States and India, the countries with the most internet users after China, are also the world's biggest English-speaking markets. The internet user base in both countries combined, as of January 2023, was over a billion individuals. This has led to most of the online information being created in English. Consequently, even those who are not native speakers may use it for convenience. Global internet usage by regions As of October 2024, the number of internet users worldwide was 5.52 billion. In the same period, Northern Europe and North America were leading in terms of internet penetration rates worldwide, with around 97 percent of its populations accessing the internet.
COVID-19 Time-Series Metrics by County and State (ARCHIVED)
data.chhs.ca.gov
data.ca.gov
+1more
csv, xlsx, zip
Updated Aug 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Public Health (2024). COVID-19 Time-Series Metrics by County and State (ARCHIVED) [Dataset]. https://data.chhs.ca.gov/dataset/covid-19-time-series-metrics-by-county-and-state
Explore at:
csv(7729431), csv(6223281), xlsx(6471), xlsx(11305), csv(3313), xlsx(7811), csv(4836928), zipAvailable download formats
Dataset updated
Aug 28, 2024
Dataset authored and provided by
California Department of Public Healthhttps://www.cdph.ca.gov/
Description
Note: This COVID-19 data set is no longer being updated as of December 1, 2023. Access current COVID-19 data on the CDPH respiratory virus dashboard (https://www.cdph.ca.gov/Programs/CID/DCDC/Pages/Respiratory-Viruses/RespiratoryDashboard.aspx) or in open data format (https://data.chhs.ca.gov/dataset/respiratory-virus-dashboard-metrics).

As of August 17, 2023, data is being updated each Friday.

For death data after December 31, 2022, California uses Provisional Deaths from the Center for Disease Control and Prevention’s National Center for Health Statistics (NCHS) National Vital Statistics System (NVSS). Prior to January 1, 2023, death data was sourced from the COVID-19 registry. The change in data source occurred in July 2023 and was applied retroactively to all 2023 data to provide a consistent source of death data for the year of 2023.

As of May 11, 2023, data on cases, deaths, and testing is being updated each Thursday. Metrics by report date have been removed, but previous versions of files with report date metrics are archived below.

All metrics include people in state and federal prisons, US Immigration and Customs Enforcement facilities, US Marshal detention facilities, and Department of State Hospitals facilities. Members of California's tribal communities are also included.

The "Total Tests" and "Positive Tests" columns show totals based on the collection date. There is a lag between when a specimen is collected and when it is reported in this dataset. As a result, the most recent dates on the table will temporarily show NONE in the "Total Tests" and "Positive Tests" columns. This should not be interpreted as no tests being conducted on these dates. Instead, these values will be updated with the number of tests conducted as data is received.
c
Commuter Mode Share
data.ccrpc.org
csv
Updated Oct 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Champaign County Regional Planning Commission (2024). Commuter Mode Share [Dataset]. https://data.ccrpc.org/dataset/commuter-mode-share
Explore at:
csv(1639)Available download formats
Dataset updated
Oct 2, 2024
Dataset provided by
Champaign County Regional Planning Commission
Description
This commuter mode share data shows the estimated percentages of commuters in Champaign County who traveled to work using each of the following modes: drove alone in an automobile; carpooled; took public transportation; walked; biked; went by motorcycle, taxi, or other means; and worked at home. Commuter mode share data can illustrate the use of and demand for transit services and active transportation facilities, as well as for automobile-focused transportation projects.

Driving alone in an automobile is by far the most prevalent means of getting to work in Champaign County, accounting for over 69 percent of all work trips in 2023. This is the same rate as 2019, and the first increase since 2017, both years being before the COVID-19 pandemic began.

The percentage of workers who commuted by all other means to a workplace outside the home also decreased from 2019 to 2021, most of these modes reaching a record low since this data first started being tracked in 2005. The percentage of people carpooling to work in 2023 was lower than every year except 2016 since this data first started being tracked in 2005. The percentage of people walking to work increased from 2022 to 2023, but this increase is not statistically significant.

Meanwhile, the percentage of people in Champaign County who worked at home more than quadrupled from 2019 to 2021, reaching a record high over 18 percent. It is a safe assumption that this can be attributed to the increase of employers allowing employees to work at home when the COVID-19 pandemic began in 2020.

The work from home figure decreased to 11.2 percent in 2023, but which is the first statistically significant decrease since the pandemic began. However, this figure is still about 2.5 times higher than 2019, even with the COVID-19 emergency ending in 2023.

Commuter mode share data was sourced from the U.S. Census Bureau’s American Community Survey (ACS) 1-Year Estimates, which are released annually.

As with any datasets that are estimates rather than exact counts, it is important to take into account the margins of error (listed in the column beside each figure) when drawing conclusions from the data.

Due to the impact of the COVID-19 pandemic, instead of providing the standard 1-year data products, the Census Bureau released experimental estimates from the 1-year data in 2020. This includes a limited number of data tables for the nation, states, and the District of Columbia. The Census Bureau states that the 2020 ACS 1-year experimental tables use an experimental estimation methodology and should not be compared with other ACS data. For these reasons, and because data is not available for Champaign County, no data for 2020 is included in this Indicator.

For interested data users, the 2020 ACS 1-Year Experimental data release includes a dataset on Means of Transportation to Work.

Sources: U.S. Census Bureau; American Community Survey, 2023 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using data.census.gov; (18 September 2024).; U.S. Census Bureau; American Community Survey, 2022 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using data.census.gov; (10 October 2023).; U.S. Census Bureau; American Community Survey, 2021 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using data.census.gov; (14 October 2022).; U.S. Census Bureau; American Community Survey, 2019 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using data.census.gov; (26 March 2021).; U.S. Census Bureau; American Community Survey, 2018 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using data.census.gov; (26 March 2021).; U.S. Census Bureau; American Community Survey, 2017 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (13 September 2018).; U.S. Census Bureau; American Community Survey, 2016 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (14 September 2017).; U.S. Census Bureau; American Community Survey, 2015 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (19 September 2016).; U.S. Census Bureau; American Community Survey, 2014 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2013 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2012 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2011 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2010 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2009 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2008 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2007 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2006 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2005 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).
d
A national dataset of rasterized building footprints for the U.S.
catalog.data.gov
datasets.ai
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). A national dataset of rasterized building footprints for the U.S. [Dataset]. https://catalog.data.gov/dataset/a-national-dataset-of-rasterized-building-footprints-for-the-u-s-c24bf
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
United States
Description
The Bing Maps team at Microsoft released a U.S.-wide vector building dataset in 2018, which includes over 125 million building footprints for all 50 states in GeoJSON format. This dataset is extracted from aerial images using deep learning object classification methods. Large-extent modelling (e.g., urban morphological analysis or ecosystem assessment models) or accuracy assessment with vector layers is highly challenging in practice. Although vector layers provide accurate geometries, their use in large-extent geospatial analysis comes at a high computational cost. We used High Performance Computing (HPC) to develop an algorithm that calculates six summary values for each cell in a raster representation of each U.S. state: (1) total footprint coverage, (2) number of unique buildings intersecting each cell, (3) number of building centroids falling inside each cell, and area of the (4) average, (5) smallest, and (6) largest area of buildings that intersect each cell. These values are represented as raster layers with 30m cell size covering the 48 conterminous states, to better support incorporation of building footprint data into large-extent modelling. This Project is funded by NASA’s Biological Diversity and Ecological Forcasting program; Award # 80NSSC18k0341
ACS Context for Senior Well-Being - Boundaries
gis-fema.hub.arcgis.com
covid-hub.gio.georgia.gov
+5more
Updated Mar 12, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Esri (2020). ACS Context for Senior Well-Being - Boundaries [Dataset]. https://gis-fema.hub.arcgis.com/maps/e4b16658bc4749c58cb55ced3298d7d2
Explore at:
Dataset updated
Mar 12, 2020
Dataset authored and provided by
Esrihttp://esri.com/
Area covered
North Pacific Ocean, Pacific Ocean
Description
This layer shows demographic context for senior well-being work. This is shown by tract, county, and state boundaries. This service is updated annually to contain the most currently released American Community Survey (ACS) 5-year data, and contains estimates and margins of error. There are also additional calculated attributes related to this topic, which can be mapped or used within analysis. The layer is symbolized to show the percentage of population aged 65 and up (senior population). To see the full list of attributes available in this service, go to the "Data" tab, and choose "Fields" at the top right. Current Vintage: 2019-2023ACS Table(s): B01001, B09021, B17020, B18101, B23027, B25072, B25093, B27010, B28005, C27001B-IData downloaded from: Census Bureau's API for American Community Survey Date of API call: December 12, 2024National Figures: data.census.govThe United States Census Bureau's American Community Survey (ACS):About the SurveyGeography & ACSTechnical DocumentationNews & UpdatesThis ready-to-use layer can be used within ArcGIS Pro, ArcGIS Online, its configurable apps, dashboards, Story Maps, custom apps, and mobile apps. Data can also be exported for offline workflows. For more information about ACS layers, visit the FAQ. Please cite the Census and ACS when using this data.Data Note from the Census:Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see Accuracy of the Data). The effect of nonsampling error is not represented in these tables.Data Processing Notes:This layer is updated automatically when the most current vintage of ACS data is released each year, usually in December. The layer always contains the latest available ACS 5-year estimates. It is updated annually within days of the Census Bureau's release schedule. Click here to learn more about ACS data releases.Boundaries come from the US Census TIGER geodatabases, specifically, the National Sub-State Geography Database (named tlgdb_(year)_a_us_substategeo.gdb). Boundaries are updated at the same time as the data updates (annually), and the boundary vintage appropriately matches the data vintage as specified by the Census. These are Census boundaries with water and/or coastlines erased for cartographic and mapping purposes. For census tracts, the water cutouts are derived from a subset of the 2020 Areal Hydrography boundaries offered by TIGER. Water bodies and rivers which are 50 million square meters or larger (mid to large sized water bodies) are erased from the tract level boundaries, as well as additional important features. For state and county boundaries, the water and coastlines are derived from the coastlines of the 2023 500k TIGER Cartographic Boundary Shapefiles. These are erased to more accurately portray the coastlines and Great Lakes. The original AWATER and ALAND fields are still available as attributes within the data table (units are square meters). The States layer contains 52 records - all US states, Washington D.C., and Puerto RicoCensus tracts with no population that occur in areas of water, such as oceans, are removed from this data service (Census Tracts beginning with 99).Percentages and derived counts, and associated margins of error, are calculated values (that can be identified by the "_calc_" stub in the field name), and abide by the specifications defined by the American Community Survey.Field alias names were created based on the Table Shells file available from the American Community Survey Summary File Documentation page.Negative values (e.g., -4444...) have been set to null, with the exception of -5555... which has been set to zero. These negative values exist in the raw API data to indicate the following situations:The margin of error column indicates that either no sample observations or too few sample observations were available to compute a standard error and thus the margin of error. A statistical test is not appropriate.Either no sample observations or too few sample observations were available to compute an estimate, or a ratio of medians cannot be calculated because one or both of the median estimates falls in the lowest interval or upper interval of an open-ended distribution.The median falls in the lowest interval of an open-ended distribution, or in the upper interval of an open-ended distribution. A statistical test is not appropriate.The estimate is controlled. A statistical test for sampling variability is not appropriate.The data for this geographic area cannot be displayed because the number of sample cases is too small.
Data from: Estimating Human Trafficking into the United States [Phase I:...
catalog.data.gov
icpsr.umich.edu
+1more
Updated Mar 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Justice (2025). Estimating Human Trafficking into the United States [Phase I: Development of a Methodology] [Dataset]. https://catalog.data.gov/dataset/estimating-human-trafficking-into-the-united-states-phase-i-development-of-a-methodology
Explore at:
Dataset updated
Mar 12, 2025
Dataset provided by
National Institute of Justicehttp://nij.ojp.gov/
Area covered
United States
Description
This research project developed and fully documented a method to estimate the number of females and males trafficked for the purposes of sexual and labor exploitation from eight countries (Colombia, Ecuador, El Salvador, Guatemala, Mexico, Nicaragua, Peru, and Venezuela) into the United States at the Southwest border. The model utilizes only open source data. This research represents the first phase of a two-phase project and Provides a conceptual framework for identifying potential data sources to estimate the number of victims at different stages in traffickingDevelops statistical models to estimate the number of males and females at risk of being trafficked for sexual and labor exploitation from the eight countries, and the number of males and females actually trafficked for sex and laborIncorporates into the estimation models the transit journey of trafficking victims from the eight countries to the southwest border of the United StatesDesigns the estimation models such that they are highly flexible and modular so that they can evolve as the body of data expands Utilizes open source data as inputs to the statistical model, making the model accessible to anyone interested in using itPresents preliminary estimates that illustrate the use of the statistical methodsIlluminates gaps in data sources. The data included in this collection are the open source data which were primarily used in the models to estimate the number of males and females at risk of being trafficked.
Monthly internet traffic in the U.S. 2018-2023
statista.com
Updated Jan 18, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2023). Monthly internet traffic in the U.S. 2018-2023 [Dataset]. https://www.statista.com/statistics/216335/data-usage-per-month-in-the-us-by-age/
Explore at:
Dataset updated
Jan 18, 2023
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2018
Area covered
United States
Description
The statistic shows estimated internet data traffic per month in the United States from 2018 to 2023. In 2018, total internet data traffic was estimated to amount to 33.45 million exabytes per month.
Loss of Work Due to Illness from COVID-19
catalog.data.gov
healthdata.gov
+3more
Updated Apr 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Centers for Disease Control and Prevention (2023). Loss of Work Due to Illness from COVID-19 [Dataset]. https://catalog.data.gov/dataset/loss-of-work-due-to-illness-from-covid-19
Explore at:
Dataset updated
Apr 25, 2023
Dataset provided by
Centers for Disease Control and Preventionhttp://www.cdc.gov/
Description
The Research and Development Survey (RANDS) is a platform designed for conducting survey question evaluation and statistical research. RANDS is an ongoing series of surveys from probability-sampled commercial survey panels used for methodological research at the National Center for Health Statistics (NCHS). RANDS estimates are generated using an experimental approach that differs from the survey design approaches generally used by NCHS, including possible biases from different response patterns and sampling frames as well as increased variability from lower sample sizes. Use of the RANDS platform allows NCHS to produce more timely data than would be possible using traditional data collection methods. RANDS is not designed to replace NCHS’ higher quality, core data collections. Below are experimental estimates of loss of work due to illness with coronavirus for three rounds of RANDS during COVID-19. Data collection for the three rounds of RANDS during COVID-19 occurred between June 9, 2020 and July 6, 2020, August 3, 2020 and August 20, 2020, and May 17, 2021 and June 30, 2021. Information needed to interpret these estimates can be found in the Technical Notes. RANDS during COVID-19 included a question about the inability to work due to being sick or having a family member sick with COVID-19. The National Health Interview Survey, conducted by NCHS, is the source for high-quality data to monitor work-loss days and work limitations in the United States. For example, in 2018, 42.7% of adults aged 18 and over missed at least 1 day of work in the previous year due to illness or injury and 9.3% of adults aged 18 to 69 were limited in their ability to work or unable to work due to physical, mental, or emotional problems. The experimental estimates on this page are derived from RANDS during COVID-19 and show the percentage of U.S. adults who did not work for pay at a job or business, at any point, in the previous week because either they or someone in their family was sick with COVID-19. Technical Notes: https://www.cdc.gov/nchs/covid19/rands/work.htm#limitations
Share of law enforcement agencies who reported crime data U.S. 2022, by...
statista.com
Updated Nov 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Share of law enforcement agencies who reported crime data U.S. 2022, by state [Dataset]. https://www.statista.com/statistics/1368634/crime-data-reported-fbi-by-state-us/
Explore at:
Dataset updated
Nov 19, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2022
Area covered
United States
Description
To estimate national trends of crime, the FBI collects crime reports from law enforcement agencies across the country. In 2022, Delaware, the District of Columbia, and Oklahoma had perfect participation rates, with 100 percent of law enforcement agencies reporting crime data to the FBI in those states. In contrast, the state of Florida had the lowest share of law enforcement agencies who reported crime data to the FBI in the United States, at 7.7 percent. An unreliable source? Along with being the principal investigative agency of the U.S. federal government, the FBI is also in charge of tracking crimes committed in the United States. In recent years, however, the FBI made significant changes to their crime reporting system, requiring a more detailed input on how agencies report their data. Consequently, less crime data has been reported and the FBI has come under criticism as an unreliable source on crime in the United States. In 2022, the FBI was found to rank low on trustworthiness for Americans when compared to other government agencies, further demonstrating the need for transparent and accurate data. Importance of crime rates As crime and policing data can help to analyze emerging issues and policy responses, the inaccuracy of the FBI’s crime reporting system may lead to misinformation which could be used to impact elections and the beliefs of the American public. In addition, the lack of crime data from Republican states such as Florida may prove problematic as 78 percent of Republicans said that crime was a very important issue for them in midterm elections.
Global number of breached user accounts Q1 2020-Q3 2024
statista.com
Updated Nov 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Global number of breached user accounts Q1 2020-Q3 2024 [Dataset]. https://www.statista.com/statistics/1307426/number-of-data-breaches-worldwide/
Explore at:
Dataset updated
Nov 8, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
During the third quarter of 2024, data breaches exposed more than 422 million records worldwide. Since the first quarter of 2020, the highest number of data records were exposed in the first quarter of 202, more than 818 million data sets. Data breaches remain among the biggest concerns of company leaders worldwide. The most common causes of sensitive information loss were operating system vulnerabilities on endpoint devices. Which industries see the most data breaches? Meanwhile, certain conditions make some industry sectors more prone to data breaches than others. According to the latest observations, the public administration experienced the highest number of data breaches between 2021 and 2022. The industry saw 495 reported data breach incidents with confirmed data loss. The second were financial institutions, with 421 data breach cases, followed by healthcare providers. Data breach cost Data breach incidents have various consequences, the most common impact being financial losses and business disruptions. As of 2023, the average data breach cost across businesses worldwide was 4.45 million U.S. dollars. Meanwhile, a leaked data record cost about 165 U.S. dollars. The United States saw the highest average breach cost globally, at 9.48 million U.S. dollars.
Internet of Things - number of connected devices worldwide 2015-2025
statista.com
Updated Nov 27, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2016). Internet of Things - number of connected devices worldwide 2015-2025 [Dataset]. https://www.statista.com/statistics/471264/iot-number-of-connected-devices-worldwide/
Explore at:
Dataset updated
Nov 27, 2016
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
By 2025, forecasts suggest that there will be more than 75 billion Internet of Things (IoT) connected devices in use. This would be a nearly threefold increase from the IoT installed base in 2019.

What is the Internet of Things?

The IoT refers to a network of devices that are connected to the internet and can “communicate” with each other. Such devices include daily tech gadgets such as the smartphones and the wearables, smart home devices such as smart meters, as well as industrial devices like smart machines. These smart connected devices are able to gather, share, and analyze information and create actions accordingly. By 2023, global spending on IoT will reach 1.1 trillion U.S. dollars.

How does Internet of Things work?

IoT devices make use of sensors and processors to collect and analyze data acquired from their environments. The data collected from the sensors will be shared by being sent to a gateway or to other IoT devices. It will then be either sent to and analyzed in the cloud or analyzed locally. By 2025, the data volume created by IoT connections is projected to reach a massive total of 79.4 zettabytes.

Privacy and security concerns 

Given the amount of data generated by IoT devices, it is no wonder that data privacy and security are among the major concerns with regard to IoT adoption. Once devices are connected to the Internet, they become vulnerable to possible security breaches in the form of hacking, phishing, etc. Frequent data leaks from social media raise earnest concerns about information security standards in today’s world; were the IoT to become the next new reality, serious efforts to create strict security stands need to be prioritized.
Penalties issued to Meta for EU GDPR violations 2024
statista.com
Updated Nov 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Penalties issued to Meta for EU GDPR violations 2024 [Dataset]. https://www.statista.com/statistics/1192794/meta-fines-from-eu-and-dpc/
Explore at:
Dataset updated
Nov 15, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Mar 2022 - Sep 2024
Area covered
Europe
Description
In September 2024, the Irish Data Protection Commission fined Meta Ireland 91 million euros after passwords of social media users were stored in 'plaintext' on Meta's internal systems rather than with cryptographic protection or encryption. In May 2023, the EU fined Meta 1.2 billion euros for violating laws on digital privacy and putting the data of EU citizens at risk through Facebook's EU-U.S. data transfers. European privacy legislation is seen as being far stricter than American privacy law, and the sending of EU citizens’ data to the United States resulted in the record breaking penalty being issued to the tech giant. In January 2023, after it was discovered that Meta Platforms had improperly required that users of Facebook, Instagram, and WhatsApp accept personalized adverts to use the platforms, the company was issued a 390 million euro fine by the European Commission. EU regulators claim that the social media giant broke the General Data Protection Regulation (GDPR) by including the demand in its terms of service. In addition, Meta was fined 405 million euros by the Irish Data Protection Commission (DPC) in September 2022 for violating Instagram's children's privacy settings. In November 2022, the DPC fined Meta a further 265 million euros for failing to protect their users from data scraping. GDPR violations in 2022 Social media sites and companies are not the only types of online services upon which users' data can potentially be compromised. In 2022, the online service with the biggest fine for violating GDPR was e-commerce and digital powerhouse Amazon, which was issued a 746 million euro fine. Furthermore, in December 2021, Google was penalized 90 million euros for GDPR violations. What are the most common GDPR violations? Since GDPR went into effect in May 2018, fines have been imposed for a variety of reasons. As of June 2022, companies' non-compliance with general data processing principles accounted for the largest share of fines, resulting in over 845 million euros worth of penalties. Insufficient legal basis for data processing was the second most common violation, amounting to 447 million euros in fines.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Statista (2025). Global market share of leading desktop search engines 2015-2025 [Dataset]. https://www.statista.com/statistics/216573/worldwide-market-share-of-search-engines/

Global market share of leading desktop search engines 2015-2025

Explore at:

495 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Jan 23, 2025

Dataset authored and provided by

Statistahttp://statista.com/

Time period covered

Jan 2015 - Jan 2025

Area covered

Worldwide

Description

As of January 2025, online search engine Bing accounted for 12.23 percent of the global desktop search market, while market leader Google had a share of around 78.83 percent. Meanwhile, Yahoo's market share was 3.07 percent. Google in the global market Ever since the introduction of Google Search in 1997, the company has dominated the search engine market, while the shares of all other tools has been rather lopsided. The majority of Google revenues are generated through advertising. Its parent corporation, Alphabet, was one of the biggest internet companies worldwide as of 2023, with a market capitalization of 1,6 trillion U.S. dollars. The company has also expanded its services to mail, productivity tools, enterprise products, mobile devices, and other ventures. As a result, Google earned one of the highest tech company revenues in 2023 with roughly 305.6 billion U.S. dollars. Search engine usage in different countries Google is the most frequently used search engine worldwide. But in some countries, its’ alternatives are leading or competing with it to some extent. As of the last quarter of 2023, more than 63 percent of internet users in Russia used Yandex, whereas Google users were nearly 36 percent. Meanwhile, Baidu was the most used search engine in China, despite a strong percentage decrease of internet users in the country accessing it. In other countries, like Japan and Mexico, people tend to use Yahoo along with Google. In the first quarter of 2022 nearly 56 percent of the respondents in Japan said that they had used Yahoo in the past four weeks. In the same year, over 27 percent of users in Mexico said they used Yahoo. Another search engine, Bing, operated by Microsoft, was the second most popular search engine in the United Kingdom after Google.

Clear search

Close search

Google apps

Main menu

Global market share of leading desktop search engines 2015-2025

Data from: Inventory of online public databases and repositories holding...

Financial Well-Being in America (2017)

covid-bing-query-gpt4

Data from: Qbias – A Dataset on Media Bias in Search Queries and Query...

Human Well-Being Index (HWBI) for U.S. Counties, 2000-2010

Common languages used for web content 2025, by share of websites

COVID-19 Time-Series Metrics by County and State (ARCHIVED)

Commuter Mode Share

A national dataset of rasterized building footprints for the U.S.

ACS Context for Senior Well-Being - Boundaries

Data from: Estimating Human Trafficking into the United States [Phase I:...

Monthly internet traffic in the U.S. 2018-2023

Loss of Work Due to Illness from COVID-19

Share of law enforcement agencies who reported crime data U.S. 2022, by...

Global number of breached user accounts Q1 2020-Q3 2024

Internet of Things - number of connected devices worldwide 2015-2025

Penalties issued to Meta for EU GDPR violations 2024

Global market share of leading desktop search engines 2015-2025