The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching *** zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than *** zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just * percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of **** percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached *** zettabytes.
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.3/customlicense?persistentId=doi:10.7910/DVN/WIYLEHhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.3/customlicense?persistentId=doi:10.7910/DVN/WIYLEH
Originally published by Harte-Hanks, the CiTDS dataset is now produced by Aberdeen Group, a subsidiary of Spiceworks Ziff Davis (SWZD). It is also referred to as CiTDB (Computer Intelligence Technology Database). CiTDS provides data on digital investments of businesses across the globe. It includes two types of technology datasets: (i) hardware expenditures and (ii) product installs. Hardware expenditure data is constructed through a combination of surveys and modeling. A survey is administered to a number of companies and the data from surveys is used to develop a prediction model of expenditures as a function of firm characteristics. CiTDS uses this model to predict the expenditures of non-surveyed firms and reports them in the dataset. In contrast, CiTDS does not do any imputation for product install data, which comes entirely from web scraping and surveys. A confidence score between 1-3 is assigned to indicate how much the source of information can be trusted. A 3 corresponds to 90-100 percent install likelihood, 2 corresponds to 75-90 percent install likelihood and 1 corresponds to 65-75 percent install likelihood. CiTDS reports technology adoption at the site level with a unique DUNS identifier. One of these sites is identified as an “enterprise,” corresponding to the firm that owns the sites. Therefore, it is possible to analyze technology adoption both at the site (establishment) and enterprise (firm) levels. CiTDS sources the site population from Dun and Bradstreet every year and drops sites that are not relevant to their clients. Due to this sample selection, there is quite a bit of variation in the number of sites from year to year, where on average, 10-15 percent of sites enter and exit every year in the US data. This number is higher in the EU data. We observe similar turnover year-to-year in the products included in the dataset. Some products have become absolute, and some new products are added every year. There are two versions of the data: (i) version 3, which covers 2016-2020, and (ii) version 4, which covers 2020-2021. The quality of version 4 is significantly better regarding the information included about the technology products. In version 3, product categories have missing values, and they are abbreviated in a way that are sometimes difficult to interpret. Version 4 does not have any major issues. Since both versions of the data are available in 2020, CiTDS provides a crosswalk between the versions. This makes it possible to use information about products in Version 4 for the products in Version 3, with the caveats that there will be no crosswalk for the products that exist in 2016-2019 but not in 2020. Finally, special attention should be paid to data from 2016, where the coverage is significantly different from 2017. From 2017 onwards, coverage is more consistent. Years of Coverage: APac: 2019 - 2021 Canada: 2015 - 2021 EMEA: 2019 - 2021 Europe: 2015 - 2018 Latin America: 2015, 2019- 2021 United States: 2015 - 2021
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
There are lots of datasets available for different machine learning tasks like NLP, Computer vision etc. However I couldn't find any dataset which catered to the domain of software testing. This is one area which has lots of potential for application of Machine Learning techniques specially deep-learning.
This was the reason I wanted such a dataset to exist. So, I made one.
New version [28th Nov'20]- Uploaded testing related questions and related details from stack-overflow. These are query results which were collected from stack-overflow by using stack-overflow's query viewer. The result set of this query contained posts which had the words "testing web pages".
New version[27th Nov'20] - Created a csv file containing pairs of test case titles and test case description.
This dataset is very tiny (approximately 200 rows of data). I have collected sample test cases from around the web and created a text file which contains all the test cases that I have collected. This text file has sections and under each section there are numbered rows of test cases.
I would like to thank websites like guru99.com, softwaretestinghelp.com and many other such websites which host great many sample test cases. These were the source for the test cases in this dataset.
My Inspiration to create this dataset was the scarcity of examples showcasing the implementation of machine learning on the domain of software testing. I would like to see if this dataset can be used to answer questions similar to the following--> * Finding semantic similarity between different test cases ranging across products and applications. * Automating the elimination of duplicate test cases in a test case repository. * Cana recommendation system be built for suggesting domain specific test cases to software testers.
The global number of Facebook users was forecast to continuously increase between 2023 and 2027 by in total 391 million users (+14.36 percent). After the fourth consecutive increasing year, the Facebook user base is estimated to reach 3.1 billion users and therefore a new peak in 2027. Notably, the number of Facebook users was continuously increasing over the past years. User figures, shown here regarding the platform Facebook, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
This dataset provides locations and technical specifications of wind turbines in the United States, almost all of which are utility-scale. Utility-scale turbines are ones that generate power and feed it into the grid, supplying a utility with energy. They are usually much larger than turbines that would feed a house or business. The regularly updated database contains wind turbine records that have been collected, digitized, and locationally verified. Turbine data were gathered from the Federal Aviation Administration's (FAA) Digital Obstacle File (DOF) and Obstruction Evaluation Airport Airspace Analysis (OE-AAA), American Clean Power (ACP) Association (formerly American Wind Energy Association (AWEA)), Lawrence Berkeley National Laboratory (LBNL), and the United States Geological Survey (USGS), and were merged and collapsed into a single dataset. Verification of the turbine positions was done by visual interpretation using high-resolution aerial imagery in ESRI ArcGIS Desktop. A locational error of plus or minus 10 meters for turbine locations was tolerated. Technical specifications for turbines were assigned based on the wind turbine make and models as provided by manufacturers and project developers directly, and via FAA datasets, information on the wind project developer or turbine manufacturer websites, or other online sources. Some facility and turbine information on make and model did not exist or was difficult to obtain. Thus, uncertainty may exist for certain turbine specifications. Similarly, some turbines were not yet built, not built at all, or for other reasons cannot be verified visually. Location and turbine specifications data quality are rated, and a confidence level (1 to 3) is recorded for both. None of the data are field verified.
This Dataset shows the Alexa Top 100 International Websites, and provides metrics on the volume of traffic that these sites were able to handle. The Alexa top 100 lists the 100 most visited websites in the world and measures various statistical information. I have looked up the Headquarters, either through alexa, or a Whois Lookup to get street address with i was then able to geocode. I was only able to successfully geocode 85 of the top 100 sites throughout the world. Source of Data was Alexa.com, Source URL: http://www.alexa.com/site/ds/top_sites?ts_mode=global&lang=none Data was from October 12, 2007. Alexa is updated daily so to get more up to date information visit their site directly. they don't have maps though.
This dataset provides locations and technical specifications of wind turbines in the United States, almost all of which are utility-scale. Utility-scale turbines are ones that generate power and feed it into the grid, supplying a utility with energy. They are usually much larger than turbines that would feed a homeowner or business.
The data formats downloadable from the Minnesota Geospatial Commons contain just the Minnesota turbines. Data, maps and services accessed from the USWTDB website provide nationwide turbines.
The regularly updated database has wind turbine records that have been collected, digitized, and locationally verified. Turbine data were gathered from the Federal Aviation Administration's (FAA) Digital Obstacle File (DOF) and Obstruction Evaluation Airport Airspace Analysis (OE-AAA), the American Wind Energy Association (AWEA), Lawrence Berkeley National Laboratory (LBNL), and the United States Geological Survey (USGS), and were merged and collapsed into a single data set.
Verification of the turbine positions was done by visual interpretation using high-resolution aerial imagery in Esri ArcGIS Desktop. A locational error of plus or minus 10 meters for turbine locations was tolerated. Technical specifications for turbines were assigned based on the wind turbine make and models as provided by manufacturers and project developers directly, and via FAA datasets, information on the wind project developer or turbine manufacturer websites, or other online sources. Some facility and turbine information on make and model did not exist or was difficult to obtain. Thus, uncertainty may exist for certain turbine specifications. Similarly, some turbines were not yet built, not built at all, or for other reasons cannot be verified visually. Location and turbine specifications data quality are rated and a confidence is recorded for both. None of the data are field verified.
The U.S. Wind Turbine Database website provides the national data in many different formats: shapefile, CSV, GeoJSON, web services (cached and dynamic), API, and web viewer. See: https://eerscmap.usgs.gov/uswtdb/
The web viewer provides many options to search; filter by attribute, date and location; and customize the map display. For details and screenshots of these options, see: https://eerscmap.usgs.gov/uswtdb/help/
------------
This metadata record was adapted by the Minnesota Geospatial Information Office (MnGeo) from the national version of the metadata. It describes the Minnesota extract of the shapefile data that has been projected from geographic to UTM coordinates and converted to Esri file geodatabase (fgdb) format. There may be more recent updates available on the national website. Accessing the data via the national web services or API will always provide the most recent data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset includes sightings of insects in Ukraine that have been published in select important scientific publications and authorized online sources. This information is significant in the context of assessing the consequences of the russian-Ukrainian war and therefore mainly includes data on insects from the Steppe zone of Ukraine, which, since 2022, has been almost entirely affected by military actions or has fallen under occupation. Scientific research in this area has been halted, and the territory has suffered considerable damage due to shelling, bombings, the construction of fortifications, and wildfires. Additionally, invasive plant species have begun to spread spontaneously across the region. All previously collected information on the biodiversity of these areas has now become historical and may be used in the future to assess the impacts of the war. Furthermore, there is a pressing need to preserve the data itself. Some authors, particularly those who previously maintained personal websites featuring insect photographs and collections, have ceased work on these sites and left Ukraine. Some such websites have gone offline (for example, https://lepidoptera.crimea.ua/index.htm) and now exist only in web archives. This dataset is dynamic and will be gradually supplemented with new data from additional sources, which will be further processed by the authors. The dataset includes the findings of Lepidoptera representatives in the steppe zone of Ukraine according to several literature resources, namely: 1) Zlatkov B., Budashkin Yu. Taxonomic and distributional remarks on some Palaearctic Cydia of the succedana-group with descriptions of two new species (Tortricidae). Nota lepi. 35 (1): 97 – 107; 2) Савчук В. В., Кайгородова Н. С. Новые сведения по фауне и биологии чешуекрылых (Lepidoptera) Крыма. Часть II. Кавказский энтомологический бюллетень. 2020. 16(2): 255–264; 3) Karolinskiy Ye. A., Demyanenko S. A., Guglya Yu. A., Zhakov A. V., Kavurka V. V., Mushinskiy V. G.. On the fauna of Lepidoptera (Insecta) of the national nature park ‘Dvorichanskyi’ (Kharkiv region, Ukraine) and its environs. Contribution 2. The Kharkov Entomol. Soc. Gaz. 2018. Vol. XXVI, iss. 1. P. 55–114; 4) Будашкин Ю. И. Материалы по фауне Чешуекрылых (Lepidoptera) Казантипского природного заповедника. Труды Никитского ботанического сада – Национального научного центра. 2006. Том 126. С. 263-290. 5) Ключко, З. Ф.Аннотированный каталог совок (Lepidoptera, Noctuidae) фауны Украины : монография / З. Ф. Ключко, И. Г. Плющ, П. Н. Шешурак. - Киев : Институт зоологии НАН Украины, 2001. - 880 с. 6) https://lepidoptera.crimea.ua/index.htm 7) https://alsphotopage.com/
Reporting of new Aggregate Case and Death Count data was discontinued May 11, 2023, with the expiration of the COVID-19 public health emergency declaration. This dataset will receive a final update on June 1, 2023, to reconcile historical data through May 10, 2023, and will remain publicly available.
Aggregate Data Collection Process Since the start of the COVID-19 pandemic, data have been gathered through a robust process with the following steps:
Methodology Changes Several differences exist between the current, weekly-updated dataset and the archived version:
Confirmed and Probable Counts In this dataset, counts by jurisdiction are not displayed by confirmed or probable status. Instead, confirmed and probable cases and deaths are included in the Total Cases and Total Deaths columns, when available. Not all jurisdictions report probable cases and deaths to CDC.* Confirmed and probable case definition criteria are described here:
Council of State and Territorial Epidemiologists (ymaws.com).
Deaths CDC reports death data on other sections of the website: CDC COVID Data Tracker: Home, CDC COVID Data Tracker: Cases, Deaths, and Testing, and NCHS Provisional Death Counts. Information presented on the COVID Data Tracker pages is based on the same source (to
This hosted feature layer has been published in RI State Plane Feet NAD 83.Representative locations of structures and sites throughout Rhode Island. These data include addressed and unaddressed locations as well as occupied and unoccupied structures. These data were originally designed and developed for Rhode Island E 9-1-1 Uniform Emergency Telephone System (RI E 9-1-1) purposes. This dataset continues to be maintained to provide an accurate spatial reference for RI E 9-1-1 telecommunicators. Portions of this dataset were collected as early as 2001. Inaccuracies do exist in these data and are therefore under constant revision. Any discrepancies, inaccuracies or inconsistencies recognized in these data should be reported to the pertinent municipality who should alert RI E-911. Users are also encouraged to email ri911gis@akassociates911.com with any suggested updates for this actively maintained dataset.
How many people use social media?
Social media usage is one of the most popular online activities. In 2024, over five billion people were using social media worldwide, a number projected to increase to over six billion in 2028.
Who uses social media?
Social networking is one of the most popular digital activities worldwide and it is no surprise that social networking penetration across all regions is constantly increasing. As of January 2023, the global social media usage rate stood at 59 percent. This figure is anticipated to grow as lesser developed digital markets catch up with other regions
when it comes to infrastructure development and the availability of cheap mobile devices. In fact, most of social media’s global growth is driven by the increasing usage of mobile devices. Mobile-first market Eastern Asia topped the global ranking of mobile social networking penetration, followed by established digital powerhouses such as the Americas and Northern Europe.
How much time do people spend on social media?
Social media is an integral part of daily internet usage. On average, internet users spend 151 minutes per day on social media and messaging apps, an increase of 40 minutes since 2015. On average, internet users in Latin America had the highest average time spent per day on social media.
What are the most popular social media platforms?
Market leader Facebook was the first social network to surpass one billion registered accounts and currently boasts approximately 2.9 billion monthly active users, making it the most popular social network worldwide. In June 2023, the top social media apps in the Apple App Store included mobile messaging apps WhatsApp and Telegram Messenger, as well as the ever-popular app version of Facebook.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We assembled occurrence records (presence-only) for all four horseshoe crab species in Asia and Eastern America from our own observations, collaborators, scientific networks as well as through publishing a scratchpad site at http://horseshoecrabs.myspecies.info/. For many species, numerous distribution records exist in the literature, and we manually geo-referenced additional occurrence data from these sources.
This hosted view feature layer has been published in RI State Plane Feet NAD 83.For complete metadata record - https://www.rigis.org/datasets/edc::e-911-sites/aboutRepresentative locations of structures and sites coded as P7 Site type throughout Rhode Island. These data include addressed and unaddressed locations as well as occupied and unoccupied structures. These data were originally designed and developed for Rhode Island E 9-1-1 Uniform Emergency Telephone System (RI E 9-1-1) purposes. This dataset continues to be maintained to provide an accurate spatial reference for RI E 9-1-1 telecommunicators. Portions of this dataset were collected as early as 2001. Inaccuracies do exist in these data and are therefore under constant revision. Any discrepancies, inaccuracies or inconsistencies recognized in these data should be reported to the pertinent municipality who should alert RI E-911. Users are also encouraged to email ri911gis@akassociates911.com with any suggested updates for this actively maintained dataset.
This hosted view feature layer has been published in RI State Plane Feet NAD 83.For complete metadata record - https://www.rigis.org/datasets/edc::e-911-sites/aboutRepresentative locations of structures and sites coded as P6 Site type throughout Rhode Island. These data include addressed and unaddressed locations as well as occupied and unoccupied structures. These data were originally designed and developed for Rhode Island E 9-1-1 Uniform Emergency Telephone System (RI E 9-1-1) purposes. This dataset continues to be maintained to provide an accurate spatial reference for RI E 9-1-1 telecommunicators. Portions of this dataset were collected as early as 2001. Inaccuracies do exist in these data and are therefore under constant revision. Any discrepancies, inaccuracies or inconsistencies recognized in these data should be reported to the pertinent municipality who should alert RI E-911. Users are also encouraged to email ri911gis@akassociates911.com with any suggested updates for this actively maintained dataset.
This data set provides industrial-scale onshore wind turbine locations, corresponding facility information, and turbine technical specifications, in the United States to March 2014. The database has nearly 49,000 wind turbine records that have been collected, digitized, locationally verified, and internally quality assured and quality controlled. Turbines from the Federal Aviation Administration Digital Obstacle File, product date March 2, 2014, were used as the primary source of turbine data points. Verification of the position of turbines was done by visual interpretation using high-resolution aerial imagery in ESRI ArcGIS Desktop. Turbines without Federal Aviation Administration Obstacle Repository System (FAA ORS) numbers were visually identified and supplemental points were added to the collection. A locational error of plus or minus 10 meters for turbine positions was estimated. Wind farm facility names were identified from publicly available facility data sets. Facility names were then used in a web search of additional industry publications and press releases to attribute additional turbine information (such as manufacturer, model, and technical specifications of wind turbines). Wind farm facility location data from various wind and energy industry sources were used to search for and digitize turbines not in existing databases. Technical specifications assigned to were based on the make and model as described in literature, in the Federal Aviation Administration Digital Obstacle File, and information from the turbine manufacturers' websites. Some facility and turbine information did not exist or was difficult to obtain. Thus, uncertainty may be present. That uncertainty was rated and a confidence was recorded for both location and attribution data quality.
Trusted Research Environments (TREs) enable analysis of sensitive data under strict security assertions that protect the data with technical organizational and legal measures from (accidentally) being leaked outside the facility. While many TREs exist in Europe, little information is available publicly on the architecture and descriptions of their building blocks & their slight technical variations. To shine light on these problems, we give an overview of existing, publicly described TREs and a bibliography linking to the system description. We further analyze their technical characteristics, especially in their commonalities & variations and provide insight on their data type characteristics and availability. Our literature study shows that 47 TREs worldwide provide access to sensitive data of which two-thirds provide data themselves, predominantly via secure remote access. Statistical offices make available a majority of available sensitive data records included in this study. Methodology We performed a literature study covering 47 TREs worldwide using scholarly databases (Scopus, Web of Science, IEEE Xplore, Science Direct), a computer science library (dblp.org), Google and grey literature focusing on retrieving the following source material: Peer-reviewed articles where available, TRE websites, TRE metadata catalogs. The goal for this literature study is to discover existing TREs, analyze their characteristics and data availability to give an overview on available infrastructure for sensitive data research as many European initiatives have been emerging in recent months. Technical details This dataset consists of five comma-separated values (.csv) files describing our inventory: countries.csv: Table of countries with columns id (number), name (text) and code (text, in ISO 3166-A3 encoding, optional) tres.csv: Table of TREs with columns id (number), name (text), countryid (number, refering to column id of table countries), structureddata (bool, optional), datalevel (one of [1=de-identified, 2=pseudonomized, 3=anonymized], optional), outputcontrol (bool, optional), inceptionyear (date, optional), records (number, optional), datatype (one of [1=claims, 2=linked records]), optional), statistics_office (bool), size (number, optional), source (text, optional), comment (text, optional) access.csv: Table of access modes of TREs with columns id (number), suf (bool, optional), physical_visit (bool, optional), external_physical_visit (bool, optional), remote_visit (bool, optional) inclusion.csv: Table of included TREs into the literature study with columns id (number), included (bool), exclusion reason (one of [peer review, environment, duplicate], optional), comment (text, optional) major_fields.csv: Table of data categorization into the major research fields with columns id (number), life_sciences (bool, optional), physical_sciences (bool, optional), arts_and_humanities (bool, optional), social_sciences (bool, optional). Additionally, a MariaDB (10.5 or higher) schema definition .sql file is needed, properly modelling the schema for databases: schema.sql: Schema definition file to create the tables and views used in the analysis. The analysis was done through Jupyter Notebook which can be found in our source code repository: https://gitlab.tuwien.ac.at/martin.weise/tres/-/blob/master/analysis.ipynb
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the second version of the Google Landmarks dataset (GLDv2), which contains images annotated with labels representing human-made and natural landmarks. The dataset can be used for landmark recognition and retrieval experiments. This version of the dataset contains approximately 5 million images, split into 3 sets of images: train, index and test. The dataset was presented in our CVPR'20 paper. In this repository, we present download links for all dataset files and relevant code for metric computation. This dataset was associated to two Kaggle challenges, on landmark recognition and landmark retrieval. Results were discussed as part of a CVPR'19 workshop. In this repository, we also provide scores for the top 10 teams in the challenges, based on the latest ground-truth version. Please visit the challenge and workshop webpages for more details on the data, tasks and technical solutions from top teams.
Likes and image data from the community art website Behance. This is a small, anonymized, version of a larger proprietary dataset.
Metadata includes
appreciates (likes)
timestamps
extracted image features
Basic Statistics:
Users: 63,497
Items: 178,788
Appreciates (likes): 1,000,000
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
This dataset and its metadata statement were supplied to the Bioregional Assessment Programme by a third party and are presented here as originally supplied.
Two files contain the preliminary site data and water quality data required by the Bureau of Meteorology (BoM) under the conditions of the Water Act 2008. Please understand that in order to achieve these preliminary files, there has been quite a deal of work over a short amount of time. This has been greatly assisted by (and in fact would not have been possible without) the BoM’s financial assistance in terms of funding of Project NSW 6.1 - Remodelling, update and migration of the DECC water quality database. Note however, that due to the relatively short timeframe involved, a number of caveats still need to be placed on these preliminary files until a full QA/QC and data integrity and consistency check has been completed on the database. This is currently being implemented and it is recommended that additional contact is made with DECC prior to the release or use of this data. DECC will be continuing to refine and QA/QC this database and will inform BoM if this affects any data in these preliminary data files.
Some of this water data has been collected under an agreement with the Murray Darling Basin Commission (now the Murray Darling Authority). Part of this agreement deals with confidentiality regarding the identification of sites on individual landholder properties. In particular: “By providing locations at this (valley name or zone name only) accuracy there is reduced risk that future sampling at that location is confounded by intentional activities at the site. Types of impacts that might be envisaged include the unauthorised collection of rare or endangered fish or macroinvertebrate species at identifiable SRA sample sites, the undesired identification of SRA sites that exist on private property, comparisons of data collected at SRA sites to deduce some causal effect due to the landholding on which those sites exist and so on†. Any supply of/access to/reporting of this data should take such confidentialities into account.
With the Site data, latitude and longitudes or eastings and northings are still be checked/added for those sites without such data. An updated Site file will be forwarded to BoM once it is finalised.
Lastly, this data is supplied in good faith, exercising all due care and attention. No representation is made about the accuracy, completeness or suitability of the information for any particular purpose. DECC does not accept liability for any damage which may occur to any person or organization taking action or not on the basis of these data.
This data was provided to the Bureau of Meteorology under the water regulations from the NSW Department of Environment & Heritage
NSW - Department of Environment and Heritage (2009) NSW Department of Environment and Heritage Historic Water Quality Data. Bioregional Assessment Source Dataset. Viewed 07 April 2016, http://data.bioregionalassessments.gov.au/dataset/4c5f7318-2567-4614-aa35-46aa0eb045f2.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Internet has dramatically expanded citizens’ access to and ability to engage with political information. On many websites, any user can contribute and edit “crowd-sourced” information about important political figures. One of the most prominent examples of crowd-sourced information on the Internet is Wikipedia, a free and open encyclopedia created and edited entirely by users, and one of the world’s most accessed websites. While previous studies of crowd-sourced information platforms have found them to be accurate, few have considered biases in what kinds of information are included. We report the results of four randomized field experiments that sought to explore what biases exist in the political articles of this collaborative website. By randomly assigning factually true but either positive or negative and cited or uncited information to the Wikipedia pages of U.S. senators, we uncover substantial evidence of an editorial bias toward positivity on Wikipedia: Negative facts are 36% more likely to be removed by Wikipedia editors than positive facts within 12 hours and 29% more likely within 3 days. Although citations substantially increase an edit’s survival time, the editorial bias toward positivity is not eliminated by inclusion of a citation. We replicate this study on the Wikipedia pages of deceased as well as recently retired but living senators and find no evidence of an editorial bias in either. Our results demonstrate that crowd-sourced information is subject to an editorial bias that favors the politically active.
The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching *** zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than *** zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just * percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of **** percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached *** zettabytes.