40 datasets found

Amount of data created, consumed, and stored 2010-2023, with forecasts to...
statista.com
tokrwards.com
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Amount of data created, consumed, and stored 2010-2023, with forecasts to 2028 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/
Explore at:
Dataset updated
Jun 30, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
May 2024
Area covered
Worldwide
Description
The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching *** zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than *** zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just * percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of **** percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached *** zettabytes.
H
Ci Technology DataSet
dataverse.harvard.edu
Updated Feb 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harte Hanks (2024). Ci Technology DataSet [Dataset]. http://doi.org/10.7910/DVN/WIYLEH
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/WIYLEH
Dataset updated
Feb 26, 2024
Dataset provided by
Harvard Dataverse
Authors
Harte Hanks
License
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.3/customlicense?persistentId=doi:10.7910/DVN/WIYLEHhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.3/customlicense?persistentId=doi:10.7910/DVN/WIYLEH
Description
Originally published by Harte-Hanks, the CiTDS dataset is now produced by Aberdeen Group, a subsidiary of Spiceworks Ziff Davis (SWZD). It is also referred to as CiTDB (Computer Intelligence Technology Database). CiTDS provides data on digital investments of businesses across the globe. It includes two types of technology datasets: (i) hardware expenditures and (ii) product installs. Hardware expenditure data is constructed through a combination of surveys and modeling. A survey is administered to a number of companies and the data from surveys is used to develop a prediction model of expenditures as a function of firm characteristics. CiTDS uses this model to predict the expenditures of non-surveyed firms and reports them in the dataset. In contrast, CiTDS does not do any imputation for product install data, which comes entirely from web scraping and surveys. A confidence score between 1-3 is assigned to indicate how much the source of information can be trusted. A 3 corresponds to 90-100 percent install likelihood, 2 corresponds to 75-90 percent install likelihood and 1 corresponds to 65-75 percent install likelihood. CiTDS reports technology adoption at the site level with a unique DUNS identifier. One of these sites is identified as an “enterprise,” corresponding to the firm that owns the sites. Therefore, it is possible to analyze technology adoption both at the site (establishment) and enterprise (firm) levels. CiTDS sources the site population from Dun and Bradstreet every year and drops sites that are not relevant to their clients. Due to this sample selection, there is quite a bit of variation in the number of sites from year to year, where on average, 10-15 percent of sites enter and exit every year in the US data. This number is higher in the EU data. We observe similar turnover year-to-year in the products included in the dataset. Some products have become absolute, and some new products are added every year. There are two versions of the data: (i) version 3, which covers 2016-2020, and (ii) version 4, which covers 2020-2021. The quality of version 4 is significantly better regarding the information included about the technology products. In version 3, product categories have missing values, and they are abbreviated in a way that are sometimes difficult to interpret. Version 4 does not have any major issues. Since both versions of the data are available in 2020, CiTDS provides a crosswalk between the versions. This makes it possible to use information about products in Version 4 for the products in Version 3, with the caveats that there will be no crosswalk for the products that exist in 2016-2019 but not in 2020. Finally, special attention should be paid to data from 2016, where the coverage is significantly different from 2017. From 2017 onwards, coverage is more consistent. Years of Coverage: APac: 2019 - 2021 Canada: 2015 - 2021 EMEA: 2019 - 2021 Europe: 2015 - 2018 Latin America: 2015, 2019- 2021 United States: 2015 - 2021
The Test-Case Dataset
kaggle.com
Updated Nov 29, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
sapal6 (2020). The Test-Case Dataset [Dataset]. https://www.kaggle.com/datasets/sapal6/the-testcase-dataset/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 29, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
sapal6
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

There are lots of datasets available for different machine learning tasks like NLP, Computer vision etc. However I couldn't find any dataset which catered to the domain of software testing. This is one area which has lots of potential for application of Machine Learning techniques specially deep-learning.

This was the reason I wanted such a dataset to exist. So, I made one.

Content

New version [28th Nov'20]- Uploaded testing related questions and related details from stack-overflow. These are query results which were collected from stack-overflow by using stack-overflow's query viewer. The result set of this query contained posts which had the words "testing web pages".

New version[27th Nov'20] - Created a csv file containing pairs of test case titles and test case description.

This dataset is very tiny (approximately 200 rows of data). I have collected sample test cases from around the web and created a text file which contains all the test cases that I have collected. This text file has sections and under each section there are numbered rows of test cases.

Acknowledgements

I would like to thank websites like guru99.com, softwaretestinghelp.com and many other such websites which host great many sample test cases. These were the source for the test cases in this dataset.

Inspiration

My Inspiration to create this dataset was the scarcity of examples showcasing the implementation of machine learning on the domain of software testing. I would like to see if this dataset can be used to answer questions similar to the following--> * Finding semantic similarity between different test cases ranging across products and applications. * Automating the elimination of duplicate test cases in a test case repository. * Cana recommendation system be built for suggesting domain specific test cases to software testers.
Facebook users worldwide 2017-2027
statista.com
tokrwards.com
+4more
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stacy Jo Dixon, Facebook users worldwide 2017-2027 [Dataset]. https://www.statista.com/topics/1164/social-networks/
Explore at:
Dataset provided by
Statistahttp://statista.com/
Authors
Stacy Jo Dixon
Description
The global number of Facebook users was forecast to continuously increase between 2023 and 2027 by in total 391 million users (+14.36 percent). After the fourth consecutive increasing year, the Facebook user base is estimated to reach 3.1 billion users and therefore a new peak in 2027. Notably, the number of Facebook users was continuously increasing over the past years. User figures, shown here regarding the platform Facebook, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
d
United States Wind Turbine Database
catalog.data.gov
data.usgs.gov
Updated Oct 7, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). United States Wind Turbine Database [Dataset]. https://catalog.data.gov/dataset/united-states-wind-turbine-database
Explore at:
Dataset updated
Oct 7, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
This dataset provides locations and technical specifications of wind turbines in the United States, almost all of which are utility-scale. Utility-scale turbines are ones that generate power and feed it into the grid, supplying a utility with energy. They are usually much larger than turbines that would feed a house or business. The regularly updated database contains wind turbine records that have been collected, digitized, and locationally verified. Turbine data were gathered from the Federal Aviation Administration's (FAA) Digital Obstacle File (DOF) and Obstruction Evaluation Airport Airspace Analysis (OE-AAA), American Clean Power (ACP) Association (formerly American Wind Energy Association (AWEA)), Lawrence Berkeley National Laboratory (LBNL), and the United States Geological Survey (USGS), and were merged and collapsed into a single dataset. Verification of the turbine positions was done by visual interpretation using high-resolution aerial imagery in ESRI ArcGIS Desktop. A locational error of plus or minus 10 meters for turbine locations was tolerated. Technical specifications for turbines were assigned based on the wind turbine make and models as provided by manufacturers and project developers directly, and via FAA datasets, information on the wind project developer or turbine manufacturer websites, or other online sources. Some facility and turbine information on make and model did not exist or was difficult to obtain. Thus, uncertainty may exist for certain turbine specifications. Similarly, some turbines were not yet built, not built at all, or for other reasons cannot be verified visually. Location and turbine specifications data quality are rated, and a confidence level (1 to 3) is recorded for both. None of the data are field verified.
g
Alexa, International Top 100 Websites, Global, 10.12.2007
geocommons.com
Updated Apr 29, 2008
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexa (2008). Alexa, International Top 100 Websites, Global, 10.12.2007 [Dataset]. http://geocommons.com/search.html
Explore at:
Dataset updated
Apr 29, 2008
Dataset provided by
data
Alexa
Description
This Dataset shows the Alexa Top 100 International Websites, and provides metrics on the volume of traffic that these sites were able to handle. The Alexa top 100 lists the 100 most visited websites in the world and measures various statistical information. I have looked up the Headquarters, either through alexa, or a Whois Lookup to get street address with i was then able to geocode. I was only able to successfully geocode 85 of the top 100 sites throughout the world. Source of Data was Alexa.com, Source URL: http://www.alexa.com/site/ds/top_sites?ts_mode=global&lang=none Data was from October 12, 2007. Alexa is updated daily so to get more up to date information visit their site directly. they don't have maps though.
m
U.S. Wind Turbine Database, Minnesota and National
gisdata.mn.gov
fgdb, webapp
Updated Jun 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Geospatial Information Office (2025). U.S. Wind Turbine Database, Minnesota and National [Dataset]. https://gisdata.mn.gov/dataset/util-uswtdb
Explore at:
webapp, fgdbAvailable download formats
Dataset updated
Jun 24, 2025
Dataset provided by
Geospatial Information Office
Description
This dataset provides locations and technical specifications of wind turbines in the United States, almost all of which are utility-scale. Utility-scale turbines are ones that generate power and feed it into the grid, supplying a utility with energy. They are usually much larger than turbines that would feed a homeowner or business.

The data formats downloadable from the Minnesota Geospatial Commons contain just the Minnesota turbines. Data, maps and services accessed from the USWTDB website provide nationwide turbines.

The regularly updated database has wind turbine records that have been collected, digitized, and locationally verified. Turbine data were gathered from the Federal Aviation Administration's (FAA) Digital Obstacle File (DOF) and Obstruction Evaluation Airport Airspace Analysis (OE-AAA), the American Wind Energy Association (AWEA), Lawrence Berkeley National Laboratory (LBNL), and the United States Geological Survey (USGS), and were merged and collapsed into a single data set.

Verification of the turbine positions was done by visual interpretation using high-resolution aerial imagery in Esri ArcGIS Desktop. A locational error of plus or minus 10 meters for turbine locations was tolerated. Technical specifications for turbines were assigned based on the wind turbine make and models as provided by manufacturers and project developers directly, and via FAA datasets, information on the wind project developer or turbine manufacturer websites, or other online sources. Some facility and turbine information on make and model did not exist or was difficult to obtain. Thus, uncertainty may exist for certain turbine specifications. Similarly, some turbines were not yet built, not built at all, or for other reasons cannot be verified visually. Location and turbine specifications data quality are rated and a confidence is recorded for both. None of the data are field verified.

The U.S. Wind Turbine Database website provides the national data in many different formats: shapefile, CSV, GeoJSON, web services (cached and dynamic), API, and web viewer. See: https://eerscmap.usgs.gov/uswtdb/

The web viewer provides many options to search; filter by attribute, date and location; and customize the map display. For details and screenshots of these options, see: https://eerscmap.usgs.gov/uswtdb/help/

------------
This metadata record was adapted by the Minnesota Geospatial Information Office (MnGeo) from the national version of the metadata. It describes the Minnesota extract of the shapefile data that has been projected from geographic to UTM coordinates and converted to Esri file geodatabase (fgdb) format. There may be more recent updates available on the national website. Accessing the data via the national web services or API will always provide the most recent data.
Records of Insecta species in Ukraine according to some literature and...
gbif.org
demo.gbif.org
Updated Jul 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maksym Marushchak; Oleksii Marushchak; Olexiy Vasyliuk; Dmytro Pozniak; Anton Biatov; Maksym Marushchak; Oleksii Marushchak; Olexiy Vasyliuk; Dmytro Pozniak; Anton Biatov (2025). Records of Insecta species in Ukraine according to some literature and authorized Internet resources [Dataset]. http://doi.org/10.15468/bkfd22
Explore at:
Unique identifier
https://doi.org/10.15468/bkfd22
Dataset updated
Jul 10, 2025
Dataset provided by
Global Biodiversity Information Facilityhttps://www.gbif.org/
Ukrainian Nature Conservation Group (NGO)
Authors
Maksym Marushchak; Oleksii Marushchak; Olexiy Vasyliuk; Dmytro Pozniak; Anton Biatov; Maksym Marushchak; Oleksii Marushchak; Olexiy Vasyliuk; Dmytro Pozniak; Anton Biatov
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
The dataset includes sightings of insects in Ukraine that have been published in select important scientific publications and authorized online sources. This information is significant in the context of assessing the consequences of the russian-Ukrainian war and therefore mainly includes data on insects from the Steppe zone of Ukraine, which, since 2022, has been almost entirely affected by military actions or has fallen under occupation. Scientific research in this area has been halted, and the territory has suffered considerable damage due to shelling, bombings, the construction of fortifications, and wildfires. Additionally, invasive plant species have begun to spread spontaneously across the region. All previously collected information on the biodiversity of these areas has now become historical and may be used in the future to assess the impacts of the war. Furthermore, there is a pressing need to preserve the data itself. Some authors, particularly those who previously maintained personal websites featuring insect photographs and collections, have ceased work on these sites and left Ukraine. Some such websites have gone offline (for example, https://lepidoptera.crimea.ua/index.htm) and now exist only in web archives. This dataset is dynamic and will be gradually supplemented with new data from additional sources, which will be further processed by the authors. The dataset includes the findings of Lepidoptera representatives in the steppe zone of Ukraine according to several literature resources, namely: 1) Zlatkov B., Budashkin Yu. Taxonomic and distributional remarks on some Palaearctic Cydia of the succedana-group with descriptions of two new species (Tortricidae). Nota lepi. 35 (1): 97 – 107; 2) Савчук В. В., Кайгородова Н. С. Новые сведения по фауне и биологии чешуекрылых (Lepidoptera) Крыма. Часть II. Кавказский энтомологический бюллетень. 2020. 16(2): 255–264; 3) Karolinskiy Ye. A., Demyanenko S. A., Guglya Yu. A., Zhakov A. V., Kavurka V. V., Mushinskiy V. G.. On the fauna of Lepidoptera (Insecta) of the national nature park ‘Dvorichanskyi’ (Kharkiv region, Ukraine) and its environs. Contribution 2. The Kharkov Entomol. Soc. Gaz. 2018. Vol. XXVI, iss. 1. P. 55–114; 4) Будашкин Ю. И. Материалы по фауне Чешуекрылых (Lepidoptera) Казантипского природного заповедника. Труды Никитского ботанического сада – Национального научного центра. 2006. Том 126. С. 263-290. 5) Ключко, З. Ф.Аннотированный каталог совок (Lepidoptera, Noctuidae) фауны Украины : монография / З. Ф. Ключко, И. Г. Плющ, П. Н. Шешурак. - Киев : Институт зоологии НАН Украины, 2001. - 880 с. 6) https://lepidoptera.crimea.ua/index.htm 7) https://alsphotopage.com/
Weekly United States COVID-19 Cases and Deaths by State - ARCHIVED
data.virginia.gov
healthdata.gov
+1more
csv, json, rdf, xsl
Updated Feb 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Centers for Disease Control and Prevention (2025). Weekly United States COVID-19 Cases and Deaths by State - ARCHIVED [Dataset]. https://data.virginia.gov/dataset/weekly-united-states-covid-19-cases-and-deaths-by-state-archived
Explore at:
rdf, csv, json, xslAvailable download formats
Dataset updated
Feb 23, 2025
Dataset provided by
Centers for Disease Control and Preventionhttp://www.cdc.gov/
Area covered
United States
Description
Reporting of new Aggregate Case and Death Count data was discontinued May 11, 2023, with the expiration of the COVID-19 public health emergency declaration. This dataset will receive a final update on June 1, 2023, to reconcile historical data through May 10, 2023, and will remain publicly available.

Aggregate Data Collection Process Since the start of the COVID-19 pandemic, data have been gathered through a robust process with the following steps:
A CDC data team reviews and validates the information obtained from jurisdictions’ state and local websites via an overnight data review process.

If more than one official county data source exists, CDC uses a comprehensive data selection process comparing each official county data source, and takes the highest case and death counts respectively, unless otherwise specified by the state.

CDC compiles these data and posts the finalized information on COVID Data Tracker.

County level data is aggregated to obtain state and territory specific totals.

This process is collaborative, with CDC and jurisdictions working together to ensure the accuracy of COVID-19 case and death numbers. County counts provide the most up-to-date numbers on cases and deaths by report date. CDC may retrospectively update counts to correct data quality issues.

Methodology Changes Several differences exist between the current, weekly-updated dataset and the archived version:
Source: The current Weekly-Updated Version is based on county-level aggregate count data, while the Archived Version is based on State-level aggregate count data.

Confirmed/Probable Cases/Death breakdown:  While the probable cases and deaths are included in the total case and total death counts in both versions (if applicable), they were reported separately from the confirmed cases and deaths by jurisdiction in the Archived Version.  In the current Weekly-Updated Version, the counts by jurisdiction are not reported by confirmed or probable status (See Confirmed and Probable Counts section for more detail).

Time Series Frequency: The current Weekly-Updated Version contains weekly time series data (i.e., one record per week per jurisdiction), while the Archived Version contains daily time series data (i.e., one record per day per jurisdiction).

Update Frequency: The current Weekly-Updated Version is updated weekly, while the Archived Version was updated twice daily up to October 20, 2022.
Important note: The counts reflected during a given time period in this dataset may not match the counts reflected for the same time period in the archived dataset noted above. Discrepancies may exist due to differences between county and state COVID-19 case surveillance and reconciliation efforts.

Confirmed and Probable Counts In this dataset, counts by jurisdiction are not displayed by confirmed or probable status. Instead, confirmed and probable cases and deaths are included in the Total Cases and Total Deaths columns, when available. Not all jurisdictions report probable cases and deaths to CDC.* Confirmed and probable case definition criteria are described here:

Council of State and Territorial Epidemiologists (ymaws.com).

Deaths CDC reports death data on other sections of the website: CDC COVID Data Tracker: Home, CDC COVID Data Tracker: Cases, Deaths, and Testing, and NCHS Provisional Death Counts. Information presented on the COVID Data Tracker pages is based on the same source (to
a
E-911 Sites
hub.arcgis.com
rigis.org
Updated May 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Environmental Data Center (2024). E-911 Sites [Dataset]. https://hub.arcgis.com/datasets/0a6ce9b22ac44ba6921f84172871ff11
Explore at:
Dataset updated
May 9, 2024
Dataset authored and provided by
Environmental Data Center
Area covered

Description
This hosted feature layer has been published in RI State Plane Feet NAD 83.Representative locations of structures and sites throughout Rhode Island. These data include addressed and unaddressed locations as well as occupied and unoccupied structures. These data were originally designed and developed for Rhode Island E 9-1-1 Uniform Emergency Telephone System (RI E 9-1-1) purposes. This dataset continues to be maintained to provide an accurate spatial reference for RI E 9-1-1 telecommunicators. Portions of this dataset were collected as early as 2001. Inaccuracies do exist in these data and are therefore under constant revision. Any discrepancies, inaccuracies or inconsistencies recognized in these data should be reported to the pertinent municipality who should alert RI E-911. Users are also encouraged to email ri911gis@akassociates911.com with any suggested updates for this actively maintained dataset.

Number of global social network users 2017-2028

statista.com
grusthub.com
+4more

Facebook

Twitter

Click to copy link

Link copied

Cite

Stacy Jo Dixon, Number of global social network users 2017-2028 [Dataset]. https://www.statista.com/topics/1164/social-networks/

Explore at:

Dataset provided by

Statistahttp://statista.com/

Authors

Stacy Jo Dixon

Description

How many people use social media?

              Social media usage is one of the most popular online activities. In 2024, over five billion people were using social media worldwide, a number projected to increase to over six billion in 2028.

              Who uses social media?
              Social networking is one of the most popular digital activities worldwide and it is no surprise that social networking penetration across all regions is constantly increasing. As of January 2023, the global social media usage rate stood at 59 percent. This figure is anticipated to grow as lesser developed digital markets catch up with other regions
              when it comes to infrastructure development and the availability of cheap mobile devices. In fact, most of social media’s global growth is driven by the increasing usage of mobile devices. Mobile-first market Eastern Asia topped the global ranking of mobile social networking penetration, followed by established digital powerhouses such as the Americas and Northern Europe.

              How much time do people spend on social media?
              Social media is an integral part of daily internet usage. On average, internet users spend 151 minutes per day on social media and messaging apps, an increase of 40 minutes since 2015. On average, internet users in Latin America had the highest average time spent per day on social media.

              What are the most popular social media platforms?
              Market leader Facebook was the first social network to surpass one billion registered accounts and currently boasts approximately 2.9 billion monthly active users, making it the most popular social network worldwide. In June 2023, the top social media apps in the Apple App Store included mobile messaging apps WhatsApp and Telegram Messenger, as well as the ever-popular app version of Facebook.

Horseshoe crab distribution records from East America and Asia
gbif.org
erddap.eurobis.org
+2more
Updated Sep 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stine Vestbo; Matthias Obst; Francisco Quevedo Fernandez; Itsara Intanai; Peter Funch; Stine Vestbo; Matthias Obst; Francisco Quevedo Fernandez; Itsara Intanai; Peter Funch (2025). Horseshoe crab distribution records from East America and Asia [Dataset]. http://doi.org/10.14284/293
Explore at:
Unique identifier
https://doi.org/10.14284/293
Dataset updated
Sep 17, 2025
Dataset provided by
Global Biodiversity Information Facilityhttps://www.gbif.org/
Flanders Marine Institute
Authors
Stine Vestbo; Matthias Obst; Francisco Quevedo Fernandez; Itsara Intanai; Peter Funch; Stine Vestbo; Matthias Obst; Francisco Quevedo Fernandez; Itsara Intanai; Peter Funch
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Dec 31, 1970 - Jun 18, 2011
Area covered

Description
We assembled occurrence records (presence-only) for all four horseshoe crab species in Asia and Eastern America from our own observations, collaborators, scientific networks as well as through publishing a scratchpad site at http://horseshoecrabs.myspecies.info/. For many species, numerous distribution records exist in the literature, and we manually geo-referenced additional occurrence data from these sources.
a
E-911 Sites (Fire)
hub.arcgis.com
Updated Feb 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Environmental Data Center (2025). E-911 Sites (Fire) [Dataset]. https://hub.arcgis.com/datasets/c1b87dd37faf4628a491416cad72951d
Explore at:
Dataset updated
Feb 4, 2025
Dataset authored and provided by
Environmental Data Center
Area covered

Description
This hosted view feature layer has been published in RI State Plane Feet NAD 83.For complete metadata record - https://www.rigis.org/datasets/edc::e-911-sites/aboutRepresentative locations of structures and sites coded as P7 Site type throughout Rhode Island. These data include addressed and unaddressed locations as well as occupied and unoccupied structures. These data were originally designed and developed for Rhode Island E 9-1-1 Uniform Emergency Telephone System (RI E 9-1-1) purposes. This dataset continues to be maintained to provide an accurate spatial reference for RI E 9-1-1 telecommunicators. Portions of this dataset were collected as early as 2001. Inaccuracies do exist in these data and are therefore under constant revision. Any discrepancies, inaccuracies or inconsistencies recognized in these data should be reported to the pertinent municipality who should alert RI E-911. Users are also encouraged to email ri911gis@akassociates911.com with any suggested updates for this actively maintained dataset.
a
E-911 Sites (Police)
hub.arcgis.com
Updated Feb 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Environmental Data Center (2025). E-911 Sites (Police) [Dataset]. https://hub.arcgis.com/datasets/1f04a478bdf2466d9d0872701acae2d9
Explore at:
Dataset updated
Feb 4, 2025
Dataset authored and provided by
Environmental Data Center
Area covered

Description
This hosted view feature layer has been published in RI State Plane Feet NAD 83.For complete metadata record - https://www.rigis.org/datasets/edc::e-911-sites/aboutRepresentative locations of structures and sites coded as P6 Site type throughout Rhode Island. These data include addressed and unaddressed locations as well as occupied and unoccupied structures. These data were originally designed and developed for Rhode Island E 9-1-1 Uniform Emergency Telephone System (RI E 9-1-1) purposes. This dataset continues to be maintained to provide an accurate spatial reference for RI E 9-1-1 telecommunicators. Portions of this dataset were collected as early as 2001. Inaccuracies do exist in these data and are therefore under constant revision. Any discrepancies, inaccuracies or inconsistencies recognized in these data should be reported to the pertinent municipality who should alert RI E-911. Users are also encouraged to email ri911gis@akassociates911.com with any suggested updates for this actively maintained dataset.
w
Data from: Onshore Industrial Wind Turbine Locations for the United States...
data.wu.ac.at
data.usgs.gov
+3more
6s6rb +1
Updated Jun 8, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of the Interior (2018). Onshore Industrial Wind Turbine Locations for the United States up to March 2014 [Dataset]. https://data.wu.ac.at/schema/data_gov/NWVjNTI3OTMtYmVmNi00MjM4LWE5NjMtMjM2MWMzNjY4MDcx
Explore at:
vector digital data set (point), 6s6rbAvailable download formats
Dataset updated
Jun 8, 2018
Dataset provided by
Department of the Interior
Area covered
99507628f00e3b2ac009a403553dcabd49fdd763
Description
This data set provides industrial-scale onshore wind turbine locations, corresponding facility information, and turbine technical specifications, in the United States to March 2014. The database has nearly 49,000 wind turbine records that have been collected, digitized, locationally verified, and internally quality assured and quality controlled. Turbines from the Federal Aviation Administration Digital Obstacle File, product date March 2, 2014, were used as the primary source of turbine data points. Verification of the position of turbines was done by visual interpretation using high-resolution aerial imagery in ESRI ArcGIS Desktop. Turbines without Federal Aviation Administration Obstacle Repository System (FAA ORS) numbers were visually identified and supplemental points were added to the collection. A locational error of plus or minus 10 meters for turbine positions was estimated. Wind farm facility names were identified from publicly available facility data sets. Facility names were then used in a web search of additional industry publications and press releases to attribute additional turbine information (such as manufacturer, model, and technical specifications of wind turbines). Wind farm facility location data from various wind and energy industry sources were used to search for and digitize turbines not in existing databases. Technical specifications assigned to were based on the make and model as described in literature, in the Federal Aviation Administration Digital Obstacle File, and information from the turbine manufacturers' websites. Some facility and turbine information did not exist or was difficult to obtain. Thus, uncertainty may be present. That uncertainty was rated and a confidence was recorded for both location and attribution data quality.
e
Trusted Research Environments: Analysis of Characteristics and Data...
b2find.eudat.eu
Updated Mar 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Trusted Research Environments: Analysis of Characteristics and Data Availability - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/9cc5d70e-5668-54dc-ad59-0d7d3f6ceadb
Explore at:
Dataset updated
Mar 28, 2024
Description
Trusted Research Environments (TREs) enable analysis of sensitive data under strict security assertions that protect the data with technical organizational and legal measures from (accidentally) being leaked outside the facility. While many TREs exist in Europe, little information is available publicly on the architecture and descriptions of their building blocks & their slight technical variations. To shine light on these problems, we give an overview of existing, publicly described TREs and a bibliography linking to the system description. We further analyze their technical characteristics, especially in their commonalities & variations and provide insight on their data type characteristics and availability. Our literature study shows that 47 TREs worldwide provide access to sensitive data of which two-thirds provide data themselves, predominantly via secure remote access. Statistical offices make available a majority of available sensitive data records included in this study. Methodology We performed a literature study covering 47 TREs worldwide using scholarly databases (Scopus, Web of Science, IEEE Xplore, Science Direct), a computer science library (dblp.org), Google and grey literature focusing on retrieving the following source material: Peer-reviewed articles where available, TRE websites, TRE metadata catalogs. The goal for this literature study is to discover existing TREs, analyze their characteristics and data availability to give an overview on available infrastructure for sensitive data research as many European initiatives have been emerging in recent months. Technical details This dataset consists of five comma-separated values (.csv) files describing our inventory: countries.csv: Table of countries with columns id (number), name (text) and code (text, in ISO 3166-A3 encoding, optional) tres.csv: Table of TREs with columns id (number), name (text), countryid (number, refering to column id of table countries), structureddata (bool, optional), datalevel (one of [1=de-identified, 2=pseudonomized, 3=anonymized], optional), outputcontrol (bool, optional), inceptionyear (date, optional), records (number, optional), datatype (one of [1=claims, 2=linked records]), optional), statistics_office (bool), size (number, optional), source (text, optional), comment (text, optional) access.csv: Table of access modes of TREs with columns id (number), suf (bool, optional), physical_visit (bool, optional), external_physical_visit (bool, optional), remote_visit (bool, optional) inclusion.csv: Table of included TREs into the literature study with columns id (number), included (bool), exclusion reason (one of [peer review, environment, duplicate], optional), comment (text, optional) major_fields.csv: Table of data categorization into the major research fields with columns id (number), life_sciences (bool, optional), physical_sciences (bool, optional), arts_and_humanities (bool, optional), social_sciences (bool, optional). Additionally, a MariaDB (10.5 or higher) schema definition .sql file is needed, properly modelling the schema for databases: schema.sql: Schema definition file to create the tables and views used in the analysis. The analysis was done through Jupyter Notebook which can be found in our source code repository: https://gitlab.tuwien.ac.at/martin.weise/tres/-/blob/master/analysis.ipynb
Google Landmarks Dataset v2
github.com
opendatalab.com
Updated Sep 27, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google (2019). Google Landmarks Dataset v2 [Dataset]. https://github.com/cvdfoundation/google-landmark
Explore at:
Dataset updated
Sep 27, 2019
Dataset provided by
Googlehttp://google.com/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the second version of the Google Landmarks dataset (GLDv2), which contains images annotated with labels representing human-made and natural landmarks. The dataset can be used for landmark recognition and retrieval experiments. This version of the dataset contains approximately 5 million images, split into 3 sets of images: train, index and test. The dataset was presented in our CVPR'20 paper. In this repository, we present download links for all dataset files and relevant code for metric computation. This dataset was associated to two Kaggle challenges, on landmark recognition and landmark retrieval. Results were discussed as part of a CVPR'19 workshop. In this repository, we also provide scores for the top 10 teams in the challenges, based on the latest ground-truth version. Please visit the challenge and workshop webpages for more details on the data, tasks and technical solutions from top teams.
u
Behance Community Art Data
cseweb.ucsd.edu
json
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCSD CSE Research Project, Behance Community Art Data [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
Explore at:
jsonAvailable download formats
Dataset authored and provided by
UCSD CSE Research Project
Description
Likes and image data from the community art website Behance. This is a small, anonymized, version of a larger proprietary dataset.

Metadata includes

appreciates (likes)

timestamps

extracted image features

Basic Statistics:

Users: 63,497

Items: 178,788

Appreciates (likes): 1,000,000
NSW Department of Environment and Heritage Historic Water Quality Data
researchdata.edu.au
data.gov.au
+1more
Updated Mar 30, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bioregional Assessment Program (2016). NSW Department of Environment and Heritage Historic Water Quality Data [Dataset]. https://researchdata.edu.au/nsw-department-environment-quality-data/2993035
Explore at:
Dataset updated
Mar 30, 2016
Dataset provided by
Data.govhttps://data.gov/
Authors
Bioregional Assessment Program
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Area covered
New South Wales
Description
Abstract

This dataset and its metadata statement were supplied to the Bioregional Assessment Programme by a third party and are presented here as originally supplied.

Two files contain the preliminary site data and water quality data required by the Bureau of Meteorology (BoM) under the conditions of the Water Act 2008. Please understand that in order to achieve these preliminary files, there has been quite a deal of work over a short amount of time. This has been greatly assisted by (and in fact would not have been possible without) the BoMâ€™s financial assistance in terms of funding of Project NSW 6.1 - Remodelling, update and migration of the DECC water quality database. Note however, that due to the relatively short timeframe involved, a number of caveats still need to be placed on these preliminary files until a full QA/QC and data integrity and consistency check has been completed on the database. This is currently being implemented and it is recommended that additional contact is made with DECC prior to the release or use of this data. DECC will be continuing to refine and QA/QC this database and will inform BoM if this affects any data in these preliminary data files.

Some of this water data has been collected under an agreement with the Murray Darling Basin Commission (now the Murray Darling Authority). Part of this agreement deals with confidentiality regarding the identification of sites on individual landholder properties. In particular: â€œBy providing locations at this (valley name or zone name only) accuracy there is reduced risk that future sampling at that location is confounded by intentional activities at the site. Types of impacts that might be envisaged include the unauthorised collection of rare or endangered fish or macroinvertebrate species at identifiable SRA sample sites, the undesired identification of SRA sites that exist on private property, comparisons of data collected at SRA sites to deduce some causal effect due to the landholding on which those sites exist and so onâ€ . Any supply of/access to/reporting of this data should take such confidentialities into account.

With the Site data, latitude and longitudes or eastings and northings are still be checked/added for those sites without such data. An updated Site file will be forwarded to BoM once it is finalised.

Lastly, this data is supplied in good faith, exercising all due care and attention. No representation is made about the accuracy, completeness or suitability of the information for any particular purpose. DECC does not accept liability for any damage which may occur to any person or organization taking action or not on the basis of these data.

Dataset History

This data was provided to the Bureau of Meteorology under the water regulations from the NSW Department of Environment & Heritage

Dataset Citation

NSW - Department of Environment and Heritage (2009) NSW Department of Environment and Heritage Historic Water Quality Data. Bioregional Assessment Source Dataset. Viewed 07 April 2016, http://data.bioregionalassessments.gov.au/dataset/4c5f7318-2567-4614-aa35-46aa0eb045f2.
f
Editorial Bias in Crowd-Sourced Political Information
plos.figshare.com
datasetcatalog.nlm.nih.gov
docx
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joshua L. Kalla; Peter M. Aronow (2023). Editorial Bias in Crowd-Sourced Political Information [Dataset]. http://doi.org/10.1371/journal.pone.0136327
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0136327
Dataset updated
Jun 3, 2023
Dataset provided by
PLOS ONE
Authors
Joshua L. Kalla; Peter M. Aronow
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Internet has dramatically expanded citizens’ access to and ability to engage with political information. On many websites, any user can contribute and edit “crowd-sourced” information about important political figures. One of the most prominent examples of crowd-sourced information on the Internet is Wikipedia, a free and open encyclopedia created and edited entirely by users, and one of the world’s most accessed websites. While previous studies of crowd-sourced information platforms have found them to be accurate, few have considered biases in what kinds of information are included. We report the results of four randomized field experiments that sought to explore what biases exist in the political articles of this collaborative website. By randomly assigning factually true but either positive or negative and cited or uncited information to the Wikipedia pages of U.S. senators, we uncover substantial evidence of an editorial bias toward positivity on Wikipedia: Negative facts are 36% more likely to be removed by Wikipedia editors than positive facts within 12 hours and 29% more likely within 3 days. Although citations substantially increase an edit’s survival time, the editorial bias toward positivity is not eliminated by inclusion of a citation. We replicate this study on the Wikipedia pages of deceased as well as recently retired but living senators and find no evidence of an editorial bias in either. Our results demonstrate that crowd-sourced information is subject to an editorial bias that favors the politically active.

Facebook

Twitter

Click to copy link

Link copied

Cite

Statista (2025). Amount of data created, consumed, and stored 2010-2023, with forecasts to 2028 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/

Amount of data created, consumed, and stored 2010-2023, with forecasts to 2028

Explore at:

Dataset updated

Jun 30, 2025

Dataset authored and provided by

Statistahttp://statista.com/

Time period covered

May 2024

Area covered

Worldwide

Description

The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching *** zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than *** zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just * percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of **** percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached *** zettabytes.

Clear search

Close search

Google apps

Main menu

Amount of data created, consumed, and stored 2010-2023, with forecasts to...

Ci Technology DataSet

The Test-Case Dataset

Context

Content

Acknowledgements

Inspiration

Facebook users worldwide 2017-2027

United States Wind Turbine Database

Alexa, International Top 100 Websites, Global, 10.12.2007

U.S. Wind Turbine Database, Minnesota and National

Records of Insecta species in Ukraine according to some literature and...

Weekly United States COVID-19 Cases and Deaths by State - ARCHIVED

E-911 Sites

Number of global social network users 2017-2028

Horseshoe crab distribution records from East America and Asia

E-911 Sites (Fire)

E-911 Sites (Police)

Data from: Onshore Industrial Wind Turbine Locations for the United States...

Trusted Research Environments: Analysis of Characteristics and Data...

Google Landmarks Dataset v2

Behance Community Art Data

NSW Department of Environment and Heritage Historic Water Quality Data

Abstract

Dataset History

Dataset Citation

Editorial Bias in Crowd-Sourced Political Information

Amount of data created, consumed, and stored 2010-2023, with forecasts to 2028