8 datasets found

n
Repository Analytics and Metrics Portal (RAMP) 2018 data
data.niaid.nih.gov
dataone.org
+2more
zip
Updated Jul 27, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonathan Wheeler; Kenning Arlitsch (2021). Repository Analytics and Metrics Portal (RAMP) 2018 data [Dataset]. http://doi.org/10.5061/dryad.ffbg79cvp
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.ffbg79cvp
Dataset updated
Jul 27, 2021
Dataset provided by
University of New Mexico
Montana State University
Authors
Jonathan Wheeler; Kenning Arlitsch
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
The Repository Analytics and Metrics Portal (RAMP) is a web service that aggregates use and performance use data of institutional repositories. The data are a subset of data from RAMP, the Repository Analytics and Metrics Portal (http://rampanalytics.org), consisting of data from all participating repositories for the calendar year 2018. For a description of the data collection, processing, and output methods, please see the "methods" section below. Note that the RAMP data model changed in August, 2018 and two sets of documentation are provided to describe data collection and processing before and after the change.

Methods

RAMP Data Documentation – January 1, 2017 through August 18, 2018

Data Collection

RAMP data were downloaded for participating IR from Google Search Console (GSC) via the Search Console API. The data consist of aggregated information about IR pages which appeared in search result pages (SERP) within Google properties (including web search and Google Scholar).

Data from January 1, 2017 through August 18, 2018 were downloaded in one dataset per participating IR. The following fields were downloaded for each URL, with one row per URL:

url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property. impressions: The number of times the URL appears within the SERP. clicks: The number of clicks on a URL which took users to a page outside of the SERP. clickThrough: Calculated as the number of clicks divided by the number of impressions. position: The position of the URL within the SERP. country: The country from which the corresponding search originated. device: The device used for the search. date: The date of the search.

Following data processing describe below, on ingest into RAMP an additional field, citableContent, is added to the page level data.

Note that no personally identifiable information is downloaded by RAMP. Google does not make such information available.

More information about click-through rates, impressions, and position is available from Google's Search Console API documentation: https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query and https://support.google.com/webmasters/answer/7042828?hl=en

Data Processing

Upon download from GSC, data are processed to identify URLs that point to citable content. Citable content is defined within RAMP as any URL which points to any type of non-HTML content file (PDF, CSV, etc.). As part of the daily download of statistics from Google Search Console (GSC), URLs are analyzed to determine whether they point to HTML pages or actual content files. URLs that point to content files are flagged as "citable content." In addition to the fields downloaded from GSC described above, following this brief analysis one more field, citableContent, is added to the data which records whether each URL in the GSC data points to citable content. Possible values for the citableContent field are "Yes" and "No."

Processed data are then saved in a series of Elasticsearch indices. From January 1, 2017, through August 18, 2018, RAMP stored data in one index per participating IR.

About Citable Content Downloads

Data visualizations and aggregations in RAMP dashboards present information about citable content downloads, or CCD. As a measure of use of institutional repository content, CCD represent click activity on IR content that may correspond to research use.

CCD information is summary data calculated on the fly within the RAMP web application. As noted above, data provided by GSC include whether and how many times a URL was clicked by users. Within RAMP, a "click" is counted as a potential download, so a CCD is calculated as the sum of clicks on pages/URLs that are determined to point to citable content (as defined above).

For any specified date range, the steps to calculate CCD are:

Filter data to only include rows where "citableContent" is set to "Yes." Sum the value of the "clicks" field on these rows.

Output to CSV

Published RAMP data are exported from the production Elasticsearch instance and converted to CSV format. The CSV data consist of one "row" for each page or URL from a specific IR which appeared in search result pages (SERP) within Google properties as described above.

The data in these CSV files include the following fields:

url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property. impressions: The number of times the URL appears within the SERP. clicks: The number of clicks on a URL which took users to a page outside of the SERP. clickThrough: Calculated as the number of clicks divided by the number of impressions. position: The position of the URL within the SERP. country: The country from which the corresponding search originated. device: The device used for the search. date: The date of the search. citableContent: Whether or not the URL points to a content file (ending with pdf, csv, etc.) rather than HTML wrapper pages. Possible values are Yes or No. index: The Elasticsearch index corresponding to page click data for a single IR. repository_id: This is a human readable alias for the index and identifies the participating repository corresponding to each row. As RAMP has undergone platform and version migrations over time, index names as defined for the index field have not remained consistent. That is, a single participating repository may have multiple corresponding Elasticsearch index names over time. The repository_id is a canonical identifier that has been added to the data to provide an identifier that can be used to reference a single participating repository across all datasets. Filtering and aggregation for individual repositories or groups of repositories should be done using this field.

Filenames for files containing these data follow the format 2018-01_RAMP_all.csv. Using this example, the file 2018-01_RAMP_all.csv contains all data for all RAMP participating IR for the month of January, 2018.

Data Collection from August 19, 2018 Onward

RAMP data are downloaded for participating IR from Google Search Console (GSC) via the Search Console API. The data consist of aggregated information about IR pages which appeared in search result pages (SERP) within Google properties (including web search and Google Scholar).

Data are downloaded in two sets per participating IR. The first set includes page level statistics about URLs pointing to IR pages and content files. The following fields are downloaded for each URL, with one row per URL:

url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property. impressions: The number of times the URL appears within the SERP. clicks: The number of clicks on a URL which took users to a page outside of the SERP. clickThrough: Calculated as the number of clicks divided by the number of impressions. position: The position of the URL within the SERP. date: The date of the search.

Following data processing describe below, on ingest into RAMP a additional field, citableContent, is added to the page level data.

The second set includes similar information, but instead of being aggregated at the page level, the data are grouped based on the country from which the user submitted the corresponding search, and the type of device used. The following fields are downloaded for combination of country and device, with one row per country/device combination:

country: The country from which the corresponding search originated. device: The device used for the search. impressions: The number of times the URL appears within the SERP. clicks: The number of clicks on a URL which took users to a page outside of the SERP. clickThrough: Calculated as the number of clicks divided by the number of impressions. position: The position of the URL within the SERP. date: The date of the search.

Note that no personally identifiable information is downloaded by RAMP. Google does not make such information available.

More information about click-through rates, impressions, and position is available from Google's Search Console API documentation: https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query and https://support.google.com/webmasters/answer/7042828?hl=en

Data Processing

Upon download from GSC, the page level data described above are processed to identify URLs that point to citable content. Citable content is defined within RAMP as any URL which points to any type of non-HTML content file (PDF, CSV, etc.). As part of the daily download of page level statistics from Google Search Console (GSC), URLs are analyzed to determine whether they point to HTML pages or actual content files. URLs that point to content files are flagged as "citable content." In addition to the fields downloaded from GSC described above, following this brief analysis one more field, citableContent, is added to the page level data which records whether each page/URL in the GSC data points to citable content. Possible values for the citableContent field are "Yes" and "No."

The data aggregated by the search country of origin and device type do not include URLs. No additional processing is done on these data. Harvested data are passed directly into Elasticsearch.

Processed data are then saved in a series of Elasticsearch indices. Currently, RAMP stores data in two indices per participating IR. One index includes the page level data, the second index includes the country of origin and device type data.

About Citable Content Downloads

Data visualizations and aggregations in RAMP dashboards present information about citable content downloads, or CCD. As a measure of use of institutional repository
n
Repository Analytics and Metrics Portal (RAMP) 2017 data
data.niaid.nih.gov
datadryad.org
+1more
zip
Updated Jul 27, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonathan Wheeler; Kenning Arlitsch (2021). Repository Analytics and Metrics Portal (RAMP) 2017 data [Dataset]. http://doi.org/10.5061/dryad.r7sqv9scf
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.r7sqv9scf
Dataset updated
Jul 27, 2021
Dataset provided by
Montana State University
University of New Mexico
Authors
Jonathan Wheeler; Kenning Arlitsch
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
The Repository Analytics and Metrics Portal (RAMP) is a web service that aggregates use and performance use data of institutional repositories. The data are a subset of data from RAMP, the Repository Analytics and Metrics Portal (http://rampanalytics.org), consisting of data from all participating repositories for the calendar year 2017. For a description of the data collection, processing, and output methods, please see the "methods" section below.

Methods RAMP Data Documentation – January 1, 2017 through August 18, 2018

Data Collection

RAMP data are downloaded for participating IR from Google Search Console (GSC) via the Search Console API. The data consist of aggregated information about IR pages which appeared in search result pages (SERP) within Google properties (including web search and Google Scholar).

Data from January 1, 2017 through August 18, 2018 were downloaded in one dataset per participating IR. The following fields were downloaded for each URL, with one row per URL:

url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property. impressions: The number of times the URL appears within the SERP. clicks: The number of clicks on a URL which took users to a page outside of the SERP. clickThrough: Calculated as the number of clicks divided by the number of impressions. position: The position of the URL within the SERP. country: The country from which the corresponding search originated. device: The device used for the search. date: The date of the search.

Following data processing describe below, on ingest into RAMP an additional field, citableContent, is added to the page level data.

Note that no personally identifiable information is downloaded by RAMP. Google does not make such information available.

More information about click-through rates, impressions, and position is available from Google's Search Console API documentation: https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query and https://support.google.com/webmasters/answer/7042828?hl=en

Data Processing

Upon download from GSC, data are processed to identify URLs that point to citable content. Citable content is defined within RAMP as any URL which points to any type of non-HTML content file (PDF, CSV, etc.). As part of the daily download of statistics from Google Search Console (GSC), URLs are analyzed to determine whether they point to HTML pages or actual content files. URLs that point to content files are flagged as "citable content." In addition to the fields downloaded from GSC described above, following this brief analysis one more field, citableContent, is added to the data which records whether each URL in the GSC data points to citable content. Possible values for the citableContent field are "Yes" and "No."

Processed data are then saved in a series of Elasticsearch indices. From January 1, 2017, through August 18, 2018, RAMP stored data in one index per participating IR.

About Citable Content Downloads

Data visualizations and aggregations in RAMP dashboards present information about citable content downloads, or CCD. As a measure of use of institutional repository content, CCD represent click activity on IR content that may correspond to research use.

CCD information is summary data calculated on the fly within the RAMP web application. As noted above, data provided by GSC include whether and how many times a URL was clicked by users. Within RAMP, a "click" is counted as a potential download, so a CCD is calculated as the sum of clicks on pages/URLs that are determined to point to citable content (as defined above).

For any specified date range, the steps to calculate CCD are:

Filter data to only include rows where "citableContent" is set to "Yes." Sum the value of the "clicks" field on these rows.

Output to CSV

Published RAMP data are exported from the production Elasticsearch instance and converted to CSV format. The CSV data consist of one "row" for each page or URL from a specific IR which appeared in search result pages (SERP) within Google properties as described above.

The data in these CSV files include the following fields:

url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property. impressions: The number of times the URL appears within the SERP. clicks: The number of clicks on a URL which took users to a page outside of the SERP. clickThrough: Calculated as the number of clicks divided by the number of impressions. position: The position of the URL within the SERP. country: The country from which the corresponding search originated. device: The device used for the search. date: The date of the search. citableContent: Whether or not the URL points to a content file (ending with pdf, csv, etc.) rather than HTML wrapper pages. Possible values are Yes or No. index: The Elasticsearch index corresponding to page click data for a single IR. repository_id: This is a human readable alias for the index and identifies the participating repository corresponding to each row. As RAMP has undergone platform and version migrations over time, index names as defined for the index field have not remained consistent. That is, a single participating repository may have multiple corresponding Elasticsearch index names over time. The repository_id is a canonical identifier that has been added to the data to provide an identifier that can be used to reference a single participating repository across all datasets. Filtering and aggregation for individual repositories or groups of repositories should be done using this field.

Filenames for files containing these data follow the format 2017-01_RAMP_all.csv. Using this example, the file 2017-01_RAMP_all.csv contains all data for all RAMP participating IR for the month of January, 2017.

References

Google, Inc. (2021). Search Console APIs. Retrieved from https://developers.google.com/webmaster-tools/search-console-api-original.
o
League of Legends Match Data at Various Time Intervals
explore.openaire.eu
data.niaid.nih.gov
Updated Aug 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jailson Barros da Silva Junior; Claudio Campelo (2023). League of Legends Match Data at Various Time Intervals [Dataset]. http://doi.org/10.5281/zenodo.8303396
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.8303396
Dataset updated
Aug 31, 2023
Authors
Jailson Barros da Silva Junior; Claudio Campelo
Description
This dataset comprises comprehensive information from ranked matches played in the game League of Legends, spanning the time frame between January 12, 2023, and May 18, 2023. The matches cover a wide range of skill levels, specifically from the Iron tier to the Diamond tier. The dataset is structured based on time intervals, presenting game data at various percentages of elapsed game time, including 20%, 40%, 60%, 80%, and 100%. For each interval, detailed match statistics, player performance metrics, objective control, gold distribution, and other vital in-game information are provided. This collection of data not only offers insights into how matches evolve and strategies change over different phases of the game but also enables the exploration of player behavior and decision-making as matches progress. Researchers and analysts in the field of esports and game analytics will find this dataset valuable for studying trends, developing predictive models, and gaining a deeper understanding of the dynamics within ranked League of Legends matches across different skill tiers.
PRONTO heterogeneous benchmark dataset
zenodo.org
explore.openaire.eu
txt
Updated Aug 2, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anna Stief; Anna Stief; Ruomu Tan; Ruomu Tan; Yi Cao; James R. Ottewill; Yi Cao; James R. Ottewill (2024). PRONTO heterogeneous benchmark dataset [Dataset]. http://doi.org/10.5281/zenodo.1341583
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1341583
Dataset updated
Aug 2, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Anna Stief; Anna Stief; Ruomu Tan; Ruomu Tan; Yi Cao; James R. Ottewill; Yi Cao; James R. Ottewill
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The PRONTO heterogeneous benchmark dataset is based on an industrial-scale multiphase flow facility. It includes data from heterogeneous sources, including process measurements, alarm records, high frequency ultrasonic flow and pressure measurements, an operation log and video recordings. The study collected data from various operational conditions with and without induced faults to generate a multi-rate, multi-modal dataset. The dataset is suitable for developing and validating algorithms for fault detection and diagnosis (FDD) and data fusion.

When using the dataset please cite the following publication:

A. Stief, R. Tan, Y. Cao, J. R. Ottewill, N. F. Thornhill, J. Baranowski, A heterogeneous benchmark dataset for data analytics: Multiphase flow facility case study, Journal of Process Control, 79 (2019) 41–55, DOI: https://doi.org/10.1016/j.jprocont.2019.04.009

The dataset has been used in the following works:

A. Stief, R. Tan, Y. Cao, J. R. Ottewill. Analytics of heterogeneous process data: Multiphase flow facility case study. IFAC-PapersOnLine, 51(18):363–368, 2018. DOI: https://doi.org/10.1016/j.ifacol.2018.09.327

A. Stief, J. R. Ottewill, R. Tan, Y. Cao. Process and alarm data integration under a two-stage Bayesian framework for fault diagnostics. IFAC-PapersOnLine, 51(24):1220–1226, 2018. DOI: https://doi.org/10.1016/j.ifacol.2018.09.696

A. Stief, J. R. Ottewill, J. Baranowski. Investigation of the diagnostic properties of sensors and features in a multiphase flow facility case study. in: 12^th IFAC Symposium on Dynamics and Control of Process Systems (in press), 2019

M. Lucke, X. Mei, A. Stief, M. Chioua, N. F. Thornhill. Variable selection for fault detection and identification based on mutual information of multi-valued alarm series, in: 12^th IFAC Symposium on Dynamics and Control of Process Systems (in press), 2019

R. Tan, T. Cong, N. F. Thornhill, J. R. Ottewill, J. Baranowski. Statistical monitoring of processes with multiple operating modes, in: 12^th IFAC Symposium on Dynamics and Control of Process Systems (in press), 2019.
League of Legends Match Data at Various Time Intervals
zenodo.org
csv
Updated Aug 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jailson Barros da Silva Junior; Jailson Barros da Silva Junior; Claudio Campelo; Claudio Campelo (2023). League of Legends Match Data at Various Time Intervals [Dataset]. http://doi.org/10.5281/zenodo.8303397
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8303397
Dataset updated
Aug 31, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jailson Barros da Silva Junior; Jailson Barros da Silva Junior; Claudio Campelo; Claudio Campelo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset comprises comprehensive information from ranked matches played in the game League of Legends, spanning the time frame between January 12, 2023, and May 18, 2023. The matches cover a wide range of skill levels, specifically from the Iron tier to the Diamond tier.

The dataset is structured based on time intervals, presenting game data at various percentages of elapsed game time, including 20%, 40%, 60%, 80%, and 100%. For each interval, detailed match statistics, player performance metrics, objective control, gold distribution, and other vital in-game information are provided.

This collection of data not only offers insights into how matches evolve and strategies change over different phases of the game but also enables the exploration of player behavior and decision-making as matches progress. Researchers and analysts in the field of esports and game analytics will find this dataset valuable for studying trends, developing predictive models, and gaining a deeper understanding of the dynamics within ranked League of Legends matches across different skill tiers.

Oil and Gas Analytics Market Report | Global Forecast From 2025 To 2033

dataintelo.com

csv, pdf, pptx

Updated Sep 18, 2023

Facebook

Twitter

Click to copy link

Link copied

Cite

Dataintelo (2023). Oil and Gas Analytics Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/oil-and-gas-analytics-market

Explore at:

pptx, csv, pdfAvailable download formats

Dataset updated

Sep 18, 2023

Dataset authored and provided by

Dataintelo

License

https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

Time period covered

2024 - 2032

Area covered

Global

Description

The global oil and gas analytics market is anticipated to expand at a CAGR of 18% during the forecast period, 2020 – 2026.

The oil and gas industry includes the global processes of exploration, extraction, refining, and transporting and marketing of petroleum products. The oil and gas industries are dominated by a few large firms therefore it is set as operating in an oligopoly market.
The global oil demand is on an exponential rise due to rapid globalization and industrial growth. Big data and analytics are assisting companies in analysing large quantities of structured and unstructured data from different sources and creating real-time insights. The oil companies face organizational data challenges like poor data quality, data integration, data irrelevancies, data ownership and limited visibility. The new-age big data analytics solutions have overcome the fragmented framework into unified data architecture to address the organizational data challenges.

Market Trends, Drivers, Restraints, and Opportunities:

Adaptation of digital technologies to enhance their productivity and reduce costs are some of the key trends driving the growth of the oil and gas analytics market and thus fueling the growth of the market.
Increase in demand from refineries for petroleum products is expected to drive the oil and gas analytics market during the forecast period.
Implementation of strict operational safety norms by the government is expected to boost market growth.
Various factors like cheap sensors, widening connectivity and computing power are boosting the data collection by Oil and Gas companies. This, in turn, is driving the growth of the market.
Rising global demand for fuel, increasing competition, financial capital and public scrutiny and regulations are the market challenges.
Energy-intensive operations and environmental impact of carbon emission, is a key restraint of the market.
Implementation of cloud-based, and integration analytics services to find real-time information, is a key market opportunity for the industry.

Scope of the Report

The report on the global oil and gas analytics market includes an assessment of the market, trends, segments, and regional markets. Overview and dynamics have also been included in the report.

Attributes	Details
Base Year	2019
Historic Data	2018–2019
Forecast Period	2020–2026
Regional Scope	Asia Pacific, Europe, North America, Middle East & Africa, and Latin America
Report Coverage	Company Share, Market Analysis and Size, Competitive Landscape, Growth Factors, and Trends, and Revenue Forecast

o
Gaze-Aware Visualization Design Worksheet
explore.openaire.eu
Updated Nov 10, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Radu Jianu; Nelson Silva; Nils Rodrigues; Tanja Blascheck; Tobias Schreck; Daniel Weiskopf (2021). Gaze-Aware Visualization Design Worksheet [Dataset]. http://doi.org/10.5281/zenodo.5665611
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.5665611
Dataset updated
Nov 10, 2021
Authors
Radu Jianu; Nelson Silva; Nils Rodrigues; Tanja Blascheck; Tobias Schreck; Daniel Weiskopf
Description
The resource is a practical worksheet that can guide the integration of eye-tracking capabilities into visualization or visual analytic systems by helping identify opportunities, challenges, and benefits of doing so. The resource also includes guidance for its use and three concrete examples. Importantly, this resource is meant to be used in conjunction with the design framework and references detailed in section 4 of: ��Gaze-Aware Visualization: Design Considerations and Research Agenda�� by R. Jianu, N. Silva, N. Rodrigues, T. Blascheck, T. Schreck, and D. Weiskopf (in Transactions on Visualization and Computer Graphics). The worksheet encourages designers who wish to integrate eye-tracking into visualization or visual analytics systems to carefully consider 18 fundamental facets that can inform the integration process and whether it is likely to be valuable. Broadly, these relate to: M1-M3: Measurable data afforded by eye trackers (and other modalities and context data that could be used together with such data) I1-I6: Inferences that can be made from measured data about users�� interests, tasks, intent, and analysis process S1-S7: Opportunities to use such inferences to support visual search, interaction, exploration, analysis, recall, collaboration, and onboarding B1-B9: Limitations to beware that arise from eye-tracking technology and the sometimes inscrutable ways in which human perception and cognition work, and which may constrain support possibilities. To apply the worksheet to inform the design of a gaze-aware visualization or visual analytic system one would: Progress through its sections and consider the facets they contain step-by-step. For each facet: Refer to the academic paper mentioned above (in particular section 4) for a more detailed discussion about the facet and for supporting references that provide further depth, inspiration, and concrete examples Consider carefully how these details apply to the specific visualization under analysis and its context of use. Consider both opportunities that eye-tracking affords (M, I, S) but also limitations and challenges (B) Use the specific questions under each facet (e.g., ��Are lighting conditions too variable for accurate gaze tracking?�� ) to further guide the thought process and capture rough yes/no assessments (if this is possible) Summarize a design rationale at the end of each worksheet section. This should capture design decisions or options and the motivation behind them, as informed by thought processes and insights facilitated by the design considerations in the section. The format and level of detail of such summaries are up to the designer (a few different options are shown in our examples). We exemplify this use of the worksheet by conjecturing how eye-tracking could be integrated in three visualizations systems (included in the resource). We chose three systems that span a broad range of domains and contexts to exemplify different challenges and opportunities. We also exemplify different ways of capturing design rationales �� more detailed/verbose or as bullet points.
Amount of data created, consumed, and stored 2010-2023, with forecasts to...
statista.com
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Amount of data created, consumed, and stored 2010-2023, with forecasts to 2028 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/
Explore at:
Dataset updated
Jun 30, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
May 2024
Area covered
Worldwide
Description
The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching *** zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than *** zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just * percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of **** percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached *** zettabytes.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Jonathan Wheeler; Kenning Arlitsch (2021). Repository Analytics and Metrics Portal (RAMP) 2018 data [Dataset]. http://doi.org/10.5061/dryad.ffbg79cvp

Repository Analytics and Metrics Portal (RAMP) 2018 data

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5061/dryad.ffbg79cvp

Dataset updated

Jul 27, 2021

Dataset provided by

University of New Mexico
Montana State University

Authors

Jonathan Wheeler; Kenning Arlitsch

License

https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

Description

The Repository Analytics and Metrics Portal (RAMP) is a web service that aggregates use and performance use data of institutional repositories. The data are a subset of data from RAMP, the Repository Analytics and Metrics Portal (http://rampanalytics.org), consisting of data from all participating repositories for the calendar year 2018. For a description of the data collection, processing, and output methods, please see the "methods" section below. Note that the RAMP data model changed in August, 2018 and two sets of documentation are provided to describe data collection and processing before and after the change.

Methods

RAMP Data Documentation – January 1, 2017 through August 18, 2018

Data Collection

RAMP data were downloaded for participating IR from Google Search Console (GSC) via the Search Console API. The data consist of aggregated information about IR pages which appeared in search result pages (SERP) within Google properties (including web search and Google Scholar).

Data from January 1, 2017 through August 18, 2018 were downloaded in one dataset per participating IR. The following fields were downloaded for each URL, with one row per URL:

url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property.
impressions: The number of times the URL appears within the SERP.
clicks: The number of clicks on a URL which took users to a page outside of the SERP.
clickThrough: Calculated as the number of clicks divided by the number of impressions.
position: The position of the URL within the SERP.
country: The country from which the corresponding search originated.
device: The device used for the search.
date: The date of the search.

Following data processing describe below, on ingest into RAMP an additional field, citableContent, is added to the page level data.

Note that no personally identifiable information is downloaded by RAMP. Google does not make such information available.

More information about click-through rates, impressions, and position is available from Google's Search Console API documentation: https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query and https://support.google.com/webmasters/answer/7042828?hl=en

Data Processing

Upon download from GSC, data are processed to identify URLs that point to citable content. Citable content is defined within RAMP as any URL which points to any type of non-HTML content file (PDF, CSV, etc.). As part of the daily download of statistics from Google Search Console (GSC), URLs are analyzed to determine whether they point to HTML pages or actual content files. URLs that point to content files are flagged as "citable content." In addition to the fields downloaded from GSC described above, following this brief analysis one more field, citableContent, is added to the data which records whether each URL in the GSC data points to citable content. Possible values for the citableContent field are "Yes" and "No."

Processed data are then saved in a series of Elasticsearch indices. From January 1, 2017, through August 18, 2018, RAMP stored data in one index per participating IR.

About Citable Content Downloads

Data visualizations and aggregations in RAMP dashboards present information about citable content downloads, or CCD. As a measure of use of institutional repository content, CCD represent click activity on IR content that may correspond to research use.

CCD information is summary data calculated on the fly within the RAMP web application. As noted above, data provided by GSC include whether and how many times a URL was clicked by users. Within RAMP, a "click" is counted as a potential download, so a CCD is calculated as the sum of clicks on pages/URLs that are determined to point to citable content (as defined above).

For any specified date range, the steps to calculate CCD are:

Filter data to only include rows where "citableContent" is set to "Yes."
Sum the value of the "clicks" field on these rows.

Output to CSV

Published RAMP data are exported from the production Elasticsearch instance and converted to CSV format. The CSV data consist of one "row" for each page or URL from a specific IR which appeared in search result pages (SERP) within Google properties as described above.

The data in these CSV files include the following fields:

url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property.
impressions: The number of times the URL appears within the SERP.
clicks: The number of clicks on a URL which took users to a page outside of the SERP.
clickThrough: Calculated as the number of clicks divided by the number of impressions.
position: The position of the URL within the SERP.
country: The country from which the corresponding search originated.
device: The device used for the search.
date: The date of the search.
citableContent: Whether or not the URL points to a content file (ending with pdf, csv, etc.) rather than HTML wrapper pages. Possible values are Yes or No.
index: The Elasticsearch index corresponding to page click data for a single IR.
repository_id: This is a human readable alias for the index and identifies the participating repository corresponding to each row. As RAMP has undergone platform and version migrations over time, index names as defined for the index field have not remained consistent. That is, a single participating repository may have multiple corresponding Elasticsearch index names over time. The repository_id is a canonical identifier that has been added to the data to provide an identifier that can be used to reference a single participating repository across all datasets. Filtering and aggregation for individual repositories or groups of repositories should be done using this field.

Filenames for files containing these data follow the format 2018-01_RAMP_all.csv. Using this example, the file 2018-01_RAMP_all.csv contains all data for all RAMP participating IR for the month of January, 2018.

Data Collection from August 19, 2018 Onward

RAMP data are downloaded for participating IR from Google Search Console (GSC) via the Search Console API. The data consist of aggregated information about IR pages which appeared in search result pages (SERP) within Google properties (including web search and Google Scholar).

Data are downloaded in two sets per participating IR. The first set includes page level statistics about URLs pointing to IR pages and content files. The following fields are downloaded for each URL, with one row per URL:

url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property.
impressions: The number of times the URL appears within the SERP.
clicks: The number of clicks on a URL which took users to a page outside of the SERP.
clickThrough: Calculated as the number of clicks divided by the number of impressions.
position: The position of the URL within the SERP.
date: The date of the search.

Following data processing describe below, on ingest into RAMP a additional field, citableContent, is added to the page level data.

The second set includes similar information, but instead of being aggregated at the page level, the data are grouped based on the country from which the user submitted the corresponding search, and the type of device used. The following fields are downloaded for combination of country and device, with one row per country/device combination:

country: The country from which the corresponding search originated.
device: The device used for the search.
impressions: The number of times the URL appears within the SERP.
clicks: The number of clicks on a URL which took users to a page outside of the SERP.
clickThrough: Calculated as the number of clicks divided by the number of impressions.
position: The position of the URL within the SERP.
date: The date of the search.

Note that no personally identifiable information is downloaded by RAMP. Google does not make such information available.

Data Processing

Upon download from GSC, the page level data described above are processed to identify URLs that point to citable content. Citable content is defined within RAMP as any URL which points to any type of non-HTML content file (PDF, CSV, etc.). As part of the daily download of page level statistics from Google Search Console (GSC), URLs are analyzed to determine whether they point to HTML pages or actual content files. URLs that point to content files are flagged as "citable content." In addition to the fields downloaded from GSC described above, following this brief analysis one more field, citableContent, is added to the page level data which records whether each page/URL in the GSC data points to citable content. Possible values for the citableContent field are "Yes" and "No."

The data aggregated by the search country of origin and device type do not include URLs. No additional processing is done on these data. Harvested data are passed directly into Elasticsearch.

Processed data are then saved in a series of Elasticsearch indices. Currently, RAMP stores data in two indices per participating IR. One index includes the page level data, the second index includes the country of origin and device type data.

About Citable Content Downloads

Data visualizations and aggregations in RAMP dashboards present information about citable content downloads, or CCD. As a measure of use of institutional repository

Clear search

Close search

Google apps

Main menu

Repository Analytics and Metrics Portal (RAMP) 2018 data

Repository Analytics and Metrics Portal (RAMP) 2017 data

League of Legends Match Data at Various Time Intervals

PRONTO heterogeneous benchmark dataset

League of Legends Match Data at Various Time Intervals

Oil and Gas Analytics Market Report | Global Forecast From 2025 To 2033

Market Trends, Drivers, Restraints, and Opportunities:

Scope of the Report

Gaze-Aware Visualization Design Worksheet

Amount of data created, consumed, and stored 2010-2023, with forecasts to...

Repository Analytics and Metrics Portal (RAMP) 2018 data