52 datasets found

Data from: Analysis of the Quantitative Impact of Social Networks General...
figshare.com
produccioncientifica.ucm.es
doc
Updated Oct 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Parra; Santiago Martínez Arias; Sergio Mena Muñoz (2022). Analysis of the Quantitative Impact of Social Networks General Data.doc [Dataset]. http://doi.org/10.6084/m9.figshare.21329421.v1
Explore at:
docAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21329421.v1
Dataset updated
Oct 14, 2022
Dataset provided by
Figsharehttp://figshare.com/
Authors
David Parra; Santiago Martínez Arias; Sergio Mena Muñoz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
General data recollected for the studio " Analysis of the Quantitative Impact of Social Networks on Web Traffic of Cybermedia in the 27 Countries of the European Union". Four research questions are posed: what percentage of the total web traffic generated by cybermedia in the European Union comes from social networks? Is said percentage higher or lower than that provided through direct traffic and through the use of search engines via SEO positioning? Which social networks have a greater impact? And is there any degree of relationship between the specific weight of social networks in the web traffic of a cybermedia and circumstances such as the average duration of the user's visit, the number of page views or the bounce rate understood in its formal aspect of not performing any kind of interaction on the visited page beyond reading its content? To answer these questions, we have first proceeded to a selection of the cybermedia with the highest web traffic of the 27 countries that are currently part of the European Union after the United Kingdom left on December 31, 2020. In each nation we have selected five media using a combination of the global web traffic metrics provided by the tools Alexa (https://www.alexa.com/), which ceased to be operational on May 1, 2022, and SimilarWeb (https:// www.similarweb.com/). We have not used local metrics by country since the results obtained with these first two tools were sufficiently significant and our objective is not to establish a ranking of cybermedia by nation but to examine the relevance of social networks in their web traffic. In all cases, cybermedia whose property corresponds to a journalistic company have been selected, ruling out those belonging to telecommunications portals or service providers; in some cases they correspond to classic information companies (both newspapers and televisions) while in others they refer to digital natives, without this circumstance affecting the nature of the research proposed.
Below we have proceeded to examine the web traffic data of said cybermedia. The period corresponding to the months of October, November and December 2021 and January, February and March 2022 has been selected. We believe that this six-month stretch allows possible one-time variations to be overcome for a month, reinforcing the precision of the data obtained. To secure this data, we have used the SimilarWeb tool, currently the most precise tool that exists when examining the web traffic of a portal, although it is limited to that coming from desktops and laptops, without taking into account those that come from mobile devices, currently impossible to determine with existing measurement tools on the market. It includes:

Web traffic general data: average visit duration, pages per visit and bounce rate Web traffic origin by country Percentage of traffic generated from social media over total web traffic Distribution of web traffic generated from social networks Comparison of web traffic generated from social netwoks with direct and search procedures
e
Dataset for: Same Question, Different Answers? An Empirical Comparison of...
b2find.eudat.eu
Updated Jul 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Dataset for: Same Question, Different Answers? An Empirical Comparison of Web Data and Traditional Data - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/aa5eb0c9-80f7-57ff-9c4e-3a3fcc32b966
Explore at:
Dataset updated
Jul 24, 2025
Description
Psychological scientists increasingly study web data, such as user ratings or social media postings. However, whether research relying on such web data leads to the same conclusions as research based on traditional data is largely unknown. To test this, we (re)analyzed three datasets, thereby comparing web data with lab and online survey data. We calculated correlations across these different datasets (Study 1) and investigated identical, illustrative research questions in each dataset (Studies 2 to 4). Our results suggest that web and traditional data are not fundamentally different and usually lead to similar conclusions, but also that it is important to consider differences between data types such as populations and research settings. Web data can be a valuable tool for psychologists when accounting for such differences, as it allows for testing established research findings in new contexts, complementing them with insights from novel data sources.
Data from: Nursing Home Compare
catalog.data.gov
data.va.gov
+2more
Updated May 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Veterans Affairs (2021). Nursing Home Compare [Dataset]. https://catalog.data.gov/dataset/nursing-home-compare-ed7b0
Explore at:
Dataset updated
May 1, 2021
Dataset provided by
United States Department of Veterans Affairshttp://va.gov/
Description
Nursing Home Compare has detailed information about every Medicare and Medicaid nursing home in the country. A nursing home is a place for people who can’t be cared for at home and need 24-hour nursing care. These are the official datasets used on the Medicare.gov Nursing Home Compare Website provided by the Centers for Medicare & Medicaid Services. These data allow you to compare the quality of care at every Medicare and Medicaid-certified nursing home in the country, including over 15,000 nationwide.
P
SWDE Dataset
paperswithcode.com
Updated Jan 31, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). SWDE Dataset [Dataset]. https://paperswithcode.com/dataset/swde
Explore at:
Dataset updated
Jan 31, 2022
Description
This dataset is a real-world web page collection used for research on the automatic extraction of structured data (e.g., attribute-value pairs of entities) from the Web. We hope it could serve as a useful benchmark for evaluating and comparing different methods for structured web data extraction.
t
Dataset: on the similarity of web measurements under different experimental...
service.tib.eu
Updated Nov 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Dataset: on the similarity of web measurements under different experimental setups - Vdataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/rdr-doi-10-35097-1719
Explore at:
Dataset updated
Nov 28, 2024
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Abstract: Measurement studies are essential for research and industry alike to understand the Web's inner workings better and help quantify specific phenomena. Performing such studies is demanding due to the dynamic nature and size of the Web. An experiment's careful design and setup are complex, and many factors might affect the results. However, while several works have independently observed differences in the outcome of an experiment (e.g., the number of observed trackers) based on the measurement setup, it is unclear what causes such deviations. This work investigates the reasons for these differences by visiting 1.7M webpages with five different measurement setups. Based on this, we build dependency trees' for each page and cross-compare the nodes in the trees. The results show that the measured trees differ considerably, that the cause of differences can be attributed to specific nodes, and that even identical measurement setups can produce different results. Abstract: Measurement studies are essential for research and industry alike to understand the Web's inner workings better and help quantify specific phenomena. Performing such studies is demanding due to the dynamic nature and size of the Web. An experiment's careful design and setup are complex, and many factors might affect the results. However, while several works have independently observed differences in the outcome of an experiment (e.g., the number of observed trackers) based on the measurement setup, it is unclear what causes such deviations. This work investigates the reasons for these differences by visiting 1.7M webpages with five different measurement setups. Based on this, we builddependency trees' for each page and cross-compare the nodes in the trees. The results show that the measured trees differ considerably, that the cause of differences can be attributed to specific nodes, and that even identical measurement setups can produce different results. TechnicalRemarks: This repository hosts the dataset corresponding to the paper "On the Similarity of Web Measurements Under Different Experimental Setups", which was published at the Proceedings of the 23nd ACM Internet Measurement Conference 2023.
Z
2019 EMDataResource Model Metrics Challenge Dataset
data.niaid.nih.gov
zenodo.org
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Monastryrskyy, Bohdan (2024). 2019 EMDataResource Model Metrics Challenge Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4148788
Explore at:
Dataset updated
Jul 19, 2024
Dataset provided by
Terashi, Genki
Kihara, Daisuke
Yu, Xiaodi
Burnley, Tom
Richardson, Jane S
Schäfer, Luisa U
Chiu, Wah
Patwardhan, Ardan
Fraser, James S
Schröder, Gunnar F
Igaev, Maxim
Perez, Alberto
Cheng, Jianlin
Hou, Jie
Berman, Helen M
Pintilie, Greg D
Dill, Ken
Lawson, Catherine L
Kryshtafovych, Andriy
Winn, Martyn
Monastryrskyy, Bohdan
Barad, Benjamin A
Herzik, Mark A Jr
Farrell, Daniel P
Hoh, Soon Wen
Cowtan, Kevin
Wu, Tianqi
Terwilliger, Thomas C
Sarkar, Daipayan
Afonine, Pavel V
Si, Dong
Palmer, Colin M
Wang, Liguo
Adams, Paul D
Cao, Renzhi
Vaiana, Andrea
Singharoy, Abishek
Chojnowski, Grzegorz
Williams, Christopher J
DiMaio, Frank
Wankowicz, Stephanie A
Schmid, Michael F
Pfab, Jonas
Zhang, Kaiming
Joseph, Agnel P
Baker, Matthew L
Kumar, Dilip
Olek, Mateusz
Hung, Li-Wei
Bond, Paul
Shekhar, Mrinal
Mittal, Sumit
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the full dataset of the 2019 Cryo-EM Map-based Model Metrics Challenge sponsored by EMDataResource (www.emdataresource.org, challenges.emdataresource.org, model-compare.emdataresource.org). The goals of this challenge were (1) to assess the quality of models that can be produced using current modeling software, (2) to check the reproducibility of modeling results from different software developers and users, and (3) compare the performance of current metrics used for evaluation of models. The focus was on near-atomic resolution maps with an innovative twist: three of four target maps formed a resolution series (1.8 to 3.1 Å) from the same specimen and imaging experiment. Tools developed in previous challenges were expanded for managing, visualizing and analyzing the 63 submitted coordinate models, and several new metrics were introduced.

File Descriptions:

2019-EMDataResource-Challenge-web.pdf: Archive of News, Goals, Timeline, Targets, Modelling Instructions, Process, FAQ, Submission Instructions, Submission Summary Statistics source from the EMDR Challenges website

correlation-images.tar.gz: Pairwise correlation tables for selected metric scores from the EMDR Model Compare website

maps.tar.gz: The maps used for Fit-to-Map analyses in the Challenge

models.tar.gz: The 63 models submitted by the modelling teams

results.tar.gz: The output logs for all of the analysis methods

Scores.xlsx: Scores for each model and analysis method, compiled into spreadsheet format

targets.tar.gz: The reference models used in the analysis

Post submission correction to the web archive PDF document: The full list of EMDataResource members on the model committee is as follows: Cathy Lawson, Andriy Kryshtafovych, Greg Pintilie, Mike Schmid, Helen Berman, Wah Chiu.
d
Site compare scripts and output
datasets.ai
catalog.data.gov
+1more
57
Updated Nov 12, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Environmental Protection Agency (2020). Site compare scripts and output [Dataset]. https://datasets.ai/datasets/site-compare-scripts-and-output
Explore at:
57Available download formats
Dataset updated
Nov 12, 2020
Dataset authored and provided by
U.S. Environmental Protection Agency
Description
Monthly site compare scripts and output used to generate the model/ob plots and statistics in the manuscript. The AQS hourly site compare output files are not included as they were too large to store on ScienceHub. The files contain paired model/ob values for the various air quality networks.

This dataset is associated with the following publication: Appel, W., S. Napelenok, K. Foley, H. Pye, C. Hogrefe, D. Luecken, J. Bash, S. Roselle, J. Pleim, H. Foroutan, B. Hutzell, G. Pouliot, G. Sarwar, K. Fahey, B. Gantt, D. Kang, R. Mathur, D. Schwede, T. Spero, D. Wong, J. Young, and N. Heath. Description and evaluation of the Community Multiscale Air Quality (CMAQ) modeling system version 5.1. Geoscientific Model Development. Copernicus Publications, Katlenburg-Lindau, GERMANY, 10: 1703-1732, (2017).
E
Portuguese Comparative Sentences: A Collection of Labeled Sentences on...
live.european-language-grid.eu
data.niaid.nih.gov
+1more
json
Updated Dec 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Portuguese Comparative Sentences: A Collection of Labeled Sentences on Twitter and Buscapé [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7664
Explore at:
jsonAvailable download formats
Dataset updated
Dec 10, 2023
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
More and more customers demand online reviews of products and comments on the Web to make decisions about buying a product over another. In this context, sentiment analysis techniques constitute the traditional way to summarize a user’s opinions that criticizes or highlights the positive aspects of a product. Sentiment analysis of reviews usually relies on extracting positive and negative aspects of products, neglecting comparative opinions. Such opinions do not directly express a positive or negative view but contrast aspects of products from different competitors. Here, we present the first effort to study comparative opinions in Portuguese, creating two new Portuguese datasets with comparative sentences marked by three humans. This repository consists of three important files: (1) lexicon that contains words frequently used to make a comparison in Portuguese; (2) Twitter dataset with labeled comparative sentences; and (3) Buscapé dataset with labeled comparative sentences.The lexicon is a set of 176 words frequently used to express a comparative opinion in the Portuguese language. In these contexts, the lexicon is aggregated in a filter and used to build two sets of data with comparative sentences from two important contexts: (1) Social Network Online; and (2) Product reviews.For Twitter, we collected all Portuguese tweets published in Brazil on 2018/01/10 and filtered all tweets that contained at least one keyword present in the lexicon, obtaining 130,459 tweets. Our work is based on the sentence level. Thus, all sentences were extracted and a sample with 2,053 sentences was created, which was labeled for three human manuals, reaching an 83.2% agreement with Fleiss' Kappa coefficient. For Buscapé, a Brazilian website (https://www.buscape.com.br/) used to compare product prices on the web, the same methodology was conducted by creating a set of 2,754 labeled sentences, obtained from comments made in 2013. This dataset was labeled by three humans, reaching an agreement of 83.46% with the Fleiss Kappa coefficient.The Twitter dataset has 2,053 labeled sentences, of which 918 are comparative. The Buscapé dataset has 2,754 labeled sentences, of which 1,282 are comparative.The datasets contain some properties labeled:text: the sentence extracted from the review comment.entity_s1: the first entity compared in the sentence.entity_s2: the second entity compared in the sentence.keyword: the comparative keyword used in the sentence to express comparison.preferred_entity: the preferred entity.id_start: the starting position of the keyword in the phrase.id_end: the keyword's final position in the sentence.type: the sentence label, which specifies whether the phrase is a comparison.Additional information:1 - The sentences were separated using a sentence tokenizer.2 - If the compared entity is not specified, the field will receive a value: "_".3 - The property type can contain different five values, they are:0: Non-comparative (Não Comparativa).1: Non-Equal-Gradable (Gradativa com Predileção).2: Equative (Equitativa).3: Superlative (Superlativa).4: Non-Equal-Gradable (Não Gradativa).If you use this data, please cite our paper as follows: "Daniel Kansaon, Michele A. Brandão, Julio C. S. Reis, Matheus Barbosa,Breno Matos, and Fabrício Benevenuto. 2020. Mining Portuguese Comparative Sentences in Online Reviews. In Brazilian Symposium on Multimedia and the Web (WebMedia ’20), November 30-December 4, 2020, São Luís, Brazil. ACM, New York, NY, USA, 8 pages. https://doi.org/10.1145/3428658.3431081"
A
‘Hospital ratings’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Hospital ratings’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-hospital-ratings-8232/latest
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Hospital ratings’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/center-for-medicare-and-medicaid/hospital-ratings on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

This are the official datasets used on the Medicare.gov Hospital Compare Website provided by the Centers for Medicare & Medicaid Services. These data allow you to compare the quality of care at over 4,000 Medicare-certified hospitals across the country.

Content

Dataset fields:

Provider ID

Hospital Name

Address

City

State

ZIP Code

County Name

Phone Number

Hospital Type

Hospital Ownership

Emergency Services

Meets criteria for meaningful use of EHRs

Hospital overall rating

Hospital overall rating footnote

Mortality national comparison

Mortality national comparison footnote

Safety of care national comparison

Safety of care national comparison footnote

Readmission national comparison

Readmission national comparison footnote

Patient experience national comparison

Patient experience national comparison footnote

Effectiveness of care national comparison

Effectiveness of care national comparison footnote

Timeliness of care national comparison

Timeliness of care national comparison footnote

Efficient use of medical imaging national comparison

Efficient use of medical imaging national comparison

Acknowledgements

Dataset was downloaded from [https://data.medicare.gov/data/hospital-compare]

Inspiration

If you just broke your leg, you might need to use this dataset to find the best Hospital to get that fixed!

--- Original source retains full ownership of the source dataset ---
h
AgentTrek
huggingface.co
Updated Feb 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
XLang NLP Lab (2025). AgentTrek [Dataset]. https://huggingface.co/datasets/xlangai/AgentTrek
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 20, 2025
Dataset authored and provided by
XLang NLP Lab
Description
AgentTrek Data Collection

AgentTrek dataset is the training dataset for the Web agent AgentTrek-1.0-32B. It consists of a total of 52,594 dialogue turns, specifically designed to train a language model for performing web-based tasks, such as browsing and web shopping. The dialogues in this dataset simulate interactions where the agent assists users in tasks like searching for information, comparing products, making purchasing decisions, and navigating websites.

Dataset… See the full description on the dataset page: https://huggingface.co/datasets/xlangai/AgentTrek.
Search Engines Comparison and Websites Performance
zenodo.org
data.niaid.nih.gov
bin
Updated Jul 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Georgios Ntimo; Vasilios Ntararas; Georgios Ntimo; Vasilios Ntararas (2023). Search Engines Comparison and Websites Performance [Dataset]. http://doi.org/10.5281/zenodo.8102700
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8102700
Dataset updated
Jul 1, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Georgios Ntimo; Vasilios Ntararas; Georgios Ntimo; Vasilios Ntararas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The current dataset is consisted of 200 search results extracted from Google and Bing engines (100 of Google and 100 of Bing). The search terms are selected from the 10 most search keywords of 2021 based on the provided data of Google Trends. The rest of the sheets include the performance of the websites according to three technical evaluation aspects. That is, SEO, Speed and Security. The performance dataset has been developed through the utilization of CheckBot crawling tool. The whole dataset can help information retrieval scientists to compare the two engines in terms of their position/ranking and their performance related to these factors.

For more information about the thinking of the of the structure of the dataset please contact the Information Management Lab of University of West Attica.

Contact Persons: Vasilis Ntararas (lb17032@uniwa.gr) , Georgios Ntimo (lb17100@uniwa.gr) and Ioannis C. Drivas (idrivas@uniwa.gr)
CMAQv5.1 with new dust IMPROVE site compare files
catalog.data.gov
datasets.ai
Updated Nov 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). CMAQv5.1 with new dust IMPROVE site compare files [Dataset]. https://catalog.data.gov/dataset/cmaqv5-1-with-new-dust-improve-site-compare-files
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
CMAQv5.1 with a new dust module IMPROVE sitex files containing 24-hr (every 3rd day) paired model/ob data for the IMPROVE network. This dataset is associated with the following publication: Foroutan, H., J. Young, S. Napelenok, L. Ran, W. Appel, R. Gilliam, and J. Pleim. Development and evaluation of a physics-based windblown dust emission scheme implemented in the CMAQ modeling system. Journal of Advances in Modeling Earth Systems. John Wiley & Sons, Inc., Hoboken, NJ, USA, 9(1): 585-608, (2017).
e
Response Rates Differences Web vs. Other - Dataset - B2FIND
b2find.eudat.eu
Updated Jul 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The citation is currently not available for this dataset.
Explore at:
Dataset updated
Jul 24, 2025
Description
Do web surveys still yield lower response rates compared with other survey modes? To answer this question, we replicated and extended a meta-analysis done in 2008 which found that, based on 45 experimental comparisons, web surveys had an 11 percentage points lower response rate compared with other survey modes. Fundamental changes in internet accessibility and use since the publication of the original meta-analysis would suggest that people’s propensity to participate in web surveys has changed considerably in the meantime. However, in our replication and extension study, which comprised 114 experimental comparisons between web and other survey modes, we found almost no change: web surveys still yielded lower response rates than other modes (a difference of 12 percentage points in response rates). Furthermore, we found that prenotifications, the sample recruitment strategy, the survey’s solicitation mode, the type of target population, the number of contact attempts, and the country in which the survey was conducted moderated the magnitude of the response rate differences. These findings have substantial implications for web survey methodology and operations.
VineLOGIC: Experimental Data Sets
data.csiro.au
researchdata.edu.au
Updated Feb 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rob Walker; Rachel Ashley; Nicola Cooley; Anne Pellegrino; Deidre Blackmore; Peter Clingeleffer; Everard Edwards; D. C. Godwin; R. J. G. White; David Benn (2023). VineLOGIC: Experimental Data Sets [Dataset]. http://doi.org/10.25919/j503-ft52
Explore at:
Unique identifier
https://doi.org/10.25919/j503-ft52
Dataset updated
Feb 28, 2023
Dataset provided by
CSIROhttp://www.csiro.au/
Authors
Rob Walker; Rachel Ashley; Nicola Cooley; Anne Pellegrino; Deidre Blackmore; Peter Clingeleffer; Everard Edwards; D. C. Godwin; R. J. G. White; David Benn
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jul 1, 2000 - Dec 31, 2006
Dataset funded by
CSIROhttp://www.csiro.au/
Australian Wine Research Institute
Description
Three experimental data sets (WNRA0103, WNRA0305 and WNRA0506) involving three grapevine varieties and a range of deficit irrigation and pruning treatments are described. The purpose for obtaining the data sets was two-fold, (1) to meet the research goals of the Cooperative Research Centre for Viticulture (CRCV) during its tenure 1999-2006, and (2) to test the capacity of the VineLOGIC grapevine growth and development model to predict timing of bud burst, flowering, veraison and harvest, yield and yield components, berry attributes and components of water balance. A test script, included with the VineLOGIC source code publication (https://doi.org/10.25919/5eb3536b6a8a8), enables comparison between model predicted and measured values for key variables. Key references relating to the model and data sets are provided under Related Links. A description of selected terms and outcomes of regression analysis between values predicted by the model and observed values are provided under Supporting Files. Version 3 included the following amendments: (1) to WNRA0103 – alignment of settings for irrigation simulation control and initial soil water contents for soil layers with those in WNRA0305 and WNRA0506, and addition of missing berry anthocyanin data for season 2002-03; (2) to WNRA0305 - minor corrections to values for berry and bunch number and weight, and correction of target Brix value for harvest to 24.5 Brix; (3) minor corrections to some measured berry anthocyanin concentrations as mg/g fresh weight; minor amendments to treatment names for consistency across data sets, and to the name for irrigation type to improve clarity; and (4) update of regression analysis between VineLOGIC-predicted versus observed values for key variables. Version 4 (this version) includes a metadata only amendment with two additions to Related links: ‘VineLOGIC View’ and a recent publication. Lineage: The data sets were obtained at a commercial wine company vineyard in the Mildura region of north western Victoria, Australia. Vines were spaced 2.4 m within rows and 3 m between rows, trained to a two-wire vertical trellis and drip irrigated. The soil was a Nookamka sandy loam. Data Set 1 (WNRA0103): An experiment comparing the effects on grapevine growth and development of three pruning treatments, spur, light mechanical hedging and minimal pruning, involving Shiraz on Schwarzmann rootstock, irrigated with industry standard drip irrigation and collected over three seasons 2000-01, 2001-02 and 2002-03. The experiment was established and conducted by Dr Rachel Ashley with input from Peter Clingeleffer (CSIRO), Dr Bob Emmett (Department of Primary Industries, Victoria) and Dr Peter Dry (University of Adelaide). Seasons in the southern hemisphere span two calendar years, with budburst in the second half of the first calendar year and harvest in the first half of the second calendar year. Data Set 2 (WNRA0305): An experiment comparing the effects of three irrigation treatments, industry standard drip, Regulated Deficit (RDI) and Prolonged Deficit (PD) irrigation involving Cabernet Sauvignon on own roots and pruned by light mechanical hedging, over three seasons 2002-03, 2003-04 and 2004-05. The RDI treatment involved application of a water deficit in the post-fruit set to pre-veraison period. The PD treatment was initially the same as RDI but with an extended period of extreme deficit (no irrigation) after the RDI stress period until veraison. The experiment was established and conducted by Dr Nicola Cooley with input from Peter Clingeleffer and Dr Rob Walker (CSIRO). Data Set 3 (WNRA0506): Compared basic grapevine growth, development and berry maturation post fruit set at three Trial Sites over two seasons 2004-05 and 2005-06. Trial Site one is the same site used to collect Data Set 1. Data were collected from all three pruning treatments in season 2004-05 but only from the spur and light mechanical hedging treatments in season 2005-06. Trial Site two involved comparison of two scions, Chardonnay and Shiraz, both on Schwarzmann rootstock, irrigated with industry standard drip irrigation and pruned using light mechanical hedging. Data were collected in season 2004-05. Trial Site three is the same site used to collect Data Set 2. Data were collected from all three irrigation treatments in season 2004-05 but only from the industry standard drip and PD treatments in 2005-06. Establishment and conduct of experiments at Trial Sites one, two and three was by Dr Anne Pellegrino and Deidre Blackmore with input from Peter Clingeleffer and Dr Rob Walker. The decision to develop Data Set 3 followed a mid-term CRCV review and analysis of available Australian data sets and relevant literature, which identified the need to obtain a data set covering all of the required variables necessary to run VineLOGIC and in particular, to obtain data on berry development commencing as soon as possible after fruit set. Most prior data sets were from veraison onwards, which is later than desirable from a modelling perspective. Data Set 1, 2 and 3 compilation for VineLOGIC was by Deidre Blackmore with input from Dr Doug Godwin. Review and testing of the Data Sets with VineLOGIC was conducted by David Benn with input from Dr Paul Petrie (South Australian Research and Development Institute), Dr Vinay Pagay (University of Adelaide) and Drs Everard Edwards and Rob Walker (CSIRO). A collaboration agreement with University of Adelaide established in 2017 enabled further input to review of the Data Sets and their testing with VineLOGIC by Dr Sam Culley.
[Crypto] CoinGecko vs CoinMarketCap Data
kaggle.com
Updated May 11, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sherpa (2020). [Crypto] CoinGecko vs CoinMarketCap Data [Dataset]. https://www.kaggle.com/thesherpafromalabama/coingecko-vs-coinmarketcap-data/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 11, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sherpa
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Use the CMC_CG_Combo dataset, unless you want to recollect and DIY!

Context

On a quest to compare different cryptoexchanges, I came up with the idea to compare metrics across multiple platforms (at the moment just two). CoinGecko and CoinMarketCap are two of the biggest websites for monitoring both exchanges and cryptoprojects. In response to over-inflated volumes faked by crypto exchanges, both websites came up with independent metrics for assessing the worth of a given exchange.

Content

Collected on May 10, 2020

CoinGecko's data is a bit more holistic, containing metrics across a multitude of areas (you can read more in the original blog post here. The data from CoinGecko consists of the following:

-Exchange Name -Trust Score (on a scale of N/A-10) -Type (centralized/decentralized) -AML (risk: How well prepared are they to handle financial crime?) -API Coverage (Blanket Measure that includes: (1) Tickers Data (2) Historical Trades Data (3) Order Book Data (4) Candlestick/OHLC (5) WebSocket API (6) API Trading (7) Public Documentation -API Last Updated (When was the API last updated?) -Bid Ask Spread (Average buy/sell spread across all pairs) -Candlestick (Available/Not) -Combined Orderbook Percentile (See above link) -Estimated_Reserves (estimated holdings of major crypto) -Grade_Score (Overall API score) -Historical Data (available/not) -Jurisdiction Risk (risk: risk of Terrorist activity/bribery/corruption?) -KYC Procedures (risk: Know Your Customer?) -License and Authorization (risk: has exchange sought regulatory approval?) -Liquidity (don't confuse with "CMC Liquidity". THIS column is a combo of (1) Web traffic & Reported Volume (2) Order book spread (3) Trading Activity (4) Trust Score on Trading Pairs -Negative News (risk: any bad news?) -Normalized Trading Volume (Trading Volume normalized to web traffic) -Normalized Volume Percentile (see above blog link) -Orderbook (available/not) -Public Documentation (got well documented API available to everyone?) -Regulatory Compliance (risk rating from compliance perspective) -Regulatory last updated (last time regulatory metrics were updated) -Reported Trading Volume (volume as listed by the exchange) -Reported Normalized Trading Volume (Ratio of normalized to reported volume [0-1]) -Sanctions (risk: risk of sanctions?) -Scale (based on: (1) Normalized Trading Volume Percentile (2) Normalized Order Book Depth Percentile -Senior Public Figure (risk: does exchange have transparent public relations? etc) -Tickers (tick tick tick...) -Trading via API (can data be traded through the API?) -Websocket (got websockets?)

-Green Pairs (Percentage of trading pairs deemed to have good liquidity) -Yellow Pairs (Percentage of trading pairs deemed to have fair liquidity -Red Pairs (Percentage of trading pairs deemed to have poor liquidity) -Unknown Pairs (percentage of trading pairs that do not have sufficient order book data)

~

Again, CoinMarketCap only has one metric (that was recently updated and scales from 1-1000, 1000 being very liquid and 1 not. You can go check the article out for yourself. In the dataset, this is the "CMC Liquidity" column, not to be confused with the "Liquidity" column, which refers to the CoinGecko Metric!

Acknowledgements

Thanks to coingecko and cmc for making their data scrapable :)

[CMC, you should try to give us a little more access to the figures that define your metric. Thanks!]

Inspiration

Your data will be in front of the world's largest data science community. What questions do you want to see answered?
American Community Survey: 1-Year Estimates: Comparison Profiles 1-Year
catalog.data.gov
datasets.ai
Updated Jul 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Census Bureau (2023). American Community Survey: 1-Year Estimates: Comparison Profiles 1-Year [Dataset]. https://catalog.data.gov/dataset/american-community-survey-1-year-estimates-comparison-profiles-1-year
Explore at:
Dataset updated
Jul 19, 2023
Dataset provided by
United States Census Bureauhttp://census.gov/
Description
The American Community Survey (ACS) is an ongoing survey that provides data every year -- giving communities the current information they need to plan investments and services. The ACS covers a broad range of topics about social, economic, demographic, and housing characteristics of the U.S. population. Much of the ACS data provided on the Census Bureau's Web site are available separately by age group, race, Hispanic origin, and sex. Summary files, Subject tables, Data profiles, and Comparison profiles are available for the nation, all 50 states, the District of Columbia, Puerto Rico, every congressional district, every metropolitan area, and all counties and places with populations of 65,000 or more. Comparison profiles are similar to data profiles but also include comparisons with past-year data. The current year data are compared with each of the last four years of data and include statistical significance testing. There are over 1,000 variables in this dataset.
d
Data from: Evaluation and comparison of classical interatomic potentials...
dataone.org
data.niaid.nih.gov
+1more
Updated Jun 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kamal Choudhary; Faical Yannick P. Congo; Tao Liang; Chandler Becker; Richard G. Hennig; Francesca Tavazza (2025). Evaluation and comparison of classical interatomic potentials through a user-friendly interactive web-interface [Dataset]. http://doi.org/10.5061/dryad.dd56c
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.dd56c
Dataset updated
Jun 2, 2025
Dataset provided by
Dryad Digital Repository
Authors
Kamal Choudhary; Faical Yannick P. Congo; Tao Liang; Chandler Becker; Richard G. Hennig; Francesca Tavazza
Time period covered
Jan 1, 2018
Description
Classical empirical potentials/force-fields (FF) provide atomistic insights into material phenomena through molecular dynamics and Monte Carlo simulations. Despite their wide applicability, a systematic evaluation of materials properties using such potentials and, especially, an easy-to-use user-interface for their comparison is still lacking. To address this deficiency, we computed energetics and elastic properties of variety of materials such as metals and ceramics using a wide range of empirical potentials and compared them to density functional theory (DFT) as well as to experimental data, where available. The database currently consists of 3248 entries including energetics and elastic property calculations, and it is still increasing. We also include computational tools for convex-hull plots for DFT and FF calculations. The data covers 1471 materials and 116 force-fields. A major feature of this database is that the web interface offers easy look up tables to compare at a glance t...
d
Nursing Home Compare Data.
datadiscoverystudio.org
Updated Jun 9, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2018). Nursing Home Compare Data. [Dataset]. http://datadiscoverystudio.org/geoportal/rest/metadata/item/f8c3fad314ec409f93dc7deeee63a74f/html
Explore at:
Dataset updated
Jun 9, 2018
Description
description:
These are the official datasets used on the Medicare.gov Nursing Home Compare Website provided by the Centers for Medicare and Medicaid Services. These data allow you to compare the quality of care at every Medicare and Medicaid-certified nursing home in the country, including over 15,000 nationwide.
; abstract:
These are the official datasets used on the Medicare.gov Nursing Home Compare Website provided by the Centers for Medicare and Medicaid Services. These data allow you to compare the quality of care at every Medicare and Medicaid-certified nursing home in the country, including over 15,000 nationwide.
Data for Multi-site, multi-platform comparison of magnetic resonance imaging...
catalog.data.gov
data.nist.gov
Updated Jul 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2022). Data for Multi-site, multi-platform comparison of magnetic resonance imaging (MRI) T1 measurement using the International Society of Magnetic Resonance in Medicine/National Institute of Standards and Technology (ISMRM/NIST) system phantom [Dataset]. https://catalog.data.gov/dataset/data-for-multi-site-multi-platform-comparison-of-magnetic-resonance-imaging-mri-t1-measure
Explore at:
Dataset updated
Jul 29, 2022
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
Data for multi-site, multi-platform comparison of magnetic resonance imaging (MRI) T1 measurement using the International Society of Magnetic Resonance in Medicine/National Institute of Standards and Technology (ISMRM/NIST) system phantom. Includes data sets for T1 measurement by inversion recovery (IR) and variable flip angle (VFA) methods at 1.5 tesla and 3 tesla. At 1.5 T, data is from 2 different vendor systems, 9 total MRI machines. At 3 T, data is from 3 different vendor systems, 18 total MRI machines.
E-commerce - Users of a French C2C fashion store
kaggle.com
Updated Feb 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeffrey Mvutu Mabilama (2024). E-commerce - Users of a French C2C fashion store [Dataset]. https://www.kaggle.com/jmmvutu/ecommerce-users-of-a-french-c2c-fashion-store/notebooks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 24, 2024
Dataset provided by
Kaggle
Authors
Jeffrey Mvutu Mabilama
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
French
Description
Foreword

This users dataset is a preview of a much bigger dataset, with lots of related data (product listings of sellers, comments on listed products, etc...).

My Telegram bot will answer your queries and allow you to contact me.

Context

There are a lot of unknowns when running an E-commerce store, even when you have analytics to guide your decisions.

Users are an important factor in an e-commerce business. This is especially true in a C2C-oriented store, since they are both the suppliers (by uploading their products) AND the customers (by purchasing other user's articles).

This dataset aims to serve as a benchmark for an e-commerce fashion store. Using this dataset, you may want to try and understand what you can expect of your users and determine in advance how your grows may be.

For instance, if you see that most of your users are not very active, you may look into this dataset to compare your store's performance.

If you think this kind of dataset may be useful or if you liked it, don't forget to show your support or appreciation with an upvote/comment. You may even include how you think this dataset might be of use to you. This way, I will be more aware of specific needs and be able to adapt my datasets to suits more your needs.

This dataset is part of a preview of a much larger dataset. Please contact me for more.

Content

The data was scraped from a successful online C2C fashion store with over 10M registered users. The store was first launched in Europe around 2009 then expanded worldwide.

Visitors vs Users: Visitors do not appear in this dataset. Only registered users are included. "Visitors" cannot purchase an article but can view the catalog.

Acknowledgements

We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

Inspiration

Questions you might want to answer using this dataset:

Are e-commerce users interested in social network feature ?

Are my users active enough (compared to those of this dataset) ?

How likely are people from other countries to sign up in a C2C website ?

How many users are likely to drop off after years of using my service ?

Example works:

Report(s) made using SQL queries can be found on the data.world page of the dataset.

Notebooks may be found on the Kaggle page of the dataset.

License

CC-BY-NC-SA 4.0

For other licensing options, contact me.

Facebook

Twitter

Click to copy link

Link copied

Cite

David Parra; Santiago Martínez Arias; Sergio Mena Muñoz (2022). Analysis of the Quantitative Impact of Social Networks General Data.doc [Dataset]. http://doi.org/10.6084/m9.figshare.21329421.v1

Data from: Analysis of the Quantitative Impact of Social Networks General Data.doc

Explore at:

docAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.21329421.v1

Dataset updated

Oct 14, 2022

Dataset provided by

Figsharehttp://figshare.com/

Authors

David Parra; Santiago Martínez Arias; Sergio Mena Muñoz

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

General data recollected for the studio " Analysis of the Quantitative Impact of Social Networks on Web Traffic of Cybermedia in the 27 Countries of the European Union". Four research questions are posed: what percentage of the total web traffic generated by cybermedia in the European Union comes from social networks? Is said percentage higher or lower than that provided through direct traffic and through the use of search engines via SEO positioning? Which social networks have a greater impact? And is there any degree of relationship between the specific weight of social networks in the web traffic of a cybermedia and circumstances such as the average duration of the user's visit, the number of page views or the bounce rate understood in its formal aspect of not performing any kind of interaction on the visited page beyond reading its content? To answer these questions, we have first proceeded to a selection of the cybermedia with the highest web traffic of the 27 countries that are currently part of the European Union after the United Kingdom left on December 31, 2020. In each nation we have selected five media using a combination of the global web traffic metrics provided by the tools Alexa (https://www.alexa.com/), which ceased to be operational on May 1, 2022, and SimilarWeb (https:// www.similarweb.com/). We have not used local metrics by country since the results obtained with these first two tools were sufficiently significant and our objective is not to establish a ranking of cybermedia by nation but to examine the relevance of social networks in their web traffic. In all cases, cybermedia whose property corresponds to a journalistic company have been selected, ruling out those belonging to telecommunications portals or service providers; in some cases they correspond to classic information companies (both newspapers and televisions) while in others they refer to digital natives, without this circumstance affecting the nature of the research proposed.
Below we have proceeded to examine the web traffic data of said cybermedia. The period corresponding to the months of October, November and December 2021 and January, February and March 2022 has been selected. We believe that this six-month stretch allows possible one-time variations to be overcome for a month, reinforcing the precision of the data obtained. To secure this data, we have used the SimilarWeb tool, currently the most precise tool that exists when examining the web traffic of a portal, although it is limited to that coming from desktops and laptops, without taking into account those that come from mobile devices, currently impossible to determine with existing measurement tools on the market. It includes:

Web traffic general data: average visit duration, pages per visit and bounce rate Web traffic origin by country Percentage of traffic generated from social media over total web traffic Distribution of web traffic generated from social networks Comparison of web traffic generated from social netwoks with direct and search procedures

Clear search

Close search

Google apps

Main menu

Data from: Analysis of the Quantitative Impact of Social Networks General...

Dataset for: Same Question, Different Answers? An Empirical Comparison of...

Data from: Nursing Home Compare

SWDE Dataset

Dataset: on the similarity of web measurements under different experimental...

2019 EMDataResource Model Metrics Challenge Dataset

Site compare scripts and output

Portuguese Comparative Sentences: A Collection of Labeled Sentences on...

‘Hospital ratings’ analyzed by Analyst-2

Context

Content

Acknowledgements

Inspiration

AgentTrek

Search Engines Comparison and Websites Performance

CMAQv5.1 with new dust IMPROVE site compare files

Response Rates Differences Web vs. Other - Dataset - B2FIND

VineLOGIC: Experimental Data Sets

[Crypto] CoinGecko vs CoinMarketCap Data

Use the CMC_CG_Combo dataset, unless you want to recollect and DIY!

Context

Content

Acknowledgements

Inspiration

American Community Survey: 1-Year Estimates: Comparison Profiles 1-Year

Data from: Evaluation and comparison of classical interatomic potentials...

Nursing Home Compare Data.

Data for Multi-site, multi-platform comparison of magnetic resonance imaging...

E-commerce - Users of a French C2C fashion store

Foreword

Context

Content

Acknowledgements

Inspiration

License

Data from: Analysis of the Quantitative Impact of Social Networks General Data.doc