100+ datasets found
  1. d

    Dataset for: Same Question, Different Answers? An Empirical Comparison of...

    • demo-b2find.dkrz.de
    Updated Sep 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Dataset for: Same Question, Different Answers? An Empirical Comparison of Web Data and Traditional Data - Dataset - B2FIND [Dataset]. http://demo-b2find.dkrz.de/dataset/bc684dad-c657-5013-b2d4-cc35b4a2e7ee
    Explore at:
    Dataset updated
    Sep 22, 2025
    Description

    Psychological scientists increasingly study web data, such as user ratings or social media postings. However, whether research relying on such web data leads to the same conclusions as research based on traditional data is largely unknown. To test this, we (re)analyzed three datasets, thereby comparing web data with lab and online survey data. We calculated correlations across these different datasets (Study 1) and investigated identical, illustrative research questions in each dataset (Studies 2 to 4). Our results suggest that web and traditional data are not fundamentally different and usually lead to similar conclusions, but also that it is important to consider differences between data types such as populations and research settings. Web data can be a valuable tool for psychologists when accounting for such differences, as it allows for testing established research findings in new contexts, complementing them with insights from novel data sources.

  2. Data from: Nursing Home Compare

    • catalog.data.gov
    • datahub.va.gov
    • +2more
    Updated Aug 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Veterans Affairs (2025). Nursing Home Compare [Dataset]. https://catalog.data.gov/dataset/nursing-home-compare-ed7b0
    Explore at:
    Dataset updated
    Aug 2, 2025
    Dataset provided by
    United States Department of Veterans Affairshttp://va.gov/
    Description

    Nursing Home Compare has detailed information about every Medicare and Medicaid nursing home in the country. A nursing home is a place for people who can’t be cared for at home and need 24-hour nursing care. These are the official datasets used on the Medicare.gov Nursing Home Compare Website provided by the Centers for Medicare & Medicaid Services. These data allow you to compare the quality of care at every Medicare and Medicaid-certified nursing home in the country, including over 15,000 nationwide.

  3. Top Visited Websites

    • kaggle.com
    zip
    Updated Nov 19, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Top Visited Websites [Dataset]. https://www.kaggle.com/datasets/thedevastator/the-top-websites-in-the-world
    Explore at:
    zip(1286 bytes)Available download formats
    Dataset updated
    Nov 19, 2022
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The Top Websites in the World

    How They Change Over Time

    About this dataset

    This dataset consists of the top 50 most visited websites in the world, as well as the category and principal country/territory for each site. The data provides insights into which sites are most popular globally, and what type of content is most popular in different parts of the world

    How to use the dataset

    This dataset can be used to track the most popular websites in the world over time. It can also be used to compare website popularity between different countries and categories

    Research Ideas

    • To track the most popular websites in the world over time
    • To see how website popularity changes by region
    • To find out which website categories are most popular

    Acknowledgements

    Dataset by Alexa Internet, Inc. (2019), released on Kaggle under the Open Data Commons Public Domain Dedication and License (ODC-PDDL)

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: df_1.csv | Column name | Description | |:--------------------------------|:---------------------------------------------------------------------| | Site | The name of the website. (String) | | Domain Name | The domain name of the website. (String) | | Category | The category of the website. (String) | | Principal country/territory | The principal country/territory where the website is based. (String) |

  4. Online Sports Betting

    • kaggle.com
    zip
    Updated Oct 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Online Sports Betting [Dataset]. https://www.kaggle.com/datasets/thedevastator/online-sports-betting
    Explore at:
    zip(8915 bytes)Available download formats
    Dataset updated
    Oct 28, 2022
    Authors
    The Devastator
    Description

    Online Sports Betting

    A State-by-State Comparison

    About this dataset

    How prevalent is sports betting across the United States? This dataset provides information on the legal status of sports betting, revenue generated by sports betting, the number of sports betting outlets, and more. Use this dataset to compare the revenue generated by sports betting across different states

    How to use the dataset

    This dataset can be used to understand the prevalence of sports betting across the United States and to compare the revenue generated by sports betting across states.

    Research Ideas

    • Understand the prevalence of sports betting across the United States and to compare the revenue generated by sports betting across states
    • Understand how the legal status of sports betting affects revenue generated
    • Understand how the number of sports betting outlets affects revenue generated

    Columns

    File: New Jersey.csv | Column name | Description | |:------------------|:--------------------------------------------------------------| | date | The date of the data. (Date) | | New Jersey | The amount of money bet on sports in New Jersey. (Numeric) | | Pennsylvania | The amount of money bet on sports in Pennsylvania. (Numeric) | | Delaware | The amount of money bet on sports in Delaware. (Numeric) | | Mississippi | The amount of money bet on sports in Mississippi. (Numeric) | | Nevada | The amount of money bet on sports in Nevada. (Numeric) | | Rhode Island | The amount of money bet on sports in Rhode Island. (Numeric) | | West Virginia | The amount of money bet on sports in West Virginia. (Numeric) | | Arkansas | The amount of money bet on sports in Arkansas. (Numeric) | | New York | The amount of money bet on sports in New York. (Numeric) | | Iowa | The amount of money bet on sports in Iowa. (Numeric) | | Indiana | The amount of money bet on sports in Indiana. (Numeric) | | Oregon | The amount of money bet on sports in Oregon. (Numeric) | | New Hampshire | The amount of money bet on sports in New Hampshire. (Numeric) | | Michigan | The amount of money bet on sports in Michigan. (Numeric) | | Montana | The amount of money bet on sports in Montana. (Numeric) | | Colorado | The amount of money bet on sports in Colorado. (Numeric) | | Washington DC | The amount of money bet on sports in Washington DC. (Numeric) | | Illinois | The amount of money bet on sports in Illinois. (Numeric) | | Tennessee | The amount of money bet on sports in Tennessee. (Numeric) |

    File: PopulationStates.csv | Column name | Description | |:--------------|:----------------------------------------------------| | State | The state in which the data was collected. (String) |

    File: homeless.csv | Column name | Description | |:----------------|:----------------------------------------------------| | year | The year the data was collected. (Integer) | | unsheltered | The number of people who are unsheltered. (Integer) |

    File: income.csv | Column name | Description | |:------------------|:--------------------------------------------------------------| | Pennsylvania | The amount of money bet on sports in Pennsylvania. (Numeric) | | Delaware | The amount of money bet on sports in Delaware. (Numeric) | | Mississippi | The amount of money bet on sports in Mississippi. (Numeric) | | Nevada | The amount of money bet on sports in Nevada. (Numeric) | | Rhode Island | The amount of money bet on sports in Rhode Island. (Numeric) | | West Virginia | The amount of money bet on sports in West Virginia. (Numeric) | | Arkansas | The amount of money bet on sports in Arkansas. (Numeric) | | New York | The amount of money bet on sports in New York. (Numeric) | | Iowa | The amount of money bet on sports in Iowa. (Numeric) | | Indiana | The amount of money bet on sports in Indiana. (Numeric) | | New Hampshire | The amount of money bet on sports in New Hampshire. (Numeric) | | Michigan | The amount of money bet on sports in Michigan. (Numeric) | | Colorado | The amount of money bet on sports in Colorado. (Numeric) | | Washington DC | The amount of money bet on sports in Washington DC. (Numeric) | | Illinois | The amount of money bet on sports in Illinois. (Nume...

  5. a

    Urban Observatory Compare App

    • fesec-cesj.opendata.arcgis.com
    • gis-for-secondary-schools-schools-be.hub.arcgis.com
    Updated Aug 16, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ArcGIS Maps for the Nation (2013). Urban Observatory Compare App [Dataset]. https://fesec-cesj.opendata.arcgis.com/datasets/nation::urban-observatory-compare-app
    Explore at:
    Dataset updated
    Aug 16, 2013
    Dataset authored and provided by
    ArcGIS Maps for the Nation
    Description

    The Urban Observatory Compare app shows maps of the same subject for three cities, in a side by side comparison view. The app allows quick visual comparisons of the patterns at work in cities around the world.The app allows people to interact with rich datasets for each city. People can use the Urban Observatory web application to easily compare cities by using a simple web browser. As a user zooms in to one digital city map, other city maps will zoom in parallel, revealing similarities and differences in density and distribution. For instance, a person can simultaneously view traffic density for Abu Dhabi and Paris or simultaneously view vegetation in London and Tokyo.The Urban Observatory is brought to you by Richard Saul Wurman, creator of Technology/Entertainment/Design (TED) and 19.20.21; Jon Kamen of the Academy Award-, Emmy Award-, and Golden Globe Award-winning film company @radical.media; and Esri president Jack Dangermond. "A map is a pattern made understandable, and patterns must be compared to understand successes, failures, and opportunities of our global cities," says Wurman. "The Urban Observatory demonstrates this new paradigm, using cartographic language and constructive data display. People and cities can use maps as a common language," said Wurman. The application utilizes Esri's ArcGIS API for JavaScript. Once a web map is created, it is added to a group and tagged to indicated its city and subject information. Those tags are read by the application as it starts up in the browser.

  6. d

    PIECE: Plant Intron Exon Comparison and Evolution Database

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    Updated Apr 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). PIECE: Plant Intron Exon Comparison and Evolution Database [Dataset]. https://catalog.data.gov/dataset/piece-plant-intron-exon-comparison-and-evolution-database-84874
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Service
    Description

    PIECE is a plant gene structure comparison and evolution database with 25 species. Annotated genes extracted from the species are classified based on the Pfam motif and phylogenetic trees are reconstructed for each gene category integrating exon-intron and protein motif information. Resources in this dataset:Resource Title: Web Page. File Name: Web Page, url: https://probes.pw.usda.gov/piece/index.php

  7. V

    Comparing Policy and Demographic Environments Across States

    • data.virginia.gov
    • gimi9.com
    • +1more
    html
    Updated Sep 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Data Archive on Child Abuse and Neglect (2025). Comparing Policy and Demographic Environments Across States [Dataset]. https://data.virginia.gov/dataset/comparing-policy-and-demographic-environments-across-states
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Sep 5, 2025
    Dataset provided by
    National Data Archive on Child Abuse and Neglect
    Description

    Child Welfare Policies and Demographic Characteristics: A Compilation of State-Level Data is a suite of datasets gathered from various sources. All datasets in this suite contain information about states. It is intended to be a resource for researchers doing policy studies in the areas of foster care, adoption, and child abuse, and is intended as a supplement to the AFCARS and NCANDS datasets. It consists of five studies, their data, and final reports (if any).The common thread linking this suite of datasets is that the level of analysis is always states. This information can be used to group or classify states in some domain, coupled with using the AFCARS or NCANDS data to explore how states or groups of states compare. The intention is that this process will increase the value of AFCARS and NCANDS for analyzing the effects of policy differences across states. Most of the data were gleaned from reports published by academic or public interest organizations, such as The Urban Institute, the North American Council on Adoptable Children, or the John F. Kennedy School of Government at Harvard University. Each of these reports is available at the organization's web site, and is included in the files that accompany this User Guide in PDF format. The value of this compilation is in providing the data in a form that is readily readable by statistical programs such as SAS, SPSS, and Stata, and in compiling in one place the descriptions of the variables and values contained in the reports. Other data in this suite were collected from the United States Bureau of the Census and Wikipedia, a web-based encyclopedia.

    Investigators: Hansen, Mary & Dineen, Michael

  8. p

    SmarterHome.ai - Compare Local Internet Deals Locations Data for Alabama,...

    • poidata.io
    csv, json
    Updated Nov 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Business Data Provider (2025). SmarterHome.ai - Compare Local Internet Deals Locations Data for Alabama, United States [Dataset]. https://poidata.io/brand-report/smarterhomeai-compare-local-internet-deals/united-states/alabama
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Nov 3, 2025
    Dataset authored and provided by
    Business Data Provider
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2025
    Area covered
    Alabama
    Variables measured
    Website URL, Phone Number, Review Count, Business Name, Email Address, Business Hours, Customer Rating, Business Address, Brand Affiliation, Geographic Coordinates
    Description

    Comprehensive dataset containing 6 verified SmarterHome.ai - Compare Local Internet Deals locations in Alabama, United States with complete contact information, ratings, reviews, and location data.

  9. f

    Comparison of Online Tools.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Apr 23, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bik, Holly M.; Goldstein, Miriam C. (2013). Comparison of Online Tools. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001721306
    Explore at:
    Dataset updated
    Apr 23, 2013
    Authors
    Bik, Holly M.; Goldstein, Miriam C.
    Description

    Comparison of Online Tools.

  10. u

    Data from: CottonGen Map Viewer

    • agdatacommons.nal.usda.gov
    • datasetcatalog.nlm.nih.gov
    • +3more
    bin
    Updated Feb 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Taein Lee; Sook Jung; Ksenija Gasic; Todd Campbell; Jing Yu; Jodi Humann; Heidi Hough; Dorrie Main (2024). CottonGen Map Viewer [Dataset]. https://agdatacommons.nal.usda.gov/articles/dataset/CottonGen_Map_Viewer/24853266
    Explore at:
    binAvailable download formats
    Dataset updated
    Feb 13, 2024
    Dataset provided by
    MainLab, Washington State University
    Authors
    Taein Lee; Sook Jung; Ksenija Gasic; Todd Campbell; Jing Yu; Jodi Humann; Heidi Hough; Dorrie Main
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    MapViewer is a graphical tool for viewing and comparing Gossypium spp. genetic maps. It includes dynamically scrollable maps, correspondence matrices, dot plots, links to details about map features, and exporting functionality. It was developed by the MainLab at Washington State University and is available for download for use in other Tripal databases. The query interface allows the user to select Species, Map, and Linkage Group options. Help information includes a video tutorial, user manual, and sample map, correspondence matrix, dot plot, and exported figures. Resources in this dataset:Resource Title: Website Pointer for CottonGen Map Viewer. File Name: Web Page, url: https://www.cottongen.org/MapViewer MapViewer is a graphical tool for viewing and comparing Gossypium spp. genetic maps. It includes dynamically scrollable maps, correspondence matrices, dot plots, links to details about map features, and exporting functionality. It was developed by the MainLab at Washington State University and is available for download for use in other Tripal databases. The query interface allows the user to select Species, Map, and Linkage Group options. Help information includes a video tutorial, user manual, and sample map, correspondence matrix, dot plot, and exported figures.

  11. w

    Federal/State Tribal Lands Comparison Web Map, Minnesota

    • data.wu.ac.at
    • gisdata.mn.gov
    html, jpeg, webapp
    Updated Nov 15, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Transportation Department (2017). Federal/State Tribal Lands Comparison Web Map, Minnesota [Dataset]. https://data.wu.ac.at/schema/gisdata_mn_gov/MjA4ZTdiZjMtMDc5NC00NDcxLWFhY2UtNGFiNWI4MTc1YjEy
    Explore at:
    webapp, html, jpegAvailable download formats
    Dataset updated
    Nov 15, 2017
    Dataset provided by
    Transportation Department
    Area covered
    Minnesota, 65938d552f50d6351b4530a2fa92c1384f7efa2c
    Description

    The Federal/State Tribal Data Comparison web map can be used to compare the reservation boundaries that appear on the Minnesota State Highway Map with the U.S. Census Bureau reservation boundaries. This map also shows off-reservation trust land owned by tribes. The map is for informational purposes only. It is not a land survey and does not contain coordinate correct data. Boundaries are not recognition, endorsement, or acceptance by MnDOT or the State of Minnesota.

  12. t

    Data from: Trivago Dataset

    • service.tib.eu
    • resodate.org
    Updated Jan 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Trivago Dataset [Dataset]. https://service.tib.eu/ldmservice/dataset/trivago-dataset
    Explore at:
    Dataset updated
    Jan 3, 2025
    Description

    The Trivago dataset is a real-world task in the travel metasearch domain. Users that are planning a business or leisure trip can use Trivago's website to compare accommodations and prices from various booking sites.

  13. s

    Mutual Funds Comparison Calculator – 2025 | Free and Fast Online Tool | How...

    • smartinvestello.com
    html
    Updated Sep 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Smart Investello (2025). Mutual Funds Comparison Calculator – 2025 | Free and Fast Online Tool | How to Use it - Data Table [Dataset]. https://smartinvestello.com/mutual-funds-comparison-calculator/
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Sep 2, 2025
    Dataset authored and provided by
    Smart Investello
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset extracted from the post Mutual Funds Comparison Calculator – 2025 | Free and Fast Online Tool | How to Use it on Smart Investello.

  14. a

    Structured Web Data Extraction Dataset (SWDE)

    • academictorrents.com
    bittorrent
    Updated Nov 29, 2015
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qiang Hao (2015). Structured Web Data Extraction Dataset (SWDE) [Dataset]. https://academictorrents.com/details/411576c7e80787e4b40452360f5f24acba9b5159
    Explore at:
    bittorrent(207314582)Available download formats
    Dataset updated
    Nov 29, 2015
    Dataset authored and provided by
    Qiang Hao
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    Motivation This dataset is a real-world web page collection used for research on the automatic extraction of structured data (e.g., attribute-value pairs of entities) from the Web. We hope it could serve as a useful benchmark for evaluating and comparing different methods for structured web data extraction. ## Contents of the Dataset Currently the dataset involves: 8 verticals with diverse semantics; 80 web sites (10 per vertical); 124,291 web pages (200 ~ 2,000 per web site), each containing a single data record with detailed information of an entity; 32 attributes (3 ~ 5 per vertical) associated with carefully labeled ground-truth of corresponding values in each web page. The goal of structured data extraction is to automatically identify the values of these attributes from web pages. The involved verticals are summarized as follows: |Vertical | #Sites| #Pages| #Attributes| Attributes| |—————-|———|—————

  15. Blog or Not Dataset

    • kaggle.com
    zip
    Updated Jul 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ozgurdogan (2021). Blog or Not Dataset [Dataset]. https://www.kaggle.com/ozgurdogan646/blog-or-not-dataset
    Explore at:
    zip(1839276 bytes)Available download formats
    Dataset updated
    Jul 5, 2021
    Authors
    ozgurdogan
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Blog or not dataset

    This dataset includes whether the page is a blog or not from the website urls. Most of the features are taken from this article [1]. You can review for detailed information. Information about features not included in this dataset will be added soon.

    GitHub Repo

    [1] Vrbančič, G., Fister Jr, I., & Podgorelec, V. (2020). Datasets for phishing websites detection. Data in Brief, 33, 106438.

  16. H

    Comparison of R1 and R2 Online Research Data Services

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Nov 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elizabeth Szkirpan (2022). Comparison of R1 and R2 Online Research Data Services [Dataset]. http://doi.org/10.7910/DVN/SHJABB
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 29, 2022
    Dataset provided by
    Harvard Dataverse
    Authors
    Elizabeth Szkirpan
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Compiled in mid-2022, this dataset contains the raw data file, randomized ranked lists of R1 and R2 research institutions, and files created to support data visualization for Elizabeth Szkirpan's 2022 study regarding availability of data services and research data information via university libraries for online users. Files are available in Microsoft Excel formats.

  17. f

    Comparison of missing values, ‘don’t know’ values and inconsistent values...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated May 21, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Van Hal, Guido; Van der Heyden, Johan; Braekman, Elise; Charafeddine, Rana; Demarest, Stefaan; Gisle, Lydia; Tafforeau, Jean; Berete, Finaba; Molenberghs, Geert; Drieskens, Sabine (2018). Comparison of missing values, ‘don’t know’ values and inconsistent values between the paper-and-pencil and web-based mode and number of data entry mistakes in the paper-and-pencil mode (n = 149). [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000729296
    Explore at:
    Dataset updated
    May 21, 2018
    Authors
    Van Hal, Guido; Van der Heyden, Johan; Braekman, Elise; Charafeddine, Rana; Demarest, Stefaan; Gisle, Lydia; Tafforeau, Jean; Berete, Finaba; Molenberghs, Geert; Drieskens, Sabine
    Description

    Comparison of missing values, ‘don’t know’ values and inconsistent values between the paper-and-pencil and web-based mode and number of data entry mistakes in the paper-and-pencil mode (n = 149).

  18. Portuguese Comparative Sentences: A Collection of Labeled Sentences on...

    • zenodo.org
    • live.european-language-grid.eu
    • +1more
    csv, json
    Updated Apr 19, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Kansaon; Daniel Kansaon; Michele A. Brandão; Michele A. Brandão; Julio C. S. Reis; Julio C. S. Reis; Matheus Barbosa; Breno Matos; Fabrício Benevenuto; Fabrício Benevenuto; Matheus Barbosa; Breno Matos (2021). Portuguese Comparative Sentences: A Collection of Labeled Sentences on Twitter and Buscapé [Dataset]. http://doi.org/10.5281/zenodo.4124410
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Apr 19, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Daniel Kansaon; Daniel Kansaon; Michele A. Brandão; Michele A. Brandão; Julio C. S. Reis; Julio C. S. Reis; Matheus Barbosa; Breno Matos; Fabrício Benevenuto; Fabrício Benevenuto; Matheus Barbosa; Breno Matos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    More and more customers demand online reviews of products and comments on the Web to make decisions about buying a product over another. In this context, sentiment analysis techniques constitute the traditional way to summarize a user’s opinions that criticizes or highlights the positive aspects of a product. Sentiment analysis of reviews usually relies on extracting positive and negative aspects of products, neglecting comparative opinions. Such opinions do not directly express a positive or negative view but contrast aspects of products from different competitors.

    Here, we present the first effort to study comparative opinions in Portuguese, creating two new Portuguese datasets with comparative sentences marked by three humans. This repository consists of three important files: (1) lexicon that contains words frequently used to make a comparison in Portuguese; (2) Twitter dataset with labeled comparative sentences; and (3) Buscapé dataset with labeled comparative sentences.

    The lexicon is a set of 176 words frequently used to express a comparative opinion in the Portuguese language. In these contexts, the lexicon is aggregated in a filter and used to build two sets of data with comparative sentences from two important contexts: (1) Social Network Online; and (2) Product reviews.

    For Twitter, we collected all Portuguese tweets published in Brazil on 2018/01/10 and filtered all tweets that contained at least one keyword present in the lexicon, obtaining 130,459 tweets. Our work is based on the sentence level. Thus, all sentences were extracted and a sample with 2,053 sentences was created, which was labeled for three human manuals, reaching an 83.2% agreement with Fleiss' Kappa coefficient. For Buscapé, a Brazilian website (https://www.buscape.com.br/) used to compare product prices on the web, the same methodology was conducted by creating a set of 2,754 labeled sentences, obtained from comments made in 2013. This dataset was labeled by three humans, reaching an agreement of 83.46% with the Fleiss Kappa coefficient.

    The Twitter dataset has 2,053 labeled sentences, of which 918 are comparative. The Buscapé dataset has 2,754 labeled sentences, of which 1,282 are comparative.

    The datasets contain these labeled properties:

    • text: the sentence extracted from the review comment.

    • entity_s1: the first entity compared in the sentence.

    • entity_s2: the second entity compared in the sentence.

    • keyword: the comparative keyword used in the sentence to express comparison.

    • preferred_entity: the preferred entity.

    • id_start: the keyword's initial position in the sentence.

    • id_end: the keyword's final position in the sentence.

    • type: the sentence label, which specifies whether the phrase is a comparison.

    Additional Information:

    1 - The sentences were separated using a sentence tokenizer.

    2 - If the compared entity is not specified, the field will receive a value: "_".

    3 - The property "type" can contain five values, they are:

    • 0: Non-comparative (Não Comparativa).

    • 1: Non-Equal-Gradable (Gradativa com Predileção).

    • 2: Equative (Equitativa).

    • 3: Superlative (Superlativa).

    • 4: Non-Equal-Gradable (Não Gradativa).

    If you use this data, please cite our paper as follows:

    "Daniel Kansaon, Michele A. Brandão, Julio C. S. Reis, Matheus Barbosa,Breno Matos, and Fabrício Benevenuto. 2020. Mining Portuguese Comparative Sentences in Online Reviews. In Brazilian Symposium on Multimedia and the Web (WebMedia ’20), November 30-December 4, 2020, São Luís, Brazil. ACM, New York, NY, USA, 8 pages. https://doi.org/10.1145/3428658.3431081"

    --------------

    Plus Information:

    We make the raw sentences available in the dataset to allow future work to test different pre-processing steps. Then, if you want to obtain the exact sentences used in the paper above, you must reproduce the pre-processing step described in the paper (Figure 2).

    For each sentence with more than one keyword in the dataset:

    • You need to extract three words before and three words after the comparative keyword, creating a new sentence that will receive the existing value in the “type” field as a label;
    • The original sentence will be divided into n new sentences. (n) is the number of keywords in the sentence;
    • The stopwords should not be accounted for as part of this range (3 words);

    Note that: the final processed sentence can have more than six words because the stopwords are not counted as part of the range.

  19. V

    Data from: Internet hand x-rays: A comparison of joint space narrowing and...

    • data.virginia.gov
    • catalog.data.gov
    html
    Updated Sep 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institutes of Health (2025). Internet hand x-rays: A comparison of joint space narrowing and erosion scores (Sharp/Genant) of plain versus digitized x-rays in rheumatoid arthritis patients [Dataset]. https://data.virginia.gov/dataset/internet-hand-x-rays-a-comparison-of-joint-space-narrowing-and-erosion-scores-sharp-genant-of-p
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Sep 6, 2025
    Dataset provided by
    National Institutes of Health
    Description

    Background The objective of the study is to examine the reliability of erosion and joint space narrowing scores derived from hand x-rays posted on the Internet compared to scores derived from original plain x-rays.

       Methods
       Left and right x-rays of the hands of 36 patients were first digitized and then posted in standard fashion to a secure Internet website. Both the plain and Internet x-rays were scored for erosions and joint space narrowing using the Sharp/Genant method. All scoring was completed in a blind and randomized manner. Agreement between plain and Internet x-ray scores was calculated using Lin's concordance correlations and Bland-Altman graphical representation.
    
    
       Results
       Erosion scores for plain x-rays showed almost perfect concordance with x-rays read on the Internet (concordance 0.887). However, joint space narrowing scores were only "fair" (concordance 0.365). Global scores demonstrated substantial concordance between plain and Internet readings (concordance 0.769). Hand x-rays with less disease involvement showed a tendency to be scored higher on the Internet versions than those with greater disease involvement. This was primarily evident in the joint space narrowing scores.
    
    
       Conclusions
       The Internet represents a valid medium for displaying and scoring hand x-rays of patients with RA. Higher scores from the Internet version may be related to better viewing conditions on the computer screen relative to the plain x-ray viewing, which did not include magnifying lens or bright light. The capability to view high quality x-rays on the Internet has the potential to facilitate information sharing, education, and encourage collaborative studies.
    
  20. Comparison Between 2019-2020 NSDUH State Prevalence Estimates

    • data.virginia.gov
    • healthdata.gov
    • +1more
    html
    Updated Sep 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Substance Abuse and Mental Health Services Administration (2025). Comparison Between 2019-2020 NSDUH State Prevalence Estimates [Dataset]. https://data.virginia.gov/dataset/comparison-between-2019-2020-nsduh-state-prevalence-estimates
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Sep 6, 2025
    Dataset provided by
    Substance Abuse and Mental Health Services Administrationhttps://www.samhsa.gov/
    Description

    State estimates for these years are no longer available due to methodological concerns with combining 2019 and 2020 data. We apologize for any inconvenience or confusion this may causeBecause of the COVID-19 pandemic, most respondents answered the survey via the web in Quarter 4 of 2020, even though all responses in Quarter 1 were from in-person interviews. It is known that people may respond to the survey differently while taking it online, thus introducing what is called a mode effect.When the state estimates were released, it was assumed that the mode effect was similar for different groups of people. However, later analyses have shown that this assumption should not be made. Because of these analyses, along with concerns about the rapid societal changes in 2020, it was determined that averages across the two years could be misleading.For more detail on this decision, see the 2019-2020state data page.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2025). Dataset for: Same Question, Different Answers? An Empirical Comparison of Web Data and Traditional Data - Dataset - B2FIND [Dataset]. http://demo-b2find.dkrz.de/dataset/bc684dad-c657-5013-b2d4-cc35b4a2e7ee

Dataset for: Same Question, Different Answers? An Empirical Comparison of Web Data and Traditional Data - Dataset - B2FIND

Explore at:
Dataset updated
Sep 22, 2025
Description

Psychological scientists increasingly study web data, such as user ratings or social media postings. However, whether research relying on such web data leads to the same conclusions as research based on traditional data is largely unknown. To test this, we (re)analyzed three datasets, thereby comparing web data with lab and online survey data. We calculated correlations across these different datasets (Study 1) and investigated identical, illustrative research questions in each dataset (Studies 2 to 4). Our results suggest that web and traditional data are not fundamentally different and usually lead to similar conclusions, but also that it is important to consider differences between data types such as populations and research settings. Web data can be a valuable tool for psychologists when accounting for such differences, as it allows for testing established research findings in new contexts, complementing them with insights from novel data sources.

Search
Clear search
Close search
Google apps
Main menu