100+ datasets found
  1. Internet Use Characteristics of 27 Participants Who Self-Reported Problem...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wen Li; Jennifer E. O’Brien; Susan M. Snyder; Matthew O. Howard (2023). Internet Use Characteristics of 27 Participants Who Self-Reported Problem Internet Use. [Dataset]. http://doi.org/10.1371/journal.pone.0117372.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Wen Li; Jennifer E. O’Brien; Susan M. Snyder; Matthew O. Howard
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    YDQ ≥ 5 indicates Internet addiction. YDQ scores of 3 or 4 = potential IA. CIUS ≥ 21 indicates compulsive Internet use.Internet Use Characteristics of 27 Participants Who Self-Reported Problem Internet Use.

  2. Number of internet and social media users worldwide 2025

    • statista.com
    • abripper.com
    Updated Oct 16, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Number of internet and social media users worldwide 2025 [Dataset]. https://www.statista.com/statistics/617136/digital-population-worldwide/
    Explore at:
    Dataset updated
    Oct 16, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    World
    Description

    As of October 2025, 6.04 billion individuals worldwide were internet users, which amounted to 73.2 percent of the global population. Of this total, 5.66 billion, or 68.7 percent of the world's population, were social media users. Global internet usage Connecting billions of people worldwide, the internet is a core pillar of the modern information society. Northern Europe ranked first among worldwide regions by the share of the population using the internet in 2025. In the Netherlands, Norway, and Saudi Arabia, 99 percent of the population used the internet as of February 2025. North Korea was at the opposite end of the spectrum, with virtually no internet usage penetration among the general population, ranking last worldwide. Eastern Asia was home to the largest number of online users worldwide—over 1.34 billion at the latest count. Southern Asia ranked second, with around 1.2 billion internet users. China, India, and the United States rank ahead of other countries worldwide by the number of internet users. Worldwide internet user demographics As of 2024, the share of female internet users worldwide was 65 percent, five percent less than that of men. Gender disparity in internet usage was bigger in African countries, with around a 10-percent difference. Worldwide regions, like the Commonwealth of Independent States and Europe, showed a smaller usage gap between these two genders. As of 2024, global internet usage was higher among individuals between 15 and 24 years old across all regions, with young people in Europe representing the most considerable usage penetration, 98 percent. In comparison, the worldwide average for the age group of 15 to 24 years was 79 percent. The income level of the countries was also an essential factor for internet access, as 93 percent of the population of the countries with high income reportedly used the internet, as opposed to only 27 percent of the low-income markets.

  3. f

    Data_Sheet_1_Investigating links between Internet literacy, Internet use,...

    • frontiersin.figshare.com
    • datasetcatalog.nlm.nih.gov
    docx
    Updated Sep 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qiaolei Jiang; Zonghai Chen; Zizhong Zhang; Can Zuo (2023). Data_Sheet_1_Investigating links between Internet literacy, Internet use, and Internet addiction among Chinese youth and adolescents in the digital age.docx [Dataset]. http://doi.org/10.3389/fpsyt.2023.1233303.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Sep 7, 2023
    Dataset provided by
    Frontiers
    Authors
    Qiaolei Jiang; Zonghai Chen; Zizhong Zhang; Can Zuo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionIn current digital era, adolescents’ Internet use has increased exponentially, with the Internet playing a more and more important role in their education and entertainment. However, due to the ongoing cognitive, emotion, and social development processes, youth and adolescents are more vulnerable to Internet addiction. Attention has been paid to the increased use of Internet during the COVID-19 pandemic and the influence of Internet literacy in prevention and intervention of Internet addiction.MethodsThe present study proposes a conceptual model to investigate the links between Internet literacy, Internet use of different purpose and duration, and Internet addiction among Chinese youth and adolescents. In this study, N = 2,276 adolescents studying in primary and secondary schools in East China were recruited, and they completed self-reports on sociodemographic characteristics, Internet literacy scale, Internet use, and Internet addiction scale.ResultsThe results showed a significant relationship between Internet use and Internet addiction. To be specific, the duration of Internet use significantly and positively affected Internet addiction. With different dimensions of Internet literacy required, entertainment-oriented Internet use had positive impact on Internet addiction, while education-oriented Internet use exerted negative effects on Internet addiction. As for Internet literacy, knowledge and skills for Internet (positively) and Internet self-management (negatively) significantly influenced the likelihood of Internet addiction.DiscussionThe findings suggest that Internet overuse increases the risk of Internet addiction in youth and adolescents, while entertainment-oriented rather than education-oriented Internet use is addictive. The role of Internet literacy is complicated, with critical Internet literacy preventing the development of Internet addiction among youth and adolescents, while functional Internet literacy increasing the risk.

  4. Global number of internet users 2005-2024

    • statista.com
    Updated May 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Global number of internet users 2005-2024 [Dataset]. https://www.statista.com/statistics/273018/number-of-internet-users-worldwide/
    Explore at:
    Dataset updated
    May 6, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    World
    Description

    As of 2024, the estimated number of internet users worldwide was 5.5 billion, up from 5.3 billion in the previous year. This share represents 68 percent of the global population. Internet access around the world Easier access to computers, the modernization of countries worldwide, and increased utilization of smartphones have allowed people to use the internet more frequently and conveniently. However, internet penetration often pertains to the current state of development regarding communications networks. As of January 2023, there were approximately 1.05 billion total internet users in China and 692 million total internet users in the United States. Online activities Social networking is one of the most popular online activities worldwide, and Facebook is the most popular online network based on active usage. As of the fourth quarter of 2023, there were over 3.07 billion monthly active Facebook users, accounting for well more than half of the internet users worldwide. Connecting with family and friends, expressing opinions, entertainment, and online shopping are amongst the most popular reasons for internet usage.

  5. f

    Data from: Characteristics of Internet Addiction/Pathological Internet Use...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Feb 3, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Howard, Matthew O.; O’Brien, Jennifer E.; Snyder, Susan M.; Li, Wen (2015). Characteristics of Internet Addiction/Pathological Internet Use in U.S. University Students: A Qualitative-Method Investigation [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001897105
    Explore at:
    Dataset updated
    Feb 3, 2015
    Authors
    Howard, Matthew O.; O’Brien, Jennifer E.; Snyder, Susan M.; Li, Wen
    Area covered
    United States
    Description

    Studies have identified high rates and severe consequences of Internet Addiction/Pathological Internet Use (IA/PIU) in university students. However, most research concerning IA/PIU in U.S. university students has been conducted within a quantitative research paradigm, and frequently fails to contextualize the problem of IA/PIU. To address this gap, we conducted an exploratory qualitative study using the focus group approach and examined 27 U.S. university students who self-identified as intensive Internet users, spent more than 25 hours/week on the Internet for non-school or non-work-related activities and who reported Internet-associated health and/or psychosocial problems. Students completed two IA/PIU measures (Young’s Diagnostic Questionnaire and the Compulsive Internet Use Scale) and participated in focus groups exploring the natural history of their Internet use; preferred online activities; emotional, interpersonal, and situational triggers for intensive Internet use; and health and/or psychosocial consequences of their Internet overuse. Students’ self-reports of Internet overuse problems were consistent with results of standardized measures. Students first accessed the Internet at an average age of 9 (SD = 2.7), and first had a problem with Internet overuse at an average age of 16 (SD = 4.3). Sadness and depression, boredom, and stress were common triggers of intensive Internet use. Social media use was nearly universal and pervasive in participants’ lives. Sleep deprivation, academic under-achievement, failure to exercise and to engage in face-to-face social activities, negative affective states, and decreased ability to concentrate were frequently reported consequences of intensive Internet use/Internet overuse. IA/PIU may be an underappreciated problem among U.S. university students and warrants additional research.

  6. Attitudes towards the internet in Australia 2025

    • statista.com
    Updated Apr 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Umair Bashir (2025). Attitudes towards the internet in Australia 2025 [Dataset]. https://www.statista.com/topics/1145/internet-usage-worldwide/
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Umair Bashir
    Description

    When asked about "Attitudes towards the internet", most Australian respondents pick "It is important to me to have mobile internet access in any place" as an answer. 55 percent did so in our online survey in 2025. Looking to gain valuable insights about users of internet providers worldwide? Check out our reports on consumers who use internet providers. These reports give readers a thorough picture of these customers, including their identities, preferences, opinions, and methods of communication.

  7. Average daily time spent on social media worldwide 2012-2025

    • statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista, Average daily time spent on social media worldwide 2012-2025 [Dataset]. https://www.statista.com/statistics/433871/daily-social-media-usage-worldwide/
    Explore at:
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    As of February 2025, the average daily social media usage of internet users worldwide amounted to 141 minutes per day, down from 143 minutes in the previous year. Currently, the country with the most time spent on social media per day is Brazil, with online users spending an average of 3 hours and 49 minutes on social media each day. In comparison, the daily time spent with social media in the U.S. was just 2 hours and 16 minutes. Global social media usage Currently, the global social network penetration rate is 62.3 percent. Northern Europe had an 81.7 percent social media penetration rate, topping the ranking of global social media usage by region. Eastern and Middle Africa closed the ranking with 10.1 and 9.6 percent usage reach, respectively. People access social media for a variety of reasons. Users like to find funny or entertaining content and enjoy sharing photos and videos with friends, but mainly use social media to stay in touch with current events and friends. Global impact of social media Social media has a wide-reaching and significant impact on not only online activities but also offline behavior and life in general. During a global online user survey in February 2019, a significant share of respondents stated that social media had increased their access to information, ease of communication, and freedom of expression. On the flip side, respondents also felt that social media had worsened their personal privacy, increased polarization in politics, and heightened everyday distractions.

  8. Same News - Different Sources

    • kaggle.com
    zip
    Updated Oct 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Same News - Different Sources [Dataset]. https://www.kaggle.com/datasets/thedevastator/same-news-different-sources
    Explore at:
    zip(262582 bytes)Available download formats
    Dataset updated
    Oct 28, 2022
    Authors
    The Devastator
    Description

    Same News Different Sources

    How different sources report on the same events

    About this dataset

    Do you ever feel like you're being inundated with news from all sides, and you can't keep up? Well, you're not alone. In today's age of social media and 24-hour news cycles, it can be difficult to know what's going on in the world. And with so many different news sources to choose from, it can be hard to know who to trust.

    That's where this dataset comes in. It captures data related to individuals' Sentiment Analysis toward different news sources. The data was collected by administering a survey to individuals who use different news sources. The survey responses were then analyzed to obtain the sentiment score for each news source.

    So if you're feeling overwhelmed by the news, don't worry – this dataset has you covered. With its insights on which news sources are trustworthy and which ones aren't, you'll be able to make informed decisions about what to read – and what to skip

    How to use the dataset

    The Twitter Sentiment Analysis dataset can be used to analyze the impact of social media on news consumption. This data can be used to study how individuals' sentiments towards different news sources vary based on the source they use. The dataset can also be used to study how different factors, such as the time of day or the topic of the news, affect an individual's sentiments

    Research Ideas

    • Identify which news sources are most trusted by the public.
    • Understand what topics are most important to the public.
    • Understand how different news sources report on the same issue

    Columns

    File: news.csv | Column name | Description | |:-----------------------|:------------------------------------------------------| | **** | | | Title | The title of the news article. (String) | | Date | The date the news article was published. (Date) | | Time | The time the news article was published. (Time) | | Score | The sentiment score of the news article. (Float) | | Number of Comments | The number of comments on the news article. (Integer) |

    File: news_api.csv | Column name | Description | |:--------------|:------------------------------------------------| | **** | | | Title | The title of the news article. (String) | | Date | The date the news article was published. (Date) | | Source | The news source the article is from. (String) |

    File: politics.csv | Column name | Description | |:-----------------------|:------------------------------------------------------| | **** | | | Title | The title of the news article. (String) | | Date | The date the news article was published. (Date) | | Time | The time the news article was published. (Time) | | Score | The sentiment score of the news article. (Float) | | Number of Comments | The number of comments on the news article. (Integer) |

    File: sports.csv | Column name | Description | |:-----------------------|:------------------------------------------------------| | **** | | | Title | The title of the news article. (String) | | Date | The date the news article was published. (Date) | | Time | The time the news article was published. (Time) | | Score | The sentiment score of the news article. (Float) | | Number of Comments | The number of comments on the news article. (Integer) |

    File: television.csv | Column name | Description | |:-----------------------|:------------------------------------------------------| | **** | | | Title | The title of the news article. (String) | | Date | The date the news article was published. (Date) | | Time | The time the news article was published. (Time) | | Score | The sentiment score of the news article. (Float) | | Number of Comments | The number of comments on the news article. (Integer) |

    File: trending.csv | Column name | Description ...

  9. D

    Replication Data for: Social internet use by people with ID

    • dataverse.nl
    csv, pdf, xlsx
    Updated Mar 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hannah Van Alem; Hannah Van Alem; Noud Frielink; Noud Frielink; Petri J. C. M. Embregts; Petri J. C. M. Embregts (2025). Replication Data for: Social internet use by people with ID [Dataset]. http://doi.org/10.34894/EFDBZW
    Explore at:
    xlsx(2460512), pdf(679873), csv(5826122), pdf(177390), xlsx(146171), xlsx(113455), pdf(173046)Available download formats
    Dataset updated
    Mar 24, 2025
    Dataset provided by
    DataverseNL
    Authors
    Hannah Van Alem; Hannah Van Alem; Noud Frielink; Noud Frielink; Petri J. C. M. Embregts; Petri J. C. M. Embregts
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Data for overview of peer-reviewed articles up to November 2024 on the reasons for social internet usage by people with intellectual disabilities. RQ: Why do people with ID engage in social internet use?

  10. Data from: Internet access - households and individuals

    • ons.gov.uk
    • cy.ons.gov.uk
    xlsx
    Updated Aug 7, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics (2020). Internet access - households and individuals [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/householdcharacteristics/homeinternetandsocialmediausage/datasets/internetaccesshouseholdsandindividualsreferencetables
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Aug 7, 2020
    Dataset provided by
    Office for National Statisticshttp://www.ons.gov.uk/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Annual data on internet usage in Great Britain, including frequency of internet use, internet activities and internet purchasing.

  11. Age distribution of internet users worldwide 2024

    • statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista, Age distribution of internet users worldwide 2024 [Dataset]. https://www.statista.com/statistics/272365/age-distribution-of-internet-users-worldwide/
    Explore at:
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Feb 2024
    Area covered
    Worldwide
    Description

    As of February 2024, over a third of online users worldwide were aged between 25 and 34 years. Website visitors in this age bracket constituted the biggest group of online users worldwide. Also, 19 percent of global online users were aged 18 to 24 years. The global digital population aged 65 or older represented approximately 4.2 percent of all internet users worldwide. Social media usage and Meta Social media is a major driver of internet use, with a global penetration rate of 62.2 percent. On average, internet users spend 143 minutes per day on social media, highlighting its significant impact on daily online activities. The usage of social media is mostly dominated by Meta platforms, which own four of the largest social media platforms. Facebook leads the ranking with over three billion active users, followed by Instagram and WhatsApp. Instagram's global popularity Meta’s social video platform, Instagram, had long been one of the most engaging social media platforms worldwide, and it was projected to reach 1.44 billion monthly active users. Instagram was particularly favored by users aged 18 to 34, thanks to its ability to offer a variety of interactive content, from images and carousels. This diverse range of content types was a key factor in its popularity among its young user base.

  12. w

    Data Use in Academia Dataset

    • datacatalog.worldbank.org
    csv, utf-8
    Updated Nov 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Semantic Scholar Open Research Corpus (S2ORC) (2023). Data Use in Academia Dataset [Dataset]. https://datacatalog.worldbank.org/search/dataset/0065200/data_use_in_academia_dataset
    Explore at:
    utf-8, csvAvailable download formats
    Dataset updated
    Nov 27, 2023
    Dataset provided by
    Semantic Scholar Open Research Corpus (S2ORC)
    Brian William Stacy
    License

    https://datacatalog.worldbank.org/public-licenses?fragment=cchttps://datacatalog.worldbank.org/public-licenses?fragment=cc

    Description

    This dataset contains metadata (title, abstract, date of publication, field, etc) for around 1 million academic articles. Each record contains additional information on the country of study and whether the article makes use of data. Machine learning tools were used to classify the country of study and data use.


    Our data source of academic articles is the Semantic Scholar Open Research Corpus (S2ORC) (Lo et al. 2020). The corpus contains more than 130 million English language academic papers across multiple disciplines. The papers included in the Semantic Scholar corpus are gathered directly from publishers, from open archives such as arXiv or PubMed, and crawled from the internet.


    We placed some restrictions on the articles to make them usable and relevant for our purposes. First, only articles with an abstract and parsed PDF or latex file are included in the analysis. The full text of the abstract is necessary to classify the country of study and whether the article uses data. The parsed PDF and latex file are important for extracting important information like the date of publication and field of study. This restriction eliminated a large number of articles in the original corpus. Around 30 million articles remain after keeping only articles with a parsable (i.e., suitable for digital processing) PDF, and around 26% of those 30 million are eliminated when removing articles without an abstract. Second, only articles from the year 2000 to 2020 were considered. This restriction eliminated an additional 9% of the remaining articles. Finally, articles from the following fields of study were excluded, as we aim to focus on fields that are likely to use data produced by countries’ national statistical system: Biology, Chemistry, Engineering, Physics, Materials Science, Environmental Science, Geology, History, Philosophy, Math, Computer Science, and Art. Fields that are included are: Economics, Political Science, Business, Sociology, Medicine, and Psychology. This third restriction eliminated around 34% of the remaining articles. From an initial corpus of 136 million articles, this resulted in a final corpus of around 10 million articles.


    Due to the intensive computer resources required, a set of 1,037,748 articles were randomly selected from the 10 million articles in our restricted corpus as a convenience sample.


    The empirical approach employed in this project utilizes text mining with Natural Language Processing (NLP). The goal of NLP is to extract structured information from raw, unstructured text. In this project, NLP is used to extract the country of study and whether the paper makes use of data. We will discuss each of these in turn.


    To determine the country or countries of study in each academic article, two approaches are employed based on information found in the title, abstract, or topic fields. The first approach uses regular expression searches based on the presence of ISO3166 country names. A defined set of country names is compiled, and the presence of these names is checked in the relevant fields. This approach is transparent, widely used in social science research, and easily extended to other languages. However, there is a potential for exclusion errors if a country’s name is spelled non-standardly.


    The second approach is based on Named Entity Recognition (NER), which uses machine learning to identify objects from text, utilizing the spaCy Python library. The Named Entity Recognition algorithm splits text into named entities, and NER is used in this project to identify countries of study in the academic articles. SpaCy supports multiple languages and has been trained on multiple spellings of countries, overcoming some of the limitations of the regular expression approach. If a country is identified by either the regular expression search or NER, it is linked to the article. Note that one article can be linked to more than one country.


    The second task is to classify whether the paper uses data. A supervised machine learning approach is employed, where 3500 publications were first randomly selected and manually labeled by human raters using the Mechanical Turk service (Paszke et al. 2019).[1] To make sure the human raters had a similar and appropriate definition of data in mind, they were given the following instructions before seeing their first paper:


    Each of these documents is an academic article. The goal of this study is to measure whether a specific academic article is using data and from which country the data came.

    There are two classification tasks in this exercise:

    1. identifying whether an academic article is using data from any country

    2. Identifying from which country that data came.

    For task 1, we are looking specifically at the use of data. Data is any information that has been collected, observed, generated or created to produce research findings. As an example, a study that reports findings or analysis using a survey data, uses data. Some clues to indicate that a study does use data includes whether a survey or census is described, a statistical model estimated, or a table or means or summary statistics is reported.

    After an article is classified as using data, please note the type of data used. The options are population or business census, survey data, administrative data, geospatial data, private sector data, and other data. If no data is used, then mark "Not applicable". In cases where multiple data types are used, please click multiple options.[2]

    For task 2, we are looking at the country or countries that are studied in the article. In some cases, no country may be applicable. For instance, if the research is theoretical and has no specific country application. In some cases, the research article may involve multiple countries. In these cases, select all countries that are discussed in the paper.

    We expect between 10 and 35 percent of all articles to use data.


    The median amount of time that a worker spent on an article, measured as the time between when the article was accepted to be classified by the worker and when the classification was submitted was 25.4 minutes. If human raters were exclusively used rather than machine learning tools, then the corpus of 1,037,748 articles examined in this study would take around 50 years of human work time to review at a cost of $3,113,244, which assumes a cost of $3 per article as was paid to MTurk workers.


    A model is next trained on the 3,500 labelled articles. We use a distilled version of the BERT (bidirectional Encoder Representations for transformers) model to encode raw text into a numeric format suitable for predictions (Devlin et al. (2018)). BERT is pre-trained on a large corpus comprising the Toronto Book Corpus and Wikipedia. The distilled version (DistilBERT) is a compressed model that is 60% the size of BERT and retains 97% of the language understanding capabilities and is 60% faster (Sanh, Debut, Chaumond, Wolf 2019). We use PyTorch to produce a model to classify articles based on the labeled data. Of the 3,500 articles that were hand coded by the MTurk workers, 900 are fed to the machine learning model. 900 articles were selected because of computational limitations in training the NLP model. A classification of “uses data” was assigned if the model predicted an article used data with at least 90% confidence.


    The performance of the models classifying articles to countries and as using data or not can be compared to the classification by the human raters. We consider the human raters as giving us the ground truth. This may underestimate the model performance if the workers at times got the allocation wrong in a way that would not apply to the model. For instance, a human rater could mistake the Republic of Korea for the Democratic People’s Republic of Korea. If both humans and the model perform the same kind of errors, then the performance reported here will be overestimated.


    The model was able to predict whether an article made use of data with 87% accuracy evaluated on the set of articles held out of the model training. The correlation between the number of articles written about each country using data estimated under the two approaches is given in the figure below. The number of articles represents an aggregate total of

  13. Internet Penetration in Percentage

    • figshare.com
    xlsx
    Updated May 18, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matheus Lotto (2021). Internet Penetration in Percentage [Dataset]. http://doi.org/10.6084/m9.figshare.14614581.v2
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 18, 2021
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Matheus Lotto
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Raw Data of manuscript: "Social isolation intensified the interests in toothache-related digital information during the COVID-19 pandemic"

  14. Internet usage frequency in Germany in 2019, by activity

    • statista.com
    Updated Nov 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Internet usage frequency in Germany in 2019, by activity [Dataset]. https://www.statista.com/statistics/1188894/internet-usage-frequency-activity-germany/
    Explore at:
    Dataset updated
    Nov 27, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Mar 26, 2019 - Apr 2, 2019
    Area covered
    Germany
    Description

    In 2019, ** percent of respondents used the internet almost daily. This survey depicts the frequency of online activities in Germany in 2019. Other popular daily activities included reading articles and posts online, as well as using social media.

  15. f

    Stimation results for different internet usage modes.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Aug 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Huang, Huan; Li, Xiaodi; Qin, Dongxue; Ma, Zhifei; Zhang, Xiangmin (2024). Stimation results for different internet usage modes. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001288762
    Explore at:
    Dataset updated
    Aug 30, 2024
    Authors
    Huang, Huan; Li, Xiaodi; Qin, Dongxue; Ma, Zhifei; Zhang, Xiangmin
    Description

    Stimation results for different internet usage modes.

  16. Number of global social network users 2017-2028

    • statista.com
    • de.statista.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon, Number of global social network users 2017-2028 [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    How many people use social media?

                  Social media usage is one of the most popular online activities. In 2024, over five billion people were using social media worldwide, a number projected to increase to over six billion in 2028.
    
                  Who uses social media?
                  Social networking is one of the most popular digital activities worldwide and it is no surprise that social networking penetration across all regions is constantly increasing. As of January 2023, the global social media usage rate stood at 59 percent. This figure is anticipated to grow as lesser developed digital markets catch up with other regions
                  when it comes to infrastructure development and the availability of cheap mobile devices. In fact, most of social media’s global growth is driven by the increasing usage of mobile devices. Mobile-first market Eastern Asia topped the global ranking of mobile social networking penetration, followed by established digital powerhouses such as the Americas and Northern Europe.
    
                  How much time do people spend on social media?
                  Social media is an integral part of daily internet usage. On average, internet users spend 151 minutes per day on social media and messaging apps, an increase of 40 minutes since 2015. On average, internet users in Latin America had the highest average time spent per day on social media.
    
                  What are the most popular social media platforms?
                  Market leader Facebook was the first social network to surpass one billion registered accounts and currently boasts approximately 2.9 billion monthly active users, making it the most popular social network worldwide. In June 2023, the top social media apps in the Apple App Store included mobile messaging apps WhatsApp and Telegram Messenger, as well as the ever-popular app version of Facebook.
    
  17. f

    Data from: Diagnostic Criteria for Problematic Internet Use among U.S....

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Jan 18, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Howard, Matthew O.; O’Brien, Jennifer E.; Li, Wen; Snyder, Susan M. (2016). Diagnostic Criteria for Problematic Internet Use among U.S. University Students: A Mixed-Methods Evaluation [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001533824
    Explore at:
    Dataset updated
    Jan 18, 2016
    Authors
    Howard, Matthew O.; O’Brien, Jennifer E.; Li, Wen; Snyder, Susan M.
    Description

    Empirical studies have identified increasing rates of problematic Internet use worldwide and a host of related negative consequences. However, researchers disagree as to whether problematic Internet use is a subtype of behavioral addiction. Thus, there are not yet widely accepted and validated diagnostic criteria for problematic Internet use. To address this gap, we used mixed-methods to examine the extent to which signs and symptoms of problematic Internet use mirror DSM-5 diagnostic criteria for substance use disorder, gambling disorder, and Internet gaming disorder. A total of 27 university students, who self-identified as intensive Internet users and who reported Internet-use-associated health and/or psychosocial problems were recruited. Students completed two measures that assess problematic Internet use (Young’s Diagnostic Questionnaire and the Compulsive Internet Use Scale) and participated in focus groups exploring their experiences with problematic Internet use. Results of standardized measures and focus group discussions indicated substantial overlap between students’ experiences of problematic Internet use and the signs and symptoms reflected in the DSM-5 criteria for substance use disorder, gambling disorder, and Internet gaming disorder. These signs and symptoms included: a) use Internet longer than intended, b) preoccupation with the Internet, c) withdrawal symptoms when unable to access the Internet, d) unsuccessful attempts to stop or reduce Internet use, e) craving, f) loss of interest in hobbies or activities other than the Internet, g) excessive Internet use despite the knowledge of related problems, g) use of the Internet to escape or relieve a negative mood, and h) lying about Internet use. Tolerance, withdrawal symptoms, and recurrent Internet use in hazardous situations were uniquely manifested in the context of problematic Internet use. Implications for research and practice are discussed.

  18. f

    Average treatment effect of internet usage on farmers’ adoption behavior.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Aug 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qin, Dongxue; Li, Xiaodi; Ma, Zhifei; Huang, Huan; Zhang, Xiangmin (2024). Average treatment effect of internet usage on farmers’ adoption behavior. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001288723
    Explore at:
    Dataset updated
    Aug 30, 2024
    Authors
    Qin, Dongxue; Li, Xiaodi; Ma, Zhifei; Huang, Huan; Zhang, Xiangmin
    Description

    Average treatment effect of internet usage on farmers’ adoption behavior.

  19. Digital divide among people with disabilities: Analysis of data from a...

    • plos.figshare.com
    xlsx
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mariusz Duplaga (2023). Digital divide among people with disabilities: Analysis of data from a nationwide study for determinants of Internet use and activities performed online [Dataset]. http://doi.org/10.1371/journal.pone.0179825
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Mariusz Duplaga
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionThe Internet is both an opportunity as well as a challenge for people with disabilities. However, this segment of the population is usually indicated among social groups experiencing digital divide. The study is focused on the analysis of factors determining Internet usage and undertaking specific activities online among people with disabilities based on a nationwide study performed in 2013 in Poland.MethodsSecondary analysis was performed on the data of persons who declared disability status in 2013 “Social Diagnosis” study. Multivariate logistic regression models were developed for the use of the Internet and performing three types of activities online.ResultsAmong 3,556 respondents with disability 51.02% were females, 25.19% 65 years of age and over and 33.05% were Internet users. The predictors of Internet usage included the degree of disability, place of residence, level of education, marital status, occupational status, net income, use of health care service and the use of mobile phone. The odds ratio that a person with disability belonging to the oldest category will use the Internet was only 0.04 (95% CI 0.02–0.09), when compared to the youngest category. The odds that a person with disability from the highest category of education will use the Internet were 18 times higher than in the case of persons with only basic education (OR 18.17, 95% CI 11.70–28.21). Common predictors of online activities (accessing websites of public institutions, checking and sending emails, publishing own content on the Internet) included age category and net income.ConclusionsPeople with disabilities in Poland are facing a significant digital divide. The factors determining the use of the Internet in this group are similar to those of the general population. On the other hand, people with disabilities who are active online, access diversified types of services including presentation of their own content online.

  20. Data from: WikiReddit: Tracing Information and Attention Flows Between...

    • zenodo.org
    bin
    Updated May 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Patrick Gildersleve; Patrick Gildersleve; Anna Beers; Anna Beers; Viviane Ito; Viviane Ito; Agustin Orozco; Agustin Orozco; Francesca Tripodi; Francesca Tripodi (2025). WikiReddit: Tracing Information and Attention Flows Between Online Platforms [Dataset]. http://doi.org/10.5281/zenodo.14653265
    Explore at:
    binAvailable download formats
    Dataset updated
    May 4, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Patrick Gildersleve; Patrick Gildersleve; Anna Beers; Anna Beers; Viviane Ito; Viviane Ito; Agustin Orozco; Agustin Orozco; Francesca Tripodi; Francesca Tripodi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 15, 2025
    Description

    Preprint

    Gildersleve, P., Beers, A., Ito, V., Orozco, A., & Tripodi, F. (2025). WikiReddit: Tracing Information and Attention Flows Between Online Platforms. arXiv [Cs.CY]. https://doi.org/10.48550/arXiv.2502.04942
    Accepted at the International AAAI Conference on Web and Social Media (ICWSM) 2025

    Abstract

    The World Wide Web is a complex interconnected digital ecosystem, where information and attention flow between platforms and communities throughout the globe. These interactions co-construct how we understand the world, reflecting and shaping public discourse. Unfortunately, researchers often struggle to understand how information circulates and evolves across the web because platform-specific data is often siloed and restricted by linguistic barriers. To address this gap, we present a comprehensive, multilingual dataset capturing all Wikipedia links shared in posts and comments on Reddit from 2020 to 2023, excluding those from private and NSFW subreddits. Each linked Wikipedia article is enriched with revision history, page view data, article ID, redirects, and Wikidata identifiers. Through a research agreement with Reddit, our dataset ensures user privacy while providing a query and ID mechanism that integrates with the Reddit and Wikipedia APIs. This enables extended analyses for researchers studying how information flows across platforms. For example, Reddit discussions use Wikipedia for deliberation and fact-checking which subsequently influences Wikipedia content, by driving traffic to articles or inspiring edits. By analyzing the relationship between information shared and discussed on these platforms, our dataset provides a foundation for examining the interplay between social media discourse and collaborative knowledge consumption and production.

    Datasheet

    Motivation

    The motivations for this dataset stem from the challenges researchers face in studying the flow of information across the web. While the World Wide Web enables global communication and collaboration, data silos, linguistic barriers, and platform-specific restrictions hinder our ability to understand how information circulates, evolves, and impacts public discourse. Wikipedia and Reddit, as major hubs of knowledge sharing and discussion, offer an invaluable lens into these processes. However, without comprehensive data capturing their interactions, researchers are unable to fully examine how platforms co-construct knowledge. This dataset bridges this gap, providing the tools needed to study the interconnectedness of social media and collaborative knowledge systems.

    Composition

    WikiReddit, a comprehensive dataset capturing all Wikipedia mentions (including links) shared in posts and comments on Reddit from 2020 to 2023, excluding those from private and NSFW (not safe for work) subreddits. The SQL database comprises 336K total posts, 10.2M comments, 1.95M unique links, and 1.26M unique articles spanning 59 languages on Reddit and 276 Wikipedia language subdomains. Each linked Wikipedia article is enriched with its revision history and page view data within a ±10-day window of its posting, as well as article ID, redirects, and Wikidata identifiers. Supplementary anonymous metadata from Reddit posts and comments further contextualizes the links, offering a robust resource for analysing cross-platform information flows, collective attention dynamics, and the role of Wikipedia in online discourse.

    Collection Process

    Data was collected from the Reddit4Researchers and Wikipedia APIs. No personally identifiable information is published in the dataset. Data from Reddit to Wikipedia is linked via the hyperlink and article titles appearing in Reddit posts.

    Preprocessing/cleaning/labeling

    Extensive processing with tools such as regex was applied to the Reddit post/comment text to extract the Wikipedia URLs. Redirects for Wikipedia URLs and article titles were found through the API and mapped to the collected data. Reddit IDs are hashed with SHA-256 for post/comment/user/subreddit anonymity.

    Uses

    We foresee several applications of this dataset and preview four here. First, Reddit linking data can be used to understand how attention is driven from one platform to another. Second, Reddit linking data can shed light on how Wikipedia's archive of knowledge is used in the larger social web. Third, our dataset could provide insights into how external attention is topically distributed across Wikipedia. Our dataset can help extend that analysis into the disparities in what types of external communities Wikipedia is used in, and how it is used. Fourth, relatedly, a topic analysis of our dataset could reveal how Wikipedia usage on Reddit contributes to societal benefits and harms. Our dataset could help examine if homogeneity within the Reddit and Wikipedia audiences shapes topic patterns and assess whether these relationships mitigate or amplify problematic engagement online.

    Distribution

    The dataset is publicly shared with a Creative Commons Attribution 4.0 International license. The article describing this dataset should be cited: https://doi.org/10.48550/arXiv.2502.04942

    Maintenance

    Patrick Gildersleve will maintain this dataset, and add further years of content as and when available.


    SQL Database Schema

    Table: posts

    Column NameTypeDescription
    subreddit_idTEXTThe unique identifier for the subreddit.
    crosspost_parent_idTEXTThe ID of the original Reddit post if this post is a crosspost.
    post_idTEXTUnique identifier for the Reddit post.
    created_atTIMESTAMPThe timestamp when the post was created.
    updated_atTIMESTAMPThe timestamp when the post was last updated.
    language_codeTEXTThe language code of the post.
    scoreINTEGERThe score (upvotes minus downvotes) of the post.
    upvote_ratioREALThe ratio of upvotes to total votes.
    gildingsINTEGERNumber of awards (gildings) received by the post.
    num_commentsINTEGERNumber of comments on the post.

    Table: comments

    Column NameTypeDescription
    subreddit_idTEXTThe unique identifier for the subreddit.
    post_idTEXTThe ID of the Reddit post the comment belongs to.
    parent_idTEXTThe ID of the parent comment (if a reply).
    comment_idTEXTUnique identifier for the comment.
    created_atTIMESTAMPThe timestamp when the comment was created.
    last_modified_atTIMESTAMPThe timestamp when the comment was last modified.
    scoreINTEGERThe score (upvotes minus downvotes) of the comment.
    upvote_ratioREALThe ratio of upvotes to total votes for the comment.
    gildedINTEGERNumber of awards (gildings) received by the comment.

    Table: postlinks

    Column NameTypeDescription
    post_idTEXTUnique identifier for the Reddit post.
    end_processed_validINTEGERWhether the extracted URL from the post resolves to a valid URL.
    end_processed_urlTEXTThe extracted URL from the Reddit post.
    final_validINTEGERWhether the final URL from the post resolves to a valid URL after redirections.
    final_statusINTEGERHTTP status code of the final URL.
    final_urlTEXTThe final URL after redirections.
    redirectedINTEGERIndicator of whether the posted URL was redirected (1) or not (0).
    in_titleINTEGERIndicator of whether the link appears in the post title (1) or post body (0).

    Table: commentlinks

    Column NameTypeDescription
    comment_idTEXTUnique identifier for the Reddit comment.
    end_processed_validINTEGERWhether the extracted URL from the comment resolves to a valid URL.
    end_processed_urlTEXTThe extracted URL from the comment.
    final_validINTEGERWhether the final URL from the comment resolves to a valid URL after redirections.
    final_statusINTEGERHTTP status code of the final

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Wen Li; Jennifer E. O’Brien; Susan M. Snyder; Matthew O. Howard (2023). Internet Use Characteristics of 27 Participants Who Self-Reported Problem Internet Use. [Dataset]. http://doi.org/10.1371/journal.pone.0117372.t002
Organization logo

Internet Use Characteristics of 27 Participants Who Self-Reported Problem Internet Use.

Related Article
Explore at:
xlsAvailable download formats
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Wen Li; Jennifer E. O’Brien; Susan M. Snyder; Matthew O. Howard
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

YDQ ≥ 5 indicates Internet addiction. YDQ scores of 3 or 4 = potential IA. CIUS ≥ 21 indicates compulsive Internet use.Internet Use Characteristics of 27 Participants Who Self-Reported Problem Internet Use.

Search
Clear search
Close search
Google apps
Main menu