45 datasets found

Number of internet users worldwide 2014-2029
statista.com
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2025). Number of internet users worldwide 2014-2029 [Dataset]. https://www.statista.com/topics/1145/internet-usage-worldwide/
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Area covered
World
Description
The global number of internet users in was forecast to continuously increase between 2024 and 2029 by in total 1.3 billion users (+23.66 percent). After the fifteenth consecutive increasing year, the number of users is estimated to reach 7 billion users and therefore a new peak in 2029. Notably, the number of internet users of was continuously increasing over the past years.Depicted is the estimated number of individuals in the country or region at hand, that use the internet. As the datasource clarifies, connection quality and usage frequency are distinct aspects, not taken into account here.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of internet users in countries like the Americas and Asia.
Average daily time spent on social media worldwide 2012-2025
statista.com
Updated Jun 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Average daily time spent on social media worldwide 2012-2025 [Dataset]. https://www.statista.com/statistics/433871/daily-social-media-usage-worldwide/
Explore at:
Dataset updated
Jun 19, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
How much time do people spend on social media? As of 2025, the average daily social media usage of internet users worldwide amounted to 141 minutes per day, down from 143 minutes in the previous year. Currently, the country with the most time spent on social media per day is Brazil, with online users spending an average of 3 hours and 49 minutes on social media each day. In comparison, the daily time spent with social media in the U.S. was just 2 hours and 16 minutes. Global social media usageCurrently, the global social network penetration rate is 62.3 percent. Northern Europe had an 81.7 percent social media penetration rate, topping the ranking of global social media usage by region. Eastern and Middle Africa closed the ranking with 10.1 and 9.6 percent usage reach, respectively. People access social media for a variety of reasons. Users like to find funny or entertaining content and enjoy sharing photos and videos with friends, but mainly use social media to stay in touch with current events friends. Global impact of social mediaSocial media has a wide-reaching and significant impact on not only online activities but also offline behavior and life in general. During a global online user survey in February 2019, a significant share of respondents stated that social media had increased their access to information, ease of communication, and freedom of expression. On the flip side, respondents also felt that social media had worsened their personal privacy, increased a polarization in politics and heightened everyday distractions.
C
Canada Internet Usage: Search Engine Market Share: Mobile: Haosou
ceicdata.com
Updated Dec 2, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com (2024). Canada Internet Usage: Search Engine Market Share: Mobile: Haosou [Dataset]. https://www.ceicdata.com/en/canada/internet-usage-search-engine-market-share/internet-usage-search-engine-market-share-mobile-haosou
Explore at:
Dataset updated
Dec 2, 2024
Dataset provided by
CEICdata.com
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Sep 18, 2024 - Sep 28, 2024
Area covered
Canada
Description
Canada Internet Usage: Search Engine Market Share: Mobile: Haosou data was reported at 0.010 % in 28 Sep 2024. This stayed constant from the previous number of 0.010 % for 27 Sep 2024. Canada Internet Usage: Search Engine Market Share: Mobile: Haosou data is updated daily, averaging 0.010 % from Sep 2024 (Median) to 28 Sep 2024, with 11 observations. The data reached an all-time high of 0.010 % in 28 Sep 2024 and a record low of 0.010 % in 28 Sep 2024. Canada Internet Usage: Search Engine Market Share: Mobile: Haosou data remains active status in CEIC and is reported by Statcounter Global Stats. The data is categorized under Global Database’s Canada – Table CA.SC.IU: Internet Usage: Search Engine Market Share.
Mobile internet users worldwide 2020-2029
statista.com
Updated Feb 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2025). Mobile internet users worldwide 2020-2029 [Dataset]. https://www.statista.com/topics/779/mobile-internet/
Explore at:
Dataset updated
Feb 5, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Description
The global number of smartphone users in was forecast to continuously increase between 2024 and 2029 by in total 1.8 billion users (+42.62 percent). After the ninth consecutive increasing year, the smartphone user base is estimated to reach 6.1 billion users and therefore a new peak in 2029. Notably, the number of smartphone users of was continuously increasing over the past years.Smartphone users here are limited to internet users of any age using a smartphone. The shown figures have been derived from survey data that has been processed to estimate missing demographics.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of smartphone users in countries like Australia & Oceania and Asia.
Job Offers Web Scraping Search
kaggle.com
Updated Feb 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Job Offers Web Scraping Search [Dataset]. https://www.kaggle.com/datasets/thedevastator/job-offers-web-scraping-search
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 11, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Job Offers Web Scraping Search

Targeted Results to Find the Optimal Work Solution

By [source]

About this dataset

This dataset collects job offers from web scraping which are filtered according to specific keywords, locations and times. This data gives users rich and precise search capabilities to uncover the best working solution for them. With the information collected, users can explore options that match with their personal situation, skillset and preferences in terms of location and schedule. The columns provide detailed information around job titles, employer names, locations, time frames as well as other necessary parameters so you can make a smart choice for your next career opportunity

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset is a great resource for those looking to find an optimal work solution based on keywords, location and time parameters. With this information, users can quickly and easily search through job offers that best fit their needs. Here are some tips on how to use this dataset to its fullest potential:

Start by identifying what type of job offer you want to find. The keyword column will help you narrow down your search by allowing you to search for job postings that contain the word or phrase you are looking for.

Next, consider where the job is located – the Location column tells you where in the world each posting is from so make sure it’s somewhere that suits your needs!

Finally, consider when the position is available – look at the Time frame column which gives an indication of when each posting was made as well as if it’s a full-time/ part-time role or even if it’s a casual/temporary position from day one so make sure it meets your requirements first before applying!

Additionally, if details such as hours per week or further schedule information are important criteria then there is also info provided under Horari and Temps Oferta columns too! Now that all three criteria have been ticked off - key words, location and time frame - then take a look at Empresa (Company Name) and Nom_Oferta (Post Name) columns too in order to get an idea of who will be employing you should you land the gig!

All these pieces of data put together should give any motivated individual all they need in order to seek out an optimal work solution - keep hunting good luck!

Research Ideas

Machine learning can be used to groups job offers in order to facilitate the identification of similarities and differences between them. This could allow users to specifically target their search for a work solution.

The data can be used to compare job offerings across different areas or types of jobs, enabling users to make better informed decisions in terms of their career options and goals.

It may also provide an insight into the local job market, enabling companies and employers to identify where there is potential for new opportunities or possible trends that simply may have previously gone unnoticed

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: web_scraping_information_offers.csv | Column name | Description | |:-----------------|:------------------------------------| | Nom_Oferta | Name of the job offer. (String) | | Empresa | Company offering the job. (String) | | Ubicació | Location of the job offer. (String) | | Temps_Oferta | Time of the job offer. (String) | | Horari | Schedule of the job offer. (String) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit .
Mobile internet usage reach in North America 2020-2029
statista.com
Updated Feb 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2025). Mobile internet usage reach in North America 2020-2029 [Dataset]. https://www.statista.com/topics/779/mobile-internet/
Explore at:
Dataset updated
Feb 5, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Description
The population share with mobile internet access in North America was forecast to increase between 2024 and 2029 by in total 2.9 percentage points. This overall increase does not happen continuously, notably not in 2028 and 2029. The mobile internet penetration is estimated to amount to 84.21 percent in 2029. Notably, the population share with mobile internet access of was continuously increasing over the past years.The penetration rate refers to the share of the total population having access to the internet via a mobile broadband connection.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the population share with mobile internet access in countries like Caribbean and Europe.
d
Job Postings Dataset for Labour Market Research and Insights
datarade.ai
Updated Sep 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oxylabs (2023). Job Postings Dataset for Labour Market Research and Insights [Dataset]. https://datarade.ai/data-products/job-postings-dataset-for-labour-market-research-and-insights-oxylabs
Explore at:
.json, .xml, .csv, .xlsAvailable download formats
Dataset updated
Sep 20, 2023
Dataset authored and provided by
Oxylabs
Area covered
Kyrgyzstan, Luxembourg, Togo, Zambia, Switzerland, Anguilla, Tajikistan, Sierra Leone, Jamaica, British Indian Ocean Territory
Description
Introducing Job Posting Datasets: Uncover labor market insights!

Elevate your recruitment strategies, forecast future labor industry trends, and unearth investment opportunities with Job Posting Datasets.

Job Posting Datasets Source:

Indeed: Access datasets from Indeed, a leading employment website known for its comprehensive job listings.

Glassdoor: Receive ready-to-use employee reviews, salary ranges, and job openings from Glassdoor.

StackShare: Access StackShare datasets to make data-driven technology decisions.

Job Posting Datasets provide meticulously acquired and parsed data, freeing you to focus on analysis. You'll receive clean, structured, ready-to-use job posting data, including job titles, company names, seniority levels, industries, locations, salaries, and employment types.

Choose your preferred dataset delivery options for convenience:

Receive datasets in various formats, including CSV, JSON, and more. Opt for storage solutions such as AWS S3, Google Cloud Storage, and more. Customize data delivery frequencies, whether one-time or per your agreed schedule.

Why Choose Oxylabs Job Posting Datasets:

Fresh and accurate data: Access clean and structured job posting datasets collected by our seasoned web scraping professionals, enabling you to dive into analysis.

Time and resource savings: Focus on data analysis and your core business objectives while we efficiently handle the data extraction process cost-effectively.

Customized solutions: Tailor our approach to your business needs, ensuring your goals are met.

Legal compliance: Partner with a trusted leader in ethical data collection. Oxylabs is a founding member of the Ethical Web Data Collection Initiative, aligning with GDPR and CCPA best practices.

Pricing Options:

Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.

Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.

Experience a seamless journey with Oxylabs:

Understanding your data needs: We work closely to understand your business nature and daily operations, defining your unique data requirements.

Developing a customized solution: Our experts create a custom framework to extract public data using our in-house web scraping infrastructure.

Delivering data sample: We provide a sample for your feedback on data quality and the entire delivery process.

Continuous data delivery: We continuously collect public data and deliver custom datasets per the agreed frequency.

Effortlessly access fresh job posting data with Oxylabs Job Posting Datasets.
Web robot detection - Server logs
zenodo.org
explore.openaire.eu
+1more
csv, json
Updated Jan 4, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Athanasios Lagopoulos; Athanasios Lagopoulos; Grigorios Tsoumakas; Grigorios Tsoumakas (2021). Web robot detection - Server logs [Dataset]. http://doi.org/10.5281/zenodo.3477932
Explore at:
csv, jsonAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3477932
Dataset updated
Jan 4, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Athanasios Lagopoulos; Athanasios Lagopoulos; Grigorios Tsoumakas; Grigorios Tsoumakas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains server logs from the search engine of the library and information center of the Aristotle University of Thessaloniki in Greece (http://search.lib.auth.gr/). The search engine enables users to check the availability of books and other written works, and search for digitized material and scientific publications. The server logs obtained span an entire month, from March 1st to March 31 2018 and consist of 4,091,155 requests with an average of 131,973 requests per day and a standard deviation of 36,996.7 requests. In total, there are requests from 27,061 unique IP addresses and 3,441 unique user-agent strings. The server logs are in JSON format and they are anonymized by masking the last 6 digits of the IP address and by hashing the last part of the URLs requested (after last /). The dataset also contains the processed form of the server logs as a labelled dataset of log entries grouped into sessions along with their extracted features (simple semantic features). We make this dataset publicly available, the first one in this domain, in order to provide a common ground for testing web robot detection methods, as well as other methods that analyze server logs.
f
Web Search Queries Can Predict Stock Market Volumes
figshare.com
pdf
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ilaria Bordino; Stefano Battiston; Guido Caldarelli; Matthieu Cristelli; Antti Ukkonen; Ingmar Weber (2023). Web Search Queries Can Predict Stock Market Volumes [Dataset]. http://doi.org/10.1371/journal.pone.0040014
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0040014
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Ilaria Bordino; Stefano Battiston; Guido Caldarelli; Matthieu Cristelli; Antti Ukkonen; Ingmar Weber
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We live in a computerized and networked society where many of our actions leave a digital trace and affect other people’s actions. This has lead to the emergence of a new data-driven research field: mathematical methods of computer science, statistical physics and sociometry provide insights on a wide range of disciplines ranging from social science to human mobility. A recent important discovery is that search engine traffic (i.e., the number of requests submitted by users to search engines on the www) can be used to track and, in some cases, to anticipate the dynamics of social phenomena. Successful examples include unemployment levels, car and home sales, and epidemics spreading. Few recent works applied this approach to stock prices and market sentiment. However, it remains unclear if trends in financial markets can be anticipated by the collective wisdom of on-line users on the web. Here we show that daily trading volumes of stocks traded in NASDAQ-100 are correlated with daily volumes of queries related to the same stocks. In particular, query volumes anticipate in many cases peaks of trading by one day or more. Our analysis is carried out on a unique dataset of queries, submitted to an important web search engine, which enable us to investigate also the user behavior. We show that the query volume dynamics emerges from the collective but seemingly uncoordinated activity of many users. These findings contribute to the debate on the identification of early warnings of financial systemic risk, based on the activity of users of the www.
C
Current
data.cityofchicago.org
Updated Jul 20, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chicago Police Department (2025). Current [Dataset]. https://data.cityofchicago.org/Public-Safety/Current/rr3h-eci7
Explore at:
csv, tsv, application/rssxml, xml, application/rdfxml, kml, kmz, application/geo+jsonAvailable download formats
Dataset updated
Jul 20, 2025
Authors
Chicago Police Department
Description
This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days. Data is extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system. In order to protect the privacy of crime victims, addresses are shown at the block level only and specific locations are not identified. Should you have questions about this dataset, you may contact the Research & Development Division of the Chicago Police Department at 312.745.6071 or RandD@chicagopolice.org. Disclaimer: These crimes may be based upon preliminary information supplied to the Police Department by the reporting parties that have not been verified. The preliminary crime classifications may be changed at a later date based upon additional investigation and there is always the possibility of mechanical or human error. Therefore, the Chicago Police Department does not guarantee (either expressed or implied) the accuracy, completeness, timeliness, or correct sequencing of the information and the information should not be used for comparison purposes over time. The Chicago Police Department will not be responsible for any error or omission, or for the use of, or the results obtained from the use of this information. All data visualizations on maps should be considered approximate and attempts to derive specific addresses are strictly prohibited. The Chicago Police Department is not responsible for the content of any off-site pages that are referenced by or that reference this web page other than an official City of Chicago or Chicago Police Department web page. The user specifically acknowledges that the Chicago Police Department is not responsible for any defamatory, offensive, misleading, or illegal conduct of other users, links, or third parties and that the risk of injury from the foregoing rests entirely with the user. The unauthorized use of the words "Chicago Police Department," "Chicago Police," or any colorable imitation of these words or the unauthorized use of the Chicago Police Department logo is unlawful. This web page does not, in any way, authorize such use. Data is updated daily Tuesday through Sunday. The dataset contains more than 65,000 records/rows of data and cannot be viewed in full in Microsoft Excel. Therefore, when downloading the file, select CSV from the Export menu. Open the file in an ASCII text editor, such as Wordpad, to view and search. To access a list of Chicago Police Department - Illinois Uniform Crime Reporting (IUCR) codes, go to http://data.cityofchicago.org/Public-Safety/Chicago-Police-Department-Illinois-Uniform-Crime-R/c7ck-438e
Search Engines in Germany - Market Research Report (2015-2030)
ibisworld.com
Updated Jul 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
IBISWorld (2025). Search Engines in Germany - Market Research Report (2015-2030) [Dataset]. https://www.ibisworld.com/germany/industry/search-engines/935/
Explore at:
Dataset updated
Jul 21, 2025
Dataset authored and provided by
IBISWorld
License
https://www.ibisworld.com/about/termsofuse/https://www.ibisworld.com/about/termsofuse/
Time period covered
2015 - 2030
Area covered
Germany
Description
In the past five years, the web portal industry in Germany has seen dynamic growth, driven by high internet penetration and the increased use of mobile devices. Demand for digital services has remained robust across all sectors, with advertising revenue, premium models and commission business establishing themselves as key revenue pillars. At the same time, competition from international technology groups, increasing regulatory requirements and growing data protection awareness are intensifying the pressure to innovate. Web portals are increasingly investing in mobile applications, personalisation and a differentiated range of services in order to maintain user intensity and user loyalty despite increasing saturation and growing digital detox trends. Industry revenue increased by an average of 9.6% per year between 2020 and 2025 and is expected to reach 14 billion euros in the current year.In 2025, industry turnover is expected to increase by 3.9%. The industry is currently characterised by a greater awareness of data protection and user trust. New studies show that many users are sceptical about web portals with inadequate data protection measures and are switching. At the same time, content and community-orientated portals are gaining massive visibility, while traditional e-commerce and technology portals are coming under pressure. Increasing mobile use and the trend towards digital self-regulation functions are influencing development priorities. To ensure their competitiveness, providers are increasingly focussing on transparent data protection solutions, innovative content and cross-service platform strategies.In the next five years, turnover in the sector is expected to increase by an average of 3.2% per year to 16.5 billion euros. The web portal industry is undergoing a phase of profound change, which is primarily characterised by stricter data protection regulations, higher technological requirements and new tax regulations. In particular, the complex compliance with data protection regulations is hampering innovation and making the development of data-based business models more difficult. In addition, the minimum tax law deprives international providers of an important locational advantage and thus changes the competitive landscape. In response, companies are driving forward automation and the use of artificial intelligence in order to fulfil regulatory requirements more efficiently. At the same time, there is a strategic focus on the integration and diversification of digital services. The bundling of email, cloud, calendar and other services increases user loyalty and advertising revenue, but at the same time increases the pressure to consolidate and makes it more difficult for smaller providers to participate in the market.
Amount of data created, consumed, and stored 2010-2023, with forecasts to...
statista.com
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Amount of data created, consumed, and stored 2010-2023, with forecasts to 2028 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/
Explore at:
Dataset updated
Jun 30, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
May 2024
Area covered
Worldwide
Description
The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching *** zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than *** zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just * percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of **** percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached *** zettabytes.
f
An Archive of #DH2016 Tweets Published from Sunday 10 to Tuesday 12 July...
city.figshare.com
bin
Updated May 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ernesto Priego (2023). An Archive of #DH2016 Tweets Published from Sunday 10 to Tuesday 12 July 2016 GMT [Dataset]. http://doi.org/10.6084/m9.figshare.3484817.v2
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3484817.v2
Dataset updated
May 31, 2023
Dataset provided by
City, University of London
Authors
Ernesto Priego
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
eBackgroundThe Digital Humanities 2016 conference is taking/took place in Kraków, Poland, between Sunday 11 July and Saturday 16 July 2016. #DH2016 is/was the conference official hashtag.What This Output IsThis is an Excel spreadsheet file containing three sheets containing a total of 3478 Tweets publicly published with the hashtag #DH2016. The archive starts with a Tweet published on Sunday July 10 2016 00:03:41 +0000 and finishes with a Tweet published on Tuesday July 12 2016 23:55:47 +0000.The original collection has been organised into conference days; one sheet per day (GMT and Central European Times included). A breakdown of Tweets per day:Sunday 10 July 2016: 179 TweetsMonday 11 July 2016: 981 TweetsTuesday 12 July 2016: 2318 Tweets Methodology and LimitationsThe Tweets contained in this file were collected by Ernesto Priego using Martin Hawksey's TAGS 6.0. Only users with at least 1 follower were included in the archive. Retweets have been included (Retweets count as Tweets). The collection spreadsheet was customised to reflect the time zone and geographical location of the conference.The profile_image_url and entities_str metadata were removed before public sharing in this archive. Please bear in mind that the conference hashtag has been spammed so some Tweets colllected may be from spam accounts. Some automated refining has been performed to remove Tweets not related to the conference but the data is likely to require further refining and deduplication. Both research and experience show that the Twitter search API is not 100% reliable. Large Tweet volumes affect the search collection process. The API might "over-represent the more central users", not offering "an accurate picture of peripheral activity" (Gonzalez-Bailon, Sandra, et al. 2012).Apart from the filters and limitations already declared, it cannot be guaranteed that this file contains each and every Tweet tagged with #dh2016 during the indicated period, and the dataset is shared for archival, comparative and indicative educational research purposes only.Only content from public accounts is included and was obtained from the Twitter Search API. The shared data is also publicly available to all Twitter users via the Twitter Search API and available to anyone with an Internet connection via the Twitter and Twitter Search web client and mobile apps without the need of a Twitter account.Each Tweet and its contents were published openly on the Web with the queried hashtag and are responsibility of the original authors.No private personal information is shared in this dataset. The collection and sharing of this dataset is enabled and allowed by Twitter's Privacy Policy. The sharing of this dataset complies with Twitter's Developer Rules of the Road. This dataset is shared to archive, document and encourage open educational research into scholarly activity on Twitter. Other ConsiderationsTweets published publicly by scholars during academic conferences are often tagged (labeled) with a hashtag dedicated to the conference in question.The purpose and function of hashtags is to organise and describe information/outputs under the relevant label in order to enhance the discoverability of the labeled information/outputs (Tweets in this case). A hashtag is metadata users choose freely to use so their content is associated, directly linked to and categorised with the chosen hashtag. Though every reason for Tweeters' use of hashtags cannot be generalised nor predicted, it can be argued that scholarly Twitter users form specialised, self-selecting public professional networks that tend to observe scholarly practices and accepted modes of social and professional behaviour. In general terms it can be argued that scholarly Twitter users willingly and consciously tag their public Tweets with a conference hashtag as a means to network and to promote, report from, reflect on, comment on and generally contribute publicly to the scholarly conversation around conferences. As Twitter users, conference Twitter hashtag contributors have agreed to Twitter's Privacy and data sharing policies. Professional associations like the Modern Language Association recognise Tweets as citeable scholarly outputs. Archiving scholarly Tweets is a means to preserve this form of rapid online scholarship that otherwise can very likely become unretrievable as time passes; Twitter's search API has well-known temporal limitations for retrospective historical search and collection.Beyond individual tweets as scholarly outputs, the collective scholarly activity on Twitter around a conference or academic project or event can provide interesting insights for the contemporary history of scholarly communications. To date, collecting in real time is the only relatively accurate method to archive tweets at a small scale. Though these datasets have limitations and are not thoroughly systematic, it is hoped they can contribute to developing new insights into the discipline's presence on Twitter over time.The CC-BY license has been applied to the output in the repository as a curated dataset. Authorial/curatorial/collection work has been performed on the file in order to make it available as part of the scholarly record. The data contained in the deposited file is otherwise freely available elsewhere through different methods and anyone not wishing to attribute the data to the creator of this output is needless to say free to do their own collection and clean their own data.
n
OpenDoTT Deployment Dataset: ESR3 Communities and Neighbourhoods
figshare.northumbria.ac.uk
txt
Updated Dec 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SARAH KIDEN (2022). OpenDoTT Deployment Dataset: ESR3 Communities and Neighbourhoods [Dataset]. http://doi.org/10.25398/rd.northumbria.17838842.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.25398/rd.northumbria.17838842.v1
Dataset updated
Dec 2, 2022
Dataset provided by
Northumbria University
Authors
SARAH KIDEN
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The overall aim of the PhD is to explore the potential use of Internet of Things (IoT) to support the creation and sustainment of community technology/infrastructure and other publicly shared objects. For the second year, the Dundee West End Community Fridge was considered to allow continuity from the first year.

The research followed a mixed approach consisting of a prototype and a field visit, that were supplemented using storyboards and a Miro board. 7 UK-based participants contributed to the research through in-person and 2 virtual interviews and workshops. A 5-day field visit was organised to allow in-person participation as online recruitment did not yield many participants.
This Dataset contains: Field notes Assorted photographs of the prototypes and other activities Storyboards Sample forms, documents and announcements Participant recruitment tools Other data related to the research
NOAA U.S. Climate Gridded Dataset (NClimGrid)
registry.opendata.aws
Updated Aug 12, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NOAA (2021). NOAA U.S. Climate Gridded Dataset (NClimGrid) [Dataset]. https://registry.opendata.aws/noaa-nclimgrid/
Explore at:
Dataset updated
Aug 12, 2021
Dataset provided by
National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
Area covered
United States
Description
The NOAA Monthly U.S. Climate Gridded Dataset (NClimGrid) consists of four climate variables derived from the GHCN-D dataset: maximum temperature, minimum temperature, average temperature and precipitation. Each file provides monthly values in a 5x5 lat/lon grid for the Continental United States. Data is available from 1895 to the present. On an annual basis, approximately one year of "final" nClimGrid will be submitted to replace the initially supplied "preliminary" data for the same time period. Users should be sure to ascertain which level of data is required for their research.

EpiNOAA is an analysis ready dataset that consists of a daily time-series of nClimGrid measures (maximum temperature, minimum temperature, average temperature, and precipitation) at the county scale. Each file provides daily values for the Continental United States. Data are available from 1951 to the present. Daily data are updated every 3 days with a preliminary data file and replaced with the scaled (i.e., quality controlled) data file every three months. This derivative data product is an enhancement from the original daily nClimGrid dataset in that all four weather parameters are now packaged into one file and assembled in a daily time-series format. In addition to a direct download option, an R package and web interface has been developed to streamline access to the final data product. These options allow end users three separate access modes to arrive at a customized dataset unique to each end user’s application. Users should be sure to review the data documentation to inform which level of data is required for their research.
A Tribuna
kaggle.com
Updated Mar 8, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TheScientistBR (2017). A Tribuna [Dataset]. https://www.kaggle.com/TheScientistBR/atribuna/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 8, 2017
Dataset provided by
Kagglehttp://kaggle.com/
Authors
TheScientistBR
Description
Context

The newspaper publications on the Internet increases every day. There are many news agencies, newspapers and magazines with digital publications on the big network. Published documents made available to users who, in turn, use search engines to find them. To deliver the closest searched documents, these documents must be previously indexed and classified. With the huge volume of documents published every day, many researches have been carried out in order to find a way of dealing with the automatic document classification.

Content

The "Tribuna" database is of journalistic origin with its digital publication, a factor that may be important for professionals of the area, also serving to understand other similar datasets. In order to carry out the experiment, we adopted the "A Tribuna" database, whose main characteristics presented previously, show that the collection is a good source of research, since it is already classified by specialists and has 21 classes that can be Displayed in the table below.

Acknowledgements

My thanks to the company "A Tribuna" that gave all these text files for experiment at the Federal University of Espírito Santo. To the High Desermpenho Computation Laboratory (LCAD) for all the help in the experiments. Thanks also to Prof. PhD Oliveira, Elias for all the knowledge shared.

Inspiration

There are two issues involving this dataset:

What is the best algorithm for sorting these documents?

What are the elements that describe each of the 21 classes in the collection?
CT-FAN-21 corpus: A dataset for Fake News Detection
zenodo.org
Updated Oct 23, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl; Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl (2022). CT-FAN-21 corpus: A dataset for Fake News Detection [Dataset]. http://doi.org/10.5281/zenodo.4714517
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.4714517
Dataset updated
Oct 23, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl; Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl
Description
Data Access: The data in the research collection provided may only be used for research purposes. Portions of the data are copyrighted and have commercial value as data, so you must be careful to use it only for research purposes. Due to these restrictions, the collection is not open data. Please download the Agreement at Data Sharing Agreement and send the signed form to fakenewstask@gmail.com .

Citation

Please cite our work as

@article{shahi2021overview, title={Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection}, author={Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Mandl, Thomas}, journal={Working Notes of CLEF}, year={2021} }

Problem Definition: Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other (e.g., claims in dispute) and detect the topical domain of the article. This task will run in English.

Subtask 3A: Multi-class fake news detection of news articles (English) Sub-task A would detect fake news designed as a four-class classification problem. The training data will be released in batches and roughly about 900 articles with the respective label. Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other. Our definitions for the categories are as follows:

False - The main claim made in an article is untrue.

Partially False - The main claim of an article is a mixture of true and false information. The article contains partially true and partially false information but cannot be considered 100% true. It includes all articles in categories like partially false, partially true, mostly true, miscaptioned, misleading etc., as defined by different fact-checking services.

True - This rating indicates that the primary elements of the main claim are demonstrably true.

Other- An article that cannot be categorised as true, false, or partially false due to lack of evidence about its claims. This category includes articles in dispute and unproven articles.

Subtask 3B: Topical Domain Classification of News Articles (English) Fact-checkers require background expertise to identify the truthfulness of an article. The categorisation will help to automate the sampling process from a stream of data. Given the text of a news article, determine the topical domain of the article (English). This is a classification problem. The task is to categorise fake news articles into six topical categories like health, election, crime, climate, election, education. This task will be offered for a subset of the data of Subtask 3A.

Input Data

The data will be provided in the format of Id, title, text, rating, the domain; the description of the columns is as follows:

Task 3a

ID- Unique identifier of the news article

Title- Title of the news article

text- Text mentioned inside the news article

our rating - class of the news article as false, partially false, true, other

Task 3b

public_id- Unique identifier of the news article

Title- Title of the news article

text- Text mentioned inside the news article

domain - domain of the given news article(applicable only for task B)

Output data format

Task 3a

public_id- Unique identifier of the news article

predicted_rating- predicted class

Sample File

public_id, predicted_rating 1, false 2, true

Task 3b

public_id- Unique identifier of the news article

predicted_domain- predicted domain

Sample file

public_id, predicted_domain 1, health 2, crime

Additional data for Training

To train your model, the participant can use additional data with a similar format; some datasets are available over the web. We don't provide the background truth for those datasets. For testing, we will not use any articles from other datasets. Some of the possible source:

Fakenews Classification Datasets

Fake News Detection Challenge KDD 2020

FakeNewsNet

IMPORTANT!

Fake news article used for task 3b is a subset of task 3a.

We have used the data from 2010 to 2021, and the content of fake news is mixed up with several topics like election, COVID-19 etc.

Evaluation Metrics

This task is evaluated as a classification task. We will use the F1-macro measure for the ranking of teams. There is a limit of 5 runs (total and not per day), and only one person from a team is allowed to submit runs.

Submission Link: https://competitions.codalab.org/competitions/31238

Related Work

Shahi GK. AMUSED: An Annotation Framework of Multi-modal Social Media Data. arXiv preprint arXiv:2010.00502. 2020 Oct 1.https://arxiv.org/pdf/2010.00502.pdf

G. K. Shahi and D. Nandini, “FakeCovid – a multilingualcross-domain fact check news dataset for covid-19,” inWorkshop Proceedings of the 14th International AAAIConference on Web and Social Media, 2020. http://workshop-proceedings.icwsm.org/abstract?id=2020_14

Shahi, G. K., Dirkson, A., & Majchrzak, T. A. (2021). An exploratory study of covid-19 misinformation on twitter. Online Social Networks and Media, 22, 100104. doi: 10.1016/j.osnem.2020.100104
(🌇Sunset) 🇺🇦 Ukraine Conflict Twitter Dataset
kaggle.com
zip
Updated Apr 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BwandoWando (2024). (🌇Sunset) 🇺🇦 Ukraine Conflict Twitter Dataset [Dataset]. https://www.kaggle.com/datasets/bwandowando/ukraine-russian-crisis-twitter-dataset-1-2-m-rows
Explore at:
zip(18174367560 bytes)Available download formats
Dataset updated
Apr 2, 2024
Authors
BwandoWando
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
Ukraine
Description
IMPORTANT (02-Apr-2024)

Kaggle has fixed the issue with gzip files and Version 510 should now reflect properly working files

IMPORTANT (28-Mar-2024)

Please use the version 508 of the dataset, as 509 is broken. See link below of the dataset that is properly working https://www.kaggle.com/datasets/bwandowando/ukraine-russian-crisis-twitter-dataset-1-2-m-rows/versions/508

Context

The context and history of the current ongoing conflict can be found https://en.wikipedia.org/wiki/2022_Russian_invasion_of_Ukraine.

Announcement

[Jun 16] (🌇Sunset) Twitter has finally pulled the plug on all of my remaining TWITTER API accounts as part of their efforts for developers to migrate to the new API. The last tweets that I pulled was dated last Jun 14, and no more data from Jun 15 onwards. It was fun til it lasted and I hope that this dataset was able and will continue to help a lot. I'll just leave the dataset here for future download and reference. Thank you all!

[Apr 19] Two additional developer accounts have been permanently suspended, expect a lower throughtput in the next few weeks. I will pull data til they ban my last account.

[Apr 08] I woke up this morning and saw that Twitter has banned/ permanently suspended 4 of my developer accounts, I have around a few more but it is just a matter of time till all my accounts will most likely get banned as well. This was a fun project that I maintained for as long as I can. I will pull data til my last account gets banned.

[Feb 26] I've started to pull in RETWEETS again, so I am expecting a significant amount of throughput in tweets again on top of the dedicated processes that I have that gets NONRETWEETS. If you don't want RETWEETS, just filter them out.

[Feb 24] It's been a year since I started getting tweets of this conflict and had no idea that a year later this is still ongoing. Almost everyone assumed that Ukraine will crumble in a matter of days, but it is not the case. To those who have been using my dataset, i hope that I am helping all of you in one way or another. Ill do my best to maintain updating this dataset as long as I can.

[Feb 02] I seem to be getting less tweets as my crawlers are getting throttled, i used to get 2500 tweets per 15 mins but around 2-3 of my crawlers are getting throttling limit errors. There may be some kind of update that Twitter has done about rate limits or something similar. Will try to find ways to increase the throughput again.

[Jan 02] For all new datasets, it will now be prefixed by a year, so for Jan 01, 2023, it will be 20230101_XXXX.

[Dec 28] For those looking for a cleaned version of my dataset, with the retweets removed from before Aug 08, here is a dataset by @@vbmokin https://www.kaggle.com/datasets/vbmokin/russian-invasion-ukraine-without-retweets

[Nov 19] I noticed that one of my developer accounts, which ISNT TWEETING ANYTHING and just pulling data out of twitter has been permanently banned by Twitter.com, thus the decrease of unique tweets. I will try to come up with a solution to increase my throughput and signup for a new developer account.

[Oct 19] I just noticed that this dataset is finally "GOLD", after roughly seven months since I first uploaded my gzipped csv files.

[Oct 11] Sudden spike in number of tweets revolving around most recent development(s) about the Kerch Bridge explosion and the response from Russia.

[Aug 19- IMPORTANT] I raised the missing dataset issue to Kaggle team and they confirmed it was a bug brought by a ReactJs upgrade, the conversation and details can be seen here https://www.kaggle.com/discussions/product-feedback/345915 . It has been fixed already and I've reuploaded all the gzipped files that were lost PLUS the new files that were generated AFTER the issue was identified.

[Aug 17] Seems the latest version of my dataset lost around 100+ files, good thing this dataset is versioned so one can just go back to the previous version(s) and download them. Version 188 HAS ALL THE LOST FILES, I wont be reuploading all datasets as it will be tedious and I've deleted them already in my local and I only store the latest 2-3 days.

[Aug 10] 3/5 of my Python processes errored out and resulted to around 10-12 hours of NO data gathering for those processes thus the sharp decrease of tweets for Aug 09 dataset. I've applied an exception/ error checking to prevent this from happening.

[Aug 09] Significant drop in tweets extracted, but I am now getting ORIGINAL/ NON-RETWEETS.

[Aug 08] I've noticed that I had a spike of Tweets extracted, but they are literally thousands of retweets of a single original tweet. I also noticed that my crawlers seem to deviate because of this tactic being used by some Twitter users where they flood Twitter w...
COVIDcast CMU Delphi Research Group Epidata
kaggle.com
zip
Updated May 5, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anna Epishova (2020). COVIDcast CMU Delphi Research Group Epidata [Dataset]. https://www.kaggle.com/annaepishova/delphiepidata
Explore at:
zip(6514081 bytes)Available download formats
Dataset updated
May 5, 2020
Authors
Anna Epishova
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Context

COVIDcast displays signals related to COVID-19 activity levels across the United States, derived from a variety of anonymized, aggregated data sources made available by multiple partners.

One of COVIDcast streams displays results for a CMU-run symptom survey, advertised through Facebook.

Content

This dataset is gathered using the delphi-epidata API and contains covidcast_meta and covidcast datasources.

Presently the dataset contains fb-survey data signal which is based on CMU-run symptom surveys, advertised through Facebook. Using this survey data, CMU estimate the percentage of people in a given location, on a given day that have CLI (covid-like illness = fever, along with cough, or shortness of breath, or difficulty breathing), and separately, that have ILI (influenza-like illness = fever, along with cough or sore throat).

Dataset Description

Files are organized in folders based on the spatial resolution of fb-survey data (state, county, hrr, msa).

Each file contains the percentage of people in a given location, on a given day that have CLI or ILI. Data consists of raw and smoothed estimates and is gathered for all time values available at delphi-epidata.

Each file contains the following columns: - geo_value - location code - time_value - time unit (e.g. date) over which underlying events happened - direction - trend classifier (+1 -> increasing, 0 steady or not determined, -1 -> decreasing) - value - value (statistic) derived from the underlying data source - stderr - standard error of the statistic with respect to its sampling distribution, null when not applicable - sample_size - number of "data points" used in computing the statistic, null when not applicable

Additionally, the dataset contains the most recent covidcast_meta where you can find the summary statistics for fb-survey data.
Mobile internet penetration in Europe 2024, by country
statista.com
Updated Feb 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2025). Mobile internet penetration in Europe 2024, by country [Dataset]. https://www.statista.com/topics/779/mobile-internet/
Explore at:
Dataset updated
Feb 5, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Description
Switzerland is leading the ranking by population share with mobile internet access , recording 95.06 percent. Following closely behind is Ukraine with 95.06 percent, while Moldova is trailing the ranking with 46.83 percent, resulting in a difference of 48.23 percentage points to the ranking leader, Switzerland. The penetration rate refers to the share of the total population having access to the internet via a mobile broadband connection.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).

Facebook

Twitter

Click to copy link

Link copied

Cite

Statista Research Department (2025). Number of internet users worldwide 2014-2029 [Dataset]. https://www.statista.com/topics/1145/internet-usage-worldwide/

Number of internet users worldwide 2014-2029

Explore at:

304 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Apr 11, 2025

Dataset provided by

Statistahttp://statista.com/

Authors

Statista Research Department

Area covered

World

Description

The global number of internet users in was forecast to continuously increase between 2024 and 2029 by in total 1.3 billion users (+23.66 percent). After the fifteenth consecutive increasing year, the number of users is estimated to reach 7 billion users and therefore a new peak in 2029. Notably, the number of internet users of was continuously increasing over the past years.Depicted is the estimated number of individuals in the country or region at hand, that use the internet. As the datasource clarifies, connection quality and usage frequency are distinct aspects, not taken into account here.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of internet users in countries like the Americas and Asia.

Clear search

Close search

Google apps

Main menu

Number of internet users worldwide 2014-2029

Average daily time spent on social media worldwide 2012-2025

Canada Internet Usage: Search Engine Market Share: Mobile: Haosou

Mobile internet users worldwide 2020-2029

Job Offers Web Scraping Search

Job Offers Web Scraping Search

Targeted Results to Find the Optimal Work Solution

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

Mobile internet usage reach in North America 2020-2029

Job Postings Dataset for Labour Market Research and Insights

Web robot detection - Server logs

Web Search Queries Can Predict Stock Market Volumes

Current

Search Engines in Germany - Market Research Report (2015-2030)

Amount of data created, consumed, and stored 2010-2023, with forecasts to...

An Archive of #DH2016 Tweets Published from Sunday 10 to Tuesday 12 July...

OpenDoTT Deployment Dataset: ESR3 Communities and Neighbourhoods

NOAA U.S. Climate Gridded Dataset (NClimGrid)

A Tribuna

Context

Content

Acknowledgements

Inspiration

CT-FAN-21 corpus: A dataset for Fake News Detection

(🌇Sunset) 🇺🇦 Ukraine Conflict Twitter Dataset

IMPORTANT (02-Apr-2024)

IMPORTANT (28-Mar-2024)

Context

Announcement

COVIDcast CMU Delphi Research Group Epidata

Context

Content

Dataset Description

Mobile internet penetration in Europe 2024, by country

Number of internet users worldwide 2014-2029