29 datasets found

Data from: HTTPS traffic classification
kaggle.com
Updated Mar 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Đinh Ngọc Ân (2024). HTTPS traffic classification [Dataset]. https://www.kaggle.com/datasets/inhngcn/https-traffic-classification/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 11, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Đinh Ngọc Ân
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The people from Czech are publishing a dataset for the HTTPS traffic classification.

Since the data were captured mainly in the real backbone network, they omitted IP addresses and ports. The datasets consist of calculated from bidirectional flows exported with flow probe Ipifixprobe. This exporter can export a sequence of packet lengths and times and a sequence of packet bursts and time. For more information, please visit ipfixprobe repository (Ipifixprobe).

During research, they divided HTTPS into five categories: L -- Live Video Streaming, P -- Video Player, M -- Music Player, U -- File Upload, D -- File Download, W -- Website, and other traffic.

They have chosen the service representatives known for particular traffic types based on the Alexa Top 1M list and Moz's list of the most popular 500 websites for each category. They also used several popular websites that primarily focus on the audience in Czech. The identified traffic classes and their representatives are provided below:

Live Video Stream Twitch, Czech TV, YouTube Live Video Player DailyMotion, Stream.cz, Vimeo, YouTube Music Player AppleMusic, Spotify, SoundCloud File Upload/Download FileSender, OwnCloud, OneDrive, Google Drive Website and Other Traffic Websites from Alexa Top 1M list
Number of internet users worldwide 2014-2029
statista.com
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2025). Number of internet users worldwide 2014-2029 [Dataset]. https://www.statista.com/topics/1145/internet-usage-worldwide/
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Area covered
World
Description
The global number of internet users in was forecast to continuously increase between 2024 and 2029 by in total 1.3 billion users (+23.66 percent). After the fifteenth consecutive increasing year, the number of users is estimated to reach 7 billion users and therefore a new peak in 2029. Notably, the number of internet users of was continuously increasing over the past years.Depicted is the estimated number of individuals in the country or region at hand, that use the internet. As the datasource clarifies, connection quality and usage frequency are distinct aspects, not taken into account here.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of internet users in countries like the Americas and Asia.
What social Media People like the most and why?
kaggle.com
Updated Feb 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nina Luquez (2023). What social Media People like the most and why? [Dataset]. https://www.kaggle.com/ninaluquez/what-social-media-people-like-the-most-and-why/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 17, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Nina Luquez
Description
Dataset

This dataset was created by Nina Luquez

Contents
IMDB Top 250 Lists (1996 - 2021)
kaggle.com
Updated Jan 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mustafa Cicek (2022). IMDB Top 250 Lists (1996 - 2021) [Dataset]. https://www.kaggle.com/mustafacicek/imdb-top-250-lists-1996-2020/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 20, 2022
Dataset provided by
Kaggle
Authors
Mustafa Cicek
Description
Context

IMDB (Internet Movie Database) is one of the most popular web sites, or databases, about movies, TV shows and similar. IMDB's Top 250 lists also important feature for considering good movies. Rankings are calculated with users' votes. For more IMDB's pollmaster account shares previous years IMDB Top 250 lists. Top 250 lists changes all the time, so that the lists are created for December 31st, midnight PST of that year.

Content

This dataset contains IMDB Top 250 lists from 1996 to 2020 with every movie's basic information; release year, ranking, score, stars, etc.

Acknowledgements

This data scraped from IMDB, and you can reach scraping part from here

Inspiration

Time travel... You can look into lists for last 25 years. Analyze best movies for voters, genre preferences, most successful directors, stars, ranking changings over time et cetera. There are lots of things to do. Be creative and visualize them.
Global Starlink Web Cache Latency & Traceroute Measurement Dataset
zenodo.org
Updated Feb 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qi Zhang; Qi Zhang; Zeqi Lai; Zeqi Lai; Qian Wu; Qian Wu; Jihao Li; Jihao Li; HEWU LI; HEWU LI (2025). Global Starlink Web Cache Latency & Traceroute Measurement Dataset [Dataset]. http://doi.org/10.5281/zenodo.14800115
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.14800115
Dataset updated
Feb 6, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Qi Zhang; Qi Zhang; Zeqi Lai; Zeqi Lai; Qian Wu; Qian Wu; Jihao Li; Jihao Li; HEWU LI; HEWU LI
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains global web cache latency measurements collected via RIPE Atlas probes equipped with Starlink terminals across five continents, spanning over 24 hours and resulting in ~2 Million measurements. The measurements aim to evaluate the user-perceived latency of accessing popular websites through low-earth orbit (LEO) satellite networks.

This dataset is a product of Spache, a research project on web caching from space. Please refer to its WWW'25 paper for more details and analysis results.

Dataset File Content

The dataset includes the following files:

Metadata

Target website list: A list of the top 50 most popular websites according to Alexa ranking.

RIPE Atlas Measurement IDs: For each website, the corresponding RIPE Atlas Measurement IDs for both Ping and Traceroute measurements are provided.

Note: microsoftonline.com (originally ranked 41st) is not included in the list due to its unresolvable domain name.

Measurement results - Raw Data

Ping and Traceroute results: Raw measurement results for each target website, including detailed information on each measurement.

Note: For details on the measurement result formats, please refer to the RIPE Atlas documentation.

Measurement results - Preprocessed Latency

Ping RTT latency: Preprocessed data containing the minimum RTT (Round Trip Time, in milliseconds) for each Ping measurement to all target websites.

Probe information: Corresponding Probe IDs, along with their respective countries and continents at the time of measurement.

This dataset is intended to support research on web caching, particularly in the context of satellite Internet. Please cite both this dataset and the associated paper if you find this data useful.
d
Dataset for collaborative prediction of web service quality based on user...
search.dataone.org
datadryad.org
Updated May 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yang Song (2025). Dataset for collaborative prediction of web service quality based on user preferences and services [Dataset]. http://doi.org/10.5061/dryad.5dv41ns4s
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.5dv41ns4s
Dataset updated
May 4, 2025
Dataset provided by
Dryad Digital Repository
Authors
Yang Song
Time period covered
Jan 1, 2020
Description
The prediction ofÂ web service quality plays an important role in improving user services; it has been one of the most popular topics in the field of Internet services. In traditional collaborative filtering methods, differences in the personalization and preferences of different users have been ignored.Â In this paper, we propose a prediction method forÂ web service quality based on different types of quality of service (QoS) attributes. Different extraction rules are applied to extract the user preference matrices from the original web data, and the negative value filtering-based top-K method is used to merge the optimization results into the collaborative prediction method. Thus, the individualized differences are fully exploited, and the problem of inconsistent QoS values is resolved.Â The experimental results demonstrate the validity of the proposed method.Â Compared with other methods, the proposed method performs better, and the results are closer to the real values.
Context Ad Clicks Dataset
kaggle.com
Updated Feb 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Möbius (2021). Context Ad Clicks Dataset [Dataset]. https://www.kaggle.com/arashnic/ctrtest/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 9, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Möbius
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The dataset generated by an E-commerce website which sells a variety of products at its online platform. The records user behaviour of its customers and stores it as a log. However, most of the times, users do not buy the products instantly and there is a time gap during which the customer might surf the internet and maybe visit competitor websites. Now, to improve sales of products, website owner has hired an Adtech company which built a system such that ads are being shown for owner products on its partner websites. If a user comes to owner website and searches for a product, and then visits these partner websites or apps, his/her previously viewed items or their similar items are shown on as an ad. If the user clicks this ad, he/she will be redirected to the owner website and might buy the product.

The task is to predict the probability i.e. probability of user clicking the ad which is shown to them on the partner websites for the next 7 days on the basis of historical view log data, ad impression data and user data.

Content

You are provided with the view log of users (2018/10/15 - 2018/12/11) and the product description collected from the owner website. We also provide the training data and test data containing details for ad impressions at the partner websites(Train + Test). Train data contains the impression logs during 2018/11/15 – 2018/12/13 along with the label which specifies whether the ad is clicked or not. Your model will be evaluated on the test data which have impression logs during 2018/12/12 – 2018/12/18 without the labels. You are provided with the following files:

train.zip: This contains 3 files and description of each is given below:

train.csv

view_log.csv

item_data.csv

test.csv: test file contains the impressions for which the participants need to predict the click rate sample_submission.csv: This file contains the format in which you have to submit your predictions.

Inspiration

Predict the probability probability of user clicking the ad which is shown to them on the partner websites for the next 7 days on the basis of historical view log data, ad impression data and user data.

The evaluated metric could be "area under the ROC curve" between the predicted probability and the observed target.

Imgur Most Viral and Secret Santa

kaggle.com

Updated Apr 18, 2020

Facebook

Twitter

Click to copy link

Link copied

Cite

Ghalib93 (2020). Imgur Most Viral and Secret Santa [Dataset]. https://www.kaggle.com/ghalib93/imgur-most-viral-and-secret-santa/code

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Apr 18, 2020

Dataset provided by

Kaggle

Authors

Ghalib93

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

Imgur is an image hosting and sharing website founded in 2009. It became one of the most popular websites around the world with approximately 250 million users. The website does not require registration and anyone can browse its content. However, to be able to post an account must be created. It is famous for an event that it created in 2013 where members get to register to send/receive gifts from other members on the website. The event takes place during Christmas time and people share their gifts via the website where they post pictures of the process or what they received in a specific tag. Today the data provided covers two sections that I think are important to understanding certain patterns within the Imgur community. The first is the Most Viral section and the second is the Secret Santa tag.

I have participated twice in The Imgur secret Santa event and always found funny and interesting post from its most viral section. I would like with the help of the Kaggle community to identify trends from the data provided and maybe make a comparison between the Secret Santa data and the most viral.

Content

There are two Dataframes included and they are almost identical in the number of columns:

The first Dataframe is Imgur Most Viral posts. This contains many of the posts that were labelled as Viral by The Imgur community and team using specific algorithms to track number of likes and dislikes across multiple platforms. The posts might be videos, gifs, pictures or just text.

The second Dataframe is Imgur Secret Santa Tag. Secret Santa is an annual Imgur tradition where members can sign up to send gifts to and receive gifts from other members during the Christmas holiday.This contains many of the posts that were tagged with Secret Santa by the Imgur community. The posts might be videos, gifs, pictures or just text. There is a (is_viral) column in this Dataframe that is not available in the Most Viral Dataframe since all of the posts there are viral.

Data Dictionary

Feature	Type	Dataset	Description
account_id	object	Imgur_Viral/imgur_secret_santa	Unique Account ID per member
comment_count	float64	Imgur_Viral/imgur_secret_santa	Number of comments made in the post
datetime	float64	Imgur_Viral/imgur_secret_santa	TimeStamp containing Date and Time Details
downs	float64	Imgur_Viral/imgur_secret_santa	Number of dislikes for the post
favorite_count	float64	Imgur_Viral/imgur_secret_santa	Number of user that marked the post as a favourite
id	object	Imgur_Viral/imgur_secret_santa	Uniqe Post ID. Even if it was posted by the same member, different posts will have different IDs
images_count	float64	Imgur_Viral/imgur_secret_santa	Number of images included in the post
points	float64	Imgur_Viral/imgur_secret_santa	Each post will have calculated points based on (ups - downs)
score	float64	Imgur_Viral/imgur_secret_santa	Ticket number
tags	object	Imgur_Viral/imgur_secret_santa	Tags are sub albums that the post will show under
title	object	Imgur_Viral/imgur_secret_santa	Title of the post
ups	float64	Imgur_Viral/imgur_secret_santa	Number of likes for the post
views	float64	Imgur_Viral/imgur_secret_santa	Number of people that viewed the post
is_most_viral	boolean	imgur_secret_santa	If the post is viral or not

Acknowledgements

I would like to thank imgur for providing an API that made collecting data easier from its website. With their help we might be able to better understand certain trends that emerge from its community

Inspiration

There is no problem to solve from this data, but it just a fun way to explore and learn more about programming and analyzing data. I hope you enjoy playing with the data as much as I did collecting it and browsing the website

t
Demir, Nurullah, Urban, Tobias, Pohlmann, Norbert, Wressnegger, Christian...
service.tib.eu
Updated Nov 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Demir, Nurullah, Urban, Tobias, Pohlmann, Norbert, Wressnegger, Christian (2023). Dataset: Dataset: a large-scale study of cookie banner interaction tools and their impact on users' privacy / part2. https://doi.org/10.35097/1717 [Dataset]. https://service.tib.eu/ldmservice/dataset/rdr-doi-10-35097-1717
Explore at:
Dataset updated
Nov 28, 2024
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Abstract: Cookie notices (or cookie banners) are a popular mechanism for websites to provide (European) Internet users a tool to choose which cookies the site may set. Banner implementations range from merely providing information that a site uses cookies over offering the choice to accepting or denying all cookies to allowing fine-grained control of cookie usage. Users frequently get annoyed by the banner's pervasiveness as they interrupt natural'' browsing on the Web. As a remedy, different browser extensions have been developed to automate the interaction with cookie banners. In this work, we perform a large-scale measurement study comparing the effectiveness of extensions for cookie banner interaction.'' We configured the extensions to express different privacy choices (e.g., accepting all cookies, accepting functional cookies, or rejecting all cookies) to understand their capabilities to execute a user's preferences. The results show statistically significant differences in which cookies are set, how many of them are set, and which types are set---even for extensions that aim to implement the same cookie choice. Extensions forcookie banner interaction'' can effectively reduce the number of set cookies compared to no interaction with the banners. However, all extensions increase the tracking requests significantly except when rejecting all cookies. Abstract: Cookie notices (or cookie banners) are a popular mechanism for websites to provide (European) Internet users a tool to choose which cookies the site may set. Banner implementations range from merely providing information that a site uses cookies over offering the choice to accepting or denying all cookies to allowing fine-grained control of cookie usage. Users frequently get annoyed by the banner's pervasiveness as they interruptnatural'' browsing on the Web. As a remedy, different browser extensions have been developed to automate the interaction with cookie banners. In this work, we perform a large-scale measurement study comparing the effectiveness of extensions for cookie banner interaction.'' We configured the extensions to express different privacy choices (e.g., accepting all cookies, accepting functional cookies, or rejecting all cookies) to understand their capabilities to execute a user's preferences. The results show statistically significant differences in which cookies are set, how many of them are set, and which types are set---even for extensions that aim to implement the same cookie choice. Extensions forcookie banner interaction'' can effectively reduce the number of set cookies compared to no interaction with the banners. However, all extensions increase the tracking requests significantly except when rejecting all cookies. TechnicalRemarks: This repository hosts the dataset corresponding to the paper "A Large-Scale Study of Cookie Banner Interaction Tools and their Impact on Users’ Privacy", which was published at the Privacy Enhancing Technologies Symposium (PETS) in 2024.
f
Data from: Revealing QoE of Web Users from Encrypted Network Traffic
figshare.com
zip
Updated Jun 16, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexis Huet; Antoine Saverimoutou; Zied Ben Houidi; Hao Shi; Shengming Cai; Jinchun Xu; Bertrand Mathieu; Dario Rossi (2020). Revealing QoE of Web Users from Encrypted Network Traffic [Dataset]. http://doi.org/10.6084/m9.figshare.12459293.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12459293.v1
Dataset updated
Jun 16, 2020
Dataset provided by
figshare
Authors
Alexis Huet; Antoine Saverimoutou; Zied Ben Houidi; Hao Shi; Shengming Cai; Jinchun Xu; Bertrand Mathieu; Dario Rossi
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
We present a dataset targeting a large set of popular pages (Alexa top-500), from probes from several ISPs networks, browsers software (Chrome, Firefox) and viewport combinations, for over 200,000 experiments realized in 2019.We purposely collect two distinct sets with two different tools, namely Web Page Test (WPT) and Web View (WV), varying a number of relevant parameters and conditions, for a total of 200K+ web sessions, roughly equally split among WV and WPT. Our dataset comprises variations in terms of geographical coverage, scale, diversity and representativeness (location, targets, protocol, browser, viewports, metrics).For Web Page Test, we used the online service www.webpagetest.org at different locations worldwide (Europe, Asia, USA) and private WPT instances in three locations in China (Beijing, Shanghai, Dongguan). The list of target URLs comprised the main pages and five random subpages from Alexa top-500 worldwide and China. We varied network conditions : native connections and 4G, FIOS, 3GFast, DSL, and custom shaping/loss conditions. The other elements in the configuration were fixed: Chrome browser on desktop with a fixed screen resolution, HTTP/2 protocol and IPv4.For Web View, we collected experiments from three machines located in France. We selected two versions of two browser families (Chrome 75/77, Firefox 63/68), two screen sizes (1920x1080, 1440x900), and employ different browser configurations (one half of the experiments activate the AdBlock plugin) from two different access technologies (fiber and ADSL). From a protocol standpoint, we used both IPv4 and IPv6, with HTTP/2 and QUIC, and performed repeated experiments with cached objects/DNS. Given the settings diversity, we restricted the number of websites to about 50 among the Alexa top-500 websites, to ensure statistical relevance of the collected samples for each page.The two archives IFIPNetworking2020_WebViewOrange.zip and IFIPNetworking2020_Webpagetest.zip correspond respectively to the Web View experiments and to the Web Page Test experiments.Each archive contains three files:- config.csv: Description of parameters and conditions for each run,- metrics.csv: Value of different metrics collected by the browser,- progressionCurves.csv: Progression curves of the bytes progress as seen by the network, from 0 to 10 seconds by steps of 100 milliseconds,- listUrl folder: Indexes the sets of urls.Regarding config.csv, the columns are: - index: Index for this set of conditions, - location: Location of the machine, - listUrl: List of urls, located in the folder listUrl - browserUsed: Internet browser and version - terminal: Desktop or Mobile - collectionEnvironment: Identification of the collection environment - networkConditionsTrafficShaping (WPT only): Whether native condition or traffic shaping (4G, FIOS, 3GFast, DSL, or custom Emulator conditions) - networkConditionsBandwidth (WPT only): Bandwidth of the network - networkConditionsDelay (WPT only): Delay in the network - networkConditions (WV only): network conditions - ipMode (WV only): requested L3 protocol, - requestedProtocol (WV only): requested L7 protocol - adBlocker (WV only): Whether adBlocker is used or not - winSize (WV only): Window sizeRegarding metrics.csv, the columns are: - id: Unique identification of an experiment (consisting of an index 'set of conditions' and an index 'current page') - DOM Content Loaded Event End (ms): DOM time, - First Paint (ms) (WV only): First paint time, - Load Event End (ms): Page Load Time from W3C, - RUM Speed Index (ms) (WV only): RUM Speed Index, - Speed Index (ms) (WPT only): Speed Index, - Time for Full Visual Rendering (ms) (WV only): Time for Full Visual Rendering - Visible portion (%) (WV only): Visible portion, - Time to First Byte (ms) (WPT only): Time to First Byte, - Visually Complete (ms) (WPT only): Visually Complete used to compute the Speed Index, - aatf: aatf using ATF-chrome-plugin - bi_aatf: bi_aatf using ATF-chrome-plugin - bi_plt: bi_plt using ATF-chrome-plugin - dom: dom using ATF-chrome-plugin - ii_aatf: ii_aatf using ATF-chrome-plugin - ii_plt: ii_plt using ATF-chrome-plugin - last_css: last_css using ATF-chrome-plugin - last_img: last_img using ATF-chrome-plugin - last_js: last_js using ATF-chrome-plugin - nb_ress_css: nb_ress_css using ATF-chrome-plugin - nb_ress_img: nb_ress_img using ATF-chrome-plugin - nb_ress_js: nb_ress_js using ATF-chrome-plugin - num_origins: num_origins using ATF-chrome-plugin - num_ressources: num_ressources using ATF-chrome-plugin - oi_aatf: oi_aatf using ATF-chrome-plugin - oi_plt: oi_plt using ATF-chrome-plugin - plt: plt using ATF-chrome-pluginRegarding progressionCurves.csv, the columns are: - id: Unique identification of an experiment (consisting of an index 'set of conditions' and an index 'current page') - url: Url of the current page. SUBPAGE stands for a path. - run: Current run (linked with index of the config for WPT) - filename: Filename of the pcap - fullname: Fullname of the pcap - har_size: Size of the HAR for this experiment, - pagedata_size: Size of the page data report - pcap_size: Size of the pcap - App Byte Index (ms): Application Byte Index as computed from the har file (in the browser) - bytesIn_APP: Total bytes in as seen in the browser, - bytesIn_NET: Total bytes in as seen in the network, - X_BI_net: Network Byte Index computed from the pcap file (in the network) - X_bin_0_for_B_completion to X_bin_99_for_B_completion: X_bin_k_for_B_completion is the bytes progress reached after k*100 millisecondsIf you use these datasets in your research, you can reference to the appropriate paper:@inproceedings{qoeNetworking2020, title={Revealing QoE of Web Users from Encrypted Network Traffic}, author={Huet, Alexis and Saverimoutou, Antoine and Ben Houidi, Zied and Shi, Hao and Cai, Shengming and Xu, Jinchun and Mathieu, Bertrand and Rossi, Dario}, booktitle={2020 IFIP Networking Conference (IFIP Networking)}, year={2020}, organization={IEEE}}
Reddit r/brasil data
kaggle.com
Updated Apr 13, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Leonardo Araujo (2021). Reddit r/brasil data [Dataset]. https://www.kaggle.com/araleo/redditbrasildata/activity
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 13, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Leonardo Araujo
License
https://www.reddit.com/wiki/apihttps://www.reddit.com/wiki/api
Area covered
Brazil
Description
Data from /r/brasil, the most popular Brazilian subreddit. Reddit is a social network in which users gather on forums known as subreddits. Each subreddit has it's own theme, rules and topics of conversation. Reddit's most popular subreddits include: /r/funny, /r/pics, /r/gaming, /r/movies and so on.

The available data was gathered hourly for over four months. Fields include information such as online users count, total number of subscribers and top post at the moment.

All Data was collected by usage of the official reddit API and a simple Python script.

File info: Encoding: UTF-8 Separator: ;

Images by https://unsplash.com/@introspectivedsgn
i
Evolution of Web search engine interfaces through SERP screenshots and HTML...
rdm.inesctec.pt
Updated Jul 26, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Evolution of Web search engine interfaces through SERP screenshots and HTML complete pages for 20 years - Dataset - CKAN [Dataset]. https://rdm.inesctec.pt/dataset/cs-2021-003
Explore at:
Dataset updated
Jul 26, 2021
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This dataset was extracted for a study on the evolution of Web search engine interfaces since their appearance. The well-known list of “10 blue links” has evolved into richer interfaces, often personalized to the search query, the user, and other aspects. We used the most searched queries by year to extract a representative sample of SERP from the Internet Archive. The Internet Archive has been keeping snapshots and the respective HTML version of webpages over time and tts collection contains more than 50 billion webpages. We used Python and Selenium Webdriver, for browser automation, to visit each capture online, check if the capture is valid, save the HTML version, and generate a full screenshot. The dataset contains all the extracted captures. Each capture is represented by a screenshot, an HTML file, and a files' folder. We concatenate the initial of the search engine (G) with the capture's timestamp for file naming. The filename ends with a sequential integer "-N" if the timestamp is repeated. For example, "G20070330145203-1" identifies a second capture from Google by March 30, 2007. The first is identified by "G20070330145203". Using this dataset, we analyzed how SERP evolved in terms of content, layout, design (e.g., color scheme, text styling, graphics), navigation, and file size. We have registered the appearance of SERP features and analyzed the design patterns involved in each SERP component. We found that the number of elements in SERP has been rising over the years, demanding a more extensive interface area and larger files. This systematic analysis portrays evolution trends in search engine user interfaces and, more generally, web design. We expect this work will trigger other, more specific studies that can take advantage of the dataset we provide here. This graphic represents the diversity of captures by year and search engine (Google and Bing).
Z
Dataset for Report: "The Increasing Prominence of Prejudice and Social...
data.niaid.nih.gov
Updated Jun 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Rozado (2022). Dataset for Report: "The Increasing Prominence of Prejudice and Social Justice Rhetoric in UK News Media" [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_6482344
Explore at:
Dataset updated
Jun 13, 2022
Dataset authored and provided by
David Rozado
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United Kingdom
Description
This data set contains frequency counts of target words in 16 million news and opinion articles from 10 popular news media outlets in the United Kingdom. The target words are listed in the associated report and are mostly words that denote prejudice or are often associated with social justice discourse. A few additional words not denoting prejudice are also available since they are used in the report for illustration purposes of the method.

The textual content of news and opinion articles from the outlets is available in the outlet's online domains and/or public cache repositories such as Google cache (https://webcache.googleusercontent.com), The Internet Wayback Machine (https://archive.org/web/web.php), and Common Crawl (https://commoncrawl.org). We used derived word frequency counts from these sources. Textual content included in our analysis is circumscribed to articles headlines and main body of text of the articles and does not include other article elements such as figure captions.

Targeted textual content was located in HTML raw data using outlet specific xpath expressions. Tokens were lowercased prior to estimating frequency counts. To prevent outlets with sparse text content for a year from distorting aggregate frequency counts, we only include outlet frequency counts from years for which there is at least 1 million words of article content from an outlet. This threshold was chosen to maximize inclusion in our analysis of outlets with sparse amounts of articles text per year.

Yearly frequency usage of a target word in an outlet in any given year was estimated by dividing the total number of occurrences of the target word in all articles of a given year by the number of all words in all articles of that year. This method of estimating frequency accounts for variable volume of total article output over time.

In a small percentage of articles, outlet specific XPath expressions might fail to properly capture the content of the article due to the heterogeneity of HTML elements and CSS styling combinations with which articles text content is arranged in outlets online domains. As a result, the total and target word counts metrics for a small subset of articles are not precise. In a random sample of articles and outlets, manual estimation of target words counts overlapped with the automatically derived counts for over 90% of the articles.

Most of the incorrect frequency counts are often minor deviations from the actual counts such as for instance counting the word "Facebook" in an article footnote encouraging article readers to follow the journalist’s Facebook profile and that the XPath expression mistakenly included as the content of the article main text.To conclude, in a data analysis of over 16 million articles, we cannot manually check the correctness of frequency counts for every single article and hundred percent accuracy at capturing articles’ content is elusive due to the small number of difficult to detect boundary cases such as incorrect HTML markup syntax in online domains. Overall however, we are confident that our frequency metrics are representative of word prevalence in print news media content (see Figure 2 of main manuscript for supporting evidence of the temporal precision of the method).
c
Opendata - Tuscany Region - Sites - CKAN Ecosystem Catalog
catalog.civicdataecosystem.org
Updated May 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Opendata - Tuscany Region - Sites - CKAN Ecosystem Catalog [Dataset]. https://catalog.civicdataecosystem.org/dataset/opendata-tuscany-region
Explore at:
Dataset updated
May 13, 2025
Area covered
Tuscany
Description
What was the average price of a house in the United Kingdom in 1935? When will India's population surpass that of China? Where can you admire publicly funded works of art in Seattle? The data to answer many, many questions like these exists somewhere on the Internet - but it's not always easy to find. The Open Data platform, created as part of the actions foreseen by the Tuscan Digital Agenda, makes reusable public data available in open format, thus maximizing transparency and ease of access to the many pieces of information available to the Tuscany Region. The goal is to publish, through a gradual process, the many datasets whose ownership belongs to the Tuscany Region and other Public Administrations of the regional territory adhering to the Tuscan Regional Telematics Network (RTRT), creating an infrastructure that will allow public and private entities and civil society to create new services and applications capable of improving access to information, transparency, and therefore the social, cultural, and economic life of the entire Tuscan territory. This site is based on a powerful open-source data cataloging software, called CKAN, developed by the Open Knowledge Foundation. Each 'dataset' entry on CKAN contains a description of the data and other useful information, such as the available formats, the holder, the freedom of access and reuse, and the topics that the data address. Other users can improve or modify this information (CKAN keeps a history of all these changes). CKAN is used for several data catalogs on the Internet. The Data Hub is a freely editable and reusable catalog, in the style of Wikipedia. The British government uses CKAN for the data.gov.uk portal, which currently has about 8000 government datasets. The official public data of most European countries are collected in a CKAN catalog on publicdata.eu. There is also a list of these catalogs from all over the world on datacatalogs.org, which is in turn based on CKAN. Most of the data on the Tuscany Region Open Data portal is freely accessible and reusable: anyone has the right to use and reuse the data in any way they prefer. Maybe someone will take that nice dataset on the city's works of art that you found, and add it to a tourist map - or develop a new app for your smartphone, which will help you find monuments when you visit the city. Open data means more enterprise, collaborative scientific research, and transparent public administration. You can learn more about this topic in the Open Data Handbook. The Open Knowledge Foundation is a non-profit organization that promotes free knowledge: the development and constant improvement of CKAN is one of the ways to achieve this goal. If you want to participate in the design or development, join the public discussion or development lists, or check out the OKFN website to discover the other ongoing projects. CKAN is the world's leading platform for open-source data portals. CKAN is a complete and ready-to-use software solution that makes data accessible and usable – providing tools to optimize its publication, search and use (including data storage and the availability of robust APIs). CKAN is aimed at organizations that publish data (national and local governments, companies and institutions) and want to make it open and accessible to all. CKAN is used by governments and user groups around the world to manage a wide range of data portals for official and community bodies, including portals for local, national and international governments, such as data.gov.uk in the UK and publicdata.eu of the European Union, dados.gov.br in Brazil, government portals of the Netherlands and the Netherlands, as well as city and municipal administration sites in the USA, the United Kingdom, Argentina, Finland and other countries. CKAN: http://ckan.org/ Tour of CKAN: http://ckan.org/tour/ Overview of functions: http://ckan.org/features/ CKAN's page-view tracking feature is enabled. Translated from Italian Original Text: Qual era il prezzo medio di una casa nel Regno Unito nel 1935? Quando avverrà il sorpasso della popolazione dell'India su quella della Cina? Dove si possono ammirare opere d'arte finanziate da enti pubblici a Seattle? I dati per rispondere a molte, molte domande come queste esistono da qualche parte in Internet - ma non è sempre facile trovarli. La piattaforma Open Data, realizzata nell'ambito delle azioni previste dall'Agenda digitale toscana, mette a disposizione dati pubblici riutilizzabili, in formato aperto, favorendo così al massimo la trasparenza e la facilità di accesso alle tante informazioni di cui dispone la Regione Toscana. L'obiettivo è quello di pubblicare, attraverso un processo graduale, i tanti dataset la cui titolarità afferisce alla Regione Toscana e ad altre Pubbliche amministrazioni del territorio regionale aderenti alla Rete telematica regionale toscana (RTRT), creando un'infrastruttura che consentirà a soggetti pubblici, privati e della società civile di creare nuovi servizi e applicazioni in grado di migliorare l'accesso all'informazione, la trasparenza e quindi la vita sociale, culturale ed economica dell'intero territorio toscano. Questo sito è basato su un potente software open-source di catalogazione dei dati, chiamato CKAN, sviluppato dalla Open Knowledge Foundation. Ogni voce di 'dataset' su CKAN contiene una descrizione dei dati e altre informazioni utili, come i formati disponibili, il detentore, la libertà di accesso e riuso, e gli argomenti che i dati affrontano. Gli altri utenti possono migliorare o modificare queste informazioni (CKAN mantiene una cronologia di tutte queste modifiche). CKAN è utilizzato per diversi cataloghi di dati su Internet. The Data Hub è un catalogo liberamente modificabile e riutilizzabile, nello stile di Wikipedia. Il governo britannico usa CKAN per il portale data.gov.uk, che attualmente conta circa 8000 dataset governativi. I dati pubblici ufficiali della maggior parte dei paesi europei sono raccolti in un catalogo CKAN su publicdata.eu. Esiste anche una lista di questi cataloghi da tutto il mondo su datacatalogs.org, che è a sua volta basato su CKAN. La maggior parte dei dati sul portale Open Data della Regione Toscana è liberamente accessibile e riutilizzabile: chiunque ha il diritto di utilizzare e riutilizzare i dati nel modo che preferisce. Magari qualcuno prenderà quel simpatico dataset sulle opere d'arte della città che avevi trovato tu, e lo aggiungerà a una mappa turistica - oppure svilupperà una nuova app per il tuo smartphone, che ti aiuterà a trovare i monumenti quando visiti la città. Gli open data significano più impresa, ricerca scientifica collaborativa e pubblica amministrazione trasparente. Puoi approfondire questo argomento nell'Open Data Handbook. La Open Knowledge Foundation è una organizzazione no-profit che promuove il sapere libero: lo sviluppo e il miglioramento costante di CKAN è uno dei modi per raggiungere questo obiettivo. Se vuoi partecipare alla progettazione o allo sviluppo, unisciti alle liste pubbliche di discussione o sviluppo, o dai un'occhiata al sito della OKFN per scoprire gli altri progetti in corso. CKAN è la piattaforma leader mondiale per i portali di dati open-source. CKAN è una soluzione software completa e pronta all'uso che rende accessibili e utilizzabili i dati – fornendo strumenti per ottimizzarne la pubblicazione, la ricerca e l'utilizzo (inclusa l'archiviazione dei dati e la disponibilità di solide API). CKAN si rivolge alle organizzazioni che pubblicano dati (governi nazionali e locali, aziende ed istituzioni) e desiderano renderli aperti e accessibili a tutti. CKAN è usato da governi e gruppi di utenti in tutto il mondo per gestire una vasta serie di portali di dati di enti ufficiali e di comunità, tra cui portali per governi locali, nazionali e internazionali, come data.gov.uk nel Regno Unito e publicdata.eu dell'Unione Europea, dados.gov.br in Brasile, portali di governo dell'Olanda e dei Paesi Bassi, oltre a siti di amministrazione cittadine e municipali negli USA, nel Regno Unito, Argentina, Finlandia e altri paesi. CKAN: http://ckan.org/ Tour di CKAN: http://ckan.org/tour/ Panoramica delle funzioni: http://ckan.org/features/ CKAN's page-view tracking feature is enabled.
c
Rostock Open Data Portal - Sites - CKAN Ecosystem Catalog
catalog.civicdataecosystem.org
Updated Apr 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Rostock Open Data Portal - Sites - CKAN Ecosystem Catalog [Dataset]. https://catalog.civicdataecosystem.org/dataset/rostock-open-data-portal
Explore at:
Dataset updated
Apr 22, 2025
Area covered
Rostock
Description
Where exactly was that elementary school again that's closest to your home and that your children can easily reach without having to cross many streets? Can you reach your workplace entirely via bike paths? Will you have to wait at the construction site again next Sunday on the way to the sports field? There is a lot of data on the internet that can answer these and similar questions – but finding it is not always easy. OpenData.HRO is a web application that serves as a catalog for many useful datasets. The application is operated by the Hanseatic and University City of Rostock, which is also the owner and publisher of the data. You can use the application to search for, view, and download data for yourself and/or others. Depending on the type of dataset, OpenData.HRO also offers it as database content, providing you with some useful statistical and/or visualization tools. The present web application is based on the powerful open-source software CKAN, maintained and further developed by the Open Knowledge Foundation. Each dataset in CKAN consists of a description of the contained data as well as the data itself. The description includes important information such as the type of file formats in which the data is offered, the license under which it is provided, and the categories and subject areas to which it is assigned. The data and their descriptions can be updated or supplemented, with CKAN always recording all changes by means of automatic versioning. CKAN is used by a large number of data catalogs on the internet. The Data Hub, for example, is a publicly editable data catalog in the Wikipedia style. The British government uses CKAN to operate data.gov.uk – currently with approximately 8,000 government datasets. The official public data of most European countries are listed in the CKAN catalog on europeandataportal.eu. You can find a complete list of catalogs like this on dataportals.org, a page that is also operated with CKAN. Unless otherwise stated, the data on OpenData.HRO are subject to a free license. This means that you can freely use and exploit the data in compliance with the conditions set out in the terms of use (and they are anything but restrictive). Perhaps you would like to use the data on art in public spaces to build a smartphone app that helps to make a tour of Rostock culturally sophisticated? Go for it! Open Data promotes entrepreneurship, collaborative science, and transparent administration. You can learn more about Open Data in the Open Data Handbook. The Open Knowledge Foundation is a non-profit organization for the promotion of open knowledge: developing and improving CKAN is one of the ways to achieve this. If you would like to contribute to CKAN with design or code, you can join the developer mailing lists or visit the OKFN pages to learn more about CKAN and other projects. Translated from German Original Text: Wo genau war nochmal die Grundschule, die Ihrem zu Hause am nächsten ist und die Ihre Kinder gut erreichen können, ohne dabei viele Straßen überqueren zu müssen? Können Sie Ihren Arbeitsplatz durchgängig über Fahrradwege erreichen? Werden Sie nächsten Sonntag wieder an der Baustelle warten müssen auf dem Weg zum Sportplatz? Es gibt viele Daten im Internet, die solche und ähnliche Fragen beantworten können – allein sie zu finden ist nicht immer einfach. OpenData.HRO ist eine Web-Anwendung, die als Katalog für viele nützliche Daten dient. Betrieben wird die Anwendung von der Hanse- und Universitätsstadt Rostock, die zugleich Eigentümerin und Herausgeberin der Daten ist. Sie können die Anwendung nutzen, um für sich und/oder andere Daten zu suchen, anzuschauen und herunterzuladen. Abhängig von der Art eines Datensatzes bietet OpenData.HRO diesen auch als Datenbankinhalt an, sodass Ihnen einige nützliche Statistik- und/oder Visualisierungswerkzeuge zur Verfügung gestellt werden. Die vorliegende Web-Anwendung basiert auf der mächtigen Open-Source-Software CKAN, gepflegt und weiterentwickelt von der Open Knowledge Foundation. Jeder Datensatz in CKAN besteht aus einer Beschreibung der enthaltenen Daten sowie den Daten selbst. Zur Beschreibung zählen wichtige Informationen wie zum Beispiel die Art der Dateiformate, in denen die Daten angeboten werden, die Lizenz, unter der sie stehen, und die Kategorien und Themenbereiche, denen sie zugeordnet sind. Die Daten und deren Beschreibungen können aktualisiert oder ergänzt werden, wobei CKAN stets alle Änderungen aufzeichnet mittels einer automatischen Versionierung. CKAN wird von einer großen Anzahl an Datenkatalogen im Internet genutzt. The Data Hub zum Beispiel ist ein von der Öffentlichkeit bearbeitbarer Datenkatalog im Wikipedia-Stil. Die britische Regierung nutzt CKAN, um data.gov.uk zu betreiben – zur Zeit mit etwa 8.000 Regierungsdatensätzen. Die offiziellen öffentlichen Daten der meisten europäischen Staaten sind im CKAN-Katalog auf europeandataportal.eu gelistet. Sie finden eine vollständige Liste von Katalogen wie diesem auf dataportals.org, einer Seite, die ebenfalls mit CKAN betrieben wird. Sofern nicht anders angegeben unterliegen die Daten bei OpenData.HRO einer freien Lizenz. Das heißt, dass Sie die Daten unter Einhaltung der in den Nutzungsbedingungen festgelegten Konditionen (und die sind alles andere als restriktiv) beliebig verwenden und verwerten können. Vielleicht möchten Sie ja die Daten zur Kunst im öffentlichen Raum nutzen, um eine Smartphone-App zu bauen, die dabei hilft, einen Rundgang durch Rostock kulturell anspruchsvoll zu gestalten? Nur zu! Open Data fördert den Unternehmergeist, gemeinschaftliche Wissenschaft und transparentes Verwaltungshandeln. Mehr zu Open Data erfahren Sie im Open Data Handbook. Die Open Knowledge Foundation ist eine gemeinnützige Organisation zur Förderung von offenem Wissen: CKAN zu entwickeln und zu verbessern ist einer der Wege dies zu erreichen. Wenn Sie mit Design oder Code zu CKAN beitragen möchten, so können Sie den Entwickler-Mailinglisten beitreten oder die OKFN-Seiten besuchen, um mehr über CKAN und andere Projekte zu erfahren.
h
womanru-posts
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nyuuzyou, womanru-posts [Dataset]. https://huggingface.co/datasets/nyuuzyou/womanru-posts
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
nyuuzyou
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
Dataset Card for Woman.ru Forum Posts

Dataset Summary

This dataset contains 1,308,238 forum posts from Woman.ru, a popular Russian-language information and entertainment portal. Woman.ru is one of the most visited women's sites in Runet (Russian Internet). The dataset covers posts from around 2005 to 2024, providing a comprehensive view of discussions on the platform over nearly two decades. The content includes original posts and replies on various topics, offering… See the full description on the dataset page: https://huggingface.co/datasets/nyuuzyou/womanru-posts.
Corresponding-Colour Datasets - Luo and Rhodes (1999)
zenodo.org
data.niaid.nih.gov
bin
Updated Nov 30, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
M. Ronnier Luo; Peter A. Rhodes; M. Ronnier Luo; Peter A. Rhodes (2020). Corresponding-Colour Datasets - Luo and Rhodes (1999) [Dataset]. http://doi.org/10.5281/zenodo.3270903
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3270903
Dataset updated
Nov 30, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
M. Ronnier Luo; Peter A. Rhodes; M. Ronnier Luo; Peter A. Rhodes
Description
Source URL: https://web.archive.org/web/20031123133629/http://colour.derby.ac.uk:80/colour/info/catweb/
Source DOI: https://doi.org/10.1002/(SICI)1520-6378(199908)24:4%3C295::AID-COL10%3E3.0.CO;2-K

M. Ronnier Luo and Peter A. Rhodes

Colour & Imaging Institute
University of Derby
Derby
England

INTRODUCTION

A chromatic adaptation transform is capable of predicting corresponding colours. Corresponding colours are described by two sets of tristimulus values that give rise to the same perceived colour when the two samples are viewed under test and reference light sources or illuminants. The two light sources or illuminants differ in terms of their colour temperatures (or chromaticity coodinates). A chromatic adaptation transform can be effectively used for numerous industrial applications such as the evaluation of colour inconstancy for surface samples, the calculation of colour difference between pairs of samples assessed under non-daylight sources or illuminants, the provision of a colour rendering index for assessing the quality of light sources, or the prediction of coloured images across different sources or illuminants.

In October 1998, the CIE formed a new technical committee, TC 1-52, on Chromatic Adaptation Transforms during its interim meeting in Baltimore, USA with Professor M. R. Luo as its chairman. The objective of this committee is to review certain chromatic adaptation transforms with a view to making a CIE recommendation. The performance of chromatic adaptation transforms is normally evaluated using corresponding-colour experimental data sets in which each colour is defined by two sets of tristimulus values under two illuminants. Many experiments were carried out using a variety of psychophysical methods under different viewing conditions. A comprehensive collection of these data sets has been accumulated by Luo and Hunt [1] for the purposes of deriving and evaluating the CIE colour appearance model, CIECAM97s [2], and the CMC chromatic adaptation transform, CMCCAT97 [3]. The Committee has decided to make these data sets available via the Internet for public assessment. This task has been completed and the resulting database is now available via the world wide web at http://colour.derby.ac.uk. Researchers or industrialists are welcome to acquire this database for further study. This paper gives a brief description of each data set and describes the format of the data.

EXPERIMENTAL DATA SETS

Fourteen data sets have been accumulated from nine sources [4-11]: the Color Science Association of Japan (CSAJ), Helson, Lam and Rigg, LUTCHI, Kuo and Luo, Breneman, Braun and Fairchild, and McCann. Each data set includes a number of corresponding-colour pairs in which both colours in a pair appear the same when each is viewed under different viewing conditions. Table I summarises the experimental conditions in each data set including the number of phases (as defined by a set of viewing conditions), the number of corresponding-colour pairs and the viewing parameters used. The parameters considered are the light sources used for the test and reference conditions, illuminance (lux), the luminance factor of the neutral background (Y%), sample size, media and psychophysical method.

The CSAJ [4] data was divided into three sets: -C, -Hunt and -Stevens according to studies on chromatic adaptation, Hunt and Stevens effects respectively. The Helson [5], Lam and Rigg [6] data sets include corresponding colours between the test source (A) and reference source (D65). The LUTCHI [7] data includes three sets - A, D50 and WF - which are the test illuminants against a reference D65 simulator. Similarly, there are two sets for Kuo and Luo [8] data: A and TL84, which are the test light sources against a reference D65 simulator. The only data set based upon transparent media in this category is the Breneman [9] data which was divided into two sets: -C and -L according to investigations on chromatic adaptation and illuminance effects respectively. The Braun and Fairchild [10] data was accumulated by asking observers to adjust monitor colours to match those presented on reflection prints. The McCann [11] data were obtained by investigating the chromatic adaptation effect using a Mondrain figure viewed under highly chromatic test illuminants with low illuminances. Its original data was further analysed to obtain corresponding tristimulus values by Nayatani et al [12].

In total, 746 corresponding-colour pairs were gathered from experiments involving 38 phases of viewing conditions. The psychophysical methods used are haploscopic matching, memory matching and magnitude estimation.

Table I: Summary of the corresponding-colour data sets

| Data Set | No. of Phases | No. of Samples | Illuminant | | Illuminance(lux) | Background(Y%) | Sample Size | Medium | Experimental Method | |-------------------|---------------|----------------|------------|---------------|------------------|----------------|-------------|-------------|---------------------| | | | | Test | Ref. | | | | | | | CSAJ-C | 1 | 87 | D65 | A | 1000 | 20 | S | Refl. | Haploscopic | | CSAJ-Hunt | 4 | 20 | D65 | D65 | 10-3000 | 20 | S | Refl. | Haploscopic | | CSAJ-Stevens | 4 | 19 | D65 | D65 | 10-3000 | 20 | S | Refl. | Haploscopic | | Helson | 1 | 59 | D65 | A | 1000 | 20 | S | Refl. | Memory | | Lam & Rigg | 1 | 58 | D65 | A | 1000 | 20 | L | Refl. | Memory | | Lutchi (A) | 1 | 43 | D65 | A | 1000 | 20 | S | Refl. | Magnitude | | Lutchi (D50) | 1 | 44 | D65 | D50 | 1000 | 20 | S | Refl. | Magnitude | | Lutchi (WF) | 1 | 41 | D65 | WF | 1000 | 20 | S | Refl. | Magnitude | | Kuo & Luo (A) | 1 | 40 | D65 | A | 1000 | 20 | L | Refl. | Magnitude | | Kuo & Luo (TL84) | 1 | 41 | D65 | TL84 | 1000 | 20 | S | Refl. | Magnitude | | Breneman-C | 9 | 107 | D65, D55 | A, P, G | 50-3870 | 30 | S | Trans. | Magnitude | | Breneman-L | 3 | 36 | D55 | D55 | 50-3870 | 30 | S | Trans. | Haploscopic | | Braun & Fairchild | 4 | 66 | D65 | D30, D65, D95 | 129 | 20 | S | Mon., Refl. | Matching | | McCann | 5 | 85 | D65 | R, Y, G, B | 14-40 | 30 | S | Refl. | Haploscopic |

DATA FILE DESCRIPTION

<a
Mobile internet penetration in Europe 2024, by country
statista.com
Updated Feb 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2025). Mobile internet penetration in Europe 2024, by country [Dataset]. https://www.statista.com/topics/779/mobile-internet/
Explore at:
Dataset updated
Feb 5, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Description
Switzerland is leading the ranking by population share with mobile internet access , recording 95.06 percent. Following closely behind is Ukraine with 95.06 percent, while Moldova is trailing the ranking with 46.83 percent, resulting in a difference of 48.23 percentage points to the ranking leader, Switzerland. The penetration rate refers to the share of the total population having access to the internet via a mobile broadband connection.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).

Countries with the most Facebook users 2024

statista.com
ai-chatbox.pro
+1more

Updated Jun 17, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Stacy Jo Dixon (2025). Countries with the most Facebook users 2024 [Dataset]. https://www.statista.com/topics/1164/social-networks/

Explore at:

Dataset updated

Jun 17, 2025

Dataset provided by

Statistahttp://statista.com/

Authors

Stacy Jo Dixon

Description

Which county has the most Facebook users?

              There are more than 378 million Facebook users in India alone, making it the leading country in terms of Facebook audience size. To put this into context, if India’s Facebook audience were a country then it would be ranked third in terms of largest population worldwide. Apart from India, there are several other markets with more than 100 million Facebook users each: The United States, Indonesia, and Brazil with 193.8 million, 119.05 million, and 112.55 million Facebook users respectively.

              Facebook – the most used social media

              Meta, the company that was previously called Facebook, owns four of the most popular social media platforms worldwide, WhatsApp, Facebook Messenger, Facebook, and Instagram. As of the third quarter of 2021, there were around 3,5 billion cumulative monthly users of the company’s products worldwide. With around 2.9 billion monthly active users, Facebook is the most popular social media worldwide. With an audience of this scale, it is no surprise that the vast majority of Facebook’s revenue is generated through advertising.

              Facebook usage by device
              As of July 2021, it was found that 98.5 percent of active users accessed their Facebook account from mobile devices. In fact, almost 81.8 percent of Facebook audiences worldwide access the platform only via mobile phone. Facebook is not only available through mobile browser as the company has published several mobile apps for users to access their products and services. As of the third quarter 2021, the four core Meta products were leading the ranking of most downloaded mobile apps worldwide, with WhatsApp amassing approximately six billion downloads.

Mobile internet users worldwide 2020-2029
statista.com
Updated Feb 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2025). Mobile internet users worldwide 2020-2029 [Dataset]. https://www.statista.com/topics/779/mobile-internet/
Explore at:
Dataset updated
Feb 5, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Description
The global number of smartphone users in was forecast to continuously increase between 2024 and 2029 by in total 1.8 billion users (+42.62 percent). After the ninth consecutive increasing year, the smartphone user base is estimated to reach 6.1 billion users and therefore a new peak in 2029. Notably, the number of smartphone users of was continuously increasing over the past years.Smartphone users here are limited to internet users of any age using a smartphone. The shown figures have been derived from survey data that has been processed to estimate missing demographics.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of smartphone users in countries like Australia & Oceania and Asia.

Facebook

Twitter

Click to copy link

Link copied

Cite

Đinh Ngọc Ân (2024). HTTPS traffic classification [Dataset]. https://www.kaggle.com/datasets/inhngcn/https-traffic-classification/code

Data from: HTTPS traffic classification

Network Traffic Data to classify web activities

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Mar 11, 2024

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Đinh Ngọc Ân

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

The people from Czech are publishing a dataset for the HTTPS traffic classification.

Since the data were captured mainly in the real backbone network, they omitted IP addresses and ports. The datasets consist of calculated from bidirectional flows exported with flow probe Ipifixprobe. This exporter can export a sequence of packet lengths and times and a sequence of packet bursts and time. For more information, please visit ipfixprobe repository (Ipifixprobe).

During research, they divided HTTPS into five categories: L -- Live Video Streaming, P -- Video Player, M -- Music Player, U -- File Upload, D -- File Download, W -- Website, and other traffic.

They have chosen the service representatives known for particular traffic types based on the Alexa Top 1M list and Moz's list of the most popular 500 websites for each category. They also used several popular websites that primarily focus on the audience in Czech. The identified traffic classes and their representatives are provided below:

Live Video Stream Twitch, Czech TV, YouTube Live Video Player DailyMotion, Stream.cz, Vimeo, YouTube Music Player AppleMusic, Spotify, SoundCloud File Upload/Download FileSender, OwnCloud, OneDrive, Google Drive Website and Other Traffic Websites from Alexa Top 1M list

Clear search

Close search

Google apps

Main menu

Data from: HTTPS traffic classification

Number of internet users worldwide 2014-2029

What social Media People like the most and why?

Dataset

Contents

IMDB Top 250 Lists (1996 - 2021)

Context

Content

Acknowledgements

Inspiration

Global Starlink Web Cache Latency & Traceroute Measurement Dataset

Dataset File Content

Metadata

Measurement results - Raw Data

Measurement results - Preprocessed Latency

Dataset for collaborative prediction of web service quality based on user...

Context Ad Clicks Dataset

Context

Content

Inspiration

Imgur Most Viral and Secret Santa

Context

Content

Data Dictionary

Acknowledgements

Inspiration

Demir, Nurullah, Urban, Tobias, Pohlmann, Norbert, Wressnegger, Christian...

Data from: Revealing QoE of Web Users from Encrypted Network Traffic

Reddit r/brasil data

Evolution of Web search engine interfaces through SERP screenshots and HTML...

Dataset for Report: "The Increasing Prominence of Prejudice and Social...

Opendata - Tuscany Region - Sites - CKAN Ecosystem Catalog

Rostock Open Data Portal - Sites - CKAN Ecosystem Catalog

womanru-posts

Corresponding-Colour Datasets - Luo and Rhodes (1999)

Mobile internet penetration in Europe 2024, by country

Countries with the most Facebook users 2024

Mobile internet users worldwide 2020-2029

Data from: HTTPS traffic classification

Network Traffic Data to classify web activities