Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Popular Website Traffic Over Time ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/popular-website-traffice on 13 February 2022.
--- Dataset description provided by original source is as follows ---
Background
Have you every been in a conversation and the question comes up, who uses Bing? This question comes up occasionally because people wonder if these sites have any views. For this research study, we are going to be exploring popular website traffic for many popular websites.
Methodology
The data collected originates from SimilarWeb.com.
Source
For the analysis and study, go to The Concept Center
This dataset was created by Chase Willden and contains around 0 samples along with 1/1/2017, Social Media, technical information and other features such as: - 12/1/2016 - 3/1/2017 - and more.
- Analyze 11/1/2016 in relation to 2/1/2017
- Study the influence of 4/1/2017 on 1/1/2017
- More datasets
If you use this dataset in your research, please credit Chase Willden
--- Original source retains full ownership of the source dataset ---
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The people from Czech are publishing a dataset for the HTTPS traffic classification.
Since the data were captured mainly in the real backbone network, they omitted IP addresses and ports. The datasets consist of calculated from bidirectional flows exported with flow probe Ipifixprobe. This exporter can export a sequence of packet lengths and times and a sequence of packet bursts and time. For more information, please visit ipfixprobe repository (Ipifixprobe).
During research, they divided HTTPS into five categories: L -- Live Video Streaming, P -- Video Player, M -- Music Player, U -- File Upload, D -- File Download, W -- Website, and other traffic.
They have chosen the service representatives known for particular traffic types based on the Alexa Top 1M list and Moz's list of the most popular 500 websites for each category. They also used several popular websites that primarily focus on the audience in Czech. The identified traffic classes and their representatives are provided below:
Live Video Stream Twitch, Czech TV, YouTube Live Video Player DailyMotion, Stream.cz, Vimeo, YouTube Music Player AppleMusic, Spotify, SoundCloud File Upload/Download FileSender, OwnCloud, OneDrive, Google Drive Website and Other Traffic Websites from Alexa Top 1M list
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Entertainment_KSA.csv dataset contains data on various entertainment spots in Saudi Arabia. With over 500 rows of data, this dataset provides information on the name, rating, review count, genre, location, and best comment for each entertainment spot. This dataset can be used to analyze the entertainment industry in Saudi Arabia and understand the types of entertainment spots available in the country.
The way of creating datasets like Entertainment_KSA.csv is by web scraping information from public sources such as Google Maps or Yelp. Web scraping is the process of automatically extracting data from websites using software tools. In this case, a web scraper would be programmed to visit the relevant pages on Google Maps or Yelp and extract information on entertainment spots such as name, rating, review count, genre, location, and best comment.
The scraped data can then be saved in a CSV file, like the Entertainment_KSA.csv dataset. Once the data is collected, it can be cleaned and processed to remove any errors or duplicates and then analyzed to gain insights into the entertainment industry in Saudi Arabia.
As for inspiration, datasets like Entertainment_KSA.csv can be used for a variety of purposes, including market research, trend analysis, and predictive modeling. Researchers and data analysts can use this dataset to explore the types of entertainment spots available in Saudi Arabia, identify popular spots, and understand the factors that influence customer reviews and ratings.
For example, this dataset could be used to predict which new entertainment spots are likely to be successful based on their genre, location, and other factors. It could also be used to identify trends in the entertainment industry in Saudi Arabia, such as the increasing popularity of certain genres or the growth of entertainment spots in specific regions.
Introduction
The GiGL Spaces to Visit dataset provides locations and boundaries for open space sites in Greater London that are available to the public as destinations for leisure, activities and community engagement. It includes green corridors that provide opportunities for walking and cycling.
The dataset has been created by Greenspace Information for Greater London CIC (GiGL). As London’s Environmental Records Centre, GiGL mobilises, curates and shares data that underpin our knowledge of London’s natural environment. We provide impartial evidence to support informed discussion and decision making in policy and practice.
GiGL maps under licence from the Greater London Authority.
Description
This dataset is a sub-set of the GiGL Open Space dataset, the most comprehensive dataset available of open spaces in London. Sites are selected for inclusion in Spaces to Visit based on their public accessibility and likelihood that people would be interested in visiting.
The dataset is a mapped Geographic Information System (GIS) polygon dataset where one polygon (or multi-polygon) represents one space. As well as site boundaries, the dataset includes information about a site’s name, size and type (e.g. park, playing field etc.).
GiGL developed the Spaces to Visit dataset to support anyone who is interested in London’s open spaces - including community groups, web and app developers, policy makers and researchers - with an open licence data source. More detailed and extensive data are available under GiGL data use licences for GIGL partners, researchers and students. Information services are also available for ecological consultants, biological recorders and community volunteers – please see www.gigl.org.uk for more information.
Please note that access and opening times are subject to change (particularly at the current time) so if you are planning to visit a site check on the local authority or site website that it is open.
The dataset is updated on a quarterly basis. If you have questions about this dataset please contact GiGL’s GIS and Data Officer.
Data sources
The boundaries and information in this dataset, are a combination of data collected during the London Survey Method habitat and open space survey programme (1986 – 2008) and information provided to GiGL from other sources since. These sources include London borough surveys, land use datasets, volunteer surveys, feedback from the public, park friends’ groups, and updates made as part of GiGL’s on-going data validation and verification process.
Due to data availability, some areas are more up-to-date than others. We are continually working on updating and improving this dataset. If you have any additional information or corrections for sites included in the Spaces to Visit dataset please contact GiGL’s GIS and Data Officer.
NOTE: The dataset contains OS data © Crown copyright and database rights 2025. The site boundaries are based on Ordnance Survey mapping, and the data are published under Ordnance Survey's 'presumption to publish'. When using these data please acknowledge GiGL and Ordnance Survey as the source of the information using the following citation:
‘Dataset created by Greenspace Information for Greater London CIC (GiGL), 2025 – Contains Ordnance Survey and public sector information licensed under the Open Government Licence v3.0 ’
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
so if you have to have a G+ account (for YouTube, location services, or other reasons) - here's how you can make it totally private! No one will be able to add you, send you spammy links, or otherwise annoy you. You need to visit the "Audience Settings" page - https://plus.google.com/u/0/settings/audience You can then set a "custom audience" - usually you would use this to restrict your account to people from a specific geographic location, or within a specific age range. In this case, we're going to choose a custom audience of "No-one" Check the box and hit save. Now, when people try to visit your Google+ profile - they'll see this "restricted" message. You can visit my G+ Profile if you want to see this working. (https://plus.google.com/114725651137252000986) If you are not able to understand you can follow this website : http://www.livehuntz.com/google-plus/support-phone-number
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Face Recognition, Face Detection, Male Photo Dataset 👨
If you are interested in biometric data - visit our website to learn more and buy the dataset :)
110,000+ photos of 74,000+ men from 141 countries. The dataset includes photos of people's faces. All people presented in the dataset are men. The dataset contains a variety of images capturing individuals from diverse backgrounds and age groups. Our dataset will diversify your data by adding more photos of men of… See the full description on the dataset page: https://huggingface.co/datasets/TrainingDataPro/male-selfie-image-dataset.
For more information on CDC.gov metrics please see http://www.cdc.gov/metrics/
https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/
🍷 FineWeb
15 trillion tokens of the finest data the 🌐 web has to offer
What is it?
The 🍷 FineWeb dataset consists of more than 18.5T tokens (originally 15T tokens) of cleaned and deduplicated english web data from CommonCrawl. The data processing pipeline is optimized for LLM performance and ran on the 🏭 datatrove library, our large scale data processing library. 🍷 FineWeb was originally meant to be a fully open replication of 🦅 RefinedWeb, with a release… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceFW/fineweb.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Face Recognition, Face Detection, Female Photo Dataset 👩
If you are interested in biometric data - visit our website to learn more and buy the dataset :)
90,000+ photos of 46,000+ women from 141 countries. The dataset includes photos of people's faces. All people presented in the dataset are women. The dataset contains a variety of images capturing individuals from diverse backgrounds and age groups. Our dataset will diversify your data by adding more photos of women of… See the full description on the dataset page: https://huggingface.co/datasets/TrainingDataPro/female-selfie-image-dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here are a few use cases for this project:
Use Case 1: Gender-Based Retail Analytics By analyzing customer demographics in retail stores, the "man vrouw dataset 1" can help retailers understand the gender distribution of their shoppers, empowering them to make informed decisions on store layout, marketing strategies, and product placements.
Use Case 2: Crowd Monitoring and Event Management This model can help enhance safety and optimize visitor experience at crowded events, such as concerts or festivals, by identifying the gender distribution of attendees, enabling promoters to customize services, restrooms allocation, and security measures accordingly.
Use Case 3: Digital Advertising and Marketing Using the "man vrouw dataset 1" model, businesses can better target their digital advertisements by understanding the key demographic visiting specific websites or engaging with specific content, allowing for tailored ad campaigns designed to target male or female audiences.
Use Case 4: Smart Surveillance and Security Systems The model can be used in surveillance and security systems to help identify and track people by their HU classes (man or vrouw) in premises like airports or corporate buildings, allowing security teams to analyze patterns and prevent potential threats.
Use Case 5: Social Media Image Analysis The "man vrouw dataset 1" model can be used to analyze the gender composition of social media images, providing insights into trends, preferences, and behaviors of different gender groups on social platforms. This information can then be used for targeted marketing or social research purposes.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) contains 7356 files (total size: 24.8 GB). The dataset contains 24 professional actors (12 female, 12 male), vocalizing two lexically-matched statements in a neutral North American accent. Speech includes calm, happy, sad, angry, fearful, surprise, and disgust expressions, and song contains calm, happy, sad, angry, and fearful emotions. Each expression is produced at two levels of emotional intensity (normal, strong), with an additional neutral expression. All conditions are available in three modality formats: Audio-only (16bit, 48kHz .wav), Audio-Video (720p H.264, AAC 48kHz, .mp4), and Video-only (no sound). Note, there are no song files for Actor_18.
The RAVDESS was developed by Dr Steven R. Livingstone, who now leads the Affective Data Science Lab, and Dr Frank A. Russo who leads the SMART Lab.
Citing the RAVDESS
The RAVDESS is released under a Creative Commons Attribution license, so please cite the RAVDESS if it is used in your work in any form. Published academic papers should use the academic paper citation for our PLoS1 paper. Personal works, such as machine learning projects/blog posts, should provide a URL to this Zenodo page, though a reference to our PLoS1 paper would also be appreciated.
Academic paper citation
Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391. https://doi.org/10.1371/journal.pone.0196391.
Personal use citation
Include a link to this Zenodo page - https://zenodo.org/record/1188976
Commercial Licenses
Commercial licenses for the RAVDESS can be purchased. For more information, please visit our license page of fees, or contact us at ravdess@gmail.com.
Contact Information
If you would like further information about the RAVDESS, to purchase a commercial license, or if you experience any issues downloading files, please contact us at ravdess@gmail.com.
Example Videos
Watch a sample of the RAVDESS speech and song videos.
Emotion Classification Users
If you're interested in using machine learning to classify emotional expressions with the RAVDESS, please see our new RAVDESS Facial Landmark Tracking data set [Zenodo project page].
Construction and Validation
Full details on the construction and perceptual validation of the RAVDESS are described in our PLoS ONE paper - https://doi.org/10.1371/journal.pone.0196391.
The RAVDESS contains 7356 files. Each file was rated 10 times on emotional validity, intensity, and genuineness. Ratings were provided by 247 individuals who were characteristic of untrained adult research participants from North America. A further set of 72 participants provided test-retest data. High levels of emotional validity, interrater reliability, and test-retest intrarater reliability were reported. Validation data is open-access, and can be downloaded along with our paper from PLoS ONE.
Contents
Audio-only files
Audio-only files of all actors (01-24) are available as two separate zip files (~200 MB each):
Audio-Visual and Video-only files
Video files are provided as separate zip downloads for each actor (01-24, ~500 MB each), and are split into separate speech and song downloads:
File Summary
In total, the RAVDESS collection includes 7356 files (2880+2024+1440+1012 files).
File naming convention
Each of the 7356 RAVDESS files has a unique filename. The filename consists of a 7-part numerical identifier (e.g., 02-01-06-01-02-01-12.mp4). These identifiers define the stimulus characteristics:
Filename identifiers
Filename example: 02-01-06-01-02-01-12.mp4
License information
The RAVDESS is released under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, CC BY-NC-SA 4.0
Commercial licenses for the RAVDESS can also be purchased. For more information, please visit our license fee page, or contact us at ravdess@gmail.com.
Related Data sets
https://data.gov.sg/open-data-licencehttps://data.gov.sg/open-data-licence
Dataset from People's Association. For more information, visit https://data.gov.sg/datasets/d_4d2c99ea159f1f6dd67beb58bf9cbe8d/view
The dataset collection 'Care Days in Institutional Care for People Aged 75 and Over per 1000 Persons of Same Age in Finland' includes data sourced from the 'Sotkanet' website in Finland. This dataset collection consists of one table providing information on the number of care days in institutional care for individuals aged 75 and over per 1000 persons of the same age in Finland. The data within this collection provides insights into the level of institutional care provided to elderly individuals in Finland. Please note that 'Sotkanet' is the English name of the website owner.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Quarterly estimates of overseas residents’ visits and spending. Also includes data on nights, purpose, region of UK visited and mode of travel. Breakdowns by nationality and area of residence are covered. This dataset is published quarterly. The versions published for Quarters 1 (Jan to Mar), 2 (Apr to June) and 3 (July to Sept) are on a separate webpage under the name "Estimates of overseas residents' visits and spending".
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset presents travel duration, season, lodging, well-liked tourist destinations, cuisine, dining options, and details of cultural events in the hill track regions of Bangladesh. The major purpose of the dataset is to develop a tourist chatbot in the hilly visiting places of Bangladesh. Four hill tract regions in Bangladesh—Khagrachhari, Rangamati, Bandarban, and Sylhet—are included in this dataset. Data was gathered from sources such as travelagency.com, community-based travel websites, online and offline surveys with different people, Google Maps, and more. This dataset includes 502 records of hill tract regions from 502 unique users, with 130 records for Khagrachhari, 141 records for Rangamati, 103 records for Bandarban, and 128 records for Sylhet. There were 15 variables (features) considered for the whole 502 data. These features include user ID, district, vehicle, travel time, time to reach destination, season, tourist spots, similar spots, resorts/hotels, restaurants, traditional food, indigenous group, traditional dress/attire, traditional dress shop, and minimum cost (per day).
This page contains data for the immigration system statistics up to March 2023.
For current immigration system data, visit ‘Immigration system statistics data tables’.
https://assets.publishing.service.gov.uk/media/64625e6894f6df0010f5eaab/asylum-applications-datasets-mar-2023.xlsx">Asylum applications, initial decisions and resettlement (MS Excel Spreadsheet, 9.13 MB)
Asy_D01: Asylum applications raised, by nationality, age, sex, UASC, applicant type, and location of application
Asy_D02: Outcomes of asylum applications at initial decision, and refugees resettled in the UK, by nationality, age, sex, applicant type, and UASC
This is not the latest data
https://assets.publishing.service.gov.uk/media/64625ec394f6df0010f5eaac/asylum-applications-awaiting-decision-datasets-mar-2023.xlsx">Asylum applications awaiting a decision (MS Excel Spreadsheet, 1.26 MB)
Asy_D03: Asylum applications awaiting an initial decision or further review, by nationality and applicant type
This is not the latest data
https://assets.publishing.service.gov.uk/media/62fa17698fa8f50b54374371/outcome-analysis-asylum-applications-datasets-jun-2022.xlsx">Outcome analysis of asylum applications (MS Excel Spreadsheet, 410 KB)
Asy_D04: The initial decision and final outcome of all asylum applications raised in a period, by nationality
This is not the latest data
https://assets.publishing.service.gov.uk/media/64625ef1427e41000cb437cb/age-disputes-datasets-mar-2023.xlsx">Age disputes (MS Excel Spreadsheet, 178 KB)
Asy_D05: Age disputes raised and outcomes of age disputes
This is not the latest data
https://assets.publishing.service.gov.uk/media/64625f0ca09dfc000c3c17cf/asylum-appeals-lodged-datasets-mar-2023.xlsx">Asylum appeals lodged and determined (MS Excel Spreadsheet, 817 KB)
Asy_D06: Asylum appeals raised at the First-Tier Tribunal, by nationality and sex
Asy_D07: Outcomes of asylum appeals raised at the First-Tier Tribunal, by nationality and sex
This is not the latest data
https://assets.publishing.service.gov.uk/media/64625f29427e41000cb437cd/asylum-claims-certified-section-94-datasets-mar-2023.xlsx"> Asylum claims certified under Section 94 (MS Excel Spreadsheet, 150 KB)
Asy_D08: Initial decisions on asylum applications certified under Section 94, by nationality
This is not the latest data
https://assets.publishing.service.gov.uk/media/6463a618d3231e000c32da99/asylum-seekers-receipt-support-datasets-mar-2023.xlsx">Asylum seekers in receipt of support (MS Excel Spreadsheet, 2.16 MB)
Asy_D09: Asylum seekers in receipt of support at end of period, by nationality, support type, accommodation type, and UK region
This is not the latest data
https://assets.publishing.service.gov.uk/media/63ecd7388fa8f5612a396c40/applications-section-95-support-datasets-dec-2022.xlsx">Applications for section 95 su
Bearded seals (Erignathus barbatus) are one of the most important subsistence resources for the indigenous people of coastal northern and western Alaska, as well as key components of Arctic marine ecosystems, yet relatively little about their abundance, seasonal distribution, migrations, or foraging behaviors has been documented scientifically. Ice-associated seal populations may be negatively...
The global number of internet users in was forecast to continuously increase between 2024 and 2029 by in total 1.3 billion users (+23.66 percent). After the fifteenth consecutive increasing year, the number of users is estimated to reach 7 billion users and therefore a new peak in 2029. Notably, the number of internet users of was continuously increasing over the past years.Depicted is the estimated number of individuals in the country or region at hand, that use the internet. As the datasource clarifies, connection quality and usage frequency are distinct aspects, not taken into account here.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of internet users in countries like the Americas and Asia.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the ground truth data used to evaluate the musical pitch, tempo and key estimation algorithms developed during the AudioCommons H2020 EU project and which are part of the Audio Commons Audio Extractor tool. It also includes ground truth information for the single-eventness audio descriptor also developed for the same tool.
This ground truth data has been used to generate the following documents:
All these documents are available in the materials section of the AudioCommons website.
All ground truth data in this repository is provided in the form of CSV files. Each CSV file corresponds to one of the individual datasets used in one or more evaluation tasks of the aforementioned deliverables. This repository does not include the audio files of each individual dataset, but includes references to the audio files. The following paragraphs describe the structure of the CSV files and give some notes about how to obtain the audio files in case these would be needed.
Structure of the CSV files
All CSV files in this repository (with the sole exception of SINGLE EVENT - Ground Truth.csv) feature the following 5 columns:
The remaining CSV file, SINGLE EVENT - Ground Truth.csv, has only the following 2 columns:
How to get the audio data
In this section we provide some notes about how to obtain the audio files corresponding to the ground truth annotations provided here. Note that due to licensing restrictions we are not allowed to re-distribute the audio data corresponding to most of these ground truth annotations.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Popular Website Traffic Over Time ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/popular-website-traffice on 13 February 2022.
--- Dataset description provided by original source is as follows ---
Background
Have you every been in a conversation and the question comes up, who uses Bing? This question comes up occasionally because people wonder if these sites have any views. For this research study, we are going to be exploring popular website traffic for many popular websites.
Methodology
The data collected originates from SimilarWeb.com.
Source
For the analysis and study, go to The Concept Center
This dataset was created by Chase Willden and contains around 0 samples along with 1/1/2017, Social Media, technical information and other features such as: - 12/1/2016 - 3/1/2017 - and more.
- Analyze 11/1/2016 in relation to 2/1/2017
- Study the influence of 4/1/2017 on 1/1/2017
- More datasets
If you use this dataset in your research, please credit Chase Willden
--- Original source retains full ownership of the source dataset ---