21 datasets found

R
RECOD.ai events dataset
redu.unicamp.br
Updated Mar 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Repositório de Dados de Pesquisa da Unicamp (2025). RECOD.ai events dataset [Dataset]. http://doi.org/10.25824/redu/BLIYYR
Explore at:
Unique identifier
https://doi.org/10.25824/redu/BLIYYR
Dataset updated
Mar 21, 2025
Dataset provided by
Repositório de Dados de Pesquisa da Unicamp
Dataset funded by
Fundação de Amparo à Pesquisa do Estado de São Paulo
Description
Overview This data set consists of links to social network items for 34 different forensic events that took place between August 14th, 2018 and January 06th, 2021. The majority of the text and images are from Twitter (a minor part is from Flickr, Facebook and Google+), and every video is from YouTube. Data Collection We used Social Tracker, along with the social medias' APIs, to gather most of the collections. For a minor part, we used Twint. In both cases, we provided keywords related to the event to receive the data. It is important to mention that, in procedures like this one, usually only a small fraction of the collected data is in fact related to the event and useful for a further forensic analysis. Content We have data from 34 events, and for each of them we provide the files: items_full.csv: It contains links to any social media post that was collected. images.csv: Enlists the images collected. In some files there is a field called "ItemUrl", that refers to the social network post (e.g., a tweet) that mentions that media. video.csv: Urls of YouTube videos that were gathered about the event. video_tweet.csv: This file contains IDs of tweets and IDs of YouTube videos. A tweet whose ID is in this file has a video in its content. In turn, the link of a Youtube video whose ID is in this file was mentioned by at least one collected tweet. Only two collections have this file. description.txt: Contains some standard information about the event, and possibly some comments about any specific issue related to it. In fact, most of the collections do not have all the files above. Such an issue is due to changes in our collection procedure throughout the time of this work. Events We divided the events into six groups. They are: Fire: Devastating fire is the main issue of the event, therefore most of the informative pictures show flames or burned constructions. 14 Events Collapse: Most of the relevant images depict collapsed buildings, bridges, etc. (not caused by fire). 5 Events Shooting: Likely images of guns and police officers. Few or no destruction of the environment. 5 Events Demonstration: Plethora of people on the streets. Possibly some problem took place on that, but in most cases the demonstration is the actual event. 7 Events Collision: Traffic collision. Pictures of damaged vehicles on an urban landscape. Possibly there are images with victims on the street. 1 Event Flood: Events that range from fierce rain to a tsunami. Many pictures depict water. 2 Events Media Content Due to the terms of use from the social networks, we do not make publicly available the texts, images and videos that were collected. However, we can provide some extra piece of media content related to one (or more) events by contacting the authors.
R
Accident Detection Model Dataset
universe.roboflow.com
zip
Updated Apr 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Accident detection model (2024). Accident Detection Model Dataset [Dataset]. https://universe.roboflow.com/accident-detection-model/accident-detection-model/model/1
Explore at:
zipAvailable download formats
Dataset updated
Apr 8, 2024
Dataset authored and provided by
Accident detection model
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Accident Bounding Boxes
Description
Accident-Detection-Model

Accident Detection Model is made using YOLOv8, Google Collab, Python, Roboflow, Deep Learning, OpenCV, Machine Learning, Artificial Intelligence. It can detect an accident on any accident by live camera, image or video provided. This model is trained on a dataset of 3200+ images, These images were annotated on roboflow.

Problem Statement

Road accidents are a major problem in India, with thousands of people losing their lives and many more suffering serious injuries every year.

According to the Ministry of Road Transport and Highways, India witnessed around 4.5 lakh road accidents in 2019, which resulted in the deaths of more than 1.5 lakh people.

The age range that is most severely hit by road accidents is 18 to 45 years old, which accounts for almost 67 percent of all accidental deaths.

Accidents survey

https://user-images.githubusercontent.com/78155393/233774342-287492bb-26c1-4acf-bc2c-9462e97a03ca.png" alt="Survey">

Literature Survey

Sreyan Ghosh in Mar-2019, The goal is to develop a system using deep learning convolutional neural network that has been trained to identify video frames as accident or non-accident.

Deeksha Gour Sep-2019, uses computer vision technology, neural networks, deep learning, and various approaches and algorithms to detect objects.

Research Gap

Lack of real-world data - We trained model for more then 3200 images.

Large interpretability time and space needed - Using google collab to reduce interpretability time and space required.

Outdated Versions of previous works - We aer using Latest version of Yolo v8.

Proposed methodology

We are using Yolov8 to train our custom dataset which has been 3200+ images, collected from different platforms.

This model after training with 25 iterations and is ready to detect an accident with a significant probability.

Model Set-up

Preparing Custom dataset

We have collected 1200+ images from different sources like YouTube, Google images, Kaggle.com etc.

Then we annotated all of them individually on a tool called roboflow.

During Annotation we marked the images with no accident as NULL and we drew a box on the site of accident on the images having an accident

Then we divided the data set into train, val, test in the ratio of 8:1:1

At the final step we downloaded the dataset in yolov8 format.
#### Using Google Collab

We are using google colaboratory to code this model because google collab uses gpu which is faster than local environments.

You can use Jupyter notebooks, which let you blend code, text, and visualisations in a single document, to write and run Python code using Google Colab.

Users can run individual code cells in Jupyter Notebooks and quickly view the results, which is helpful for experimenting and debugging. Additionally, they enable the development of visualisations that make use of well-known frameworks like Matplotlib, Seaborn, and Plotly.

In Google collab, First of all we Changed runtime from TPU to GPU.

We cross checked it by running command ‘!nvidia-smi’
#### Coding

First of all, We installed Yolov8 by the command ‘!pip install ultralytics==8.0.20’

Further we checked about Yolov8 by the command ‘from ultralytics import YOLO from IPython.display import display, Image’

Then we connected and mounted our google drive account by the code ‘from google.colab import drive drive.mount('/content/drive')’

Then we ran our main command to run the training process ‘%cd /content/drive/MyDrive/Accident Detection model !yolo task=detect mode=train model=yolov8s.pt data= data.yaml epochs=1 imgsz=640 plots=True’

After the training we ran command to test and validate our model ‘!yolo task=detect mode=val model=runs/detect/train/weights/best.pt data=data.yaml’ ‘!yolo task=detect mode=predict model=runs/detect/train/weights/best.pt conf=0.25 source=data/test/images’

Further to get result from any video or image we ran this command ‘!yolo task=detect mode=predict model=runs/detect/train/weights/best.pt source="/content/drive/MyDrive/Accident-Detection-model/data/testing1.jpg/mp4"’

The results are stored in the runs/detect/predict folder.
Hence our model is trained, validated and tested to be able to detect accidents on any video or image.

Challenges I ran into

I majorly ran into 3 problems while making this model

I got difficulty while saving the results in a folder, as yolov8 is latest version so it is still underdevelopment. so i then read some blogs, referred to stackoverflow then i got to know that we need to writ an extra command in new v8 that ''save=true'' This made me save my results in a folder.

I was facing problem on cvat website because i was not sure what
Keras video classification example with a subset of UCF101 - Action...
zenodo.org
data.niaid.nih.gov
application/gzip
Updated May 11, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mikolaj Buchwald; Mikolaj Buchwald (2023). Keras video classification example with a subset of UCF101 - Action Recognition Data Set (top 5 videos) [Dataset]. http://doi.org/10.5281/zenodo.7924745
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7924745
Dataset updated
May 11, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mikolaj Buchwald; Mikolaj Buchwald
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Classify video clips with natural scenes of actions performed by people visible in the videos.

See the UCF101 Dataset web page: https://www.crcv.ucf.edu/data/UCF101.php#Results_on_UCF101

This example datasets consists of the 5 most numerous video from the UCF101 dataset. For the top 10 version see: https://doi.org/10.5281/zenodo.7882861 .

Based on this code: https://keras.io/examples/vision/video_classification/ (needs to be updated, if has not yet been already; see the issue: https://github.com/keras-team/keras-io/issues/1342).

Testing if data can be downloaded from figshare with `wget`, see: https://github.com/mojaveazure/angsd-wrapper/issues/10

For generating the subset, see this notebook: https://colab.research.google.com/github/sayakpaul/Action-Recognition-in-TensorFlow/blob/main/Data_Preparation_UCF101.ipynb -- however, it also needs to be adjusted (if has not yet been already - then I will post a link to the notebook here or elsewhere, e.g., in the corrected notebook with Keras example).

I would like to thank Sayak Paul for contacting me about his example at Keras documentation being out of date.

Cite this dataset as:

Soomro, K., Zamir, A. R., & Shah, M. (2012). UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402. https://doi.org/10.48550/arXiv.1212.0402

To download the dataset via the command line, please use:

wget -q https://zenodo.org/record/7924745/files/ucf101_top5.tar.gz -O ucf101_top5.tar.gz tar xf ucf101_top5.tar.gz
Video Streaming Platforms
kaggle.com
Updated Nov 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UTKARSH SEN (2023). Video Streaming Platforms [Dataset]. https://www.kaggle.com/datasets/utkarshsen/video-streaming-platforms/versions/1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 16, 2023
Dataset provided by
Kaggle
Authors
UTKARSH SEN
Description
The data was collected using a Google Form to find out which video streaming platforms people use. The purpose of the collection was to apply the Apriori algorithm to it, but I am posting it here to find out what else could be done on it.
IdiapVideoAge
zenodo.org
explore.openaire.eu
application/gzip
Updated Sep 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pavel Korshunov; Pavel Korshunov; Sébastien Marcel; Sébastien Marcel (2022). IdiapVideoAge [Dataset]. http://doi.org/10.34777/e6vt-fz55
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.34777/e6vt-fz55
Dataset updated
Sep 7, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Pavel Korshunov; Pavel Korshunov; Sébastien Marcel; Sébastien Marcel
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Description

IdiapVideoAge dataset is a set of youtube video IDs with age labels to facilitates the research in the area of audio-visual age verification with the focus on detecting ages of people below 18 years old. The dataset contains 4260 IDs to the youtube videos that come from two existing video databases: VoxCeleb2 and child speech dataset from Google. Our main contribution are the age labels of people in the videos. Three different human annotators were used for labeling. They were instructed give a valid age label if a person's face in a video is visible within more than 80% of the frames and it is clear that the audible speech matches the person in the video. As the age label, we used the average of the three annotators. Out of the total 4260 videos, 1973 videos are of the minors below 18 years old.

Reference

If you use this dataset, please cite the following publication:

Pavel Korshunov and Sebastien Marcel, "Face Anthropometry Aware Audio-visual Age Verification", ACM Multimedia international conference (MM'22), October 2022.
https://publications.idiap.ch/index.php/publications/show/4862
GitHub Activity Data
console.cloud.google.com
Updated Jun 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:GitHub&inv=1&invt=Ab41nA (2022). GitHub Activity Data [Dataset]. https://console.cloud.google.com/marketplace/product/github/github-repos
Explore at:
Dataset updated
Jun 23, 2022
Dataset provided by
GitHubhttps://github.com/
Googlehttp://google.com/
Description
GitHub is how people build software and is home to the largest community of open source developers in the world, with over 12 million people contributing to 31 million projects on GitHub since 2008. This 3TB+ dataset comprises the largest released source of GitHub activity to date. It contains a full snapshot of the content of more than 2.8 million open source GitHub repositories including more than 145 million unique commits, over 2 billion different file paths, and the contents of the latest revision for 163 million files, all of which are searchable with regular expressions. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
IoTeX Cryptocurrency
console.cloud.google.com
Updated May 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:Cloud%20Public%20Datasets%20-%20Finance&hl=es-419&inv=1&invt=Ab4BAA (2023). IoTeX Cryptocurrency [Dataset]. https://console.cloud.google.com/marketplace/product/public-data-finance/crypto-iotex-dataset?hl=es-419
Explore at:
Dataset updated
May 13, 2023
Dataset provided by
Googlehttp://google.com/
Description
IoTeX is a decentralized crypto system, a new generation of blockchain platform for the development of the Internet of things (IoT). The project team is sure that the users do not have such an application that would motivate to implement the technology of the Internet of things in life. And while this will not be created, people will not have the desire to spend money and time on IoT. The developers of IoTeX decided to implement not the application itself, but the platform for creation. It is through the platform that innovative steps in the space of the Internet of things will be encouraged. Learn more... This dataset is one of many crypto datasets that are available within the Google Cloud Public Datasets . As with other Google Cloud public datasets, you can query this dataset for free, up to 1TB/month of free processing, every month. Watch this short video to learn how to get started with the public datasets. Want to know how the data from these blockchains were brought into BigQuery, and learn how to analyze the data? Más información
Z
I3D Video Features, Labels and Splits for Multicamera Overlapping Datasets...
data.niaid.nih.gov
Updated Jan 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SANTIAGO LOPES PEREIRA, SILAS (2025). I3D Video Features, Labels and Splits for Multicamera Overlapping Datasets Pets-2009, HQFS and Up-Fall [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_14655605
Explore at:
Dataset updated
Jan 16, 2025
Dataset provided by
Maia, José
SANTIAGO LOPES PEREIRA, SILAS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
I3D Video Features, Labels and Splits for Multicamera Overlapping Datasets Pets-2009, HQFS and Up-Fall

The Inflated 3D (I3D) video features, ground truths, and train/test splits for the multicamera datasets Pets-2009, HQFS, and Up-Fall are available here. We relabeled two datasets (HQFS and Pets-2009) for the task of VAD-MIL under multiple cameras. Three feature dispositions of I3D data are available: I3D-RGB, I3D-OF, and the linear concatenation of these features. These datasets can be used as benchmarks for the video anomaly detection task under multiple instance learning and multiple overlapping cameras.

Preprocessed Datasets

PETS-2009 is a benchmark dataset (https://cs.binghamton.edu/~mrldata/pets2009) aggregating different scene sets with multiple overlapping camera views and distinct events involving crowds. We labeled the scenes at \textit{frame} level as anomaly or normal events. Scenes with background, people walking individually or in a crowd, and regular passing of cars are considered normal patterns. Frames with occurrences of people running (individually or in the crowd), crowding of people in the middle of the traffic intersection, and people in the counterflow were considered anomalous patterns. Videos of scenes with the occurrence of anomalous frames are labeled as anomalous, while videos without the occurrence of anomalies are marked as normal videos. The High-Quality Fall Simulation Data - HQFS dataset (https://iiw.kuleuven.be/onderzoek/advise/datasets/fall-and-adl-meta-data) is an indoor scenario with five overlapping cameras with the occurrence of fall incidents. We consider a person falling on the floor an uncommon event. We also relabeled the frame annotations to consider the intervals where the person remains lying on the ground after the fall. The multi-class Up-Fall (https://sites.google.com/up.edu.mx/har-up/) detection dataset contains two overlapping camera views and infrared sensors in a laboratory scenario.

Video Feature Extraction

We use Inflated 3D (I3D) features to represent video clips of 16 frames. We use the Video Features library (https://github.com/v-iashin/video_features) that uses a pre-trained model on the Kinetics 400 dataset. For this procedure, the frame sequence length from which to get the video clip feature representation (or window size) and the number of frames to step before extracting the next features were set to 16 frames. After the video extraction process, each video from each camera corresponds to a matrix with dimension n x 1024, where n is a variable number of existing segments and the number of attributes is 1024 (I3D attributes referring to RGB appearance information or I3D attributes referring to Optical Flow information). It is important to note that the videos (\textit{bags}) are divided into clips with a fixed number of \textit{frames}. Consequently, each video \textit{bag} contains a variable number of clips. A clip can be completely normal, completely anomalous, or mixed with normal and anomalous frames. There are three possible deep feature dispositions considered: I3D features generated with only RGB (1024 I3D features from RGB data), Optical Flow (1024 I3D features from optical flow data), and the combination of both (by simple linear concatenation). We also make available 10-crop features (https://pytorch.org/vision/main/generated/torchvision.transforms.TenCrop.html) by yielding 10 crops for a given video clip.

File Description

center-crop.zip: Folder with I3D features of Pets-2009, HQFS and Up-Fall datasets;

10-crop.zip: Folder with I3D features (10-crop) of Pets-2009, HQFS and Up-Fall datasets;

gts.zip: Folder with ground truths at frame-level and video-level of Pets-2009, HQFS and Up-Fall datasets;

splits.zip: Folder with Lists of training and test splits of Pets-2009, HQFS and Up-Fall datasets;

A portion of the preprocessed I3D feature sets was leveraged in the studies outlined in these publications:

Pereira, S. S., & Maia, J. E. B. (2024). MC-MIL: video surveillance anomaly detection with multi-instance learning and multiple overlapped cameras. Neural Computing and Applications, 36(18), 10527-10543. Available at https://link.springer.com/article/10.1007/s00521-024-09611-3.

Pereira, S. S. L., Maia, J. E. B., & Proença, H. (2024, September). Video Anomaly Detection in Overlapping Data: The More Cameras, the Better?. In 2024 IEEE International Joint Conference on Biometrics (IJCB) (pp. 1-10). IEEE. Available at https://ieeexplore.ieee.org/document/10744502.
Physical Exercise Recognition Dataset
kaggle.com
Updated Feb 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhannad Tuameh (2023). Physical Exercise Recognition Dataset [Dataset]. https://www.kaggle.com/datasets/muhannadtuameh/exercise-recognition
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 16, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Muhannad Tuameh
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Note:

Because this dataset has been used in a competition, we had to hide some of the data to prepare the test dataset for the competition. Thus, in the previous version of the dataset, only train.csv file is existed.

Content

This dataset represents 10 different physical poses that can be used to distinguish 5 exercises. The exercises are Push-up, Pull-up, Sit-up, Jumping Jack and Squat. For every exercise, 2 different classes have been used to represent the terminal positions of that exercise (e.g., “up” and “down” positions for push-ups).

Collection Process

About 500 videos of people doing the exercises have been used in order to collect this data. The videos are from Countix Dataset that contain the YouTube links of several human activity videos. Using a simple Python script, the videos of 5 different physical exercises are downloaded. From every video, at least 2 frames are manually extracted. The extracted frames represent the terminal positions of the exercise.

Processing Data

For every frame, MediaPipe framework is used for applying pose estimation, which detects the human skeleton of the person in the frame. The landmark model in MediaPipe Pose predicts the location of 33 pose landmarks (see figure below). Visit Mediapipe Pose Classification page for more details.

https://mediapipe.dev/images/mobile/pose_tracking_full_body_landmarks.png" alt="33 pose landmarks">
e
Spatially-Led Video Interviews, 2021-2022 - Dataset - B2FIND
b2find.eudat.eu
Updated Feb 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Spatially-Led Video Interviews, 2021-2022 - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/cc25ace1-bef8-50c7-9ad4-a6185f53669d
Explore at:
Dataset updated
Feb 25, 2023
Description
This work develops a spatially-led practice to negotiate and share individuals’ perspectives of their own life course. This technique is designed particularly for researching culture(s) and feeling(s) - everyday life (Highmore, 2011) - attached to a given epoch. The focus of my ESRC Postdoctoral Fellowship project is to understand the increasingly suburban and car-oriented places built in the 1960s and 1970s. The technique relies upon online mapping systems and technologies which allow video conversations to be recorded. The broad methodology takes essential elements of one-to-one biographical walking interviews. Sometimes referred to as go-alongs (Carpiano, 2009), the participant leads the way to show spaces and place significant to their life, with the interviewer guiding the conversation. Covid-19 restrictions limited face-to-face interviews (Hall, Gaved, & Sargent, 2021) but also opened the possibility for many conversations to move on to digital platforms. Spatially-led interviews are hosted on digital platforms such as Zoom, where participants and researchers share walks through media such as Google Maps. The conversation is digitally recorded, providing a complete visual record of the spaces visited during the conversation alongside the faces of the participants and their commentary. There are three specific films in this record. They concern an interview with Pat Wright, who was happy for her likeness to be used. • Moving to Newport in 1963. This gives context about the advantages of modern housing in the 1960s compared to older terraced houses with no central heating. • Demolitions in Newport mid-1970s. Account of the plan to build a by-pass road through Newport. Interesting background on renewing urban fabric of towns and cities in the UK and the rise of Civic Trusts to protect the built environment. • Video opening Newport Library 1968. Context about the opening of Newport Library. Reveals power of geography to connect people with memories. Two other individuals were interviewed using this technique and this data may be made available at a later date. Theoretical considerations Walking approaches allow us to explore the affective connections that people have to spaces such as streets and neighbourhoods. Though less atmospheric and embodied than being on an outdoor walk, the walk through digitally-mapped space is promotes the interviewee to recall memories and feelings. The non-verbal elements of “vitality, performativity, corporeality, sensuality, and mobility” (Vannini, 2015, p. 318) are partly captured through the visual records. These interviews complement other biographical or life story techniques and are particularly useful for meeting people some distance away. In my case I seek to explore the attitudes and values of people who are now considered to be older. The main application for my project is to develop participatory walking tours (Evans & Jones, 2011). The stories that people share through these interviews are interpreted by performance artists, whose playful approach helps to communicate with the public (people of all ages). This is an edited 2-minute film captured using the spatially-led digital walking interview technique developed though my project. The participant reveals her memories of Newport Library being opened on April 5 1968.This ESRC Fellowship project will explore the sensibilities which attach to post-war aesthetics and how those born in the late 1940s, 1950s and early 1960s are navigating the present. Through a focus on the environment in which the UK's ageing population grew up, and spaces including semidetached houses, cul-de-sacs, red brick universities campuses, primary schools, and shopping centres, the research will examine how these spaces still influence contemporary life and maintain an affective appeal. Spaces built between the late 1950s and early 1970s form a large bulk of the UK's built environment. But beyond architecture and planning, they also attract deeper affective, sub-emotional or unconscious connections (Pile, 2010). This study will generate insights on how these generations are adapting and navigating social and cultural change. Included within this record are three short films edited down from a longer filmed recording of an interiview. They are captured using the spatially-led digital walking interview where converdsations take place on an online video chat facility, such as Zoom. We use an online mapping system, such as Google Maps, to navigate a given place. See record for participant information sheet and topic guide.
Lead Scoring Dataset
kaggle.com
zip
Updated Aug 17, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amrita Chatterjee (2020). Lead Scoring Dataset [Dataset]. https://www.kaggle.com/amritachatterjee09/lead-scoring-dataset
Explore at:
zip(411028 bytes)Available download formats
Dataset updated
Aug 17, 2020
Authors
Amrita Chatterjee
Description
Context

An education company named X Education sells online courses to industry professionals. On any given day, many professionals who are interested in the courses land on their website and browse for courses.

The company markets its courses on several websites and search engines like Google. Once these people land on the website, they might browse the courses or fill up a form for the course or watch some videos. When these people fill up a form providing their email address or phone number, they are classified to be a lead. Moreover, the company also gets leads through past referrals. Once these leads are acquired, employees from the sales team start making calls, writing emails, etc. Through this process, some of the leads get converted while most do not. The typical lead conversion rate at X education is around 30%.

Now, although X Education gets a lot of leads, its lead conversion rate is very poor. For example, if, say, they acquire 100 leads in a day, only about 30 of them are converted. To make this process more efficient, the company wishes to identify the most potential leads, also known as ‘Hot Leads’. If they successfully identify this set of leads, the lead conversion rate should go up as the sales team will now be focusing more on communicating with the potential leads rather than making calls to everyone.

There are a lot of leads generated in the initial stage (top) but only a few of them come out as paying customers from the bottom. In the middle stage, you need to nurture the potential leads well (i.e. educating the leads about the product, constantly communicating, etc. ) in order to get a higher lead conversion.

X Education wants to select the most promising leads, i.e. the leads that are most likely to convert into paying customers. The company requires you to build a model wherein you need to assign a lead score to each of the leads such that the customers with higher lead score h have a higher conversion chance and the customers with lower lead score have a lower conversion chance. The CEO, in particular, has given a ballpark of the target lead conversion rate to be around 80%.

Content

Variables Description * Prospect ID - A unique ID with which the customer is identified. * Lead Number - A lead number assigned to each lead procured. * Lead Origin - The origin identifier with which the customer was identified to be a lead. Includes API, Landing Page Submission, etc. * Lead Source - The source of the lead. Includes Google, Organic Search, Olark Chat, etc. * Do Not Email -An indicator variable selected by the customer wherein they select whether of not they want to be emailed about the course or not. * Do Not Call - An indicator variable selected by the customer wherein they select whether of not they want to be called about the course or not. * Converted - The target variable. Indicates whether a lead has been successfully converted or not. * TotalVisits - The total number of visits made by the customer on the website. * Total Time Spent on Website - The total time spent by the customer on the website. * Page Views Per Visit - Average number of pages on the website viewed during the visits. * Last Activity - Last activity performed by the customer. Includes Email Opened, Olark Chat Conversation, etc. * Country - The country of the customer. * Specialization - The industry domain in which the customer worked before. Includes the level 'Select Specialization' which means the customer had not selected this option while filling the form. * How did you hear about X Education - The source from which the customer heard about X Education. * What is your current occupation - Indicates whether the customer is a student, umemployed or employed. * What matters most to you in choosing this course An option selected by the customer - indicating what is their main motto behind doing this course. * Search - Indicating whether the customer had seen the ad in any of the listed items. * Magazine
* Newspaper Article * X Education Forums
* Newspaper * Digital Advertisement * Through Recommendations - Indicates whether the customer came in through recommendations. * Receive More Updates About Our Courses - Indicates whether the customer chose to receive more updates about the courses. * Tags - Tags assigned to customers indicating the current status of the lead. * Lead Quality - Indicates the quality of lead based on the data and intuition the employee who has been assigned to the lead. * Update me on Supply Chain Content - Indicates whether the customer wants updates on the Supply Chain Content. * Get updates on DM Content - Indicates whether the customer wants updates on the DM Content. * Lead Profile - A lead level assigned to each customer based on their profile. * City - The city of the customer. * Asymmetric Activity Index - An index and score assigned to each customer based on their activity and their profile * Asymmetric Profile Index * Asymmetric Activity Score * Asymmetric Profile Score
* I agree to pay the amount through cheque - Indicates whether the customer has agreed to pay the amount through cheque or not. * a free copy of Mastering The Interview - Indicates whether the customer wants a free copy of 'Mastering the Interview' or not. * Last Notable Activity - The last notable activity performed by the student.

Acknowledgements

UpGrad Case Study

Inspiration

Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Screenshots and metadata for 214 reCAPTCHA challenges encountered between...
data.niaid.nih.gov
search.dataone.org
+1more
zip
Updated Jun 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ben Pettis (2024). Screenshots and metadata for 214 reCAPTCHA challenges encountered between September 2022 - September 2023 [Dataset]. http://doi.org/10.5061/dryad.h70rxwdsr
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.h70rxwdsr
Dataset updated
Jun 19, 2024
Dataset provided by
University of Wisconsin–Madison
Authors
Ben Pettis
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
In Chapter 3 of my dissertation (tentatively titled " Becoming Users:Layers of People, Technology, and Power on the Internet. "), I describe how online user activities are datafied and monetized in subtle and often obfuscated ways. The chapter focuses on Google’s reCAPTCHA, a popular implementation of a CAPTCHA challenge. A CAPTCHA, or “Completely Automated Turning test to tell Computers and Humans Apart” is a simple task or challenge which is intended to differentiate between genuine human users and those who may be using software or other automated means to interact maliciously with a website, such as for spam, mass data scraping, or denial of service attacks. reCAPTCHA challenges are increasingly being hidden from direct view of the user, and instead assessing our mouse movements, browsing patterns, and other data to evaluate the likelihood that we are “authentic” users. These hidden challenges raise the stakes of understanding our own construction as Users because they obfuscate practices of surveillance and the ways that our activities as users are commodified by large corporations (Pettis, 2023). By studying the specifics of how such data collection works—that is, how we’re called upon and situated as Users—we can make more informed decisions about how we engage with the contemporary internet. This data set contains metadata for the 214 reCAPTCHA elements that I encountered during my personal use of the Web for the period of one year (September 2022 through September 2023). Of these reCAPTCHAs, 137 were visible challenges—meaning that there was some indication of the presence of a reCAPTCHA challenge. The remaining 77 reCAPTCHAs were entirely hidden on the page. If I had not been running my browser extension, I would likely never have been aware of the use of a reCAPTCHA on the page. The data set also includes screenshots for 174 of the reCAPTCHAs. Screenshots that contain sensitive or private information have been excluded from public access. Researchers can request access to these additional files by contacting Ben Pettis bpettis@wisc.edu. A browsable and searchable version of the data is also available at https://capturingcaptcha.com Methods I developed a custom Google Chrome extension which detects when a page contains a reCAPTCHA and prompts the user to save a screenshot or screen recording while also collecting basic metadata. During Summer 2022, I began work on this website to collate and present the screen captures that I save throughout the year. The purpose of collecting these examples of websites where reCAPTCHAs appear is to understand how this Web element is situated within websites and presented to users, along with sketching out the frequency of their use and on what kinds of websites. Given that I will only be collecting records of my own interactions with reCAPTCHAs, this will not be a comprehensive sample that I can generalize as representative of all Web users. Though my experiences of the reCAPTCHA will differ from those of any other person, this collection will nevertheless be useful for demonstrating how the interface element may be embedded within websites and presented to users. Following Niels Brügger’s descriptions of Web history methods, these screen capture techniques provide an effective way to preserve a portion of the Web as it was actually encountered by a person, as opposed to methods such as automated scraping. Therefore my dissertation offers a methodological contribution to Web historians by demonstrating a technique for identifying and preserving a representation of one Web element within a page, as opposed to focusing an analysis on a whole page or entire website. The browser extension is configured to store data in a cloud-based document database running in MongoDB Atlas. Any screenshots or video recordings are uploaded to a Google Cloud Storage bucket. Both the database and cloud storage bucket are private and are restricted from direct access. The data and screenshots are viewable and searchable at https://capturingcaptcha.com. This data set represents an export of the database as of June 10, 2024. After this date, it is possible that data collection will be resumed, causing more information to be displayed in the online website. The data was exported from the database to a single JSON file (lines format) using the mongoexport command line tool: mongoexport --uri mongodb+srv://[database-url].mongodb.net/production --collection submissions --out captcha-out.json --username [databaseuser]
American Community Survey (ACS)
console.cloud.google.com
Updated Jul 16, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:United%20States%20Census%20Bureau&inv=1&invt=Abyneg (2018). American Community Survey (ACS) [Dataset]. https://console.cloud.google.com/marketplace/product/united-states-census-bureau/acs
Explore at:
Dataset updated
Jul 16, 2018
Dataset provided by
Googlehttp://google.com/
Description
The American Community Survey (ACS) is an ongoing survey that provides vital information on a yearly basis about our nation and its people by contacting over 3.5 million households across the country. The resulting data provides incredibly detailed demographic information across the US aggregated at various geographic levels which helps determine how more than $675 billion in federal and state funding are distributed each year. Businesses use ACS data to inform strategic decision-making. ACS data can be used as a component of market research, provide information about concentrations of potential employees with a specific education or occupation, and which communities could be good places to build offices or facilities. For example, someone scouting a new location for an assisted-living center might look for an area with a large proportion of seniors and a large proportion of people employed in nursing occupations. Through the ACS, we know more about jobs and occupations, educational attainment, veterans, whether people own or rent their homes, and other topics. Public officials, planners, and entrepreneurs use this information to assess the past and plan the future. For more information, see the Census Bureau's ACS Information Guide . This public dataset is hosted in Google BigQuery as part of the Google Cloud Public Datasets Program , with Carto providing cleaning and onboarding support. It is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

Facebook: distribution of global audiences 2024, by age and gender

statista.com
es.statista.com

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Stacy Jo Dixon, Facebook: distribution of global audiences 2024, by age and gender [Dataset]. https://www.statista.com/topics/1164/social-networks/

Explore at:

Dataset provided by

Statistahttp://statista.com/

Authors

Stacy Jo Dixon

Description

As of April 2024, it was found that men between the ages of 25 and 34 years made up Facebook largest audience, accounting for 18.4 percent of global users. Additionally, Facebook's second largest audience base could be found with men aged 18 to 24 years.

              Facebook connects the world

              Founded in 2004 and going public in 2012, Facebook is one of the biggest internet companies in the world with influence that goes beyond social media. It is widely considered as one of the Big Four tech companies, along with Google, Apple, and Amazon (all together known under the acronym GAFA). Facebook is the most popular social network worldwide and the company also owns three other billion-user properties: mobile messaging apps WhatsApp and Facebook Messenger,
              as well as photo-sharing app Instagram. Facebook usersThe vast majority of Facebook users connect to the social network via mobile devices. This is unsurprising, as Facebook has many users in mobile-first online markets. Currently, India ranks first in terms of Facebook audience size with 378 million users. The United States, Brazil, and Indonesia also all have more than 100 million Facebook users each.

Data from: HTTPS traffic classification
kaggle.com
Updated Mar 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Đinh Ngọc Ân (2024). HTTPS traffic classification [Dataset]. https://www.kaggle.com/datasets/inhngcn/https-traffic-classification/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 11, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Đinh Ngọc Ân
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The people from Czech are publishing a dataset for the HTTPS traffic classification.

Since the data were captured mainly in the real backbone network, they omitted IP addresses and ports. The datasets consist of calculated from bidirectional flows exported with flow probe Ipifixprobe. This exporter can export a sequence of packet lengths and times and a sequence of packet bursts and time. For more information, please visit ipfixprobe repository (Ipifixprobe).

During research, they divided HTTPS into five categories: L -- Live Video Streaming, P -- Video Player, M -- Music Player, U -- File Upload, D -- File Download, W -- Website, and other traffic.

They have chosen the service representatives known for particular traffic types based on the Alexa Top 1M list and Moz's list of the most popular 500 websites for each category. They also used several popular websites that primarily focus on the audience in Czech. The identified traffic classes and their representatives are provided below:

Live Video Stream Twitch, Czech TV, YouTube Live Video Player DailyMotion, Stream.cz, Vimeo, YouTube Music Player AppleMusic, Spotify, SoundCloud File Upload/Download FileSender, OwnCloud, OneDrive, Google Drive Website and Other Traffic Websites from Alexa Top 1M list
f
Another Arabic Voice Command Dataset for Multiple Speech Processing Tasks
figshare.com
application/x-gzip
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohamed LICHOURI; Khaled Lounnas; Adil Bakri (2023). Another Arabic Voice Command Dataset for Multiple Speech Processing Tasks [Dataset]. http://doi.org/10.6084/m9.figshare.24520546.v1
Explore at:
application/x-gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24520546.v1
Dataset updated
Nov 8, 2023
Dataset provided by
figshare
Authors
Mohamed LICHOURI; Khaled Lounnas; Adil Bakri
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The expansion of Internet connectivity has revolutionized our daily lives, with people increasingly relying on smartphones and laptops for various tasks. This technological evolution has prompted the development of innovative solutions to enhance the quality of life for diverse populations, including the elderly and individuals with disabilities. Among the most impactful advancements are voice-command-enabled technologies such as SIRI and Google voice commands, which are built upon the foundation of Speech Recognition modules, a critical component in facilitating human-machine communication.Automatic Speech Recognition (ASR) has witnessed significant progress in achieving human-like performance through data-driven methods. In the context of our research, we have meticulously crafted an Arabic voice command dataset to facilitate advancements in ASR and other speech processing tasks. This dataset comprises 10 distinct commands spoken by 10 unique speakers, each repeated 10 times. Despite its modest size, the dataset has demonstrated remarkable performance across a range of speech processing tasks.The dataset was rigorously evaluated, yielding exceptional results. In ASR, it achieved an accuracy of 95.9%, showcasing its potential for effectively transcribing spoken Arabic commands. Furthermore, the dataset excelled in speaker identification, gender recognition, accent recognition, and spoken language understanding, with macro F1 scores of 99.67%, 100%, 100%, and 97.98%, respectively.This Arabic Voice Command Dataset represents a valuable resource for researchers and developers in the field of speech processing and human-machine interaction. Its quality and diversity make it a robust foundation for developing and testing ASR and other related systems, ultimately contributing to the advancement of voice-command technologies and their widespread accessibility.
HathiTrust Books
console.cloud.google.com
Updated Sep 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:The%20GDELT%20Project&hl=pt-BR&inv=1&invt=Ab5IFw (2023). HathiTrust Books [Dataset]. https://console.cloud.google.com/marketplace/product/the-gdelt-project/hathitrust-books?hl=pt-BR
Explore at:
Dataset updated
Sep 23, 2023
Dataset provided by
Googlehttp://google.com/
Description
This dataset contains 2.2 million digitized books stretching back two centuries, encompassing the complete English-language public domain collections of HathiTrust. These collections have been processed using the GDELT Global Knowledge Graph and are available in Google BigQuery. More than a billion pages stretching back 215 years have been examined to compile a list of all people, organizations, and other names, fulltext geocoded to render them fully mappable, and more than 4,500 emotions and themes compiled. All of this computed metadata is combined with all available book-level metadata, including title, author, publisher, and subject tags as provided by the contributing libraries. HathiTrust data includes all English language public domain books 1800-2015. They were provided as part of a special research extract and only public domain volumes are included. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
f
Descriptive variables of the YouTube™ videos.
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Mar 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wu, Jiali; Lin, Minkui; Li, Danlin (2024). Descriptive variables of the YouTube™ videos. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001309984
Explore at:
Dataset updated
Mar 6, 2024
Authors
Wu, Jiali; Lin, Minkui; Li, Danlin
Area covered
YouTube
Description
Gum bleeding is a common dental problem, and numerous patients seek health-related information on this topic online. The YouTube website is a popular resource for people searching for medical information. To our knowledge, no recent study has evaluated content related to bleeding gums on YouTube™. Therefore, this study aimed to conduct a quantitative and qualitative analysis of YouTube videos related to bleeding gums. A search was performed on YouTube using the keyword "bleeding gums" from Google Trends. Of the first 200 results, 107 videos met the inclusion criteria. The descriptive statistics for the videos included the time since upload, the video length, and the number of likes, views, comments, subscribers, and viewing rates. The global quality score (GQS), usefulness score, and DISCERN were used to evaluate the video quality. Statistical analysis was performed using the Kruskal–Wallis test, Mann–Whitney test, and Spearman correlation analysis. The majority (n = 69, 64.48%) of the videos observed were uploaded by hospitals/clinics and dentists/specialists. The highest coverage was for symptoms (95.33%). Only 14.02% of the videos were classified as "good". The average video length of the videos rated as "good" was significantly longer than the other groups (p <0.05), and the average viewing rate of the videos rated as "poor" (63,943.68%) was substantially higher than the other groups (p <0.05). YouTube videos on bleeding gums were of moderate quality, but their content was incomplete and unreliable. Incorrect and inadequate content can significantly influence patients’ attitudes and medical decisions. Effort needs to be expended by dental professionals, organizations, and the YouTube platform to ensure that YouTube can serve as a reliable source of information on bleeding gums.
r
Northern Australian Reef Mapping through Georeferenced Media - Virtual...
researchdata.edu.au
Updated Apr 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bycroft, Rachel; Lawrey, Eric (2024). Northern Australian Reef Mapping through Georeferenced Media - Virtual Fieldwork Toolkit (NESP MaC 3.17, AIMS) [Dataset]. http://doi.org/10.26274/9WR4-Q222
Explore at:
Unique identifier
https://doi.org/10.26274/9WR4-Q222
Dataset updated
Apr 10, 2024
Dataset provided by
Australian Ocean Data Network
Authors
Bycroft, Rachel; Lawrey, Eric
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Time period covered
Jan 1, 2000 - Apr 1, 2024
Area covered

Description
This dataset compiles georeferenced media - including videos (480), articles (20), and datasets (6) - specifically curated to facilitate the understanding of reef habitats across northern Australia. It was designed as a research tool for virtual fieldwork with a particular focus on identifying sources of information that allow an understanding of both inshore and offshore reef environments. This dataset provides a record of the literature and media that was reviewed as part of mapping the reef boundaries from remote sensing as part of project NESP MaC 3.17.

This dataset only focuses on media that is useful for understanding shallow reef habitats. It includes videos of snorkelling, diving, spearfishing, and aerial drone imagery. It includes websites, books and journal papers that talk about the structure of reefs and datasets that provide fine scale benthic mapping.

This dataset is likely to not comprehensive. While considerable time was put into collecting relevant media, finding all available information sources is very difficult and time consuming.

A relatively comprehensive search was conducted on:
- AIMS Metadata catalogue for benthic habitat mapping with tow videos and BRUVS
- A review of the eAtlas for benthic habitat mapping
- YouTube searches for video media of fishing, cruises, snorkelling of many named locations.
The dataset is far less comprehensive on existing literature from journals, reports and dataset.

As the NESP MaC 3.17 project progresses we will continue to expand the dataset.

Changelog:

Changes made to the dataset will be noted in the change log and indicated in the dataset via the 'Revision' date.
1st Ed. - 2024-04-10 - Initial release of the dataset

Methods:

Identifying media - YouTube videos
The initial discovery of videos for a given area was achieved by searching for place names in YouTube search using terms such as diving, snorkeling or spearfishing combined with the location name.

Each potential video was reviewed to:

1. Determine if the video had any visual content that would useful for understanding the marine environment.
2. Determine if the footage could be georeferenced to a specific location, the more specific the better.

In cases where the YouTube channel was making travel videos that were of a high quality, then all the relevant videos in that channel were reviewed. A high proportion of the most useful videos were found using this technique.

The most useful videos were those that had named specific locations (typically in their title or description) and contained drone footage and underwater footage. The drone footage would often show enough of the landscape for features to be matched with satellite imagery allowing precise geolocation of the imagery.

To minimise the time required to find relevant videos, the scrubbing feature on YouTube was used to allow the timeline of the video to be quickly reviewed for relevant scenes. The scrubbing feature shows a very quick, but low resolution version of the video as the cursor is moved along the video timeline. This scrubbing was used to quickly look through the videos for any scenes that contained drone footage, for underwater footage. This was particularly useful for travel videos that contained significant footage of overland travel mixed in with boating or shoreline activities. It was also useful for fishing videos where all the fishing activities could be quickly skipped over to focus on any available drone footage or underwater footage from snorkeling or spearfishing.

Where a video lacked direct clues to the location (such as in the title), but the footage contained particularly relevant and useful footage, additional effort was made listen to the conversations and other footage in the videos for additional clues. This includes people in the video talking about the names of locations, or any marine charts in the footage, or previous and proceeding scenes, where the location could be determined, adding constraints to the location of the relevant scene. Where the footage could not be precisely determine, but the footage was still useful then it was added to a video playlist for the region.

In many remote locations there were so few videos that the bar for including the videos was quite low as these videos would at least provide some general indication of the landscape.

When on PC, Google Maps was used to look up locations and act as reference satellite imagery for locating places, QGIS was used to record the polygons of locations and YouTube in a browser was used for video review.

YouTube Playlists:
The initial collection of videos were compiled into YouTube playlists corresponding to relatively large regions. Using playlists was the most convenient way to record useful videos when viewing YouTube from an iPad. This compilation was done prior to the setup of this dataset.

Localising Playlists:
For YouTube playlists the region digitised was based on the region represented by the playlist name and the collection of videos. Google maps was used to help determine the locations of each region. Where a particularly useful video is found in one of the playlists and its location can be determined accurately then this video was entered into this database as an individual video with its own finer scale mapping. However this process of migrating the videos from the playlists to more highly georeferenced individual videos in the dataset is incomplete.

The playlists are really a catch-all for potentially useful videos.

Localising individual videos:
Candidate videos were quickly assessed for likely usefulness by reviewing the title and quickly scrubbing through the video looking for any marine footage, in water or as drone footage. If a video had a useful section then the focus was to determine the location of that part of the footage as accurately as possible. This was done by searching for locations listed in the title, chapter markers, video description, or mentions in video. These were then looked up in Google Maps. In general we would start with any drone footage that shows a large area with distinct features that could be matched with satellite imagery. The region around named locations were scanned for matching coastline and marine features. Once a match was found then the footage would be reviewed to track the likely area that the video covers in multiple scenes.

The video region was then digitised approximately in QGIS into the AU_AIMS_NESP-3-17_Reef-map-geo-media.shp shapefile. Notes were then added about the important features seen in the footage. A link to the video, including the time code so that it would start at the relevant portion of the video. Long videos showing multiple locations were added as multiple entries, each with a separate polygon location and a different URL link with a different start time.

Articles and Datasets
While this dataset primarily focuses on videos, we started adding relevant datasets, websites, articles and reports. These categories of media are not complete in this version of the dataset.

Data dictionary:

RegionName: (String, 255 characters): Name of the location, Examples: 'Oyster Stacks Snorkelling Area', 'Kurrajong Campground', 'South Lefroy Bay'
State: (String, 30 characters): Abbreviation of the state that the region corresponds to. For example: 'WA', 'QLD', 'NT'. For locations far offshore link the location to the closest state or to an existing well known region name. For example: Herald Cay -> Coral Sea, Rowley shoals -> WA.
MediaType: (String, 20 characters): One of the following:
- Video
- Video Playlist
- Website
- Report
- EIS
- Book
- Journal Paper

HabitatRef: (Int): An indication that this resource shows high accuracy spatial habitat information can be used for improving the UQ habitat reference datasets. This attribute should indicate which resources should be reviewed and converted to habitat reference patches. It should be reserved for where a habitat can be located on satellite imagery with sufficient precision that it has high confidence. Media that corresponds to information that is deeper than 15 m is excluded (assigned a HabitatRef of 0) as this is too deep to be used by the UQ habitat mapping.
- 1 - Use for habitat reference data.
- 0 - Only provides general information about the patch. Imagery can be spatially located accurately or detail is insufficient.

Highlight: (String, 255 characters): This records the classification of reef mapping, or research question that this video is most useful for. Not all videos need this classification. In general this attribute should be reserved for those videos that have the highest level of useful information. Think of it as a shortlist of videos that someone trying to understand a particular aspect of categorising reefs from satellite imagery should review. The following are some of the questions associated with each category that the videos provide some answers.
- High tidal range fringing reef: Here we want to understand the structure of fringing reefs in the Kimberleys and Northern Territory where the tides are large and the water is turbid. Is there coral on the tops of the reef flats? Won't the coral dry out if it grows on the reef flat? How will it get enough light if it grows on the reef slope?
- Ancient coastline: Along many parts of WA there are shallow rocky reefs off the coast that appear to be acient coastline. What is the nature of these reefs? Does coral or macroalgae grow on them?
- Seagrass: What does seagrass look like from satellite imagery
- Ningaloo backreef coral: Ningaloo is a very large reef system with a large sandy back. Should the whole back reef
San Francisco Ford GoBike Share
console.cloud.google.com
Updated Oct 24, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:City%20and%20County%20of%20San%20Francisco&inv=1&invt=Ab5Hww (2018). San Francisco Ford GoBike Share [Dataset]. https://console.cloud.google.com/marketplace/product/san-francisco-public-data/sf-bike-share
Explore at:
Dataset updated
Oct 24, 2018
Dataset provided by
Googlehttp://google.com/
Area covered
San Francisco
Description
San Francisco Ford GoBike , managed by Motivate, provides the Bay Area’s bike share system. Bike share is a convenient, healthy, affordable, and fun form of transportation. It involves a fleet of specially designed bikes that are locked into a network of docking stations. Bikes can be unlocked from one station and returned to any other station in the system. People use bike share to commute to work or school, run errands, get to appointments, and more. The dataset contains trip data from 2013-2018, including start time, end time, start station, end station, and latitude/longitude for each station. See detailed metadata for historical and real-time data . This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

Facebook

Twitter

Click to copy link

Link copied

Cite

Repositório de Dados de Pesquisa da Unicamp (2025). RECOD.ai events dataset [Dataset]. http://doi.org/10.25824/redu/BLIYYR

RECOD.ai events dataset

Explore at:

Unique identifier

https://doi.org/10.25824/redu/BLIYYR

Dataset updated

Mar 21, 2025

Dataset provided by

Repositório de Dados de Pesquisa da Unicamp

Dataset funded by

Fundação de Amparo à Pesquisa do Estado de São Paulo

Description

Overview This data set consists of links to social network items for 34 different forensic events that took place between August 14th, 2018 and January 06th, 2021. The majority of the text and images are from Twitter (a minor part is from Flickr, Facebook and Google+), and every video is from YouTube. Data Collection We used Social Tracker, along with the social medias' APIs, to gather most of the collections. For a minor part, we used Twint. In both cases, we provided keywords related to the event to receive the data. It is important to mention that, in procedures like this one, usually only a small fraction of the collected data is in fact related to the event and useful for a further forensic analysis. Content We have data from 34 events, and for each of them we provide the files: items_full.csv: It contains links to any social media post that was collected. images.csv: Enlists the images collected. In some files there is a field called "ItemUrl", that refers to the social network post (e.g., a tweet) that mentions that media. video.csv: Urls of YouTube videos that were gathered about the event. video_tweet.csv: This file contains IDs of tweets and IDs of YouTube videos. A tweet whose ID is in this file has a video in its content. In turn, the link of a Youtube video whose ID is in this file was mentioned by at least one collected tweet. Only two collections have this file. description.txt: Contains some standard information about the event, and possibly some comments about any specific issue related to it. In fact, most of the collections do not have all the files above. Such an issue is due to changes in our collection procedure throughout the time of this work. Events We divided the events into six groups. They are: Fire: Devastating fire is the main issue of the event, therefore most of the informative pictures show flames or burned constructions. 14 Events Collapse: Most of the relevant images depict collapsed buildings, bridges, etc. (not caused by fire). 5 Events Shooting: Likely images of guns and police officers. Few or no destruction of the environment. 5 Events Demonstration: Plethora of people on the streets. Possibly some problem took place on that, but in most cases the demonstration is the actual event. 7 Events Collision: Traffic collision. Pictures of damaged vehicles on an urban landscape. Possibly there are images with victims on the street. 1 Event Flood: Events that range from fierce rain to a tsunami. Many pictures depict water. 2 Events Media Content Due to the terms of use from the social networks, we do not make publicly available the texts, images and videos that were collected. However, we can provide some extra piece of media content related to one (or more) events by contacting the authors.

Clear search

Close search

Google apps

Main menu

RECOD.ai events dataset

Accident Detection Model Dataset

Accident-Detection-Model

Problem Statement

Accidents survey

Literature Survey

Research Gap

Proposed methodology

Model Set-up

Preparing Custom dataset

Challenges I ran into

I majorly ran into 3 problems while making this model

Keras video classification example with a subset of UCF101 - Action...

Video Streaming Platforms

IdiapVideoAge

GitHub Activity Data

IoTeX Cryptocurrency

I3D Video Features, Labels and Splits for Multicamera Overlapping Datasets...

Physical Exercise Recognition Dataset

Note:

Content

Collection Process

Processing Data

Spatially-Led Video Interviews, 2021-2022 - Dataset - B2FIND

Lead Scoring Dataset

Context

Content

Acknowledgements

Inspiration

Screenshots and metadata for 214 reCAPTCHA challenges encountered between...

American Community Survey (ACS)

Facebook: distribution of global audiences 2024, by age and gender

Data from: HTTPS traffic classification

Another Arabic Voice Command Dataset for Multiple Speech Processing Tasks

HathiTrust Books

Descriptive variables of the YouTube™ videos.

Northern Australian Reef Mapping through Georeferenced Media - Virtual...

San Francisco Ford GoBike Share

RECOD.ai events datasetSee More Versions

RECOD.ai events dataset