Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This dataset shows how many people visited Attractions in Lincolnshire by calendar year. Visitor numbers for a wide range of attractions are shown, along with other key information such as entrance fees.
The data's source is the Annual Survey of Visits to Visitor Attractions, run by Visit England. (As usual with survey data there are some limitations, such as not all visitor attractions participating in the survey, and where visitor numbers are estimated that is indicated in the data).
This dataset is updated annually from statistics published by Visit Britain, see the Source link for more information.
For more information on CDC.gov metrics please see http://www.cdc.gov/metrics/
This dataset contains global COVID-19 case and death data by country, collected directly from the official World Health Organization (WHO) COVID-19 Dashboard. It provides a comprehensive view of the pandemic’s impact worldwide, covering the period up to 2025. The dataset is intended for researchers, analysts, and anyone interested in understanding the progression and global effects of COVID-19 through reliable, up-to-date information.
The World Health Organization is the United Nations agency responsible for international public health. The WHO COVID-19 Dashboard is a trusted source that aggregates official reports from countries and territories around the world, providing daily updates on cases, deaths, and other key metrics related to COVID-19.
This dataset can be used for: - Tracking the spread and trends of COVID-19 globally and by country - Modeling and forecasting pandemic progression - Comparative analysis of the pandemic’s impact across countries and regions - Visualization and reporting
The data is sourced from the WHO, widely regarded as the most authoritative source for global health statistics. However, reporting practices and data completeness may vary by country and may be subject to revision as new information becomes available.
Special thanks to the WHO for making this data publicly available and to all those working to collect, verify, and report COVID-19 statistics.
NOTE: This dataset pertains only to the 2020-2021 school year and is no longer being updated. For additional data on COVID-19, visit data.ct.gov/coronavirus. This dataset includes the leading and secondary metrics identified by the Connecticut Department of Health (DPH) and the Department of Education (CSDE) to support local district decision-making on the level of in-person, hybrid (blended), and remote learning model for Pre K-12 education. Data represent daily averages for two-week periods by date of specimen collection (cases and positivity), date of hospital admission, or date of ED visit. Hospitalization data come from the Connecticut Hospital Association and are based on hospital location, not county of patient residence. COVID-19-like illness includes fever and cough or shortness of breath or difficulty breathing or the presence of coronavirus diagnosis code and excludes patients with influenza-like illness. All data are preliminary. These data are updated weekly and reflect the previous two full Sunday-Saturday (MMWR) weeks (https://wwwn.cdc.gov/nndss/document/MMWR_week_overview.pdf). These metrics were adapted from recommendations by the Harvard Global Institute and supplemented by existing DPH measures. For national data on COVID-19, see COVID View, the national weekly surveillance summary of U.S. COVID-19 activity, at https://www.cdc.gov/coronavirus/2019-ncov/covid-data/covidview/index.html DPH note about change from 7-day to 14-day metrics: Prior to 10/15/2020, these metrics were calculated using a 7-day average rather than a 14-day average. The 7-day metrics are no longer being updated as of 10/15/2020 but the archived dataset can be accessed here: https://data.ct.gov/Health-and-Human-Services/CT-School-Learning-Model-Indicators-by-County/rpph-4ysy As you know, we are learning more about COVID-19 all the time, including the best ways to measure COVID-19 activity in our communities. CT DPH has decided to shift to 14-day rates because these are more stable, particularly at the town level, as compared to 7-day rates. In addition, since the school indicators were initially published by DPH last summer, CDC has recommended 14-day rates and other states (e.g., Massachusetts) have started to implement 14-day metrics for monitoring COVID transmission as well. With respect to geography, we also have learned that many people are looking at the town-level data to inform decision making, despite emphasis on the county-level metrics in the published addenda. This is understandable as there has been variation within counties in COVID-19 activity (for example, rates that are higher in one town than in most other towns in the county).
This dataset supports the Biomarker: Vector-Borne Viruses page on the Tempe Wastewater BioIntel Program site.Wastewater collection areas are comprised of merged sewage drainage basins that flow to a shared testing location for the Tempe Wastewater BioIntel Program. The wastewater collection areas represent a geographic area for which virus activity is tested. People infected with a virus excrete the virus in their feces in a process known as “shedding”. The municipal wastewater treatment system (sewage system) collects and aggregates these bathroom contributions across communities. The process begins at sampling site where, over a period of 24 hours, a wastewater sample is collected along the sewer line. After the sample is acquired, it is immediately transferred to a lab where scientists prepare the sample. The laboratory analysis seeks to determine if there is a signal (or detectable presence) of the biomarker in the wastewater. Please see the Tempe Wastewater BioIntel Program site for more information on the wastewater testing process at https://wastewater.tempe.gov/. About the data: These data illustrate a trend of the signal of the weekly average or weekly results of Tempe wastewater biomarker groups. The dashboard and collection area map do not depict the number of individuals infected. Each collection area includes at least one sampling location, which collects wastewater from across the collection area. It does not reflect the specific location where the deposit occurs. While testing can successfully quantify the results, research has not yet determined the relationship between these values and the number of people who are contributing to the signals. The influence of this data on community health decisions in the future is unknown. Data collection is being used to depict overall weekly trends and should not be interpreted without a holistic assessment of public health data. The purpose of this weekly data is to support research as well as to identify overall trends of the genome copies in each liter of wastewater per collection area. We share this information with the public with the disclaimer that only the future can tell how much “diagnostic value” we can and should attribute to the numeric measurements we obtain from the sewer. However, we know what we measure is real and we share that info with our community. Data are shared as the testing results become available. As results may not be released at the same time, testing results for each area may not yet be seen for a given day or week. The dashboard presents the weekly averages. Data are collected from 2-7 days per week. For Collection Area 1, Tempe's wastewater co-mingles with wastewater from a regional sewage line. Tempe's sewage makes up most of Collection Area 1 samples. For Collection Area 3, Tempe's wastewater co-mingles with wastewater from a regional sewage line. For analysis and reporting, Tempe’s wastewater is separated from regional sewage. Week start date represents the starting date of the testing week, which starts on Mondays and ends on Sundays. Additional Information:Source: The Translational Genomics Research Institute (TGen), part of City of Hope, is an Arizona-based, nonprofit medical research institute.Contact: Kimberly SoteloContact email: kimberly_sotelo@tempe.govPreparation Method: Initial values are provided by TGen. Tempe makes additional calculations to determine the weekly averages or weekly results for each biomarker.Publish Frequency: Weekly or as data becomes availablePublish Method: ManualData Dictionary
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Web Camera People Behavior Dataset for computer vision tasks
Dataset includes 2,300+ individuals, contributing to a total of 53,800+ videos and 9,300+ images captured via webcams. It is designed to study social interactions and behaviors in various remote meetings, including video calls, video conferencing, and online meetings. By leveraging this dataset, developers and researchers can enhance their understanding of human behavior in digital communication settings, contributing to… See the full description on the dataset page: https://huggingface.co/datasets/UniDataPro/web-camera-people-behavior.
Introduction
The GiGL Spaces to Visit dataset provides locations and boundaries for open space sites in Greater London that are available to the public as destinations for leisure, activities and community engagement. It includes green corridors that provide opportunities for walking and cycling.
The dataset has been created by Greenspace Information for Greater London CIC (GiGL). As London’s Environmental Records Centre, GiGL mobilises, curates and shares data that underpin our knowledge of London’s natural environment. We provide impartial evidence to support informed discussion and decision making in policy and practice.
GiGL maps under licence from the Greater London Authority.
Description
This dataset is a sub-set of the GiGL Open Space dataset, the most comprehensive dataset available of open spaces in London. Sites are selected for inclusion in Spaces to Visit based on their public accessibility and likelihood that people would be interested in visiting.
The dataset is a mapped Geographic Information System (GIS) polygon dataset where one polygon (or multi-polygon) represents one space. As well as site boundaries, the dataset includes information about a site’s name, size and type (e.g. park, playing field etc.).
GiGL developed the Spaces to Visit dataset to support anyone who is interested in London’s open spaces - including community groups, web and app developers, policy makers and researchers - with an open licence data source. More detailed and extensive data are available under GiGL data use licences for GIGL partners, researchers and students. Information services are also available for ecological consultants, biological recorders and community volunteers – please see www.gigl.org.uk for more information.
Please note that access and opening times are subject to change (particularly at the current time) so if you are planning to visit a site check on the local authority or site website that it is open.
The dataset is updated on a quarterly basis. If you have questions about this dataset please contact GiGL’s GIS and Data Officer.
Data sources
The boundaries and information in this dataset, are a combination of data collected during the London Survey Method habitat and open space survey programme (1986 – 2008) and information provided to GiGL from other sources since. These sources include London borough surveys, land use datasets, volunteer surveys, feedback from the public, park friends’ groups, and updates made as part of GiGL’s on-going data validation and verification process.
Due to data availability, some areas are more up-to-date than others. We are continually working on updating and improving this dataset. If you have any additional information or corrections for sites included in the Spaces to Visit dataset please contact GiGL’s GIS and Data Officer.
NOTE: The dataset contains OS data © Crown copyright and database rights 2025. The site boundaries are based on Ordnance Survey mapping, and the data are published under Ordnance Survey's 'presumption to publish'. When using these data please acknowledge GiGL and Ordnance Survey as the source of the information using the following citation:
‘Dataset created by Greenspace Information for Greater London CIC (GiGL), 2025 – Contains Ordnance Survey and public sector information licensed under the Open Government Licence v3.0 ’
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This users dataset is a preview of a much bigger dataset, with lots of related data (product listings of sellers, comments on listed products, etc...).
My Telegram bot will answer your queries and allow you to contact me.
There are a lot of unknowns when running an E-commerce store, even when you have analytics to guide your decisions.
Users are an important factor in an e-commerce business. This is especially true in a C2C-oriented store, since they are both the suppliers (by uploading their products) AND the customers (by purchasing other user's articles).
This dataset aims to serve as a benchmark for an e-commerce fashion store. Using this dataset, you may want to try and understand what you can expect of your users and determine in advance how your grows may be.
If you think this kind of dataset may be useful or if you liked it, don't forget to show your support or appreciation with an upvote/comment. You may even include how you think this dataset might be of use to you. This way, I will be more aware of specific needs and be able to adapt my datasets to suits more your needs.
This dataset is part of a preview of a much larger dataset. Please contact me for more.
The data was scraped from a successful online C2C fashion store with over 10M registered users. The store was first launched in Europe around 2009 then expanded worldwide.
Visitors vs Users: Visitors do not appear in this dataset. Only registered users are included. "Visitors" cannot purchase an article but can view the catalog.
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Questions you might want to answer using this dataset:
Example works:
For other licensing options, contact me.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘CT School Learning Model Indicators by County (14-day metrics) - ARCHIVE’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/feda0dbb-905d-48c8-81ec-590689a6da8f on 26 January 2022.
--- Dataset description provided by original source is as follows ---
NOTE: This dataset pertains only to the 2020-2021 school year and is no longer being updated. For additional data on COVID-19, visit data.ct.gov/coronavirus.
This dataset includes the leading and secondary metrics identified by the Connecticut Department of Health (DPH) and the Department of Education (CSDE) to support local district decision-making on the level of in-person, hybrid (blended), and remote learning model for Pre K-12 education.
Data represent daily averages for two-week periods by date of specimen collection (cases and positivity), date of hospital admission, or date of ED visit. Hospitalization data come from the Connecticut Hospital Association and are based on hospital location, not county of patient residence. COVID-19-like illness includes fever and cough or shortness of breath or difficulty breathing or the presence of coronavirus diagnosis code and excludes patients with influenza-like illness. All data are preliminary.
These data are updated weekly and reflect the previous two full Sunday-Saturday (MMWR) weeks (https://wwwn.cdc.gov/nndss/document/MMWR_week_overview.pdf).
These metrics were adapted from recommendations by the Harvard Global Institute and supplemented by existing DPH measures.
For national data on COVID-19, see COVID View, the national weekly surveillance summary of U.S. COVID-19 activity, at https://www.cdc.gov/coronavirus/2019-ncov/covid-data/covidview/index.html
DPH note about change from 7-day to 14-day metrics: Prior to 10/15/2020, these metrics were calculated using a 7-day average rather than a 14-day average. The 7-day metrics are no longer being updated as of 10/15/2020 but the archived dataset can be accessed here: https://data.ct.gov/Health-and-Human-Services/CT-School-Learning-Model-Indicators-by-County/rpph-4ysy
As you know, we are learning more about COVID-19 all the time, including the best ways to measure COVID-19 activity in our communities. CT DPH has decided to shift to 14-day rates because these are more stable, particularly at the town level, as compared to 7-day rates. In addition, since the school indicators were initially published by DPH last summer, CDC has recommended 14-day rates and other states (e.g., Massachusetts) have started to implement 14-day metrics for monitoring COVID transmission as well.
With respect to geography, we also have learned that many people are looking at the town-level data to inform decision making, despite emphasis on the county-level metrics in the published addenda. This is understandable as there has been variation within counties in COVID-19 activity (for example, rates that are higher in one town than in most other towns in the county).
--- Original source retains full ownership of the source dataset ---
This Location Data & Foot traffic dataset available for all countries include enriched raw mobility data and visitation at POIs to answer questions such as:
-How often do people visit a location? (daily, monthly, absolute, and averages).
-What type of places do they visit ? (parks, schools, hospitals, etc)
-Which social characteristics do people have in a certain POI? - Breakdown by type: residents, workers, visitors.
-What's their mobility like enduring night hours & day hours?
-What's the frequency of the visits partition by day of the week and hour of the day?
Extra insights -Visitors´ relative income Level. -Visitors´ preferences as derived by their visits to shopping, parks, sports facilities, churches, among others.
Overview & Key Concepts Each record corresponds to a ping from a mobile device, at a particular moment in time and at a particular latitude and longitude. We procure this data from reliable technology partners, which obtain it through partnerships with location-aware apps. All the process is compliant with applicable privacy laws.
We clean and process these massive datasets with a number of complex, computer-intensive calculations to make them easier to use in different data science and machine learning applications, especially those related to understanding customer behavior.
Featured attributes of the data Device speed: based on the distance between each observation and the previous one, we estimate the speed at which the device is moving. This is particularly useful to differentiate between vehicles, pedestrians, and stationery observations.
Night base of the device: we calculate the approximated location of where the device spends the night, which is usually their home neighborhood.
Day base of the device: we calculate the most common daylight location during weekdays, which is usually their work location.
Income level: we use the night neighborhood of the device, and intersect it with available socioeconomic data, to infer the device’s income level. Depending on the country, and the availability of good census data, this figure ranges from a relative wealth index to a currency-calculated income.
POI visited: we intersect each observation with a number of POI databases, to estimate check-ins to different locations. POI databases can vary significantly, in scope and depth, between countries.
Category of visited POI: for each observation that can be attributable to a POI, we also include a standardized location category (park, hospital, among others). Coverage: Worldwide.
Delivery schemas We can deliver the data in three different formats:
Full dataset: one record per mobile ping. These datasets are very large, and should only be consumed by experienced teams with large computing budgets.
Visitation stream: one record per attributable visit. This dataset is considerably smaller than the full one but retains most of the more valuable elements in the dataset. This helps understand who visited a specific POI, characterize and understand the consumer's behavior.
Audience profiles: one record per mobile device in a given period of time (usually monthly). All the visitation stream is aggregated by category. This is the most condensed version of the dataset and is very useful to quickly understand the types of consumers in a particular area and to create cohorts of users.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The database for this study (Briganti et al. 2018; the same for the Braun study analysis) was composed of 1973 French-speaking students in several universities or schools for higher education in the following fields: engineering (31%), medicine (18%), nursing school (16%), economic sciences (15%), physiotherapy, (4%), psychology (11%), law school (4%) and dietetics (1%). The subjects were 17 to 25 years old (M = 19.6 years, SD = 1.6 years), 57% were females and 43% were males. Even though the full dataset was composed of 1973 participants, only 1270 answered the full questionnaire: missing data are handled using pairwise complete observations in estimating a Gaussian Graphical Model, meaning that all available information from every subject are used.
The feature set is composed of 28 items meant to assess the four following components: fantasy, perspective taking, empathic concern and personal distress. In the questionnaire, the items are mixed; reversed items (items 3, 4, 7, 12, 13, 14, 15, 18, 19) are present. Items are scored from 0 to 4, where “0” means “Doesn’t describe me very well” and “4” means “Describes me very well”; reverse-scoring is calculated afterwards. The questionnaires were anonymized. The reanalysis of the database in this retrospective study was approved by the ethical committee of the Erasmus Hospital.
Size: A dataset of size 1973*28
Number of features: 28
Ground truth: No
Type of Graph: Mixed graph
The following gives the description of the variables:
Feature | FeatureLabel | Domain | Item meaning from Davis 1980 |
---|---|---|---|
001 | 1FS | Green | I daydream and fantasize, with some regularity, about things that might happen to me. |
002 | 2EC | Purple | I often have tender, concerned feelings for people less fortunate than me. |
003 | 3PT_R | Yellow | I sometimes find it difficult to see things from the “other guy’s” point of view. |
004 | 4EC_R | Purple | Sometimes I don’t feel very sorry for other people when they are having problems. |
005 | 5FS | Green | I really get involved with the feelings of the characters in a novel. |
006 | 6PD | Red | In emergency situations, I feel apprehensive and ill-at-ease. |
007 | 7FS_R | Green | I am usually objective when I watch a movie or play, and I don’t often get completely caught up in it.(Reversed) |
008 | 8PT | Yellow | I try to look at everybody’s side of a disagreement before I make a decision. |
009 | 9EC | Purple | When I see someone being taken advantage of, I feel kind of protective towards them. |
010 | 10PD | Red | I sometimes feel helpless when I am in the middle of a very emotional situation. |
011 | 11PT | Yellow | sometimes try to understand my friends better by imagining how things look from their perspective |
012 | 12FS_R | Green | Becoming extremely involved in a good book or movie is somewhat rare for me. (Reversed) |
013 | 13PD_R | Red | When I see someone get hurt, I tend to remain calm. (Reversed) |
014 | 14EC_R | Purple | Other people’s misfortunes do not usually disturb me a great deal. (Reversed) |
015 | 15PT_R | Yellow | If I’m sure I’m right about something, I don’t waste much time listening to other people’s arguments. (Reversed) |
016 | 16FS | Green | After seeing a play or movie, I have felt as though I were one of the characters. |
017 | 17PD | Red | Being in a tense emotional situation scares me. |
018 | 18EC_R | Purple | When I see someone being treated unfairly, I sometimes don’t feel very much pity for them. (Reversed) |
019 | 19PD_R | Red | I am usually pretty effective in dealing with emergencies. (Reversed) |
020 | 20FS | Green | I am often quite touched by things that I see happen. |
021 | 21PT | Yellow | I believe that there are two sides to every question and try to look at them both. |
022 | 22EC | Purple | I would describe myself as a pretty soft-hearted person. |
023 | 23FS | Green | When I watch a good movie, I can very easily put myself in the place of a leading character. |
024 | 24PD | Red | I tend to lose control during emergencies. |
025 | 25PT | Yellow | When I’m upset at someone, I usually try to “put myself in his shoes” for a while. |
026 | 26FS | Green | When I am reading an interesting story or novel, I imagine how I would feel if the events in the story were happening to me. |
027 | 27PD | Red | When I see someone who badly needs help in an emergency, I go to pieces. |
028 | 28PT | Yellow | Before criticizing somebody, I try to imagine how I would feel if I were in their place |
More information about the dataset is contained in empathy_description.html file.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
so if you have to have a G+ account (for YouTube, location services, or other reasons) - here's how you can make it totally private! No one will be able to add you, send you spammy links, or otherwise annoy you. You need to visit the "Audience Settings" page - https://plus.google.com/u/0/settings/audience You can then set a "custom audience" - usually you would use this to restrict your account to people from a specific geographic location, or within a specific age range. In this case, we're going to choose a custom audience of "No-one" Check the box and hit save. Now, when people try to visit your Google+ profile - they'll see this "restricted" message. You can visit my G+ Profile if you want to see this working. (https://plus.google.com/114725651137252000986) If you are not able to understand you can follow this website : http://www.livehuntz.com/google-plus/support-phone-number
This dataset supports the Biomarker: Other Viruses page on the Tempe Wastewater BioIntel Program site.Wastewater collection areas are comprised of merged sewage drainage basins that flow to a shared testing location for the Tempe Wastewater BioIntel Program. The wastewater collection areas represent a geographic area for which virus activity is tested. People infected with a virus excrete the virus in their feces in a process known as “shedding”. The municipal wastewater treatment system (sewage system) collects and aggregates these bathroom contributions across communities. The process begins at sampling site where, over a period of 24 hours, a wastewater sample is collected along the sewer line. After the sample is acquired, it is immediately transferred to a lab where scientists prepare the sample. The laboratory analysis seeks to determine if there is a signal (or detectable presence) of the biomarker in the wastewater. Please see the Tempe Wastewater BioIntel Program site for more information on the wastewater testing process at https://wastewater.tempe.gov/. About the data: These data illustrate a trend of the signal of the weekly average or weekly results of Tempe wastewater biomarker groups. The dashboard and collection area map do not depict the number of individuals infected. Each collection area includes at least one sampling location, which collects wastewater from across the collection area. It does not reflect the specific location where the deposit occurs. While testing can successfully quantify the results, research has not yet determined the relationship between these values and the number of people who are contributing to the signals. The influence of this data on community health decisions in the future is unknown. Data collection is being used to depict overall weekly trends and should not be interpreted without a holistic assessment of public health data. The purpose of this weekly data is to support research as well as to identify overall trends of the genome copies in each liter of wastewater per collection area. We share this information with the public with the disclaimer that only the future can tell how much “diagnostic value” we can and should attribute to the numeric measurements we obtain from the sewer. However, we know what we measure is real and we share that info with our community. Data are shared as the testing results become available. As results may not be released at the same time, testing results for each area may not yet be seen for a given day or week. The dashboard presents the weekly averages. Data are collected from 2-7 days per week. For Collection Area 1, Tempe's wastewater co-mingles with wastewater from a regional sewage line. Tempe's sewage makes up most of Collection Area 1 samples. For Collection Area 3, Tempe's wastewater co-mingles with wastewater from a regional sewage line. For analysis and reporting, Tempe’s wastewater is separated from regional sewage. Week start date represents the starting date of the testing week, which starts on Mondays and ends on Sundays. Additional Information:Source: The Translational Genomics Research Institute (TGen), part of City of Hope, is an Arizona-based, nonprofit medical research institute.Contact: Kimberly SoteloContact email: kimberly_sotelo@tempe.govPreparation Method: Initial values are provided by TGen. Tempe makes additional calculations to determine the weekly averages or weekly results for each biomarker.Publish Frequency: Weekly or as data becomes availablePublish Method: ManualData Dictionary
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset
The Man OverBoard Drone (MOBDrone) dataset is a large-scale collection of aerial footage images. It contains 126,170 frames extracted from 66 video clips gathered from one UAV flying at an altitude of 10 to 60 meters above the mean sea level. Images are manually annotated with more than 180K bounding boxes localizing objects belonging to 5 categories --- person, boat, lifebuoy, surfboard, wood. More than 113K of these bounding boxes belong to the person category and localize people in the water simulating the need to be rescued.
In this repository, we provide:
66 Full HD video clips (total size: 5.5 GB)
126,170 images extracted from the videos at a rate of 30 FPS (total size: 243 GB)
3 annotation files for the extracted images that follow the MS COCO data format (for more info see https://cocodataset.org/#format-data):
annotations_5_custom_classes.json: this file contains annotations concerning all five categories; please note that class ids do not correspond with the ones provided by the MS COCO standard since we account for two new classes not previously considered in the MS COCO dataset --- lifebuoy and wood
annotations_3_coco_classes.json: this file contains annotations concerning the three classes also accounted by the MS COCO dataset --- person, boat, surfboard. Class ids correspond with the ones provided by the MS COCO standard.
annotations_person_coco_classes.json: this file contains annotations concerning only the 'person' class. Class id corresponds to the one provided by the MS COCO standard.
The MOBDrone dataset is intended as a test data benchmark. However, for researchers interested in using our data also for training purposes, we provide training and test splits:
More details about data generation and the evaluation protocol can be found at our MOBDrone paper: https://arxiv.org/abs/2203.07973
The code to reproduce our results is available at this GitHub Repository: https://github.com/ciampluca/MOBDrone_eval
See also http://aimh.isti.cnr.it/dataset/MOBDrone
Citing the MOBDrone
The MOBDrone is released under a Creative Commons Attribution license, so please cite the MOBDrone if it is used in your work in any form.
Published academic papers should use the academic paper citation for our MOBDrone paper, where we evaluated several pre-trained state-of-the-art object detectors focusing on the detection of the overboard people
@inproceedings{MOBDrone2021, title={MOBDrone: a Drone Video Dataset for Man OverBoard Rescue}, author={Donato Cafarelli and Luca Ciampi and Lucia Vadicamo and Claudio Gennaro and Andrea Berton and Marco Paterni and Chiara Benvenuti and Mirko Passera and Fabrizio Falchi}, booktitle={ICIAP2021: 21th International Conference on Image Analysis and Processing}, year={2021} }
and this Zenodo Dataset
@dataset{donato_cafarelli_2022_5996890, author={Donato Cafarelli and Luca Ciampi and Lucia Vadicamo and Claudio Gennaro and Andrea Berton and Marco Paterni and Chiara Benvenuti and Mirko Passera and Fabrizio Falchi}, title = {{MOBDrone: a large-scale drone-view dataset for man overboard detection}}, month = feb, year = 2022, publisher = {Zenodo}, version = {1.0.0}, doi = {10.5281/zenodo.5996890}, url = {https://doi.org/10.5281/zenodo.5996890} }
Personal works, such as machine learning projects/blog posts, should provide a URL to the MOBDrone Zenodo page (https://doi.org/10.5281/zenodo.5996890), though a reference to our MOBDrone paper would also be appreciated.
Contact Information
If you would like further information about the MOBDrone or if you experience any issues downloading files, please contact us at mobdrone[at]isti.cnr.it
Acknowledgements
This work was partially supported by NAUSICAA - "NAUtical Safety by means of Integrated Computer-Assistance Appliances 4.0" project funded by the Tuscany region (CUP D44E20003410009). The data collection was carried out with the collaboration of the Fly&Sense Service of the CNR of Pisa - for the flight operations of remotely piloted aerial systems - and of the Institute of Clinical Physiology (IFC) of the CNR - for the water immersion operations.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Biometric Attack Dataset, Black People
The similar dataset that includes all ethnicities - Anti Spoofing Real Dataset
The dataset for face anti spoofing and face recognition includes images and videos of black people. The dataset helps in enchancing the performance of the model by providing wider range of data for a specific ethnic group. The videos were gathered by capturing faces of genuine individuals presenting spoofs, using facial presentations. Our dataset proposes… See the full description on the dataset page: https://huggingface.co/datasets/TrainingDataPro/black-people-liveness-detection-video-dataset.
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Note: Reporting of new COVID-19 Case Surveillance data will be discontinued July 1, 2024, to align with the process of removing SARS-CoV-2 infections (COVID-19 cases) from the list of nationally notifiable diseases. Although these data will continue to be publicly available, the dataset will no longer be updated.
Authorizations to collect certain public health data expired at the end of the U.S. public health emergency declaration on May 11, 2023. The following jurisdictions discontinued COVID-19 case notifications to CDC: Iowa (11/8/21), Kansas (5/12/23), Kentucky (1/1/24), Louisiana (10/31/23), New Hampshire (5/23/23), and Oklahoma (5/2/23). Please note that these jurisdictions will not routinely send new case data after the dates indicated. As of 7/13/23, case notifications from Oregon will only include pediatric cases resulting in death.
This case surveillance public use dataset has 12 elements for all COVID-19 cases shared with CDC and includes demographics, any exposure history, disease severity indicators and outcomes, presence of any underlying medical conditions and risk behaviors, and no geographic data.
The COVID-19 case surveillance database includes individual-level data reported to U.S. states and autonomous reporting entities, including New York City and the District of Columbia (D.C.), as well as U.S. territories and affiliates. On April 5, 2020, COVID-19 was added to the Nationally Notifiable Condition List and classified as “immediately notifiable, urgent (within 24 hours)” by a Council of State and Territorial Epidemiologists (CSTE) Interim Position Statement (Interim-20-ID-01). CSTE updated the position statement on August 5, 2020, to clarify the interpretation of antigen detection tests and serologic test results within the case classification (Interim-20-ID-02). The statement also recommended that all states and territories enact laws to make COVID-19 reportable in their jurisdiction, and that jurisdictions conducting surveillance should submit case notifications to CDC. COVID-19 case surveillance data are collected by jurisdictions and reported voluntarily to CDC.
For more information:
NNDSS Supports the COVID-19 Response | CDC.
The deidentified data in the “COVID-19 Case Surveillance Public Use Data” include demographic characteristics, any exposure history, disease severity indicators and outcomes, clinical data, laboratory diagnostic test results, and presence of any underlying medical conditions and risk behaviors. All data elements can be found on the COVID-19 case report form located at www.cdc.gov/coronavirus/2019-ncov/downloads/pui-form.pdf.
COVID-19 case reports have been routinely submitted using nationally standardized case reporting forms. On April 5, 2020, CSTE released an Interim Position Statement with national surveillance case definitions for COVID-19 included. Current versions of these case definitions are available here: https://ndc.services.cdc.gov/case-definitions/coronavirus-disease-2019-2021/.
All cases reported on or after were requested to be shared by public health departments to CDC using the standardized case definitions for laboratory-confirmed or probable cases. On May 5, 2020, the standardized case reporting form was revised. Case reporting using this new form is ongoing among U.S. states and territories.
To learn more about the limitations in using case surveillance data, visit FAQ: COVID-19 Data and Surveillance.
CDC’s Case Surveillance Section routinely performs data quality assurance procedures (i.e., ongoing corrections and logic checks to address data errors). To date, the following data cleaning steps have been implemented:
To prevent release of data that could be used to identify people, data cells are suppressed for low frequency (<5) records and indirect identifiers (e.g., date of first positive specimen). Suppression includes rare combinations of demographic characteristics (sex, age group, race/ethnicity). Suppressed values are re-coded to the NA answer option; records with data suppression are never removed.
For questions, please contact Ask SRRG (eocevent394@cdc.gov).
COVID-19 data are available to the public as summary or aggregate count files, including total counts of cases and deaths by state and by county. These
https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/
🍷 FineWeb
15 trillion tokens of the finest data the 🌐 web has to offer
What is it?
The 🍷 FineWeb dataset consists of more than 18.5T tokens (originally 15T tokens) of cleaned and deduplicated english web data from CommonCrawl. The data processing pipeline is optimized for LLM performance and ran on the 🏭 datatrove library, our large scale data processing library. 🍷 FineWeb was originally meant to be a fully open replication of 🦅 RefinedWeb, with a release… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceFW/fineweb.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset supports the Biomarker: Gastrointestinal Viruses page on the Tempe Wastewater BioIntel Program site.Wastewater collection areas are comprised of merged sewage drainage basins that flow to a shared testing location for the Tempe Wastewater BioIntel Program. The wastewater collection areas represent a geographic area for which virus activity is tested. People infected with a virus excrete the virus in their feces in a process known as “shedding”. The municipal wastewater treatment system (sewage system) collects and aggregates these bathroom contributions across communities. The process begins at sampling site where, over a period of 24 hours, a wastewater sample is collected along the sewer line. After the sample is acquired, it is immediately transferred to a lab where scientists prepare the sample. The laboratory analysis seeks to determine if there is a signal (or detectable presence) of the biomarker in the wastewater. Please see the Tempe Wastewater BioIntel Program site for more information on the wastewater testing process at https://wastewater.tempe.gov/. About the data: These data illustrate a trend of the signal of the weekly average or weekly results of Tempe wastewater biomarker groups. The dashboard and collection area map do not depict the number of individuals infected. Each collection area includes at least one sampling location, which collects wastewater from across the collection area. It does not reflect the specific location where the deposit occurs. While testing can successfully quantify the results, research has not yet determined the relationship between these values and the number of people who are contributing to the signals. The influence of this data on community health decisions in the future is unknown. Data collection is being used to depict overall weekly trends and should not be interpreted without a holistic assessment of public health data. The purpose of this weekly data is to support research as well as to identify overall trends of the genome copies in each liter of wastewater per collection area. We share this information with the public with the disclaimer that only the future can tell how much “diagnostic value” we can and should attribute to the numeric measurements we obtain from the sewer. However, we know what we measure is real and we share that info with our community. Data are shared as the testing results become available. As results may not be released at the same time, testing results for each area may not yet be seen for a given day or week. The dashboard presents the weekly averages. Data are collected from 2-7 days per week. For Collection Area 1, Tempe's wastewater co-mingles with wastewater from a regional sewage line. Tempe's sewage makes up most of Collection Area 1 samples. For Collection Area 3, Tempe's wastewater co-mingles with wastewater from a regional sewage line. For analysis and reporting, Tempe’s wastewater is separated from regional sewage. Week start date represents the starting date of the testing week, which starts on Mondays and ends on Sundays. Additional Information:Source: The Translational Genomics Research Institute (TGen), part of City of Hope, is an Arizona-based, nonprofit medical research institute.Contact: Kimberly SoteloContact email: kimberly_sotelo@tempe.govPreparation Method: Initial values are provided by TGen. Tempe makes additional calculations to determine the weekly averages or weekly results for each biomarker.Publish Frequency: Weekly or as data becomes availablePublish Method: ManualData Dictionary
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The World Bank is an international financial institution that provides loans to countries of the world for capital projects. The World Bank's stated goal is the reduction of poverty. Source: https://en.wikipedia.org/wiki/World_Bank
This dataset combines key health statistics from a variety of sources to provide a look at global health and population trends. It includes information on nutrition, reproductive health, education, immunization, and diseases from over 200 countries.
Update Frequency: Biannual
For more information, see the World Bank website.
Fork this kernel to get started with this dataset.
https://datacatalog.worldbank.org/dataset/health-nutrition-and-population-statistics
https://cloud.google.com/bigquery/public-data/world-bank-hnp
Dataset Source: World Bank. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.
Citation: The World Bank: Health Nutrition and Population Statistics
Banner Photo by @till_indeman from Unplash.
What’s the average age of first marriages for females around the world?
DPH note about change from 7-day to 14-day metrics: As of 10/15/2020, this dataset is no longer being updated. Starting on 10/15/2020, the school learning model indicator metrics will be calculated using a 14-day average rather than a 7-day average. The new school learning model indicators dataset using 14-day averages can be accessed here: https://data.ct.gov/Health-and-Human-Services/CT-School-Learning-Model-Indicators-by-County-14-d/e4bh-ax24 As you know, we are learning more about COVID-19 all the time, including the best ways to measure COVID-19 activity in our communities. CT DPH has decided to shift to 14-day rates because these are more stable, particularly at the town level, as compared to 7-day rates. In addition, since the school indicators were initially published by DPH last summer, CDC has recommended 14-day rates and other states (e.g., Massachusetts) have started to implement 14-day metrics for monitoring COVID transmission as well. With respect to geography, we also have learned that many people are looking at the town-level data to inform decision making, despite emphasis on the county-level metrics in the published addenda. This is understandable as there has been variation within counties in COVID-19 activity (for example, rates that are higher in one town than in most other towns in the county). This dataset includes the leading and secondary metrics identified by the Connecticut Department of Health (DPH) and the Department of Education (CSDE) to support local district decision-making on the level of in-person, hybrid (blended), and remote learning model for Pre K-12 education. Data represent daily averages for each week by date of specimen collection (cases and positivity), date of hospital admission, or date of ED visit. Hospitalization data come from the Connecticut Hospital Association and are based on hospital location, not county of patient residence. COVID-19-like illness includes fever and cough or shortness of breath or difficulty breathing or the presence of coronavirus diagnosis code and excludes patients with influenza-like illness. All data are preliminary. These data are updated weekly; the previous week period for each dataset is the previous Sunday-Saturday, known as an MMWR week (https://wwwn.cdc.gov/nndss/document/MMWR_week_overview.pdf). The date listed is the date the dataset was last updated and corresponds to a reporting period of the previous MMWR week. For instance, the data for 8/20/2020 corresponds to a reporting period of 8/9/2020-8/15/2020. These metrics were adapted from recommendations by the Harvard Global Institute and supplemented by existing DPH measures. For national data on COVID-19, see COVID View, the national weekly surveillance summary of U.S. COVID-19 activity, at https://www.cdc.gov/coronavirus/2019-ncov/covid-data/covidview/index.html Notes: 9/25/2020: Data for Mansfield and Middletown for the week of Sept 13-19 were unavailable at the time of reporting due to delays in lab reporting.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This dataset shows how many people visited Attractions in Lincolnshire by calendar year. Visitor numbers for a wide range of attractions are shown, along with other key information such as entrance fees.
The data's source is the Annual Survey of Visits to Visitor Attractions, run by Visit England. (As usual with survey data there are some limitations, such as not all visitor attractions participating in the survey, and where visitor numbers are estimated that is indicated in the data).
This dataset is updated annually from statistics published by Visit Britain, see the Source link for more information.