Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
There are many developers in the world of video-games. Here they are!
This is a short dataset that contains information about video-games publishers. The idea behind the data is to explain a little bit some information abut those video-games publishers.
The idea behind this dataset is to complement the video-games-sales-2019 dataset.
According to the survey, just under 18 percent of respondents identified PostgreSQQL as one of the most-wanted database skills. MongoDB ranked second with 17.89 percent stating they are not developing with it, but want to.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Language | Number of Samples |
Java | 153,119 |
Ruby | 233,710 |
Go | 137,998 |
JavaScript | 373,598 |
Python | 472,469 |
PHP | 294,394 |
https://www.worldbank.org/en/about/legal/terms-of-use-for-datasetshttps://www.worldbank.org/en/about/legal/terms-of-use-for-datasets
The primary World Bank collection of development indicators, compiled from officially-recognized international sources. It presents the most current and accurate global development data available, and includes national, regional and global estimates.
This is a dataset hosted by the World Bank. The organization has an open data platform found here and they update their information according the amount of data that is brought in. Explore the World Bank using Kaggle and all of the data sources available through the World Bank organization page!
This dataset is maintained using the World Bank's APIs and Kaggle's API.
Cover photo by Alex Block on Unsplash
Unsplash Images are distributed under a unique Unsplash License.
Monthly average radiance composite images using nighttime data from the Visible Infrared Imaging Radiometer Suite (VIIRS) Day/Night Band (DNB). As these data are composited monthly, there are many areas of the globe where it is impossible to get good quality data coverage for that month. This can be due to …
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
More details about each file are in the individual file descriptions.
This is a dataset hosted by the World Bank. The organization has an open data platform found here and they update their information according the amount of data that is brought in. Explore the World Bank using Kaggle and all of the data sources available through the World Bank organization page!
This dataset is maintained using the World Bank's APIs and Kaggle's API.
Cover photo by Markus Spiske on Unsplash
Unsplash Images are distributed under a unique Unsplash License.
The Gridded Surface Meteorological dataset provides high spatial resolution (~4-km) daily surface fields of temperature, precipitation, winds, humidity and radiation across the contiguous United States from 1979. The dataset blends the high resolution spatial data from PRISM with the high temporal resolution data from the National Land Data Assimilation System (NLDAS) to produce spatially and temporally continuous fields that lend themselves to additional land surface modeling. This dataset contains provisional products that are replaced with updated versions when the complete source data become available. Products can be distinguished by the value of the 'status' property. At first, assets are ingested with status='early'. After several days, they are replaced by assets with status='provisional'. After about 2 months, they are replaced by the final assets with status='permanent'.
After 2022-01-25, Sentinel-2 scenes with PROCESSING_BASELINE '04.00' or above have their DN (value) range shifted by 1000. The HARMONIZED collection shifts data in newer scenes to be in the same range as in older scenes. Sentinel-2 is a wide-swath, high-resolution, multi-spectral imaging mission supporting Copernicus Land Monitoring studies, including the monitoring of vegetation, soil and water cover, as well as observation of inland waterways and coastal areas. The Sentinel-2 data contain 13 UINT16 spectral bands representing TOA reflectance scaled by 10000. See the Sentinel-2 User Handbook for details. QA60 is a bitmask band that contained rasterized cloud mask polygons until Feb 2022, when these polygons stopped being produced. Starting in February 2024, legacy-consistent QA60 bands are constructed from the MSK_CLASSI cloud classification bands. For more details, see the full explanation of how cloud masks are computed.. Each Sentinel-2 product (zip archive) may contain multiple granules. Each granule becomes a separate Earth Engine asset. EE asset ids for Sentinel-2 assets have the following format: COPERNICUS/S2/20151128T002653_20151128T102149_T56MNN. Here the first numeric part represents the sensing date and time, the second numeric part represents the product generation date and time, and the final 6-character string is a unique granule identifier indicating its UTM grid reference (see MGRS). The Level-2 data produced by ESA can be found in the collection COPERNICUS/S2_SR. For datasets to assist with cloud and/or cloud shadow detection, see COPERNICUS/S2_CLOUD_PROBABILITY and GOOGLE/CLOUD_SCORE_PLUS/V1/S2_HARMONIZED. For more details on Sentinel-2 radiometric resolution, see this page.
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
More details about each file are in the individual file descriptions.
This is a dataset hosted by the World Bank. The organization has an open data platform found here and they update their information according the amount of data that is brought in. Explore the World Bank using Kaggle and all of the data sources available through the World Bank organization page!
This dataset is maintained using the World Bank's APIs and Kaggle's API.
Cover photo by İrfan Simsar on Unsplash
Unsplash Images are distributed under a unique Unsplash License.
As of June 2024, the most popular database management system (DBMS) worldwide was Oracle, with a ranking score of *******; MySQL and Microsoft SQL server rounded out the top three. Although the database management industry contains some of the largest companies in the tech industry, such as Microsoft, Oracle and IBM, a number of free and open-source DBMSs such as PostgreSQL and MariaDB remain competitive. Database Management Systems As the name implies, DBMSs provide a platform through which developers can organize, update, and control large databases. Given the business world’s growing focus on big data and data analytics, knowledge of SQL programming languages has become an important asset for software developers around the world, and database management skills are seen as highly desirable. In addition to providing developers with the tools needed to operate databases, DBMS are also integral to the way that consumers access information through applications, which further illustrates the importance of the software.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Google Earth Engine combines a multi-petabyte catalog of satellite imagery and geospatial datasets with planetary-scale analysis capabilities and makes it available for scientists, researchers, and developers to detect changes, map trends, and quantify differences on the Earth's surface.
Explore our interactive timelapse viewer to travel back in time and see how the world has changed over the past twenty-nine years. Timelapse is one example of how Earth Engine can help gain insight into petabyte-scale datasets.
EXPLORE TIMELAPSEThe public data archive includes more than thirty years of historical imagery and scientific datasets, updated and expanded daily. It contains over twenty petabytes of geospatial data instantly available for analysis.
EXPLORE DATASETSThe Earth Engine API is available in Python and JavaScript, making it easy to harness the power of Google’s cloud for your own geospatial analysis.
EXPLORE THE APIUse our web-based code editor for fast, interactive algorithm development with instant access to petabytes of data.
LEARN ABOUT THE CODE EDITORScientists and non-profits use Earth Engine for remote sensing research, predicting disease outbreaks, natural resource management, and more.
SEE CASE STUDIESAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In the quest for advancing the field of continuous user authentication, we have meticulously crafted two comprehensive datasets: COUNT-OS-I and COUNT-OS-II, each harboring unique characteristics while sharing a common ground in their utility and design principles. These datasets encompass performance counters extracted from the Windows operating system, offering an intricate tapestry of data vital for evaluating and refining authentication models in real-world scenarios.Both datasets have been generated in real-world settings within public organizations in Brazil, ensuring their applicability and relevance to practical scenarios. Volunteers from diverse professional backgrounds participated in the data collection, contributing to the richness and variability of the data. Furthermore, both datasets were collected at a sample rate of every 5 seconds, providing a dense and detailed view of user interactions and system performance. The commitment to preserving user confidentiality is unwavering across both datasets, with pseudonymization applied meticulously to safeguard individual identities while maintaining data integrity and statistical robustness.The COUNT-OS-I dataset was specifically generated in a real-world scenario to evaluate our work on continuous user authentication. This dataset consist of performance counters extracted from the Windows operating system of 26 computers, representing 26 individual users. The data were collected on the computers of the Information Technology Department of a public organization in Brazil.The participants in this study were volunteers, with aged between 20 and 45 years old, consisting of both males and females. The majority of the participants were systems analysts and software developers who performed their routine work activities. There were no specific restrictions imposed on the tasks that the participants were required to perform during the data collection process.The participants used a variety of software applications as part of their regular work activities. This included web browsers such as Firefox, Chrome, and Edge, developer tools like Eclipse and SQL Developer, office programs such as Microsoft Office Word, Excel, and PowerPoint, as well as chat applications like WhatsApp. It's important to note that the list of applications mentioned is not exhaustive, and participants were not limited to using only these applications.For the COUNT-OS-I dataset, the data collected is based on computers with different characteristics and configurations in terms of hardware, operating system versions, and installed software. This diversity ensures a representative sample of real-world scenarios and allows for a comprehensive evaluation of the authentication model.During the data collection process, each sample was recorded at a frequency of every 5 seconds, capturing system data over a period of approximately 26 hours, on average, for each user. This duration provides sufficient data to analyze user behavior and system performance over an extended period. Each sample in the COUNT-OS-I dataset corresponds to a feature vector comprising 159 attributesThe COUNT-OS-II dataset was utilized to evaluate our work in a real-world setting. This dataset comprises performance counters extracted from the Windows operating system installed on 37 computers. These computers possess identical hardware configurations (CPU, memory, network, disk), operating systems, and software installations. The data collection was conducted within various departments of a public organization in Brazil.The participants in this study (37 users) were voluntary administration assistants who performed various administrative tasks as part of their routine work activities. No restrictions were imposed on the specific tasks they were assigned. The participants commonly utilized programs such as the Chrome browser and office applications like Office Word, Excel, and PowerPoint, in addition to the WhatsApp chat application.The data were collected over six days (approximately 48 hours), with sample collected at a 5-second interval. Each sample corresponds to a feature vector composed of 218 attributes. In this dataset, we also apply pseudonymization to hide users' sensitive information.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Region of Interest (ROI) is comprised of the Belgium, the Netherlands and Luxembourg
We use the communes administrative division which is standardized across Europe by EUROSTAT at: https://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/administrative-units-statistical-units This is roughly equivalent to the notion municipalities in most countries.
From the link above, communes definition are taken from COMM_RG_01M_2016_4326.shp and country borders are taken from NUTS_RG_01M_2021_3035.shp.
images: Sentinel2 RGB from 2020-01-01 to 2020-31-12 filtered out pixels with clouds during the observation period according to QA60 band following the example given in GEE dataset info page, and took the median of the resulting pixels
see https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S2_SR_HARMONIZED
see also https://github.com/rramosp/geetiles/blob/main/geetiles/defs/sentinel2rgbmedian2020.py
labels: ESA WorldCover 10m V100 labels mapped to the interval [1,11] according to the following map { 0:0, 10: 1, 20:2, 30:3, 40:4, 50:5, 60:6, 70:7, 80:8, 90:9, 95:10, 100:11 } pixel value zero is reserved for invalid data. see https://developers.google.com/earth-engine/datasets/catalog/ESA_WorldCover_v100
see also https://github.com/rramosp/geetiles/blob/main/geetiles/defs/esaworldcover.py
_aschips.geojson the image chips geometries along with label proportions for easy visualization with QGIS, GeoPandas, etc.
_communes.geojson the communes geometries with their label prortions for easy visualization with QGIS, GeoPandas, etc.
splits.csv contains two splits of image chips in train, test, val - with geographical bands at 45° angles in nw-se direction - the same as above reorganized to that all chips within the same commune fall within the same split.
data/ a pickle file for each image chip containing a dict with - the 100x100 RGB sentinel 2 chip image - the 100x100 chip level lavels - the label proportions of the chip - the aggregated label proportions of the commune the chip belongs to
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data for replicating The Global Spatial Distribution of Economic Activity: Nature, History, and the Role of Trade (forthcoming 2018; with Vernon Henderson, Tim Squires and David N. Weil) Quarterly Journal of Economics We explore the role of natural characteristics in determining the worldwide spatial distribution of economic activity, as proxied by lights at night, observed across 240,000 grid cells. A parsimonious set of 24 physical geography attributes explains 47% of worldwide variation and 35% of within-country variation in lights. We divide geographic characteristics into two groups, those primarily important for agriculture and those primarily important for trade, and confront a puzzle. In examining within-country variation in lights, among countries that developed early, agricultural variables incrementally explain over 6 times as much variation in lights as do trade variables, while among late developing countries the ratio is only about 1.5, even though the latter group is far more dependent on agriculture. Correspondingly, the marginal effects of agricultural variables as a group on lights are larger in absolute value, and those for trade smaller, for early developers than for late developers. We show that this apparent puzzle is explained by persistence and the differential timing of technological shocks in the two sets of countries. For early developers, structural transformation due to rising agricultural productivity began when transport costs were still high, so cities were localized in agricultural regions. When transport costs fell, these agglomerations persisted. In late-developing countries, transport costs fell before structural transformation. To exploit urban scale economies, manufacturing agglomerated in relatively few, often coastal, locations. Consistent with this explanation, countries that developed earlier are more spatially equal in their distribution of education and economic activity than late developers. This dataset is part of the Global Research Program on Spatial Development of Cities funded by the Multi-Donor Trust Fund on Sustainable Urbanization of the World Bank and supported by the U.K. Department for International Development.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The ORBIT (Object Recognition for Blind Image Training) -India Dataset is a collection of 105,243 images of 76 commonly used objects, collected by 12 individuals in India who are blind or have low vision. This dataset is an "Indian subset" of the original ORBIT dataset [1, 2], which was collected in the UK and Canada. In contrast to the ORBIT dataset, which was created in a Global North, Western, and English-speaking context, the ORBIT-India dataset features images taken in a low-resource, non-English-speaking, Global South context, a home to 90% of the world’s population of people with blindness. Since it is easier for blind or low-vision individuals to gather high-quality data by recording videos, this dataset, like the ORBIT dataset, contains images (each sized 224x224) derived from 587 videos. These videos were taken by our data collectors from various parts of India using the Find My Things [3] Android app. Each data collector was asked to record eight videos of at least 10 objects of their choice.
Collected between July and November 2023, this dataset represents a set of objects commonly used by people who are blind or have low vision in India, including earphones, talking watches, toothbrushes, and typical Indian household items like a belan (rolling pin), and a steel glass. These videos were taken in various settings of the data collectors' homes and workspaces using the Find My Things Android app.
The image dataset is stored in the ‘Dataset’ folder, organized by folders assigned to each data collector (P1, P2, ...P12) who collected them. Each collector's folder includes sub-folders named with the object labels as provided by our data collectors. Within each object folder, there are two subfolders: ‘clean’ for images taken on clean surfaces and ‘clutter’ for images taken in cluttered environments where the objects are typically found. The annotations are saved inside a ‘Annotations’ folder containing a JSON file per video (e.g., P1--coffee mug--clean--231220_084852_coffee mug_224.json) that contains keys corresponding to all frames/images in that video (e.g., "P1--coffee mug--clean--231220_084852_coffee mug_224--000001.jpeg": {"object_not_present_issue": false, "pii_present_issue": false}, "P1--coffee mug--clean--231220_084852_coffee mug_224--000002.jpeg": {"object_not_present_issue": false, "pii_present_issue": false}, ...). The ‘object_not_present_issue’ key is True if the object is not present in the image, and the ‘pii_present_issue’ key is True, if there is a personally identifiable information (PII) present in the image. Note, all PII present in the images has been blurred to protect the identity and privacy of our data collectors. This dataset version was created by cropping images originally sized at 1080 × 1920; therefore, an unscaled version of the dataset will follow soon.
This project was funded by the Engineering and Physical Sciences Research Council (EPSRC) Industrial ICASE Award with Microsoft Research UK Ltd. as the Industrial Project Partner. We would like to acknowledge and express our gratitude to our data collectors for their efforts and time invested in carefully collecting videos to build this dataset for their community. The dataset is designed for developing few-shot learning algorithms, aiming to support researchers and developers in advancing object-recognition systems. We are excited to share this dataset and would love to hear from you if and how you use this dataset. Please feel free to reach out if you have any questions, comments or suggestions.
REFERENCES:
Daniela Massiceti, Lida Theodorou, Luisa Zintgraf, Matthew Tobias Harris, Simone Stumpf, Cecily Morrison, Edward Cutrell, and Katja Hofmann. 2021. ORBIT: A real-world few-shot dataset for teachable object recognition collected from people who are blind or low vision. DOI: https://doi.org/10.25383/city.14294597
microsoft/ORBIT-Dataset. https://github.com/microsoft/ORBIT-Dataset
Linda Yilin Wen, Cecily Morrison, Martin Grayson, Rita Faia Marques, Daniela Massiceti, Camilla Longden, and Edward Cutrell. 2024. Find My Things: Personalized Accessibility through Teachable AI for People who are Blind or Low Vision. In Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems (CHI EA '24). Association for Computing Machinery, New York, NY, USA, Article 403, 1–6. https://doi.org/10.1145/3613905.3648641
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
About Dataset (strawberries, peaches, pomegranates) Photo requirements: 1-White background 2-.jpg 3- Image size 300*300 The number of photos required is 250 photos of each fruit when it is fresh and 250 photos of each Fruit Dataset for Classification when it is rotten. Total 1500 images
Diverse Collection With a diverse collection of Product images, the files provides an excellent foundation for developing and testing machine learning models designed for image recognition and allocation. Each image is captured under different lighting conditions and backgrounds, offering a realistic challenge for algorithms to overcome.
Real-World Applications The variability in the dataset ensures that models trained on it can generalize well to real-world scenarios, making them robust and reliable. The dataset includes common fruits such as apples, bananas, oranges, and strawberries, among others, allowing for comprehensive training and evaluation.
Industry Use Cases One of the significant advantages of using the Fruits Dataset for Classification is its applicability in various fields such as agriculture, retail, and the food industry. In agriculture, it can help automate the process of fruit sorting and grading, enhancing efficiency and reducing labor costs. In retail, it can be used to develop automated checkout systems that accurately identify fruits, streamlining the purchasing process.
Educational Value The dataset is also valuable for educational purposes, providing students and educators with a practical tool to learn and teach machine learning concepts. By working with this dataset, learners can gain hands-on experience in data preprocessing, model training, and evaluation.
Conclusion The Fruits Dataset for Classification is a versatile and indispensable resource for advancing the field of image classification. Its diverse and high-quality images, coupled with practical applications, make it a go-to dataset for researchers, developers, and educators aiming to improve and innovate in machine learning and computer vision.
This dataset is sourced from Kaggle.
https://www.reddit.com/wiki/apihttps://www.reddit.com/wiki/api
This dataset contains labeled data for gun detection collected from various videos on YouTube. The dataset has been specifically curated and labeled by me to aid in training machine learning models, particularly for real-time gun detection tasks. It is formatted for easy use with YOLO (You Only Look Once), one of the most popular object detection models.
Key Features: Source: The videos were sourced from YouTube and feature diverse environments, including indoor and outdoor settings, with varying lighting conditions and backgrounds. Annotations: The dataset is fully labeled with bounding boxes around guns, following the YOLO format (.txt files for annotations). Each annotation provides the class (gun) and the coordinates of the bounding box. YOLO-Compatible: The dataset is ready to be used with any YOLO model (YOLOv3, YOLOv4, YOLOv5, etc.), ensuring seamless integration for object detection training. Realistic Scenarios: The dataset includes footage of guns from various perspectives and angles, making it useful for training models that can generalize to real-world detection tasks. This dataset is ideal for researchers and developers working on gun detection systems, security applications, or surveillance systems that require fast and accurate detection of firearms.
GitHub is how people build software and is home to the largest community of open source developers in the world, with over 12 million people contributing to 31 million projects on GitHub since 2008.
This 3TB+ dataset comprises the largest released source of GitHub activity to date. It contains a full snapshot of the content of more than 2.8 million open source GitHub repositories including more than 145 million unique commits, over 2 billion different file paths, and the contents of the latest revision for 163 million files, all of which are searchable with regular expressions.
You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.github_repos.[TABLENAME]
. Fork this kernel to get started to learn how to safely manage analyzing large BigQuery datasets.
This dataset was made available per GitHub's terms of service. This dataset is available via Google Cloud Platform's Marketplace, GitHub Activity Data, as part of GCP Public Datasets.
Description:
This dataset consists of a diverse collection of images featuring Paimon, a popular character from the game Genshin Impact. The images have been sourced from in-game gameplay footage and capture Paimon from various angles and in different sizes (scales), making the dataset suitable for training YOLO object detection models.
The dataset provides a comprehensive view of Paimon in different lighting conditions, game environments, and positions, ensuring the model can generalize well to similar characters or object detection tasks. While most annotations are accurately labeled, a small number of annotations may include minor inaccuracies due to manual labeling errors. This is ideal for researchers and developers working on character recognition, object detection in gaming environments, or other AI vision tasks.
Download Dataset
Dataset Features:
Image Format: .jpg files in 640×320 resolution.
Annotation Format: .txt files in YOLO format, containing bounding box data with:
class_id
x_center
y_center
width
height
Use Cases:
Character Detection in Games: Train YOLO models to detect and identify in-game characters or NPCs.
Gaming Analytics: Improve recognition of specific game elements for AI-powered game analytics tools.
Research: Contribute to academic research focused on object detection or computer vision in animated and gaming environments.
Data Structure:
Images: High-quality .jpg images captured from multiple perspectives, ensuring robust model training across various orientations and lighting scenarios.
Annotations: Each image has an associated .txt file that follows the YOLO format. The annotations are structured to include class identification, object location (center coordinates), and
bounding box dimensions.
Key Advantages:
Varied Angles and Scales: The dataset includes Paimon from multiple perspectives, aiding in creating more versatile and adaptable object detection models.
Real-World Scenario: Extracted from actual gameplay footage, the dataset simulates real-world detection challenges such as varying backgrounds, motion blur, and changing character scales.
Training Ready: Suitable for training YOLO models and other deep learning frameworks that require object detection capabilities.
This dataset is sourced from Kaggle.
Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
There are many developers in the world of video-games. Here they are!
This is a short dataset that contains information about video-games publishers. The idea behind the data is to explain a little bit some information abut those video-games publishers.
The idea behind this dataset is to complement the video-games-sales-2019 dataset.