## Overview
Head Data Set 2 is a dataset for object detection tasks - it contains Heads QiDz annotations for 2,342 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
This dataset was created by Mira Küçük
Dataset Card for "AI-Generated-vs-Real-Images-Datasets"
More Information needed
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparison of OR tables between the interaction of rs7522462 and rs11945978 in the WTCCC data with the shared controls (left) and the interaction of the proxy SNPs, rs296533 and rs2089509 in the IBDGC data (right). The legend to this table is the same as that of Table 3.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The complete dataset used in the analysis comprises 36 samples, each described by 11 numeric features and 1 target. The attributes considered were caspase 3/7 activity, Mitotracker red CMXRos area and intensity (3 h and 24 h incubations with both compounds), Mitosox oxidation (3 h incubation with the referred compounds) and oxidation rate, DCFDA fluorescence (3 h and 24 h incubations with either compound) and oxidation rate, and DQ BSA hydrolysis. The target of each instance corresponds to one of the 9 possible classes (4 samples per class): Control, 6.25, 12.5, 25 and 50 µM for 6-OHDA and 0.03, 0.06, 0.125 and 0.25 µM for rotenone. The dataset is balanced, it does not contain any missing values and data was standardized across features. The small number of samples prevented a full and strong statistical analysis of the results. Nevertheless, it allowed the identification of relevant hidden patterns and trends.
Exploratory data analysis, information gain, hierarchical clustering, and supervised predictive modeling were performed using Orange Data Mining version 3.25.1 [41]. Hierarchical clustering was performed using the Euclidean distance metric and weighted linkage. Cluster maps were plotted to relate the features with higher mutual information (in rows) with instances (in columns), with the color of each cell representing the normalized level of a particular feature in a specific instance. The information is grouped both in rows and in columns by a two-way hierarchical clustering method using the Euclidean distances and average linkage. Stratified cross-validation was used to train the supervised decision tree. A set of preliminary empirical experiments were performed to choose the best parameters for each algorithm, and we verified that, within moderate variations, there were no significant changes in the outcome. The following settings were adopted for the decision tree algorithm: minimum number of samples in leaves: 2; minimum number of samples required to split an internal node: 5; stop splitting when majority reaches: 95%; criterion: gain ratio. The performance of the supervised model was assessed using accuracy, precision, recall, F-measure and area under the ROC curve (AUC) metrics.
The data set description provides a detail account of the type of data that is used within the peer-reviewed literature. The data involves special instrumentation, such as hyperspectral imaging cameras to develop thousands of pixels, which form images, like on a television screen. Other data is used to develop absorbance spectra from infrared spectrometers and compared to reference data to confirm the presence of a desired, tested chemical. This dataset is associated with the following publication: Baseley, D., L. Wunderlich, G. Phillips, K. Gross, G. Perram, S. Willison, M. Magnuson, S. Lee, R. Phillips, and W. Harper Jr.. Hyperspectral Analysis for Standoff Detection of Dimethyl Methylphosphonate on Building Materials [HS7.52.01]. JOURNAL OF ENVIRONMENTAL MANAGEMENT. Elsevier Science Ltd, New York, NY, USA, 135-142, (2016).
Models and external data of 3rd place efficiency solution for https://www.kaggle.com/competitions/pii-detection-removal-from-educational-data competition.
See https://www.kaggle.com/code/devinanzelmo/piidd-efficiency-3rd-process-external-data for links to external data and processing code
See https://www.kaggle.com/code/devinanzelmo/piidd-efficiency-3rd-train for training code that generated models.
See https://www.kaggle.com/code/devinanzelmo/piidd-efficiency-3rd-inference for inference code
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains a couple of great open source models!
NousResearch/Nous-Hermes-Llama2-13b
) that we can load on Kaggle! didn't manage to load anything larger than 13Bcurated-transformers
that should allow for easier modifications of the underlying architectures.This dataset also includes all the dependencies we need to load the model in 8bit, if that is what you would like to do (updated version of transfomers
, accelerate
, etc).
I show how to load and run Nous-Hermes-Llama2-13b
in the following notebook:
👉 💡 Best Open Source LLM Starter Pack 🧪🚀
If you find this dataset helpful, please leave an upvote! 🙂 Thank you! 🙏
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset captures various parameters, network conditions, and their corresponding optimized performance metrics. It is designed to analyze and optimize communication performance for UAV networks under diverse scenarios.
Dataset Features Scenario ID: Unique identifier for each simulation scenario. UAV Density: The number of UAVs in the network, ranging from 20 to 200. Movement Pattern: Describes the movement pattern of UAVs in the network. Options include: Random Circular Linear Weather: Environmental conditions affecting the network. Options include: Clear Windy Rainy Foggy Initial Energy (J): Initial energy of the UAVs in joules. Bandwidth (MHz): The bandwidth allocated for communication in megahertz. Latency (ms): Initial communication latency in milliseconds. Energy Consumption (J): Energy consumed during network operations in joules. Throughput (Mbps): Initial data throughput in megabits per second. Packet Loss (%): Percentage of packets lost during communication. Optimized Metrics : Optimized Latency (ms): Improved latency. Optimized Throughput (Mbps): Enhanced data throughput after optimization. Optimized Energy Efficiency (%): Energy efficiency after optimization, calculated as the percentage of remaining energy. Target Column: Performance Target: Classification of each scenario based on the optimized metrics: High Performance: Scenarios with optimal latency, throughput, and energy efficiency. Moderate Performance: Scenarios meeting moderate optimization criteria. Low Performance: Scenarios failing to meet high or moderate criteria.
https://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdfhttps://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdf
This file contains raw data for cameras and wearables of the ConfLab dataset.
./cameras
contains the overhead video recordings for 9 cameras (cam2-10) in MP4 files.
These cameras cover the whole interaction floor, with camera 2 capturing the
bottom of the scene layout, and camera 10 capturing top of the scene layout.
Note that cam5 ran out of battery before the other cameras and thus the recordings
are cut short. However, cam4 and 6 contain significant overlap with cam 5, to
reconstruct any information needed.
Note that the annotations are made and provided in 2 minute segments.
The annotated portions of the video include the last 3min38sec of x2xxx.MP4
video files, and the first 12 min of x3xxx.MP4 files for cameras (2,4,6,8,10),
with "x" being the placeholder character in the mp4 file names. If one wishes
to separate the video into 2 min segments as we did, the "video-splitting.sh"
script is provided.
./camera-calibration contains the camera instrinsic files obtained from
https://github.com/idiap/multicamera-calibration. Camera extrinsic parameters can
be calculated using the existing intrinsic parameters and the instructions in the
multicamera-calibration repo. The coordinates in the image are provided by the
crosses marked on the floor, which are visible in the video recordings.
The crosses are 1m apart (=100cm).
./wearables
subdirectory includes the IMU, proximity and audio data from each
participant at the Conflab event (48 in total). In the directory numbered
by participant ID, the following data are included:
1. raw audio file
2. proximity (bluetooth) pings (RSSI) file (raw and csv) and a visualization
3. Tri-axial accelerometer data (raw and csv) and a visualization
4. Tri-axial gyroscope data (raw and csv) and a visualization
5. Tri-axial magnetometer data (raw and csv) and a visualization
6. Game rotation vector (raw and csv), recorded in quaternions.
All files are timestamped.
The sampling frequencies are:
- audio: 1250 Hz
- rest: around 50Hz. However, the sample rate is not fixed
and instead the timestamps should be used.
For rotation, the game rotation vector's output frequency is limited by the
actual sampling frequency of the magnetometer. For more information, please refer to
https://invensense.tdk.com/wp-content/uploads/2016/06/DS-000189-ICM-20948-v1.3.pdf
Audio files in this folder are in raw binary form. The following can be used to convert
them to WAV files (1250Hz):
ffmpeg -f s16le -ar 1250 -ac 1 -i /path/to/audio/file
Synchronization of cameras and werables data
Raw videos contain timecode information which matches the timestamps of the data in
the "wearables" folder. The starting timecode of a video can be read as:
ffprobe -hide_banner -show_streams -i /path/to/video
./audio
./sync: contains wav files per each subject
./sync_files: auxiliary csv files used to sync the audio. Can be used to improve the synchronization.
The code used for syncing the audio can be found here:
https://github.com/TUDelft-SPC-Lab/conflab/tree/master/preprocessing/audio
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The application of Artificial Intelligence (AI) has been evident in the agricultural sector recently. The main goal of AI in agriculture is to improve crop yield, control crop pests/diseases, and reduce cost. The agricultural sector in developing countries faces severe in the form of disease and pest infestation, the knowledge gap between farmers and technology, and a lack of storage facilities, among others. To help address some of these challenges, this work presents crop pests/disease datasets sourced from local farms in Ghana. The dataset is presented in two folds; the raw images which consists of 24,881 images ( 6,549-Cashew, 7,508-Cassava, 5,389-Maize, and 5,435-Tomato) and augmented images which is further split into train and test set consists of 102,976 images (25,811-Cashew, 26,330-Cassava, 23,657-Maize, and 27,178-Tomato), categorized into 22 classes. All images are de-identified, validated by expert plant virologists, and freely available for use by the research community.
These datasets each consist of a folder containing a personal geodatabase of the NHD, and shapefiles used in the HydroDEM process. These files are provided as a means to document exactly which lines were used to develop the HydroDEMs. Each folder contains a line shapefile named for the 8-digit HUC code, containing the NHD flowlines that comprise the coastline for that island. The “hydrolines.shp” shapefile contains the lines that were burned into the DEM. These lines were selected from the NHD flowlines, with some minor editing in places. The “wbpolys.shp” shapefile contains the water-body polygons that were selected from the NHD and used in the bathymetric gradient process. The folders for HUCs 20010000 (Hawaii) and 20020000 (Maui) also contain a “walls.shp” shapefile, which contains the lines that were superimposed on the surface as “walls.”
AgentTrek Data Collection
AgentTrek dataset is the training dataset for the Web agent AgentTrek-1.0-32B. It consists of a total of 52,594 dialogue turns, specifically designed to train a language model for performing web-based tasks, such as browsing and web shopping. The dialogues in this dataset simulate interactions where the agent assists users in tasks like searching for information, comparing products, making purchasing decisions, and navigating websites.
Dataset… See the full description on the dataset page: https://huggingface.co/datasets/xlangai/AgentTrek.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
What this collection is: A curated, binary-classified image dataset of grayscale (1 band) 400 x 400-pixel size, or image chips, in a JPEG format extracted from processed Sentinel-1 Synthetic Aperture Radar (SAR) satellite scenes acquired over various regions of the world, and featuring clear open ocean chips, look-alikes (wind or biogenic features) and oil slick chips.
This binary dataset contains chips labelled as:
- "0" for chips not containing any oil features (look-alikes or clean seas)
- "1" for those containing oil features.
This binary dataset is imbalanced, and biased towards "0" labelled chips (i.e., no oil features), which correspond to 66% of the dataset. Chips containing oil features, labelled "1", correspond to 34% of the dataset.
Why: This dataset can be used for training, validation and/or testing of machine learning, including deep learning, algorithms for the detection of oil features in SAR imagery. Directly applicable for algorithm development for the European Space Agency Sentinel-1 SAR mission (https://sentinel.esa.int/web/sentinel/missions/sentinel-1 ), it may be suitable for the development of detection algorithms for other SAR satellite sensors.
Overview of this dataset: Total number of chips (both classes) is N=5,630 Class 0 1 Total 3,725 1,905
Further information and description is found in the ReadMe file provided (ReadMe_Sentinel1_SAR_OilNoOil_20221215.txt)
https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/
SmolLM-Corpus
This dataset is a curated collection of high-quality educational and synthetic data designed for training small language models. You can find more details about the models trained on this dataset in our SmolLM blog post.
Dataset subsets
Cosmopedia v2
Cosmopedia v2 is an enhanced version of Cosmopedia, the largest synthetic dataset for pre-training, consisting of over 39 million textbooks, blog posts, and stories generated by… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus.
Techsalerator's Corporate Actions Dataset in Israel offers a comprehensive collection of data fields related to corporate actions, providing valuable insights for investors, traders, and financial institutions. This dataset includes crucial information about the various financial instruments of all 473 companies traded on the Tel-Aviv Stock Exchange (XTAE).
Top 5 used data fields in the Corporate Actions Dataset for Israel:
Dividend Declaration Date: The date on which a company's board of directors announces the dividend payout to its shareholders. This information is crucial for investors who rely on dividends as a source of income.
Stock Split Ratio: The ratio by which a company's shares are split to increase liquidity and affordability. This field is essential for understanding changes in share structure.
Merger Announcement Date: The date on which a company officially announces its intention to merge with another entity. This field is crucial for investors assessing the impact of potential mergers on their investments.
Rights Issue Record Date: The date on which shareholders must be on the company's books to be eligible for participating in a rights issue. This data helps investors plan their participation in fundraising events.
Bonus Issue Ex-Date: The date on which a company's shares start trading without the value of the bonus issue. This information is vital for investors to adjust their portfolios accordingly.
Top 5 corporate actions in Israel:
Technology and Startups: Corporate actions in Israel's renowned technology sector, including mergers, acquisitions, and initial public offerings (IPOs), are crucial for the country's innovation ecosystem and its reputation as the "Startup Nation."
Healthcare and Life Sciences: Corporate actions related to pharmaceuticals, medical research, and healthcare startups contribute to Israel's reputation as a hub for medical innovation and cutting-edge research.
Cybersecurity and Defense Technology: Corporate actions in the cybersecurity and defense technology sectors reflect Israel's expertise in developing advanced cybersecurity solutions and defense systems.
Renewable Energy and Cleantech: Corporate actions related to renewable energy projects and cleantech initiatives align with Israel's efforts to develop sustainable energy sources and address environmental challenges.
Financial Services and Fintech: Corporate actions involving financial technology (fintech) startups, digital payment solutions, and blockchain technology contribute to Israel's financial services sector's modernization.
Top 5 financial instruments with corporate action Data in Israel
Israel Stock Exchange (ISE) Domestic Company Index: The main index that tracks the performance of domestic companies listed on the Israel Stock Exchange. This index would provide insights into the performance of the Israeli stock market.
Israel Stock Exchange (ISE) Foreign Company Index: The index that tracks the performance of foreign companies listed on the Israel Stock Exchange, if foreign listings were present. This index would give an overview of foreign business involvement in Israel.
SuperMart Israel: An Israel-based supermarket chain with operations in multiple regions. SuperMart focuses on providing essential products to local communities and contributing to the retail sector's growth.
FinanceIsrael: A financial services provider in Israel with a focus on promoting financial inclusion and access to banking services, particularly among underserved communities.
AgriTech Israel: A company dedicated to advancing agricultural technology in Israel, focusing on optimizing crop yields and improving food security to support the country's agricultural sector.
If you're interested in accessing Techsalerator's End-of-Day Pricing Data for Israel, please contact info@techsalerator.com with your specific requirements. Techsalerator will provide you with a customized quote based on the number of data fields and records you need. The dataset can be delivered within 24 hours, and ongoing access options can be discussed if needed.
Data fields included:
Dividend Declaration Date Stock Split Ratio Merger Announcement Date Rights Issue Record Date Bonus Issue Ex-Date Stock Buyback Date Spin-Off Announcement Date Dividend Record Date Merger Effective Date Rights Issue Subscription Price
Q&A:
How much does the Corporate Actions Dataset cost in Israel?
The cost of the Corporate Actions Dataset may vary depending on factors such as the number of data fields, the frequency of updates, and the total records count. For precise pricing details, it is recommended to directly consult with a Techsalerator Data specialist.
How complete is the Corporate Actions Dataset coverage in Israel ?
Techsalerator provides comprehensive coverage of Corporate Actions Data for various companies and...
This dataset was created by JINSHEL GEORGE
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains a preprocessed version of the publicly available MNIST handwritten digit dataset, formatted for use in the research paper "A fast dictionary-learning-based classification scheme using undercomplete dictionaries".The data has been converted into vector form and sorted into .mat
files by class label, ranging from 0 to 9. The files are formated as training and testing where the training data has X_train as vectorized images and Y_train as the corresponding labels and X_test and Y_test as the image and labels for testing dataset.**Contents:**X_train_vector_sort_MNISTY_train_MNISTX_test_vector_MNISTY_test_MNIST**Usage:**The dataset is intended for direct use with the code available at:https://github.com/saeedmohseni97/fast-udl-classification
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Kiawah Island population by age. The dataset can be utilized to understand the age distribution and demographics of Kiawah Island.
The dataset constitues the following three datasets
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
Temperature profile and sound velocity data were collected using CTD, XCTD, and XBT casts in the Arctic Ocean, Mediterranean Sea - Eastern Basin, North Pacific Ocean, South Pacific Ocean, and Southern Oceans from April 11, 1975 to August 31, 1998. Data were collected by the US Naval Oceanographic Office as part of the Master Oceanographic Observation Data Set (MOODS) project.
## Overview
Head Data Set 2 is a dataset for object detection tasks - it contains Heads QiDz annotations for 2,342 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.