57 datasets found

n
Data from: Exploring deep learning techniques for wild animal behaviour...
data.niaid.nih.gov
search.dataone.org
+1more
zip
Updated Feb 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ryoma Otsuka; Naoya Yoshimura; Kei Tanigaki; Shiho Koyama; Yuichi Mizutani; Ken Yoda; Takuya Maekawa (2024). Exploring deep learning techniques for wild animal behaviour classification using animal-borne accelerometers [Dataset]. http://doi.org/10.5061/dryad.2ngf1vhwk
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.2ngf1vhwk
Dataset updated
Feb 22, 2024
Dataset provided by
Osaka University
Nagoya University
Authors
Ryoma Otsuka; Naoya Yoshimura; Kei Tanigaki; Shiho Koyama; Yuichi Mizutani; Ken Yoda; Takuya Maekawa
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Machine learning‐based behaviour classification using acceleration data is a powerful tool in bio‐logging research. Deep learning architectures such as convolutional neural networks (CNN), long short‐term memory (LSTM) and self‐attention mechanisms as well as related training techniques have been extensively studied in human activity recognition. However, they have rarely been used in wild animal studies. The main challenges of acceleration‐based wild animal behaviour classification include data shortages, class imbalance problems, various types of noise in data due to differences in individual behaviour and where the loggers were attached and complexity in data due to complex animal‐specific behaviours, which may have limited the application of deep learning techniques in this area. To overcome these challenges, we explored the effectiveness of techniques for efficient model training: data augmentation, manifold mixup and pre‐training of deep learning models with unlabelled data, using datasets from two species of wild seabirds and state‐of‐the‐art deep learning model architectures. Data augmentation improved the overall model performance when one of the various techniques (none, scaling, jittering, permutation, time‐warping and rotation) was randomly applied to each data during mini‐batch training. Manifold mixup also improved model performance, but not as much as random data augmentation. Pre‐training with unlabelled data did not improve model performance. The state‐of‐the‐art deep learning models, including a model consisting of four CNN layers, an LSTM layer and a multi‐head attention layer, as well as its modified version with shortcut connection, showed better performance among other comparative models. Using only raw acceleration data as inputs, these models outperformed classic machine learning approaches that used 119 handcrafted features. Our experiments showed that deep learning techniques are promising for acceleration‐based behaviour classification of wild animals and highlighted some challenges (e.g. effective use of unlabelled data). There is scope for greater exploration of deep learning techniques in wild animal studies (e.g. advanced data augmentation, multimodal sensor data use, transfer learning and self‐supervised learning). We hope that this study will stimulate the development of deep learning techniques for wild animal behaviour classification using time‐series sensor data.

This abstract is cited from the original article "Exploring deep learning techniques for wild animal behaviour classification using animal-borne accelerometers" in Methods in Ecology and Evolution (Otsuka et al., 2024).Please see README for the details of the datasets.
Z
Unlabeled AnuraSet: A dataset for leveraging unlabeled data in machine...
data.niaid.nih.gov
Updated May 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cañas, Juan Sebastián (2024). Unlabeled AnuraSet: A dataset for leveraging unlabeled data in machine learning models for passive acoustic monitoring [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11244813
Explore at:
Dataset updated
May 27, 2024
Dataset provided by
Juan Sebastián, Ulloa
Cañas, Juan Sebastián
Diego, Llusia
Selvino, Neckel De Oliveira
Soundclim Network
María Paula, Toro-Gómez
Franco Leandro, De Souza
Larissa Sayuri, Moreira Sugai
Rogerio, Pereira Bastos
Toledo, Luis Felipe
License
Attribution 1.0 (CC BY 1.0)https://creativecommons.org/licenses/by/1.0/
License information was derived automatically
Description
The Unlabeled AnuraSet (U-AnuraSet) is an extension of the original AnuraSet dataset. It consists of soundscape recordings from passive acoustic monitoring conducted in Brazil. The recording sites are identical to those in the original AnuraSet. Each site comprises 2,666 one-minute raw audio files of unlabeled data. The U-AnuraSet is publicly available to encourage machine learning researchers to explore innovative methods for leveraging unlabeled data in the training of models aimed at solving problems such as anuran call identification.

If you find the Unlabeled AnuraSet useful for your research, please consider citing it as follows:

Cañas, J.S., Toro-Gómez, M.P., Sugai, L.S.M., et al. A dataset for benchmarking Neotropical anuran calls identification in passive acoustic monitoring. Sci Data 10, 771 (2023). https://doi.org/10.1038/s41597-023-02666-2
R
AI in Unsupervised Learning Market Market Research Report 2033
researchintelo.com
csv, pdf, pptx
Updated Jul 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Research Intelo (2025). AI in Unsupervised Learning Market Market Research Report 2033 [Dataset]. https://researchintelo.com/report/ai-in-unsupervised-learning-market-market
Explore at:
pdf, csv, pptxAvailable download formats
Dataset updated
Jul 24, 2025
Dataset authored and provided by
Research Intelo
License
https://researchintelo.com/privacy-and-policyhttps://researchintelo.com/privacy-and-policy
Time period covered
2024 - 2033
Area covered
Global
Description
AI in Unsupervised Learning Market Outlook

According to our latest research, the AI in Unsupervised Learning market size reached USD 3.8 billion globally in 2024, demonstrating robust expansion as organizations increasingly leverage unsupervised techniques for extracting actionable insights from unlabelled data. The market is forecasted to grow at a CAGR of 28.2% from 2025 to 2033, propelling the industry to an estimated USD 36.7 billion by 2033. This remarkable growth trajectory is primarily fueled by the escalating adoption of artificial intelligence across diverse sectors, an exponential surge in data generation, and the pressing need for advanced analytics that can operate without manual data labeling.

One of the key growth factors driving the AI in Unsupervised Learning market is the rising complexity and volume of data generated by enterprises in the digital era. Organizations are inundated with unstructured and unlabelled data from sources such as social media, IoT devices, and transactional systems. Traditional supervised learning methods are often impractical due to the time and cost associated with manual labeling. Unsupervised learning algorithms, such as clustering and dimensionality reduction, offer a scalable solution by autonomously identifying patterns, anomalies, and hidden structures within vast datasets. This capability is increasingly vital for industries aiming to enhance decision-making, streamline operations, and gain a competitive edge through advanced analytics.

Another significant driver is the rapid advancement in computational power and AI infrastructure, which has made it feasible to implement sophisticated unsupervised learning models at scale. The proliferation of cloud computing and specialized AI hardware has reduced barriers to entry, enabling even small and medium enterprises to deploy unsupervised learning solutions. Additionally, the evolution of neural networks and deep learning architectures has expanded the scope of unsupervised algorithms, allowing for more complex tasks such as image recognition, natural language processing, and anomaly detection. These technological advancements are not only accelerating adoption but also fostering innovation across sectors including healthcare, finance, manufacturing, and retail.

Furthermore, regulatory compliance and the growing emphasis on data privacy are pushing organizations to adopt unsupervised learning methods. Unlike supervised approaches that require sensitive data labeling, unsupervised algorithms can process data without explicit human intervention, thereby reducing the risk of privacy breaches. This is particularly relevant in sectors such as healthcare and BFSI, where stringent data protection regulations are in place. The ability to derive insights from unlabelled data while maintaining compliance is a compelling value proposition, further propelling the market forward.

Regionally, North America continues to dominate the AI in Unsupervised Learning market owing to its advanced technological ecosystem, significant investments in AI research, and strong presence of leading market players. Europe follows closely, driven by robust regulatory frameworks and a focus on ethical AI deployment. The Asia Pacific region is exhibiting the fastest growth, fueled by rapid digital transformation, government initiatives, and increasing adoption of AI across industries. Latin America and the Middle East & Africa are also witnessing steady growth, albeit at a slower pace, as awareness and infrastructure continue to develop.

Component Analysis

The Component segment of the AI in Unsupervised Learning market is categorized into Software, Hardware, and Services, each playing a pivotal role in the overall ecosystem. The software segment, comprising machine learning frameworks, data analytics platforms, and AI development tools, holds the largest market share. This dominance is attributed to the continuous evolution of AI algorithms and the increasing availability of open-source and proprietary solutions tailored for unsupervised learning. Enterprises are investing heavily in software that can facilitate the seamless integration of unsupervised learning capabilities into existing workflows, enabling automation, predictive analytics, and pattern recognition without the need for labeled data.

The hardware segment, while smaller in comparison to software, is experiencing significant growth due to the escalating demand for high-perf
f
Sentiment140 tweet statistics.
plos.figshare.com
xls
Updated Apr 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maha Ijaz; Naveed Anwar; Mejdl Safran; Sultan Alfarhood; Tariq Sadad; Imran (2024). Sentiment140 tweet statistics. [Dataset]. http://doi.org/10.1371/journal.pone.0297028.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0297028.t004
Dataset updated
Apr 1, 2024
Dataset provided by
PLOS ONE
Authors
Maha Ijaz; Naveed Anwar; Mejdl Safran; Sultan Alfarhood; Tariq Sadad; Imran
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Machine learning techniques that rely on textual features or sentiment lexicons can lead to erroneous sentiment analysis. These techniques are especially vulnerable to domain-related difficulties, especially when dealing in Big data. In addition, labeling is time-consuming and supervised machine learning algorithms often lack labeled data. Transfer learning can help save time and obtain high performance with fewer datasets in this field. To cope this, we used a transfer learning-based Multi-Domain Sentiment Classification (MDSC) technique. We are able to identify the sentiment polarity of text in a target domain that is unlabeled by looking at reviews in a labelled source domain. This research aims to evaluate the impact of domain adaptation and measure the extent to which transfer learning enhances sentiment analysis outcomes. We employed transfer learning models BERT, RoBERTa, ELECTRA, and ULMFiT to improve the performance in sentiment analysis. We analyzed sentiment through various transformer models and compared the performance of LSTM and CNN. The experiments are carried on five publicly available sentiment analysis datasets, namely Hotel Reviews (HR), Movie Reviews (MR), Sentiment140 Tweets (ST), Citation Sentiment Corpus (CSC), and Bioinformatics Citation Corpus (BCC), to adapt multi-target domains. The performance of numerous models employing transfer learning from diverse datasets demonstrating how various factors influence the outputs.
d
Data from: Performance of unmarked abundance models with data from...
dataone.org
search.dataone.org
+4more
Updated Aug 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cameron Fiss; Samuel Lapp; Jonathan Cohen; Halie Parker; Jeffery T. Larkin; Jeffery L. Larkin; Justin Kitzes (2024). Performance of unmarked abundance models with data from machine-learning classification of passive acoustic recordings [Dataset]. http://doi.org/10.5061/dryad.4j0zpc8k0
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.4j0zpc8k0
Dataset updated
Aug 28, 2024
Dataset provided by
Dryad Digital Repository
Authors
Cameron Fiss; Samuel Lapp; Jonathan Cohen; Halie Parker; Jeffery T. Larkin; Jeffery L. Larkin; Justin Kitzes
Time period covered
Jul 11, 2024
Description
The ability to conduct cost-effective wildlife monitoring at scale is rapidly increasing due to availability of inexpensive autonomous recording units (ARUs) and automated species recognition, presenting a variety of advantages over human-based surveys. However, estimating abundance with such data collection techniques remains challenging because most abundance models require data that are difficult for low-cost monoaural ARUs to gather (e.g., counts of individuals, distance to individuals), especially when using the output of automated species recognition. Statistical models that do not require counting or measuring distances to target individuals in combination with low-cost ARUs provide a promising way of obtaining abundance estimates for large-scale wildlife monitoring projects but remain untested. We present a case study using avian field data collected in forests of Pennsylvania during the Spring of 2020 and 2021 using both traditional point counts and passive acoustic monitoring ..., , , # Data and code for ARU and point-count abundance estimates

https://doi.org/10.5061/dryad.4j0zpc8k0

Â Description of the data and file structure

point_count_data_dryad.csv file contains Wood Thrush (WOTH) and Cerulean Warbler (CERW) point-count data. Columns are formatted as such: "Species Code_ Data type Visit number. For example, WOTH_count1 contains the number of Wood Thrush counted during visit 1. Rows=sites.

CERW_det_hist_2020-2021_dryad.csv and WOTH_det_hist_2020-2021_dryad.csv contain detection histories for Cerulean Warbler (CERW) and Wood Thrush (WOTH) from passive acoustic data processed with a machine-learning classifier and verified by human listener. The "det1", "det2", "det3", etc, columns represent whether the species occurred on day 1, 2, or 3, etc. of the survey day. The "ttd" column indicates which day the species was first detected over the 10-day window.Â

Code/Software

.R file contains code for four differe...
f
Data from: DeepMoney: Counterfeit Money Detection Using Generative...
figshare.com
application/x-rar
Updated Aug 8, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Toqeer Ali; Salman Jan (2019). DeepMoney: Counterfeit Money Detection Using Generative Adversarial Networks [Dataset]. http://doi.org/10.6084/m9.figshare.9164510.v3
Explore at:
application/x-rarAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.9164510.v3
Dataset updated
Aug 8, 2019
Dataset provided by
figshare
Authors
Toqeer Ali; Salman Jan
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Conventional paper currency and modern electronic currency are two important modes of transactions. In several parts of the world, conventional methodology has clear precedence over its electronic counterpart. However, the identification of forged currency paper notes is now becoming an increasingly crucial problem because of the new and improved tactics employed by counterfeiters. In this paper, a machine assisted system – dubbed DeepMoney– is proposed which has been developed to discriminate fake notes from genuine ones. For this purpose, state-of-the-art models of machine learning called Generative Adversarial Networks (GANs) are employed. GANs use an unsupervised learning to train a model that can then be used to perform supervised predictions. This flexibility provides the best of both worlds by allowing unlabelled data to be trained on whilst still making concrete predictions. This technique was applied to Pakistani banknotes. State-of-the-art image processing and feature recognition techniques were used to design the overall approach of a valid input. Augmented samples of images were used in the experiments which show that a high-precision machine can be developed to recognize genuine paper money. An accuracy of 80% has been achieved. The code is available as an open source to allow others to reproduce and build upon the efforts already made.
f
Explanations for each cluster in Adult dataset.
plos.figshare.com
xls
Updated Oct 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Liang Chen; Caiming Zhong; Zehua Zhang (2023). Explanations for each cluster in Adult dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0292960.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0292960.t004
Dataset updated
Oct 27, 2023
Dataset provided by
PLOS ONE
Authors
Liang Chen; Caiming Zhong; Zehua Zhang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Clustering is an unsupervised machine learning technique whose goal is to cluster unlabeled data. But traditional clustering methods only output a set of results and do not provide any explanations of the results. Although in the literature a number of methods based on decision tree have been proposed to explain the clustering results, most of them have some disadvantages, such as too many branches and too deep leaves, which lead to complex explanations and make it difficult for users to understand. In this paper, a hypercube overlay model based on multi-objective optimization is proposed to achieve succinct explanations of clustering results. The model designs two objective functions based on the number of hypercubes and the compactness of instances and then uses multi-objective optimization to find a set of nondominated solutions. Finally, an Utopia point is defined to determine the most suitable solution, in which each cluster can be covered by as few hypercubes as possible. Based on these hypercubes, an explanations of each cluster is provided. Upon verification on synthetic and real datasets respectively, it shows that the model can provide a concise and understandable explanations to users.
f
Explanations for each cluster in Iris dataset.
plos.figshare.com
xls
Updated Oct 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Liang Chen; Caiming Zhong; Zehua Zhang (2023). Explanations for each cluster in Iris dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0292960.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0292960.t003
Dataset updated
Oct 27, 2023
Dataset provided by
PLOS ONE
Authors
Liang Chen; Caiming Zhong; Zehua Zhang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Clustering is an unsupervised machine learning technique whose goal is to cluster unlabeled data. But traditional clustering methods only output a set of results and do not provide any explanations of the results. Although in the literature a number of methods based on decision tree have been proposed to explain the clustering results, most of them have some disadvantages, such as too many branches and too deep leaves, which lead to complex explanations and make it difficult for users to understand. In this paper, a hypercube overlay model based on multi-objective optimization is proposed to achieve succinct explanations of clustering results. The model designs two objective functions based on the number of hypercubes and the compactness of instances and then uses multi-objective optimization to find a set of nondominated solutions. Finally, an Utopia point is defined to determine the most suitable solution, in which each cluster can be covered by as few hypercubes as possible. Based on these hypercubes, an explanations of each cluster is provided. Upon verification on synthetic and real datasets respectively, it shows that the model can provide a concise and understandable explanations to users.
G
Generative AI Market Report
marketresearchforecast.com
doc, pdf, ppt
Updated Jan 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Forecast (2025). Generative AI Market Report [Dataset]. https://www.marketresearchforecast.com/reports/generative-ai-market-1667
Explore at:
doc, pdf, pptAvailable download formats
Dataset updated
Jan 2, 2025
Dataset authored and provided by
Market Research Forecast
License
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Generative AI Market size was valued at USD 43.87 USD Billion in 2023 and is projected to reach USD 453.28 USD Billion by 2032, exhibiting a CAGR of 39.6 % during the forecast period. The market's expansion is driven by the increasing adoption of AI in various industries, the growing demand for personalized experiences, and the advancement of machine learning and deep learning technologies. Generative AI is a form of AI technology that come with the capability to generate content in several of forms such us that include text, images, audio data, and artificial data. In the latest trend of the use of generative AI, fingertip friendly interfaces that allow for the creation of top-quality text design, and videos in a brief time of only seconds have been the leading cause of the hype around it. The AI technology called Generative AI employs a variety of techniques that its development is still being improved. Fundamentally, AI foundation models are based on training on a wide spate of unlabelled data that can be used for many tasks; working primarily on specific areas where additional fine-tuning finds its place. Over-simplifying the process, huge amounts of maths and computer power get used to develop AI models. Nevertheless, at its core, it is the predictions amplified. Generative AI relies on deep learning models – sophisticated machine learning models that work as neural networks and learn and take decisions just the human minds do. Such models are based on the detection and emission of codes of complex relationships or patterns in huge information volumes and that data is used to respond to users' original speech requests or questions with native language replies or new content. Recent developments include: June 2023: Salesforce launched two generative artificial intelligence (AI) products for commerce experience and customized consumers –Commerce GPT and Marketing GPT. The Marketing GPT model leverages data from Salesforce's real-time data cloud platform to generate more innovative audience segments, personalized emails, and marketing strategies., June 2023: Accenture and Microsoft are teaming up to help companies primarily transform their businesses by harnessing the power of generative AI accelerated by the cloud. It helps customers find the right way to build and extend technology in their business responsibly., May 2023: SAP SE partnered with Microsoft to help customers solve their fundamental business challenges with the latest enterprise-ready innovations. This integration will enable new experiences to improve how businesses attract, retain and qualify their employees. , April 2023: Amazon Web Services, Inc. launched a global generative AI accelerator for startups. The company’s Generative AI Accelerator offers access to impactful AI tools and models, machine learning stack optimization, customized go-to-market strategies, and more., March 2023: Adobe and NVIDIA have partnered to join the growth of generative AI and additional advanced creative workflows. Adobe and NVIDIA will innovate advanced AI models with new generations aiming at tight integration into the applications that significant developers and marketers use. . Key drivers for this market are: Growing Necessity to Create a Virtual World in the Metaverse to Drive the Market. Potential restraints include: Risks Related to Data Breaches and Sensitive Information to Hinder Market Growth . Notable trends are: Rising Awareness about Conversational AI to Transform the Market Outlook .
D
Machine Learning Courses Market Report | Global Forecast From 2025 To 2033
dataintelo.com
csv, pdf, pptx
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). Machine Learning Courses Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/machine-learning-courses-market
Explore at:
csv, pptx, pdfAvailable download formats
Dataset updated
Jan 7, 2025
Authors
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Machine Learning Courses Market Outlook

The global market size of Machine Learning (ML) courses is witnessing substantial growth, with market valuation expected to reach $3.1 billion in 2023 and projected to soar to $12.6 billion by 2032, exhibiting a robust CAGR of 16.5% over the forecast period. This rapid expansion is fueled by the increasing adoption of artificial intelligence (AI) and machine learning technologies across various industries, the rising need for upskilling and reskilling in the workforce, and the growing penetration of online education platforms.

One of the most significant growth factors driving the ML courses market is the escalating demand for AI and ML expertise in the job market. As industries increasingly integrate AI and machine learning into their operations to enhance efficiency and innovation, there is a burgeoning need for professionals with relevant skills. Companies across sectors such as finance, healthcare, retail, and manufacturing are investing heavily in training programs to bridge the skills gap, thus driving the demand for ML courses. Additionally, the rapid evolution of technology necessitates continuous learning, further bolstering market growth.

Another crucial factor contributing to the market's expansion is the proliferation of online education platforms that offer flexible and affordable ML courses. Platforms like Coursera, Udacity, edX, and Khan Academy have made high-quality education accessible to a global audience. These platforms offer an array of courses tailored to different skill levels, from beginners to advanced learners, making it easier for individuals to pursue continuous learning and career advancement. The convenience and flexibility of online learning are particularly appealing to working professionals and students, thereby driving the market's growth.

The increasing collaboration between educational institutions and technology companies is also playing a pivotal role in the growth of the ML courses market. Many universities and colleges are partnering with leading tech firms to develop specialized curricula that align with industry requirements. These collaborations help ensure that the courses offered are up-to-date with the latest technological advancements and industry standards. As a result, students and professionals are better equipped with the skills needed to thrive in a technology-driven job market, further propelling the demand for ML courses.

On a regional level, North America holds a significant share of the ML courses market, driven by the presence of numerous leading tech companies and educational institutions, as well as a highly skilled workforce. The region's strong emphasis on innovation and technological advancement is a key driver of market growth. Additionally, Asia Pacific is emerging as a lucrative market for ML courses, with countries like China, India, and Japan witnessing increased investments in AI and ML education and training. The rising internet penetration, growing popularity of online education, and government initiatives to promote digital literacy are some of the factors contributing to the market's growth in this region.

Self-Supervised Learning, a cutting-edge approach in the realm of machine learning, is gaining traction as a pivotal element in the development of more autonomous AI systems. Unlike traditional supervised learning, which relies heavily on labeled data, self-supervised learning leverages unlabeled data to train models, significantly reducing the dependency on human intervention for data annotation. This method is particularly advantageous in scenarios where acquiring labeled data is costly or impractical. By enabling models to learn from vast amounts of unlabeled data, self-supervised learning enhances the ability of AI systems to generalize from limited labeled examples, thereby improving their performance in real-world applications. The integration of self-supervised learning techniques into machine learning courses is becoming increasingly important, as it equips learners with the knowledge to tackle complex AI challenges and develop more robust models.

Course Type Analysis

The Machine Learning Courses market is segmented by course type into online courses, offline courses, bootcamps, and workshops. Online courses dominate the segment due to their accessibility, flexibility, and cost-effectiveness. Platforms like Coursera and Udacity have democratized access to high-quality ML education, enabling lear
f
Setting of parameters.
plos.figshare.com
xls
Updated Oct 27, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Liang Chen; Caiming Zhong; Zehua Zhang (2023). Setting of parameters. [Dataset]. http://doi.org/10.1371/journal.pone.0292960.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0292960.t002
Dataset updated
Oct 27, 2023
Dataset provided by
PLOS ONE
Authors
Liang Chen; Caiming Zhong; Zehua Zhang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Clustering is an unsupervised machine learning technique whose goal is to cluster unlabeled data. But traditional clustering methods only output a set of results and do not provide any explanations of the results. Although in the literature a number of methods based on decision tree have been proposed to explain the clustering results, most of them have some disadvantages, such as too many branches and too deep leaves, which lead to complex explanations and make it difficult for users to understand. In this paper, a hypercube overlay model based on multi-objective optimization is proposed to achieve succinct explanations of clustering results. The model designs two objective functions based on the number of hypercubes and the compactness of instances and then uses multi-objective optimization to find a set of nondominated solutions. Finally, an Utopia point is defined to determine the most suitable solution, in which each cluster can be covered by as few hypercubes as possible. Based on these hypercubes, an explanations of each cluster is provided. Upon verification on synthetic and real datasets respectively, it shows that the model can provide a concise and understandable explanations to users.
o
Amos: A large-scale abdominal multi-organ benchmark for versatile medical...
explore.openaire.eu
zenodo.org
Updated Oct 29, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
YuanfengJi (2022). Amos: A large-scale abdominal multi-organ benchmark for versatile medical image segmentation (Unlabeled Data Part III) [Dataset]. http://doi.org/10.5281/zenodo.7295816
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.7295816
Dataset updated
Oct 29, 2022
Authors
YuanfengJi
Description
Despite the considerable progress in automatic abdominal multi-organ segmentation from CT/MRI scans in recent years, a comprehensive evaluation of the models' capabilities is hampered by the lack of a large-scale benchmark from diverse clinical scenarios. Constraint by the high cost of collecting and labeling 3D medical data, most of the deep learning models to date are driven by datasets with a limited number of organs of interest or samples, which still limits the power of modern deep models and makes it difficult to provide a fully comprehensive and fair estimate of various methods. To mitigate the limitations, we present AMOS, a large-scale, diverse, clinical dataset for abdominal organ segmentation. AMOS provides 500 CT and 100 MRI scans collected from multi-center, multi-vendor, multi-modality, multi-phase, multi-disease patients, each with voxel-level annotations of 15 abdominal organs, providing challenging examples and test-bed for studying robust segmentation algorithms under diverse targets and scenarios. We further benchmark several state-of-the-art medical segmentation models to evaluate the status of the existing methods on this new challenging dataset. We have made our datasets, benchmark servers, and baselines publicly available, and hope to inspire future research. The paper can be found at https://arxiv.org/pdf/2206.08023.pdf In addition to providing the labeled 600 CT and MRI scans, we expect to provide 2000 CT and 1200 MRI scans without labels to support more learning tasks (semi-supervised, un-supervised, domain adaption, ...). The link can be found in: labeled data (500CT+100MRI) unlabeled data Part I (900CT) unlabeled data Part II (1100CT) (Now there are 1000CT, we will replenish to 1100CT) unlabeled data Part III (1200MRI) if you found this dataset useful for your research, please cite: @article{ji2022amos, title={AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation}, author={Ji, Yuanfeng and Bai, Haotian and Yang, Jie and Ge, Chongjian and Zhu, Ye and Zhang, Ruimao and Li, Zhen and Zhang, Lingyan and Ma, Wanling and Wan, Xiang and others}, journal={arXiv preprint arXiv:2206.08023}, year={2022} }
n
Data and code from: Learning a deep language model for microbiomes: The...
data.niaid.nih.gov
dataone.org
+1more
zip
Updated Feb 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Quintin Pope; Rohan Varma; Christine Tataru; Maude David; Xiaoli Fern (2025). Data and code from: Learning a deep language model for microbiomes: The power of large scale unlabeled microbiome data [Dataset]. http://doi.org/10.5061/dryad.tb2rbp08p
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.tb2rbp08p
Dataset updated
Feb 20, 2025
Dataset provided by
University of Michigan
Oregon State University
Authors
Quintin Pope; Rohan Varma; Christine Tataru; Maude David; Xiaoli Fern
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
We use open source human gut microbiome data to learn a microbial “language” model by adapting techniques from Natural Language Processing (NLP). Our microbial “language” model is trained in a self-supervised fashion (i.e., without additional external labels) to capture the interactions among different microbial taxa and the common compositional patterns in microbial communities. The learned model produces contextualized taxon representations that allow a single microbial taxon to be represented differently according to the specific microbial environment in which it appears. The model further provides a sample representation by collectively interpreting different microbial taxa in the sample and their interactions as a whole. We demonstrate that, while our sample representation performs comparably to baseline models in in-domain prediction tasks such as predicting Irritable Bowel Disease (IBD) and diet patterns, it significantly outperforms them when generalizing to test data from independent studies, even in the presence of substantial distribution shifts. Through a variety of analyses, we further show that the pre-trained, context-sensitive embedding captures meaningful biological information, including taxonomic relationships, correlations with biological pathways, and relevance to IBD expression, despite the model never being explicitly exposed to such signals. Methods No additional raw data was collected for this project. All inputs are available publicly. American Gut Project, Halfvarson, and Schirmer raw data are available from the NCBI database (accession numbers PRJEB11419, PRJEB18471, and PRJNA398089, respectively). We used the curated data produced by Tataru and David, 2020.
f
Data from: Benchmarking Machine Learning Models for Polymer Informatics: An...
acs.figshare.com
xlsx
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lei Tao; Vikas Varshney; Ying Li (2023). Benchmarking Machine Learning Models for Polymer Informatics: An Example of Glass Transition Temperature [Dataset]. http://doi.org/10.1021/acs.jcim.1c01031.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.jcim.1c01031.s002
Dataset updated
Jun 4, 2023
Dataset provided by
ACS Publications
Authors
Lei Tao; Vikas Varshney; Ying Li
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
In the field of polymer informatics, utilizing machine learning (ML) techniques to evaluate the glass transition temperature Tg and other properties of polymers has attracted extensive attention. This data-centric approach is much more efficient and practical than the laborious experimental measurements when encountered a daunting number of polymer structures. Various ML models are demonstrated to perform well for Tg prediction. Nevertheless, they are trained on different data sets, using different structure representations, and based on different feature engineering methods. Thus, the critical question arises on selecting a proper ML model to better handle the Tg prediction with generalization ability. To provide a fair comparison of different ML techniques and examine the key factors that affect the model performance, we carry out a systematic benchmark study by compiling 79 different ML models and training them on a large and diverse data set. The three major components in setting up an ML model are structure representations, feature representations, and ML algorithms. In terms of polymer structure representation, we consider the polymer monomer, repeat unit, and oligomer with longer chain structure. Based on that feature, representation is calculated, including Morgan fingerprinting with or without substructure frequency, RDKit descriptors, molecular embedding, molecular graph, etc. Afterward, the obtained feature input is trained using different ML algorithms, such as deep neural networks, convolutional neural networks, random forest, support vector machine, LASSO regression, and Gaussian process regression. We evaluate the performance of these ML models using a holdout test set and an extra unlabeled data set from high-throughput molecular dynamics simulation. The ML model’s generalization ability on an unlabeled data set is especially focused, and the model’s sensitivity to topology and the molecular weight of polymers is also taken into consideration. This benchmark study provides not only a guideline for the Tg prediction task but also a useful reference for other polymer informatics tasks.
D
Self-Supervised Learning Market Report | Global Forecast From 2025 To 2033
dataintelo.com
csv, pdf, pptx
Updated Sep 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2024). Self-Supervised Learning Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-self-supervised-learning-market
Explore at:
pdf, pptx, csvAvailable download formats
Dataset updated
Sep 23, 2024
Authors
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Self-Supervised Learning Market Outlook

As of 2023, the global self-supervised learning market size is valued at approximately USD 1.5 billion and is expected to escalate to around USD 10.8 billion by 2032, reflecting a compound annual growth rate (CAGR) of 24.1% during the forecast period. This robust growth is driven by the increasing demand for advanced AI models that can learn from large volumes of unlabeled data, significantly reducing the dependency on labeled datasets, thereby making AI training more cost-effective and scalable.

The growth of the self-supervised learning market is fueled by several factors, one of which is the exponential increase in data generation. With the proliferation of digital devices, IoT technologies, and social media platforms, there is an unprecedented amount of data being created every second. Self-supervised learning models leverage this vast amount of unlabeled data to train themselves, making them particularly valuable in industries where data labeling is time-consuming and expensive. This capability is especially pertinent in fields like healthcare, finance, and retail, where the rapid analysis of extensive datasets can lead to significant advancements in predictive analytics and customer insights.

Another critical driver is the advancement in computational technologies that support more sophisticated machine learning models. The development of more powerful GPUs and cloud-based AI platforms has enabled the efficient training and deployment of self-supervised learning models. These technological advancements not only reduce the time required for training but also enhance the accuracy and performance of the models. Furthermore, the integration of self-supervised learning with other AI paradigms such as reinforcement learning and deep learning is opening new avenues for research and application, further propelling market growth.

The increasing adoption of AI across various industries is also a significant growth factor. Businesses are increasingly recognizing the potential of AI to optimize operations, enhance customer experiences, and drive innovation. Self-supervised learning, with its ability to make sense of large, unstructured datasets, is becoming a cornerstone of AI strategies across sectors. For instance, in the healthcare sector, self-supervised learning is being used to develop predictive models for disease diagnosis and treatment planning, while in the finance sector, it aids in fraud detection and risk management.

Regionally, North America is expected to dominate the self-supervised learning market, owing to the presence of leading technology companies and extensive R&D activities in AI. However, the Asia Pacific region is anticipated to witness the fastest growth during the forecast period, driven by rapid digital transformation, increasing investment in AI technologies, and supportive government initiatives. Europe also presents a significant market opportunity, with a strong focus on AI research and development, particularly in countries like Germany, the UK, and France.

Component Analysis

The self-supervised learning market is segmented by component into software, hardware, and services. The software segment is expected to hold the largest market share, driven by the development and adoption of advanced AI algorithms and platforms. These software solutions are designed to leverage the vast amounts of unlabeled data available, making them highly valuable for various applications such as natural language processing, computer vision, and predictive analytics. Furthermore, continuous advancements in software capabilities, such as improved model training techniques and enhanced data preprocessing tools, are expected to fuel the growth of this segment.

The hardware segment, while smaller in comparison to software, is crucial for the efficient deployment of self-supervised learning models. This includes high-performance computing systems, GPUs, and specialized AI accelerators that provide the necessary computational power to train and run complex AI models. Innovations in hardware technology, such as the development of more energy-efficient and powerful processing units, are expected to drive growth in this segment. Additionally, the increasing adoption of edge computing devices that can perform AI tasks locally, thereby reducing latency and bandwidth usage, is also contributing to the expansion of the hardware segment.

Services are another vital component of the self-supervised learning market. This segment encompasses various professional services such as consulting, int
2020 US election Tweets - Unlabeled
kaggle.com
Updated Nov 11, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bauyrjan (2020). 2020 US election Tweets - Unlabeled [Dataset]. https://www.kaggle.com/datasets/bauyrjanj/2020-us-election-tweets-unlabeled/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 11, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Bauyrjan
Area covered
United States
Description
Context

At the time of doing my capstone project, the US 2020 election was just around the corner and it made sense to do sentiment analysis of tweets related to the upcoming election to learn about the kind of opinions and topics being discussed in the Twitter just about 2 weeks prior to the election day. Twitter is a great source for unfiltered opinions as opposed to the typical filtered news we see from the major media outlets.

Content

439,999 tweets were collected from Twitter via Twitter API and Tweepy python package. For the details of how I collected the data, please check out my github repo where you will find my jupyter notebook with the code. https://github.com/bauyrjanj/NLP-TwitterData/blob/master/TwitterData%20-%20Problem%20Statement%20%26%20Data%20Collection.ipynb

Acknowledgements

Thanks for Twitter making it easy to collect unfiltered public opinion via their very useful Twitter API!

Inspiration

My primary interest for creating this dataset was to understand the topics of discussing by the user of Twitter and potentially identify so called October surprises that typically emerge publicly just weeks before the election day. Other ideas that might be interesting to investigate could include:

Can we detect if there are or were any attempts to manipulate the election.

Can we possible predict potential winner by just analyzing the tweets.

Can we predict sentiments in each state, particularly in swing states.
a
Stanford STL-10 Image Dataset
academictorrents.com
bittorrent
Updated Nov 26, 2015
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adam Coates and Honglak Lee and Andrew Y. Ng (2015). Stanford STL-10 Image Dataset [Dataset]. https://academictorrents.com/details/a799a2845ac29a66c07cf74e2a2838b6c5698a6a
Explore at:
bittorrent(2640397119)Available download formats
Dataset updated
Nov 26, 2015
Dataset authored and provided by
Adam Coates and Honglak Lee and Andrew Y. Ng
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
![]() The STL-10 dataset is an image recognition dataset for developing unsupervised feature learning, deep learning, self-taught learning algorithms. It is inspired by the CIFAR-10 dataset but with some modifications. In particular, each class has fewer labeled training examples than in CIFAR-10, but a very large set of unlabeled examples is provided to learn image models prior to supervised training. The primary challenge is to make use of the unlabeled data (which comes from a similar but different distribution from the labeled data) to build a useful prior. We also expect that the higher resolution of this dataset (96x96) will make it a challenging benchmark for developing more scalable unsupervised learning methods. Overview 10 classes: airplane, bird, car, cat, deer, dog, horse, monkey, ship, truck. Images are 96x96 pixels, color. 500 training images (10 pre-defined folds), 800 test images per class. 100000 unlabeled images for uns
A
‘BLE RSSI Dataset for Indoor localization’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 26, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2018). ‘BLE RSSI Dataset for Indoor localization’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-ble-rssi-dataset-for-indoor-localization-85fd/latest
Explore at:
Dataset updated
Jan 26, 2018
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘BLE RSSI Dataset for Indoor localization’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/mehdimka/ble-rssi-dataset on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Content

The dataset was created using the RSSI readings of an array of 13 ibeacons in the first floor of Waldo Library, Western Michigan University. Data was collected using iPhone 6S. The dataset contains two sub-datasets: a labeled dataset (1420 instances) and an unlabeled dataset (5191 instances). The recording was performed during the operational hours of the library. For the labeled dataset, the input data contains the location (label column), a timestamp, followed by RSSI readings of 13 iBeacons. RSSI measurements are negative values. Bigger RSSI values indicate closer proximity to a given iBeacon (e.g., RSSI of -65 represent a closer distance to a given iBeacon compared to RSSI of -85). For out-of-range iBeacons, the RSSI is indicated by -200. The locations related to RSSI readings are combined in one column consisting a letter for the column and a number for the row of the position. The following figure depicts the layout of the iBeacons as well as the arrange of locations.

https://www.kaggle.com/mehdimka/ble-rssi-dataset/downloads/iBeacon_Layout.jpg" alt="iBeacons Layout">

Attribute Information

location: The location of receiving RSSIs from ibeacons b3001 to b3013; symbolic values showing the column and row of the location on the map (e.g., A01 stands for column A, row 1).

date: Datetime in the format of ‘d-m-yyyy hh:mm:ss’

b3001 - b3013: RSSI readings corresponding to the iBeacons; numeric, integers only.

Acknowledgements

Provider: Mehdi Mohammadi and Ala Al-Fuqaha, {mehdi.mohammadi, ala-alfuqaha}@wmich.edu, Department of Computer Science, Western Michigan University

Citation Request:

M. Mohammadi, A. Al-Fuqaha, M. Guizani, J. Oh, “Semi-supervised Deep Reinforcement Learning in Support of IoT and Smart City Services,” IEEE Internet of Things Journal, Vol. PP, No. 99, 2017.

Inspiration

How unlabeled data can help for an improved learning system. How a GAN model can synthesizes viable paths based on the little labeled data and larger set of unlabeled data.

--- Original source retains full ownership of the source dataset ---
h
Internet-background-noise
huggingface.co
Updated Jan 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
橙子酱 (2025). Internet-background-noise [Dataset]. https://huggingface.co/datasets/burpheart/Internet-background-noise
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 5, 2025
Authors
橙子酱
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Internet Background Noise Dataset (Unlabeled Raw Data)

This dataset contains HTTP internet noise data collected by an internet honeypot. It consists of raw, unlabeled network packets, including metadata, payloads, and header information. This data is suitable for training and evaluating machine learning models for network intrusion detection, cybersecurity, and traffic analysis. HoneyPot repository: hachimi on GitHub.

Dataset Overview

The Internet Background Noise… See the full description on the dataset page: https://huggingface.co/datasets/burpheart/Internet-background-noise.
f
Data from: Leveraging Unlabeled Data for Superior ROC Curve Estimation via a...
tandf.figshare.com
bin
Updated Feb 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Menghua Zhang; Mengjiao Peng; Yong Zhou (2025). Leveraging Unlabeled Data for Superior ROC Curve Estimation via a Semiparametric Approach [Dataset]. http://doi.org/10.6084/m9.figshare.28156199.v2
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28156199.v2
Dataset updated
Feb 26, 2025
Dataset provided by
Taylor & Francis
Authors
Menghua Zhang; Mengjiao Peng; Yong Zhou
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The receiver operating characteristic (ROC) curve is a widely used tool in various fields, including economics, medicine, and machine learning, for evaluating classification performance and comparing treatment effect. The absence of clear and readily labels is a frequent phenomenon in estimating ROC owing to various reasons like labeling cost, time constraints, data privacy and information asymmetry. Traditional supervised estimators commonly rely solely on labeled data, where each sample is associated with a fully observed response variable. We propose a new set of semi-supervised (SS) estimators to exploit available unlabeled data (samples lack of observations for responses) to enhance the estimation precision under the semi-parametric setting assuming that the distribution of the response variable for one group is known up to unknown parameters. The newly proposed SS estimators have attractive properties such as adaptability and efficiency by leveraging the flexibility of kernel smoothing method. We establish the large sample properties of the SS estimators, which demonstrate that the SS estimators outperform the supervised estimator consistently under mild assumptions. Numeric experiments provide empirical evidence to support our theoretical findings. Finally, we showcase the practical applicability of our proposed methodology by applying it to two real datasets.

Facebook

Twitter

Click to copy link

Link copied

Cite

Ryoma Otsuka; Naoya Yoshimura; Kei Tanigaki; Shiho Koyama; Yuichi Mizutani; Ken Yoda; Takuya Maekawa (2024). Exploring deep learning techniques for wild animal behaviour classification using animal-borne accelerometers [Dataset]. http://doi.org/10.5061/dryad.2ngf1vhwk

Data from: Exploring deep learning techniques for wild animal behaviour classification using animal-borne accelerometers

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5061/dryad.2ngf1vhwk

Dataset updated

Feb 22, 2024

Dataset provided by

Osaka University
Nagoya University

Authors

Ryoma Otsuka; Naoya Yoshimura; Kei Tanigaki; Shiho Koyama; Yuichi Mizutani; Ken Yoda; Takuya Maekawa

License

https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

Description

Machine learning‐based behaviour classification using acceleration data is a powerful tool in bio‐logging research. Deep learning architectures such as convolutional neural networks (CNN), long short‐term memory (LSTM) and self‐attention mechanisms as well as related training techniques have been extensively studied in human activity recognition. However, they have rarely been used in wild animal studies. The main challenges of acceleration‐based wild animal behaviour classification include data shortages, class imbalance problems, various types of noise in data due to differences in individual behaviour and where the loggers were attached and complexity in data due to complex animal‐specific behaviours, which may have limited the application of deep learning techniques in this area. To overcome these challenges, we explored the effectiveness of techniques for efficient model training: data augmentation, manifold mixup and pre‐training of deep learning models with unlabelled data, using datasets from two species of wild seabirds and state‐of‐the‐art deep learning model architectures. Data augmentation improved the overall model performance when one of the various techniques (none, scaling, jittering, permutation, time‐warping and rotation) was randomly applied to each data during mini‐batch training. Manifold mixup also improved model performance, but not as much as random data augmentation. Pre‐training with unlabelled data did not improve model performance. The state‐of‐the‐art deep learning models, including a model consisting of four CNN layers, an LSTM layer and a multi‐head attention layer, as well as its modified version with shortcut connection, showed better performance among other comparative models. Using only raw acceleration data as inputs, these models outperformed classic machine learning approaches that used 119 handcrafted features. Our experiments showed that deep learning techniques are promising for acceleration‐based behaviour classification of wild animals and highlighted some challenges (e.g. effective use of unlabelled data). There is scope for greater exploration of deep learning techniques in wild animal studies (e.g. advanced data augmentation, multimodal sensor data use, transfer learning and self‐supervised learning). We hope that this study will stimulate the development of deep learning techniques for wild animal behaviour classification using time‐series sensor data.

This abstract is cited from the original article "Exploring deep learning techniques for wild animal behaviour classification using animal-borne accelerometers" in Methods in Ecology and Evolution (Otsuka et al., 2024).Please see README for the details of the datasets.

Clear search

Close search

Google apps

Main menu

Data from: Exploring deep learning techniques for wild animal behaviour...

Unlabeled AnuraSet: A dataset for leveraging unlabeled data in machine...

AI in Unsupervised Learning Market Market Research Report 2033

AI in Unsupervised Learning Market Outlook

Component Analysis

Sentiment140 tweet statistics.

Data from: Performance of unmarked abundance models with data from...

Â Description of the data and file structure

Code/Software

Data from: DeepMoney: Counterfeit Money Detection Using Generative...

Explanations for each cluster in Adult dataset.

Explanations for each cluster in Iris dataset.

Generative AI Market Report

Machine Learning Courses Market Report | Global Forecast From 2025 To 2033

Machine Learning Courses Market Outlook

Course Type Analysis

Setting of parameters.

Amos: A large-scale abdominal multi-organ benchmark for versatile medical...

Data and code from: Learning a deep language model for microbiomes: The...

Data from: Benchmarking Machine Learning Models for Polymer Informatics: An...

Self-Supervised Learning Market Report | Global Forecast From 2025 To 2033

Self-Supervised Learning Market Outlook

Component Analysis

2020 US election Tweets - Unlabeled

Context

Content

Acknowledgements

Inspiration

Stanford STL-10 Image Dataset

‘BLE RSSI Dataset for Indoor localization’ analyzed by Analyst-2

Content

Attribute Information

Acknowledgements

Inspiration

How unlabeled data can help for an improved learning system. How a GAN model can synthesizes viable paths based on the little labeled data and larger set of unlabeled data.

Internet-background-noise

Data from: Leveraging Unlabeled Data for Superior ROC Curve Estimation via a...

Data from: Exploring deep learning techniques for wild animal behaviour classification using animal-borne accelerometers