100+ datasets found

Mars rover dataset

kaggle.com

zip

Updated Mar 1, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Gaurav Kumar (2025). Mars rover dataset [Dataset]. https://www.kaggle.com/datasets/gauravkumar2525/mars-rover-dataset

Explore at:

zip(101820038 bytes)Available download formats

Dataset updated

Mar 1, 2025

Authors

Gaurav Kumar

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

1. Description

The description section is crucial for helping users understand the purpose, context, and potential applications of your dataset. It should include the following details:

Dataset Overview: Provide a clear summary of what the dataset contains. For example, "This dataset includes images captured by NASA’s Curiosity Rover on Mars, along with metadata such as the camera used, the Martian sol (day) when the photo was taken, and the corresponding Earth date."
Source of the Data: Explain where the data comes from. If it’s obtained via an API, mention the source (e.g., "The images and metadata are retrieved from NASA's Mars Rover Photos API."). If the dataset is curated from multiple sources, list them.
Purpose and Use Cases: Describe why this dataset was created and how it can be used. For example:
- Machine Learning: Train models for image classification, object detection, and anomaly detection.
- Scientific Research: Analyze Martian surface patterns, study terrain features, or examine rover camera performance.
- Space Exploration: Understand Mars' environmental conditions and assist in future exploration planning.
Data Format and Organization: Briefly mention the format of the files (e.g., CSV file for metadata, ZIP file containing images) and how they are structured.
Licensing and Permissions: Specify if the dataset has any restrictions on usage. Since NASA data is typically public domain, state that users are free to use it for research and development.
Limitations or Considerations: Mention any potential challenges, such as missing data, limited coverage, or resolution constraints.

2. File Information

This section provides details about the files included in your dataset, helping users navigate and use them efficiently. Key points to include:

List of Files: Clearly mention all the files and their formats, such as:
- mars_rover_dataset.csv (CSV file containing metadata of images)
- mars_images.zip (Compressed folder containing all images)
Purpose of Each File:
- CSV File: Contains structured data, including image IDs, timestamps, camera details, and URLs.
- ZIP File: Stores actual Mars images, which can be extracted and used for ML training or visualization.
File Dependencies: Explain how files relate to each other. For example, "The img_src column in mars_rover_dataset.csv corresponds to the images stored in mars_images.zip. Users should extract the images before using the dataset for model training."
How to Access the Files: Provide instructions on downloading and extracting files. Example:
bash unzip mars_images.zip
This ensures that users can quickly set up the dataset in their working environment.

3. Column Descriptions

This section explains the meaning of each column in the dataset, ensuring users can analyze and interpret the data correctly. A well-structured table format is often useful:

Column Name	Description
`id`	Unique identifier for each image.
`sol`	Martian sol (day) when the image was captured.
`camera_name`	Abbreviated name of the rover's camera (e.g., "FHAZ" for Front Hazard Camera).
`camera_full_name`	Full descriptive name of the camera.
`img_src`	URL link to the image. Users can download images using this link.
`earth_date`	The Earth date corresponding to the Martian sol.
`rover_name`	Name of the rover that captured the image (e.g., "Curiosity").
`rover_status`	Current operational status of the rover (e.g., "Active" or "Complete").
`landing_date`	Date when the rover landed on Mars.
`launch_date`	Date when the rover was launched from Earth.

Additional Details:

Data Types: Indicate whether a column contains numbers, text, or dates.
Data Format: Example: earth_date is in YYYY-MM-DD format.
Special Notes: If any column has missing values or requires preprocessing, mention it.

This section helps users quickly understand the dataset's structure, making it easier for them to work with the data effectively.

Classification - Dataset - NASA Open Data Portal
data.nasa.gov
Updated Mar 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). Classification - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/classification
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
A supervised learning task involves constructing a mapping from an input data space (normally described by several features) to an output space. A set of training examples---examples with known output values---is used by a learning algorithm to generate a model. This model is intended to approximate the mapping between the inputs and outputs. This model can be used to generate predicted outputs for inputs that have not been seen before. Within supervised learning, one type of task is a classification learning task, in which each output consists of one or more classes to which the corresponding input belongs. For example, we may have data consisting of observations of sunspots. In a classification learning task, our goal may be to learn to classify sunspots into one of several types. Each example may correspond to one candidate sunspot with various measurements or just an image. A learning algorithm would use the supplied examples to generate a model that approximates the mapping between each supplied set of measurements and the type of sunspot. This model can then be used to classify previously unseen sunspots based on the candidate's measurements. In this chapter, we explain several basic classification algorithms.
Dataset #1: Cross-sectional survey data
figshare.com
txt
Updated Jul 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adam Baimel (2023). Dataset #1: Cross-sectional survey data [Dataset]. http://doi.org/10.6084/m9.figshare.23708730.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.23708730.v1
Dataset updated
Jul 19, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Adam Baimel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
N.B. This is not real data. Only here for an example for project templates.

Project Title: Add title here

Project Team: Add contact information for research project team members

Summary: Provide a descriptive summary of the nature of your research project and its aims/focal research questions.

Relevant publications/outputs: When available, add links to the related publications/outputs from this data.

Data availability statement: If your data is not linked on figshare directly, provide links to where it is being hosted here (i.e., Open Science Framework, Github, etc.). If your data is not going to be made publicly available, please provide details here as to the conditions under which interested individuals could gain access to the data and how to go about doing so.

Data collection details: 1. When was your data collected? 2. How were your participants sampled/recruited?

Sample information: How many and who are your participants? Demographic summaries are helpful additions to this section.

Research Project Materials: What materials are necessary to fully reproduce your the contents of your dataset? Include a list of all relevant materials (e.g., surveys, interview questions) with a brief description of what is included in each file that should be uploaded alongside your datasets.

List of relevant datafile(s): If your project produces data that cannot be contained in a single file, list the names of each of the files here with a brief description of what parts of your research project each file is related to.

Data codebook: What is in each column of your dataset? Provide variable names as they are encoded in your data files, verbatim question associated with each response, response options, details of any post-collection coding that has been done on the raw-response (and whether that's encoded in a separate column).

Examples available at: https://www.thearda.com/data-archive?fid=PEWMU17 https://www.thearda.com/data-archive?fid=RELLAND14
Dataset: Environmental conditions and male quality traits simultaneously...
data.europa.eu
data.niaid.nih.gov
unknown
Updated Jun 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2022). Dataset: Environmental conditions and male quality traits simultaneously explain variation of multiple colour signals in male lizards [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-6683661?locale=de
Explore at:
unknown(3063441)Available download formats
Dataset updated
Jun 21, 2022
Dataset authored and provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset and R code associated with the following publication: Badiane et al. (2022), Environmental conditions and male quality traits simultaneously explain variation of multiple colour signals in male lizards. Journal of Animal Ecology, in press This dataset includes the following files: - An excel file containing the reflectance spectra of all individuals from all the study populations - An excel file containing the variables collected at the individual and population levels - Two R scripts corresponding to the analyses performed in the publication
Describe Art Dataset
universe.roboflow.com
zip
Updated Aug 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roboflow Project (2024). Describe Art Dataset [Dataset]. https://universe.roboflow.com/roboflow-project-erv3h/describe-art
Explore at:
zipAvailable download formats
Dataset updated
Aug 2, 2024
Dataset provided by
Roboflowhttps://roboflow.com/
Authors
Roboflow Project
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Art Images Descriptions
Description
Describe Art

## Overview Describe Art is a dataset for vision language (multimodal) tasks - it contains Art Images annotations for 6,402 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
f
Data from: Why hibernate? Tests of four hypotheses to explain intraspecific...
datasetcatalog.nlm.nih.gov
figshare.com
Updated Apr 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Morris, Alice; Allison, Austin; Conway, Courtney (2023). Data from: Why hibernate? Tests of four hypotheses to explain intraspecific variation in hibernation phenology [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001007672
Explore at:
Dataset updated
Apr 24, 2023
Authors
Morris, Alice; Allison, Austin; Conway, Courtney
Description
This dataset includes hibernation phenology (immergence date, emergence date, and hibernation duration) for northern Idaho ground squirrels (Urocitellus brunneus), along with data used as predictor variables in linear mixed-effects models designed to explain intraspecific variation in the three hibernation behaviors. Code for that lme analysis is also included. Also included in this dataset are body mass data for northern Idaho ground squirrels, the code used to generate predicted squirrel body mass curves, NDVI data for northern Idaho ground squirrel study sites, and the code used to generate predicted NDVI curves for those sites. The data files include metadata sheets to better explain the data and its collection.
R
Pill Define Dataset
universe.roboflow.com
zip
Updated Oct 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jutamas (2024). Pill Define Dataset [Dataset]. https://universe.roboflow.com/jutamas-hn069/pill-define/dataset/2
Explore at:
zipAvailable download formats
Dataset updated
Oct 2, 2024
Dataset authored and provided by
Jutamas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Hydralazine Bounding Boxes
Description
Pill Define

## Overview Pill Define is a dataset for object detection tasks - it contains Hydralazine annotations for 800 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
d
Classification
catalog.data.gov
s.cnmilf.com
+1more
Updated Apr 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Classification [Dataset]. https://catalog.data.gov/dataset/classification
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description
A supervised learning task involves constructing a mapping from an input data space (normally described by several features) to an output space. A set of training examples---examples with known output values---is used by a learning algorithm to generate a model. This model is intended to approximate the mapping between the inputs and outputs. This model can be used to generate predicted outputs for inputs that have not been seen before. Within supervised learning, one type of task is a classification learning task, in which each output consists of one or more classes to which the corresponding input belongs. For example, we may have data consisting of observations of sunspots. In a classification learning task, our goal may be to learn to classify sunspots into one of several types. Each example may correspond to one candidate sunspot with various measurements or just an image. A learning algorithm would use the supplied examples to generate a model that approximates the mapping between each supplied set of measurements and the type of sunspot. This model can then be used to classify previously unseen sunspots based on the candidate's measurements. In this chapter, we explain several basic classification algorithms.
U
Digital data sets that describe aquifer characteristics of the Rush Springs...
data.usgs.gov
dataone.org
+2more
Updated Aug 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States Geological Survey (2024). Digital data sets that describe aquifer characteristics of the Rush Springs aquifer in western Oklahoma [Dataset]. http://doi.org/10.5066/P9684FW1
Explore at:
Unique identifier
https://doi.org/10.5066/P9684FW1
Dataset updated
Aug 24, 2024
Dataset authored and provided by
United States Geological Surveyhttp://www.usgs.gov/
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Time period covered
1997
Area covered
Western Oklahoma, Rush Springs, Oklahoma
Description
This data set consists of digitized aquifer boundaries for the Rush Springs aquifer in western Oklahoma. This area encompasses all or part of Blaine, Caddo, Canadian, Comanche, Custer, Dewey, Grady, Stephens, and Washita Counties. Mark F. Becker (U.S. Geological Survey, written commun., 1997) created an aquifer boundary data set that represented hydrologic boundaries needed to simulate the ground-water flow in the Rush Springs aquifer with a computer model. In the ground-water flow model, Mark F. Becker defined the Rush Springs aquifer to include the Rush Springs Formation, alluvial and terrace deposits along major streams, and parts of the Marlow Formations, particularly in the eastern part of the aquifer boundary area.

The Permian-age Rush Springs Formation consists of highly cross-bedded sandstone with some interbedded dolomite and gypsum. The Rush Springs Formation is overlain by Quaternary-age alluvial and terrace deposits that consist of unconsolidated clay, silt, sand, a ...
R
Sicec Define Dataset
universe.roboflow.com
zip
Updated Apr 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SICECs (2023). Sicec Define Dataset [Dataset]. https://universe.roboflow.com/sicecs/sicec-define/model/3
Explore at:
zipAvailable download formats
Dataset updated
Apr 6, 2023
Dataset authored and provided by
SICECs
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
General Stuff
Description
Sicec Define

## Overview Sicec Define is a dataset for classification tasks - it contains General Stuff annotations for 4,722 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
movie_review_nltk
kaggle.com
zip
Updated Aug 11, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bharat Natrayn (2021). movie_review_nltk [Dataset]. https://www.kaggle.com/bharatnatrayn/movie-review-nltk
Explore at:
zip(961871 bytes)Available download formats
Dataset updated
Aug 11, 2021
Authors
Bharat Natrayn
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

This content have pos_tag,impact of review,description of story explain about movie tv series

Content

MOVIES This column have movies name SENTENCE This represent each word belong to which sentence TAG part of speech WORD each individual word in row REVIEW impact on audience

Acknowledgements

the dataset created from my last dataset movie.csv

Inspiration

the dataset created from my last dataset movie.csv
Employee Data from the City of Chicago
kaggle.com
zip
Updated Sep 2, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rishi Damarla (2020). Employee Data from the City of Chicago [Dataset]. https://www.kaggle.com/datasets/rishidamarla/employee-data-from-the-city-of-chicago/code
Explore at:
zip(423030 bytes)Available download formats
Dataset updated
Sep 2, 2020
Authors
Rishi Damarla
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
Chicago
Description
Context

There aren't many datasets that openly share information that is included in this dataset. Thus, it is very crucial to take advantage of the fact that this data exists for public use.

Content

This dataset includes the names, salaries, and position titles of numerous employees from Chicago, Illinois, America.

Acknowledgements

This data was found at https://data.cityofchicago.org/Administration-Finance/Current-Employee-Names-Salaries-and-Position-Title/xzkq-xp2w.
u
Data from: Dataset of the manuscript "What is local research? Towards a...
produccioncientifica.ugr.es
zenodo.org
Updated 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Di Césare, Victoria; Robinson-Garcia, Nicolas; Di Césare, Victoria; Robinson-Garcia, Nicolas (2024). Dataset of the manuscript "What is local research? Towards a multidimensional framework linking theory and methods" [Dataset]. https://produccioncientifica.ugr.es/documentos/67a9c7ca19544708f8c72b38?lang=gl
Explore at:
Dataset updated
2024
Authors
Di Césare, Victoria; Robinson-Garcia, Nicolas; Di Césare, Victoria; Robinson-Garcia, Nicolas
Description
Dataset of the manuscript "What is local research? Towards a multidimensional framework linking theory and methods". In this research article we propose a theoretical and empirical framework of local research, a concept of growing importance due to its far-reaching implications for public policy. Our motivation stems from the lack of clarity surrounding the increasing yet uncritical use of the term in both scientific publications and policy documents, where local research is conceptualized and measured in many ways. A clear understanding of it is crucial for informed decision-making when setting research agendas, allocating funds, and evaluating and rewarding scientists. Our twofold aim is (1) to compare the existing approaches that define and measure local research, and (2) to assess the implications of applying one over another. We first review the perspectives and measures used since the 1970s. Drawing on spatial scientometrics and proximities, we then build a framework that splits the concept into several dimensions: locally informed research, locally situated research, locally relevant research, locally bound research, and locally governed research. Each dimension is composed of a definition and a methodological approach, which we test in 10 million publications from the Dimensions database. Our findings reveal that these approaches measure distinct and sometimes unaligned aspects of local research, with varying effectiveness across countries and disciplines. This study highlights the complex, multifaceted nature of local research. We provide a flexible framework that facilitates the analysis of these dimensions and their intersections, in an attempt to contribute to the understanding and assessment of local research and its role within the production, dissemination, and impact of scientific knowledge.
h
joke_explaination
huggingface.co
Updated Aug 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
theblackcat102 (2023). joke_explaination [Dataset]. https://huggingface.co/datasets/theblackcat102/joke_explaination
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 19, 2023
Authors
theblackcat102
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for Dataset Name

Dataset Summary

Corpus for testing whether your LLM can explain the joke well. But this is a rather small dataset, if someone can point to a larger ones would be very nice.

Languages

English

Dataset Structure Data Fields

url : link to the explaination

joke : the original joke

explaination : the explaination of the joke

Data Splits

Since its so small, there's no splits just like gsm8k
d
Data from: Digital data sets that describe aquifer characteristics of the...
catalog.data.gov
data.usgs.gov
+2more
Updated Nov 26, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Digital data sets that describe aquifer characteristics of the Vamoosa-Ada aquifer in east-central Oklahoma [Dataset]. https://catalog.data.gov/dataset/digital-data-sets-that-describe-aquifer-characteristics-of-the-vamoosa-ada-aquifer-in-east
Explore at:
Dataset updated
Nov 26, 2025
Dataset provided by
U.S. Geological Survey
Area covered
Oklahoma
Description
This data set consists of digitized polygons of constant recharge values for the Vamoosa-Ada aquifer, in east-central Oklahoma. The Vamoosa-Ada aquifer is an important source of water that underlies about 2,320-square miles of parts of Osage, Pawnee, Payne, Creek, Lincoln, Okfuskee, and Seminole Counties. Approximately 75 percent of the water withdrawn from the Vamoosa-Ada aquifer is for municipal use. Rural domestic use and water for stock animals account for most of the remaining water withdrawn. The Vamoosa-Ada aquifer is defined in a ground-water report as consisting principally of the rocks of the Late Pennsylvanian-age Vamoosa Formation and overlying Ada Group. The Vamoosa-Ada aquifer consists of a complex sequence of fine- to very fine-grained sandstone, siltstone, shale, and conglomerate interbedded with very thin limestones. The water-yielding capabilities of the aquifer are generally controlled by lateral and vertical distribution of the sandstone beds and their physical characteristics. The Vamoosa-Ada aquifer is unconfined where it outcrops in about an 1,700-square-mile area. The recharge rate of the Vamoosa-Ada aquifer was estimated as 1.52 inches per year from base-flow measurements and precipitation records published in a ground-water report. Most of the recharge polygons were extracted from published digital geology data sets. The lines in the digital geology data sets were scanned or digitized from maps published at a scale of 1:250,000 and represent geologic contacts. Some of the lines in the data set were interpolated in areas where the Vamoosa-Ada aquifer is overlain by alluvial and terrace deposits near streams and rivers.
w
Dataset of book subjects that contain What is MS?
workwithdata.com
Updated Nov 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2024). Dataset of book subjects that contain What is MS? [Dataset]. https://www.workwithdata.com/datasets/book-subjects?f=1&fcol0=j0-book&fop0=%3D&fval0=What+is+MS%3F&j=1&j0=books
Explore at:
Dataset updated
Nov 7, 2024
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about book subjects. It has 4 rows and is filtered where the books is What is MS?. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
d
Data from: Several candidate size metrics explain vital rates across...
datadryad.org
search.dataone.org
zip
Updated Aug 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maude E. A. Baudraz; Dylan Z. Childs; Ruth Kelly; Annabel L. Smith; Jesus Villellas; Martin Andrzejak; Benedicte Bachelot; Lajos Benedek; Simone P. Blomberg; Judit Bodis; Francis Q. Brearley; Anna Bucharova; Christina M. Caruso; Jane A. Catford; Matthew Coghill; Aldo Compagnoni; Anna Mária P. Csergő; Richard P. Duncan; John Dwyer; Johan Ehrlén; Bret Elderd; Alain Finn; Lauchlan Fraser; Maria B. García; Jennifer R. Gremer; Ronny Groenteman; Liv Norunn Hamre; Aveliina Helm; Mária Höhn; Lotte Korell; Lauri Laanisto; Anna-Liisa Laine; Michele Lonati; Caroline M. McKeon; Aoife Molloy; Joslin L. Moore; Melanie Morales; Sergi Munne Bosch; Zuzana Münzbergová; Siri Lie Olsen; Adrian Oprea; Meelis Pärtel; Rachel M. Penczykowski; William K. Petry; Satu Ramula; Pil U. Rasmussen; Simone Ravetto Enri; Deborah A. Roach; Anna Roeder; Christiane Roscher; Marjo Saastamoinen; Cheryl Schultz; R. Drew Sieg; Olav Skarpaas; Ayco J. M. Tack; Joachim Töpper; Peter A. Vesk; Gregory Vose; Elizabeth M. Wandrag; Glenda M. Wardle; Astrid Wingler; Yvonne M. Buckley (2025). Several candidate size metrics explain vital rates across multiple populations throughout a widespread species' range [Dataset]. http://doi.org/10.5061/dryad.mw6m9067c
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.mw6m9067c
Dataset updated
Aug 19, 2025
Dataset provided by
Dryad
Authors
Maude E. A. Baudraz; Dylan Z. Childs; Ruth Kelly; Annabel L. Smith; Jesus Villellas; Martin Andrzejak; Benedicte Bachelot; Lajos Benedek; Simone P. Blomberg; Judit Bodis; Francis Q. Brearley; Anna Bucharova; Christina M. Caruso; Jane A. Catford; Matthew Coghill; Aldo Compagnoni; Anna Mária P. Csergő; Richard P. Duncan; John Dwyer; Johan Ehrlén; Bret Elderd; Alain Finn; Lauchlan Fraser; Maria B. García; Jennifer R. Gremer; Ronny Groenteman; Liv Norunn Hamre; Aveliina Helm; Mária Höhn; Lotte Korell; Lauri Laanisto; Anna-Liisa Laine; Michele Lonati; Caroline M. McKeon; Aoife Molloy; Joslin L. Moore; Melanie Morales; Sergi Munne Bosch; Zuzana Münzbergová; Siri Lie Olsen; Adrian Oprea; Meelis Pärtel; Rachel M. Penczykowski; William K. Petry; Satu Ramula; Pil U. Rasmussen; Simone Ravetto Enri; Deborah A. Roach; Anna Roeder; Christiane Roscher; Marjo Saastamoinen; Cheryl Schultz; R. Drew Sieg; Olav Skarpaas; Ayco J. M. Tack; Joachim Töpper; Peter A. Vesk; Gregory Vose; Elizabeth M. Wandrag; Glenda M. Wardle; Astrid Wingler; Yvonne M. Buckley
Time period covered
Mar 11, 2025
Description
PlantPopNet (www.plantpopnet.com) collaborators collect demographic information on 65 naturally occurring populations of P. lanceolata across three continents. The present study included 55 populations that had at least two consecutive yearly censuses, presented here. Each population consists of an initial 100 individuals marked in naturally occurring populations and re-visited yearly at the peak of the flowering season. New recruits within the original plots were recorded and followed in subsequent years. The number of rosettes, number of leaves per rosette, length of the longest leaf, and width of the longest leaf for each rosette, flowering status (flowered, not flowered), reproductive output, and survival or death of each individual were recorded at each annual census. For further information on the PlantPopNet protocol, see Buckley et al. (2019). This data is presented as it was used to perform a study on a subset of the plantpopnet data. For said study, we used the first transitio...
d
Digital data sets that describe aquifer characteristics of the Antlers...
catalog.data.gov
data.usgs.gov
+2more
Updated Oct 22, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Digital data sets that describe aquifer characteristics of the Antlers aquifer in southeastern Oklahoma [Dataset]. https://catalog.data.gov/dataset/digital-data-sets-that-describe-aquifer-characteristics-of-the-antlers-aquifer-in-southeas-1e230
Explore at:
Dataset updated
Oct 22, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Oklahoma
Description
This data set consists of digitized polygons of constant recharge values for the Antlers aquifer in southeastern Oklahoma. The Early Cretaceous-age Antlers Sandstone is an important source of water in an area that underlies about 4,400-square miles of all or part of Atoka, Bryan, Carter, Choctaw, Johnston, Love, Marshall, McCurtain, and Pushmataha Counties. The Antlers aquifer consists of sand, clay, conglomerate, and limestone in the outcrop area. The upper part of the Antlers aquifer consists of beds of sand, poorly cemented sandstone, sandy shale, silt, and clay. The Antlers aquifer is unconfined where it outcrops in about an 1,800-square-mile area. The recharge polygons were developed from recharge rates used as input into a ground-water flow model and from published digital data sets of the surficial geology of the Antlers Sandstone except in areas overlain by alluvial and terrace deposits near streams. Some of the lines were interpolated where the Antlers aquifer is overlain by alluvial and terrace deposits. The interpolated lines are very similar to the aquifer boundaries shown on maps published in a ground-water modeling report for the Antlers aquifer. The constant recharge rates used as input to the ground-water flow model were 0.32 inches per year for the western portion of the aquifer and 0.96 inches per year for the eastern portion of the aquifer. Ground-water flow models are numerical representations that simplify and aggregate natural systems. Models are not unique; different combinations of aquifer characteristics may produce similar results. Therefore, values of recharge used in the model and presented in this data set are not precise, but are within a reasonable range when compared to independently collected data.
Data to support "Boosted Regression Tree Models to Explain Watershed...
data.wu.ac.at
catalog.data.gov
Updated Jan 15, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Environmental Protection Agency (2017). Data to support "Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations & Biological Condition" [Dataset]. https://data.wu.ac.at/odso/data_gov/ZTY5NWY1ODktOTgxYy00NDJhLWFhZDEtNmI2MzJiYTJiMTEz
Explore at:
application/x-zip-compressedAvailable download formats
Dataset updated
Jan 15, 2017
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
Spreadsheets are included here to support the manuscript "Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition".

This dataset is associated with the following publication: Golden , H., C. Lane , A. Prues, and E. D'Amico. Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition. JAWRA. American Water Resources Association, Middleburg, VA, USA, 52(5): 1251-1274, (2016).
Z
DCASE 2024 Challenge Task 2 Additional Training Dataset
data.niaid.nih.gov
Updated May 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tomoya, Nishida; Keisuke, Imoto; Noboru, Harada; Daisuke, Niizumi; Albertini, Davide; Sannino, Roberto; Pradolini, Simone; Augusti, Filippo; Kota, Dohi; Harsh, Purohit; Takashi, Endo; Yohei, Kawaguchi (2024). DCASE 2024 Challenge Task 2 Additional Training Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11183283
Explore at:
Dataset updated
May 15, 2024
Dataset provided by
Hitachi Ltd.
Doshisha University
Nippon Telegraph and Telephone (Japan)
STMicroelectronincs
STMicroelectronics
Authors
Tomoya, Nishida; Keisuke, Imoto; Noboru, Harada; Daisuke, Niizumi; Albertini, Davide; Sannino, Roberto; Pradolini, Simone; Augusti, Filippo; Kota, Dohi; Harsh, Purohit; Takashi, Endo; Yohei, Kawaguchi
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Description

This dataset is the "additional training dataset" for the DCASE 2024 Challenge Task 2.

The data consists of the normal/anomalous operating sounds of nine types of real/toy machines. Each recording is a single-channel audio that includes both a machine's operating sound and environmental noise. The duration of recordings varies from 6 to 10 seconds. The following nine types of real/toy machines are used in this task:

3DPrinter

AirCompressor

BrushlessMotor

HairDryer

HoveringDrone

RoboticArm

Scanner

ToothBrush

ToyCircuit

Overview of the task

Anomalous sound detection (ASD) is the task of identifying whether the sound emitted from a target machine is normal or anomalous. Automatic detection of mechanical failure is an essential technology in the fourth industrial revolution, which involves artificial-intelligence-based factory automation. Prompt detection of machine anomalies by observing sounds is useful for monitoring the condition of machines.

This task is the follow-up from DCASE 2020 Task 2 to DCASE 2023 Task 2. The task this year is to develop an ASD system that meets the following five requirements.

Train a model using only normal sound (unsupervised learning scenario) Because anomalies rarely occur and are highly diverse in real-world factories, it can be difficult to collect exhaustive patterns of anomalous sounds. Therefore, the system must detect unknown types of anomalous sounds that are not provided in the training data. This is the same requirement as in the previous tasks.

Detect anomalies regardless of domain shifts (domain generalization task) In real-world cases, the operational states of a machine or the environmental noise can change to cause domain shifts. Domain-generalization techniques can be useful for handling domain shifts that occur frequently or are hard-to-notice. In this task, the system is required to use domain-generalization techniques for handling these domain shifts. This requirement is the same as in DCASE 2022 Task 2 and DCASE 2023 Task 2.

Train a model for a completely new machine typeFor a completely new machine type, hyperparameters of the trained model cannot be tuned. Therefore, the system should have the ability to train models without additional hyperparameter tuning. This requirement is the same as in DCASE 2023 Task 2.

Train a model using a limited number of machines from its machine typeWhile sounds from multiple machines of the same machine type can be used to enhance the detection performance, it is often the case that only a limited number of machines are available for a machine type. In such a case, the system should be able to train models using a few machines from a machine type. This requirement is the same as in DCASE 2023 Task 2.

5 . Train a model both with or without attribute informationWhile additional attribute information can help enhance the detection performance, we cannot always obtain such information. Therefore, the system must work well both when attribute information is available and when it is not.

The last requirement is newly introduced in DCASE 2024 Task2.

Definition

We first define key terms in this task: "machine type," "section," "source domain," "target domain," and "attributes.".

"Machine type" indicates the type of machine, which in the additional training dataset is one of nine: 3D-printer, air compressor, brushless motor, hair dryer, hovering drone, robotic arm, document scanner (scanner), toothbrush, and Toy circuit.

A section is defined as a subset of the dataset for calculating performance metrics.

The source domain is the domain under which most of the training data and some of the test data were recorded, and the target domain is a different set of domains under which some of the training data and some of the test data were recorded. There are differences between the source and target domains in terms of operating speed, machine load, viscosity, heating temperature, type of environmental noise, signal-to-noise ratio, etc.

Attributes are parameters that define states of machines or types of noise. For several machine types, the attributes are hidden.

Dataset

This dataset consists of nine machine types. For each machine type, one section is provided, and the section is a complete set of training data. A set of test data corresponding to this training data will be provided in another seperate zenodo page as an "evaluation dataset" for the DCASE 2024 Challenge task 2. For each section, this dataset provides (i) 990 clips of normal sounds in the source domain for training and (ii) ten clips of normal sounds in the target domain for training. The source/target domain of each sample is provided. Additionally, the attributes of each sample in the training and test data are provided in the file names and attribute csv files.

File names and attribute csv files

File names and attribute csv files provide reference labels for each clip. The given reference labels for each training clip include machine type, section index, normal/anomaly information, and attributes regarding the condition other than normal/anomaly. The machine type is given by the directory name. The section index is given by their respective file names. For the datasets other than the evaluation dataset, the normal/anomaly information and the attributes are given by their respective file names. Note that for machine types that has its attribute information hidden, the attribute information in each file names are only labeled as "noAttributes". Attribute csv files are for easy access to attributes that cause domain shifts. In these files, the file names, name of parameters that cause domain shifts (domain shift parameter, dp), and the value or type of these parameters (domain shift value, dv) are listed. Each row takes the following format:

[filename (string)], [d1p (string)], [d1v (int | float | string)], [d2p], [d2v]...

For machine types that have their attribute information hidden, all columns except the filename column are left blank for each row.

Recording procedure

Normal/anomalous operating sounds of machines and its related equipment are recorded. Anomalous sounds were collected by deliberately damaging target machines. For simplifying the task, we use only the first channel of multi-channel recordings; all recordings are regarded as single-channel recordings of a fixed microphone. We mixed a target machine sound with environmental noise, and only noisy recordings are provided as training/test data. The environmental noise samples were recorded in several real factory environments. We will publish papers on the dataset to explain the details of the recording procedure by the submission deadline.

Directory structure

/eval_data

/raw - /3DPrinter - /train (only normal clips) - /section_00_source_train_normal_0001_.wav - ... - /section_00_source_train_normal_0990_.wav - /section_00_target_train_normal_0001_.wav - ... - /section_00_target_train_normal_0010_.wav - attributes_00.csv (attribute csv for section 00) - /AirCompressor (The other machine types have the same directory structure as 3DPrinter.) - /BrushlessMotor - /HairDryer - /HoveringDrone - /RoboticArm - /Scanner - /ToothBrush - /ToyCircuit

Baseline system

The baseline system is available on the Github repository . The baseline systems provide a simple entry-level approach that gives a reasonable performance in the dataset of Task 2. They are good starting points, especially for entry-level researchers who want to get familiar with the anomalous-sound-detection task.

Condition of use

This dataset was created jointly by Hitachi, Ltd., NTT Corporation and STMicroelectronics and is available under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.

Citation

Contact

If there is any problem, please contact us:

Tomoya Nishida, tomoya.nishida.ax@hitachi.com

Keisuke Imoto, keisuke.imoto@ieee.org

Noboru Harada, noboru@ieee.org

Daisuke Niizumi, daisuke.niizumi.dt@hco.ntt.co.jp

Yohei Kawaguchi, yohei.kawaguchi.xk@hitachi.com