100+ datasets found
  1. Mars rover dataset

    • kaggle.com
    zip
    Updated Mar 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gaurav Kumar (2025). Mars rover dataset [Dataset]. https://www.kaggle.com/datasets/gauravkumar2525/mars-rover-dataset
    Explore at:
    zip(101820038 bytes)Available download formats
    Dataset updated
    Mar 1, 2025
    Authors
    Gaurav Kumar
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    1. Description

    The description section is crucial for helping users understand the purpose, context, and potential applications of your dataset. It should include the following details:

    • Dataset Overview: Provide a clear summary of what the dataset contains. For example, "This dataset includes images captured by NASA’s Curiosity Rover on Mars, along with metadata such as the camera used, the Martian sol (day) when the photo was taken, and the corresponding Earth date."
    • Source of the Data: Explain where the data comes from. If it’s obtained via an API, mention the source (e.g., "The images and metadata are retrieved from NASA's Mars Rover Photos API."). If the dataset is curated from multiple sources, list them.
    • Purpose and Use Cases: Describe why this dataset was created and how it can be used. For example:
      • Machine Learning: Train models for image classification, object detection, and anomaly detection.
      • Scientific Research: Analyze Martian surface patterns, study terrain features, or examine rover camera performance.
      • Space Exploration: Understand Mars' environmental conditions and assist in future exploration planning.
    • Data Format and Organization: Briefly mention the format of the files (e.g., CSV file for metadata, ZIP file containing images) and how they are structured.
    • Licensing and Permissions: Specify if the dataset has any restrictions on usage. Since NASA data is typically public domain, state that users are free to use it for research and development.
    • Limitations or Considerations: Mention any potential challenges, such as missing data, limited coverage, or resolution constraints.

    2. File Information

    This section provides details about the files included in your dataset, helping users navigate and use them efficiently. Key points to include:

    • List of Files: Clearly mention all the files and their formats, such as:
      • mars_rover_dataset.csv (CSV file containing metadata of images)
      • mars_images.zip (Compressed folder containing all images)
    • Purpose of Each File:
      • CSV File: Contains structured data, including image IDs, timestamps, camera details, and URLs.
      • ZIP File: Stores actual Mars images, which can be extracted and used for ML training or visualization.
    • File Dependencies: Explain how files relate to each other. For example, "The img_src column in mars_rover_dataset.csv corresponds to the images stored in mars_images.zip. Users should extract the images before using the dataset for model training."
    • How to Access the Files: Provide instructions on downloading and extracting files. Example:
      bash unzip mars_images.zip
      This ensures that users can quickly set up the dataset in their working environment.

    3. Column Descriptions

    This section explains the meaning of each column in the dataset, ensuring users can analyze and interpret the data correctly. A well-structured table format is often useful:

    Column NameDescription
    idUnique identifier for each image.
    solMartian sol (day) when the image was captured.
    camera_nameAbbreviated name of the rover's camera (e.g., "FHAZ" for Front Hazard Camera).
    camera_full_nameFull descriptive name of the camera.
    img_srcURL link to the image. Users can download images using this link.
    earth_dateThe Earth date corresponding to the Martian sol.
    rover_nameName of the rover that captured the image (e.g., "Curiosity").
    rover_statusCurrent operational status of the rover (e.g., "Active" or "Complete").
    landing_dateDate when the rover landed on Mars.
    launch_dateDate when the rover was launched from Earth.

    Additional Details:

    • Data Types: Indicate whether a column contains numbers, text, or dates.
    • Data Format: Example: earth_date is in YYYY-MM-DD format.
    • Special Notes: If any column has missing values or requires preprocessing, mention it.

    This section helps users quickly understand the dataset's structure, making it easier for them to work with the data effectively.

  2. Classification - Dataset - NASA Open Data Portal

    • data.nasa.gov
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Classification - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/classification
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    A supervised learning task involves constructing a mapping from an input data space (normally described by several features) to an output space. A set of training examples---examples with known output values---is used by a learning algorithm to generate a model. This model is intended to approximate the mapping between the inputs and outputs. This model can be used to generate predicted outputs for inputs that have not been seen before. Within supervised learning, one type of task is a classification learning task, in which each output consists of one or more classes to which the corresponding input belongs. For example, we may have data consisting of observations of sunspots. In a classification learning task, our goal may be to learn to classify sunspots into one of several types. Each example may correspond to one candidate sunspot with various measurements or just an image. A learning algorithm would use the supplied examples to generate a model that approximates the mapping between each supplied set of measurements and the type of sunspot. This model can then be used to classify previously unseen sunspots based on the candidate's measurements. In this chapter, we explain several basic classification algorithms.

  3. Dataset #1: Cross-sectional survey data

    • figshare.com
    txt
    Updated Jul 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adam Baimel (2023). Dataset #1: Cross-sectional survey data [Dataset]. http://doi.org/10.6084/m9.figshare.23708730.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jul 19, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Adam Baimel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    N.B. This is not real data. Only here for an example for project templates.

    Project Title: Add title here

    Project Team: Add contact information for research project team members

    Summary: Provide a descriptive summary of the nature of your research project and its aims/focal research questions.

    Relevant publications/outputs: When available, add links to the related publications/outputs from this data.

    Data availability statement: If your data is not linked on figshare directly, provide links to where it is being hosted here (i.e., Open Science Framework, Github, etc.). If your data is not going to be made publicly available, please provide details here as to the conditions under which interested individuals could gain access to the data and how to go about doing so.

    Data collection details: 1. When was your data collected? 2. How were your participants sampled/recruited?

    Sample information: How many and who are your participants? Demographic summaries are helpful additions to this section.

    Research Project Materials: What materials are necessary to fully reproduce your the contents of your dataset? Include a list of all relevant materials (e.g., surveys, interview questions) with a brief description of what is included in each file that should be uploaded alongside your datasets.

    List of relevant datafile(s): If your project produces data that cannot be contained in a single file, list the names of each of the files here with a brief description of what parts of your research project each file is related to.

    Data codebook: What is in each column of your dataset? Provide variable names as they are encoded in your data files, verbatim question associated with each response, response options, details of any post-collection coding that has been done on the raw-response (and whether that's encoded in a separate column).

    Examples available at: https://www.thearda.com/data-archive?fid=PEWMU17 https://www.thearda.com/data-archive?fid=RELLAND14

  4. Dataset: Environmental conditions and male quality traits simultaneously...

    • data.europa.eu
    • data.niaid.nih.gov
    unknown
    Updated Jun 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2022). Dataset: Environmental conditions and male quality traits simultaneously explain variation of multiple colour signals in male lizards [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-6683661?locale=de
    Explore at:
    unknown(3063441)Available download formats
    Dataset updated
    Jun 21, 2022
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset and R code associated with the following publication: Badiane et al. (2022), Environmental conditions and male quality traits simultaneously explain variation of multiple colour signals in male lizards. Journal of Animal Ecology, in press This dataset includes the following files: - An excel file containing the reflectance spectra of all individuals from all the study populations - An excel file containing the variables collected at the individual and population levels - Two R scripts corresponding to the analyses performed in the publication

  5. Describe Art Dataset

    • universe.roboflow.com
    zip
    Updated Aug 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roboflow Project (2024). Describe Art Dataset [Dataset]. https://universe.roboflow.com/roboflow-project-erv3h/describe-art
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 2, 2024
    Dataset provided by
    Roboflowhttps://roboflow.com/
    Authors
    Roboflow Project
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Art Images Descriptions
    Description

    Describe Art

    ## Overview
    
    Describe Art is a dataset for vision language (multimodal) tasks - it contains Art Images annotations for 6,402 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  6. f

    Data from: Why hibernate? Tests of four hypotheses to explain intraspecific...

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    Updated Apr 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Morris, Alice; Allison, Austin; Conway, Courtney (2023). Data from: Why hibernate? Tests of four hypotheses to explain intraspecific variation in hibernation phenology [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001007672
    Explore at:
    Dataset updated
    Apr 24, 2023
    Authors
    Morris, Alice; Allison, Austin; Conway, Courtney
    Description

    This dataset includes hibernation phenology (immergence date, emergence date, and hibernation duration) for northern Idaho ground squirrels (Urocitellus brunneus), along with data used as predictor variables in linear mixed-effects models designed to explain intraspecific variation in the three hibernation behaviors. Code for that lme analysis is also included. Also included in this dataset are body mass data for northern Idaho ground squirrels, the code used to generate predicted squirrel body mass curves, NDVI data for northern Idaho ground squirrel study sites, and the code used to generate predicted NDVI curves for those sites. The data files include metadata sheets to better explain the data and its collection.

  7. R

    Pill Define Dataset

    • universe.roboflow.com
    zip
    Updated Oct 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jutamas (2024). Pill Define Dataset [Dataset]. https://universe.roboflow.com/jutamas-hn069/pill-define/dataset/2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 2, 2024
    Dataset authored and provided by
    Jutamas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Hydralazine Bounding Boxes
    Description

    Pill Define

    ## Overview
    
    Pill Define is a dataset for object detection tasks - it contains Hydralazine annotations for 800 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  8. d

    Classification

    • catalog.data.gov
    • s.cnmilf.com
    • +1more
    Updated Apr 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Classification [Dataset]. https://catalog.data.gov/dataset/classification
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    Dashlink
    Description

    A supervised learning task involves constructing a mapping from an input data space (normally described by several features) to an output space. A set of training examples---examples with known output values---is used by a learning algorithm to generate a model. This model is intended to approximate the mapping between the inputs and outputs. This model can be used to generate predicted outputs for inputs that have not been seen before. Within supervised learning, one type of task is a classification learning task, in which each output consists of one or more classes to which the corresponding input belongs. For example, we may have data consisting of observations of sunspots. In a classification learning task, our goal may be to learn to classify sunspots into one of several types. Each example may correspond to one candidate sunspot with various measurements or just an image. A learning algorithm would use the supplied examples to generate a model that approximates the mapping between each supplied set of measurements and the type of sunspot. This model can then be used to classify previously unseen sunspots based on the candidate's measurements. In this chapter, we explain several basic classification algorithms.

  9. U

    Digital data sets that describe aquifer characteristics of the Rush Springs...

    • data.usgs.gov
    • dataone.org
    • +2more
    Updated Aug 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States Geological Survey (2024). Digital data sets that describe aquifer characteristics of the Rush Springs aquifer in western Oklahoma [Dataset]. http://doi.org/10.5066/P9684FW1
    Explore at:
    Dataset updated
    Aug 24, 2024
    Dataset authored and provided by
    United States Geological Surveyhttp://www.usgs.gov/
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Time period covered
    1997
    Area covered
    Western Oklahoma, Rush Springs, Oklahoma
    Description

    This data set consists of digitized aquifer boundaries for the Rush Springs aquifer in western Oklahoma. This area encompasses all or part of Blaine, Caddo, Canadian, Comanche, Custer, Dewey, Grady, Stephens, and Washita Counties. Mark F. Becker (U.S. Geological Survey, written commun., 1997) created an aquifer boundary data set that represented hydrologic boundaries needed to simulate the ground-water flow in the Rush Springs aquifer with a computer model. In the ground-water flow model, Mark F. Becker defined the Rush Springs aquifer to include the Rush Springs Formation, alluvial and terrace deposits along major streams, and parts of the Marlow Formations, particularly in the eastern part of the aquifer boundary area.

    The Permian-age Rush Springs Formation consists of highly cross-bedded sandstone with some interbedded dolomite and gypsum. The Rush Springs Formation is overlain by Quaternary-age alluvial and terrace deposits that consist of unconsolidated clay, silt, sand, a ...

  10. R

    Sicec Define Dataset

    • universe.roboflow.com
    zip
    Updated Apr 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SICECs (2023). Sicec Define Dataset [Dataset]. https://universe.roboflow.com/sicecs/sicec-define/model/3
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 6, 2023
    Dataset authored and provided by
    SICECs
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    General Stuff
    Description

    Sicec Define

    ## Overview
    
    Sicec Define is a dataset for classification tasks - it contains General Stuff annotations for 4,722 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  11. movie_review_nltk

    • kaggle.com
    zip
    Updated Aug 11, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bharat Natrayn (2021). movie_review_nltk [Dataset]. https://www.kaggle.com/bharatnatrayn/movie-review-nltk
    Explore at:
    zip(961871 bytes)Available download formats
    Dataset updated
    Aug 11, 2021
    Authors
    Bharat Natrayn
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    This content have pos_tag,impact of review,description of story explain about movie tv series

    Content

    MOVIES This column have movies name SENTENCE This represent each word belong to which sentence TAG part of speech WORD each individual word in row REVIEW impact on audience

    Acknowledgements

    the dataset created from my last dataset movie.csv

    Inspiration

    the dataset created from my last dataset movie.csv

  12. Employee Data from the City of Chicago

    • kaggle.com
    zip
    Updated Sep 2, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rishi Damarla (2020). Employee Data from the City of Chicago [Dataset]. https://www.kaggle.com/datasets/rishidamarla/employee-data-from-the-city-of-chicago/code
    Explore at:
    zip(423030 bytes)Available download formats
    Dataset updated
    Sep 2, 2020
    Authors
    Rishi Damarla
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Chicago
    Description

    Context

    There aren't many datasets that openly share information that is included in this dataset. Thus, it is very crucial to take advantage of the fact that this data exists for public use.

    Content

    This dataset includes the names, salaries, and position titles of numerous employees from Chicago, Illinois, America.

    Acknowledgements

    This data was found at https://data.cityofchicago.org/Administration-Finance/Current-Employee-Names-Salaries-and-Position-Title/xzkq-xp2w.

  13. u

    Data from: Dataset of the manuscript "What is local research? Towards a...

    • produccioncientifica.ugr.es
    • zenodo.org
    Updated 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Di Césare, Victoria; Robinson-Garcia, Nicolas; Di Césare, Victoria; Robinson-Garcia, Nicolas (2024). Dataset of the manuscript "What is local research? Towards a multidimensional framework linking theory and methods" [Dataset]. https://produccioncientifica.ugr.es/documentos/67a9c7ca19544708f8c72b38?lang=gl
    Explore at:
    Dataset updated
    2024
    Authors
    Di Césare, Victoria; Robinson-Garcia, Nicolas; Di Césare, Victoria; Robinson-Garcia, Nicolas
    Description

    Dataset of the manuscript "What is local research? Towards a multidimensional framework linking theory and methods". In this research article we propose a theoretical and empirical framework of local research, a concept of growing importance due to its far-reaching implications for public policy. Our motivation stems from the lack of clarity surrounding the increasing yet uncritical use of the term in both scientific publications and policy documents, where local research is conceptualized and measured in many ways. A clear understanding of it is crucial for informed decision-making when setting research agendas, allocating funds, and evaluating and rewarding scientists. Our twofold aim is (1) to compare the existing approaches that define and measure local research, and (2) to assess the implications of applying one over another. We first review the perspectives and measures used since the 1970s. Drawing on spatial scientometrics and proximities, we then build a framework that splits the concept into several dimensions: locally informed research, locally situated research, locally relevant research, locally bound research, and locally governed research. Each dimension is composed of a definition and a methodological approach, which we test in 10 million publications from the Dimensions database. Our findings reveal that these approaches measure distinct and sometimes unaligned aspects of local research, with varying effectiveness across countries and disciplines. This study highlights the complex, multifaceted nature of local research. We provide a flexible framework that facilitates the analysis of these dimensions and their intersections, in an attempt to contribute to the understanding and assessment of local research and its role within the production, dissemination, and impact of scientific knowledge.

  14. h

    joke_explaination

    • huggingface.co
    Updated Aug 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    theblackcat102 (2023). joke_explaination [Dataset]. https://huggingface.co/datasets/theblackcat102/joke_explaination
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 19, 2023
    Authors
    theblackcat102
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for Dataset Name

      Dataset Summary
    

    Corpus for testing whether your LLM can explain the joke well. But this is a rather small dataset, if someone can point to a larger ones would be very nice.

      Languages
    

    English

      Dataset Structure
    
    
    
    
    
      Data Fields
    

    url : link to the explaination

    joke : the original joke

    explaination : the explaination of the joke

      Data Splits
    

    Since its so small, there's no splits just like gsm8k

  15. d

    Data from: Digital data sets that describe aquifer characteristics of the...

    • catalog.data.gov
    • data.usgs.gov
    • +2more
    Updated Nov 26, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Digital data sets that describe aquifer characteristics of the Vamoosa-Ada aquifer in east-central Oklahoma [Dataset]. https://catalog.data.gov/dataset/digital-data-sets-that-describe-aquifer-characteristics-of-the-vamoosa-ada-aquifer-in-east
    Explore at:
    Dataset updated
    Nov 26, 2025
    Dataset provided by
    U.S. Geological Survey
    Area covered
    Oklahoma
    Description

    This data set consists of digitized polygons of constant recharge values for the Vamoosa-Ada aquifer, in east-central Oklahoma. The Vamoosa-Ada aquifer is an important source of water that underlies about 2,320-square miles of parts of Osage, Pawnee, Payne, Creek, Lincoln, Okfuskee, and Seminole Counties. Approximately 75 percent of the water withdrawn from the Vamoosa-Ada aquifer is for municipal use. Rural domestic use and water for stock animals account for most of the remaining water withdrawn. The Vamoosa-Ada aquifer is defined in a ground-water report as consisting principally of the rocks of the Late Pennsylvanian-age Vamoosa Formation and overlying Ada Group. The Vamoosa-Ada aquifer consists of a complex sequence of fine- to very fine-grained sandstone, siltstone, shale, and conglomerate interbedded with very thin limestones. The water-yielding capabilities of the aquifer are generally controlled by lateral and vertical distribution of the sandstone beds and their physical characteristics. The Vamoosa-Ada aquifer is unconfined where it outcrops in about an 1,700-square-mile area. The recharge rate of the Vamoosa-Ada aquifer was estimated as 1.52 inches per year from base-flow measurements and precipitation records published in a ground-water report. Most of the recharge polygons were extracted from published digital geology data sets. The lines in the digital geology data sets were scanned or digitized from maps published at a scale of 1:250,000 and represent geologic contacts. Some of the lines in the data set were interpolated in areas where the Vamoosa-Ada aquifer is overlain by alluvial and terrace deposits near streams and rivers.

  16. w

    Dataset of book subjects that contain What is MS?

    • workwithdata.com
    Updated Nov 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2024). Dataset of book subjects that contain What is MS? [Dataset]. https://www.workwithdata.com/datasets/book-subjects?f=1&fcol0=j0-book&fop0=%3D&fval0=What+is+MS%3F&j=1&j0=books
    Explore at:
    Dataset updated
    Nov 7, 2024
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about book subjects. It has 4 rows and is filtered where the books is What is MS?. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.

  17. d

    Data from: Several candidate size metrics explain vital rates across...

    • datadryad.org
    • search.dataone.org
    zip
    Updated Aug 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maude E. A. Baudraz; Dylan Z. Childs; Ruth Kelly; Annabel L. Smith; Jesus Villellas; Martin Andrzejak; Benedicte Bachelot; Lajos Benedek; Simone P. Blomberg; Judit Bodis; Francis Q. Brearley; Anna Bucharova; Christina M. Caruso; Jane A. Catford; Matthew Coghill; Aldo Compagnoni; Anna Mária P. Csergő; Richard P. Duncan; John Dwyer; Johan Ehrlén; Bret Elderd; Alain Finn; Lauchlan Fraser; Maria B. García; Jennifer R. Gremer; Ronny Groenteman; Liv Norunn Hamre; Aveliina Helm; Mária Höhn; Lotte Korell; Lauri Laanisto; Anna-Liisa Laine; Michele Lonati; Caroline M. McKeon; Aoife Molloy; Joslin L. Moore; Melanie Morales; Sergi Munne Bosch; Zuzana Münzbergová; Siri Lie Olsen; Adrian Oprea; Meelis Pärtel; Rachel M. Penczykowski; William K. Petry; Satu Ramula; Pil U. Rasmussen; Simone Ravetto Enri; Deborah A. Roach; Anna Roeder; Christiane Roscher; Marjo Saastamoinen; Cheryl Schultz; R. Drew Sieg; Olav Skarpaas; Ayco J. M. Tack; Joachim Töpper; Peter A. Vesk; Gregory Vose; Elizabeth M. Wandrag; Glenda M. Wardle; Astrid Wingler; Yvonne M. Buckley (2025). Several candidate size metrics explain vital rates across multiple populations throughout a widespread species' range [Dataset]. http://doi.org/10.5061/dryad.mw6m9067c
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 19, 2025
    Dataset provided by
    Dryad
    Authors
    Maude E. A. Baudraz; Dylan Z. Childs; Ruth Kelly; Annabel L. Smith; Jesus Villellas; Martin Andrzejak; Benedicte Bachelot; Lajos Benedek; Simone P. Blomberg; Judit Bodis; Francis Q. Brearley; Anna Bucharova; Christina M. Caruso; Jane A. Catford; Matthew Coghill; Aldo Compagnoni; Anna Mária P. Csergő; Richard P. Duncan; John Dwyer; Johan Ehrlén; Bret Elderd; Alain Finn; Lauchlan Fraser; Maria B. García; Jennifer R. Gremer; Ronny Groenteman; Liv Norunn Hamre; Aveliina Helm; Mária Höhn; Lotte Korell; Lauri Laanisto; Anna-Liisa Laine; Michele Lonati; Caroline M. McKeon; Aoife Molloy; Joslin L. Moore; Melanie Morales; Sergi Munne Bosch; Zuzana Münzbergová; Siri Lie Olsen; Adrian Oprea; Meelis Pärtel; Rachel M. Penczykowski; William K. Petry; Satu Ramula; Pil U. Rasmussen; Simone Ravetto Enri; Deborah A. Roach; Anna Roeder; Christiane Roscher; Marjo Saastamoinen; Cheryl Schultz; R. Drew Sieg; Olav Skarpaas; Ayco J. M. Tack; Joachim Töpper; Peter A. Vesk; Gregory Vose; Elizabeth M. Wandrag; Glenda M. Wardle; Astrid Wingler; Yvonne M. Buckley
    Time period covered
    Mar 11, 2025
    Description

    PlantPopNet (www.plantpopnet.com) collaborators collect demographic information on 65 naturally occurring populations of P. lanceolata across three continents. The present study included 55 populations that had at least two consecutive yearly censuses, presented here. Each population consists of an initial 100 individuals marked in naturally occurring populations and re-visited yearly at the peak of the flowering season. New recruits within the original plots were recorded and followed in subsequent years. The number of rosettes, number of leaves per rosette, length of the longest leaf, and width of the longest leaf for each rosette, flowering status (flowered, not flowered), reproductive output, and survival or death of each individual were recorded at each annual census. For further information on the PlantPopNet protocol, see Buckley et al. (2019). This data is presented as it was used to perform a study on a subset of the plantpopnet data. For said study, we used the first transitio...

  18. d

    Digital data sets that describe aquifer characteristics of the Antlers...

    • catalog.data.gov
    • data.usgs.gov
    • +2more
    Updated Oct 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Digital data sets that describe aquifer characteristics of the Antlers aquifer in southeastern Oklahoma [Dataset]. https://catalog.data.gov/dataset/digital-data-sets-that-describe-aquifer-characteristics-of-the-antlers-aquifer-in-southeas-1e230
    Explore at:
    Dataset updated
    Oct 22, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Oklahoma
    Description

    This data set consists of digitized polygons of constant recharge values for the Antlers aquifer in southeastern Oklahoma. The Early Cretaceous-age Antlers Sandstone is an important source of water in an area that underlies about 4,400-square miles of all or part of Atoka, Bryan, Carter, Choctaw, Johnston, Love, Marshall, McCurtain, and Pushmataha Counties. The Antlers aquifer consists of sand, clay, conglomerate, and limestone in the outcrop area. The upper part of the Antlers aquifer consists of beds of sand, poorly cemented sandstone, sandy shale, silt, and clay. The Antlers aquifer is unconfined where it outcrops in about an 1,800-square-mile area. The recharge polygons were developed from recharge rates used as input into a ground-water flow model and from published digital data sets of the surficial geology of the Antlers Sandstone except in areas overlain by alluvial and terrace deposits near streams. Some of the lines were interpolated where the Antlers aquifer is overlain by alluvial and terrace deposits. The interpolated lines are very similar to the aquifer boundaries shown on maps published in a ground-water modeling report for the Antlers aquifer. The constant recharge rates used as input to the ground-water flow model were 0.32 inches per year for the western portion of the aquifer and 0.96 inches per year for the eastern portion of the aquifer. Ground-water flow models are numerical representations that simplify and aggregate natural systems. Models are not unique; different combinations of aquifer characteristics may produce similar results. Therefore, values of recharge used in the model and presented in this data set are not precise, but are within a reasonable range when compared to independently collected data.

  19. Data to support "Boosted Regression Tree Models to Explain Watershed...

    • data.wu.ac.at
    • catalog.data.gov
    Updated Jan 15, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Environmental Protection Agency (2017). Data to support "Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations & Biological Condition" [Dataset]. https://data.wu.ac.at/odso/data_gov/ZTY5NWY1ODktOTgxYy00NDJhLWFhZDEtNmI2MzJiYTJiMTEz
    Explore at:
    application/x-zip-compressedAvailable download formats
    Dataset updated
    Jan 15, 2017
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    Spreadsheets are included here to support the manuscript "Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition".

    This dataset is associated with the following publication: Golden , H., C. Lane , A. Prues, and E. D'Amico. Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition. JAWRA. American Water Resources Association, Middleburg, VA, USA, 52(5): 1251-1274, (2016).

  20. Z

    DCASE 2024 Challenge Task 2 Additional Training Dataset

    • data.niaid.nih.gov
    Updated May 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tomoya, Nishida; Keisuke, Imoto; Noboru, Harada; Daisuke, Niizumi; Albertini, Davide; Sannino, Roberto; Pradolini, Simone; Augusti, Filippo; Kota, Dohi; Harsh, Purohit; Takashi, Endo; Yohei, Kawaguchi (2024). DCASE 2024 Challenge Task 2 Additional Training Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11183283
    Explore at:
    Dataset updated
    May 15, 2024
    Dataset provided by
    Hitachi Ltd.
    Doshisha University
    Nippon Telegraph and Telephone (Japan)
    STMicroelectronincs
    STMicroelectronics
    Authors
    Tomoya, Nishida; Keisuke, Imoto; Noboru, Harada; Daisuke, Niizumi; Albertini, Davide; Sannino, Roberto; Pradolini, Simone; Augusti, Filippo; Kota, Dohi; Harsh, Purohit; Takashi, Endo; Yohei, Kawaguchi
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Description

    This dataset is the "additional training dataset" for the DCASE 2024 Challenge Task 2.

    The data consists of the normal/anomalous operating sounds of nine types of real/toy machines. Each recording is a single-channel audio that includes both a machine's operating sound and environmental noise. The duration of recordings varies from 6 to 10 seconds. The following nine types of real/toy machines are used in this task:

    3DPrinter

    AirCompressor

    BrushlessMotor

    HairDryer

    HoveringDrone

    RoboticArm

    Scanner

    ToothBrush

    ToyCircuit

    Overview of the task

    Anomalous sound detection (ASD) is the task of identifying whether the sound emitted from a target machine is normal or anomalous. Automatic detection of mechanical failure is an essential technology in the fourth industrial revolution, which involves artificial-intelligence-based factory automation. Prompt detection of machine anomalies by observing sounds is useful for monitoring the condition of machines.

    This task is the follow-up from DCASE 2020 Task 2 to DCASE 2023 Task 2. The task this year is to develop an ASD system that meets the following five requirements.

    1. Train a model using only normal sound (unsupervised learning scenario) Because anomalies rarely occur and are highly diverse in real-world factories, it can be difficult to collect exhaustive patterns of anomalous sounds. Therefore, the system must detect unknown types of anomalous sounds that are not provided in the training data. This is the same requirement as in the previous tasks.

    2. Detect anomalies regardless of domain shifts (domain generalization task) In real-world cases, the operational states of a machine or the environmental noise can change to cause domain shifts. Domain-generalization techniques can be useful for handling domain shifts that occur frequently or are hard-to-notice. In this task, the system is required to use domain-generalization techniques for handling these domain shifts. This requirement is the same as in DCASE 2022 Task 2 and DCASE 2023 Task 2.

    3. Train a model for a completely new machine typeFor a completely new machine type, hyperparameters of the trained model cannot be tuned. Therefore, the system should have the ability to train models without additional hyperparameter tuning. This requirement is the same as in DCASE 2023 Task 2.

    4. Train a model using a limited number of machines from its machine typeWhile sounds from multiple machines of the same machine type can be used to enhance the detection performance, it is often the case that only a limited number of machines are available for a machine type. In such a case, the system should be able to train models using a few machines from a machine type. This requirement is the same as in DCASE 2023 Task 2.

    5 . Train a model both with or without attribute informationWhile additional attribute information can help enhance the detection performance, we cannot always obtain such information. Therefore, the system must work well both when attribute information is available and when it is not.

    The last requirement is newly introduced in DCASE 2024 Task2.

    Definition

    We first define key terms in this task: "machine type," "section," "source domain," "target domain," and "attributes.".

    "Machine type" indicates the type of machine, which in the additional training dataset is one of nine: 3D-printer, air compressor, brushless motor, hair dryer, hovering drone, robotic arm, document scanner (scanner), toothbrush, and Toy circuit.

    A section is defined as a subset of the dataset for calculating performance metrics.

    The source domain is the domain under which most of the training data and some of the test data were recorded, and the target domain is a different set of domains under which some of the training data and some of the test data were recorded. There are differences between the source and target domains in terms of operating speed, machine load, viscosity, heating temperature, type of environmental noise, signal-to-noise ratio, etc.

    Attributes are parameters that define states of machines or types of noise. For several machine types, the attributes are hidden.

    Dataset

    This dataset consists of nine machine types. For each machine type, one section is provided, and the section is a complete set of training data. A set of test data corresponding to this training data will be provided in another seperate zenodo page as an "evaluation dataset" for the DCASE 2024 Challenge task 2. For each section, this dataset provides (i) 990 clips of normal sounds in the source domain for training and (ii) ten clips of normal sounds in the target domain for training. The source/target domain of each sample is provided. Additionally, the attributes of each sample in the training and test data are provided in the file names and attribute csv files.

    File names and attribute csv files

    File names and attribute csv files provide reference labels for each clip. The given reference labels for each training clip include machine type, section index, normal/anomaly information, and attributes regarding the condition other than normal/anomaly. The machine type is given by the directory name. The section index is given by their respective file names. For the datasets other than the evaluation dataset, the normal/anomaly information and the attributes are given by their respective file names. Note that for machine types that has its attribute information hidden, the attribute information in each file names are only labeled as "noAttributes". Attribute csv files are for easy access to attributes that cause domain shifts. In these files, the file names, name of parameters that cause domain shifts (domain shift parameter, dp), and the value or type of these parameters (domain shift value, dv) are listed. Each row takes the following format:

    [filename (string)], [d1p (string)], [d1v (int | float | string)], [d2p], [d2v]...
    

    For machine types that have their attribute information hidden, all columns except the filename column are left blank for each row.

    Recording procedure

    Normal/anomalous operating sounds of machines and its related equipment are recorded. Anomalous sounds were collected by deliberately damaging target machines. For simplifying the task, we use only the first channel of multi-channel recordings; all recordings are regarded as single-channel recordings of a fixed microphone. We mixed a target machine sound with environmental noise, and only noisy recordings are provided as training/test data. The environmental noise samples were recorded in several real factory environments. We will publish papers on the dataset to explain the details of the recording procedure by the submission deadline.

    Directory structure

    • /eval_data

      • /raw - /3DPrinter - /train (only normal clips) - /section_00_source_train_normal_0001_.wav - ... - /section_00_source_train_normal_0990_.wav - /section_00_target_train_normal_0001_.wav - ... - /section_00_target_train_normal_0010_.wav - attributes_00.csv (attribute csv for section 00) - /AirCompressor (The other machine types have the same directory structure as 3DPrinter.) - /BrushlessMotor - /HairDryer - /HoveringDrone - /RoboticArm - /Scanner - /ToothBrush - /ToyCircuit

    Baseline system

    The baseline system is available on the Github repository . The baseline systems provide a simple entry-level approach that gives a reasonable performance in the dataset of Task 2. They are good starting points, especially for entry-level researchers who want to get familiar with the anomalous-sound-detection task.

    Condition of use

    This dataset was created jointly by Hitachi, Ltd., NTT Corporation and STMicroelectronics and is available under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.

    Citation

    Contact

    If there is any problem, please contact us:

    Tomoya Nishida, tomoya.nishida.ax@hitachi.com

    Keisuke Imoto, keisuke.imoto@ieee.org

    Noboru Harada, noboru@ieee.org

    Daisuke Niizumi, daisuke.niizumi.dt@hco.ntt.co.jp

    Yohei Kawaguchi, yohei.kawaguchi.xk@hitachi.com

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Gaurav Kumar (2025). Mars rover dataset [Dataset]. https://www.kaggle.com/datasets/gauravkumar2525/mars-rover-dataset
Organization logo

Mars rover dataset

Mars Rover Image Dataset for AI & ML | Curiosity Rover Photos with Metadata

Explore at:
zip(101820038 bytes)Available download formats
Dataset updated
Mar 1, 2025
Authors
Gaurav Kumar
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

1. Description

The description section is crucial for helping users understand the purpose, context, and potential applications of your dataset. It should include the following details:

  • Dataset Overview: Provide a clear summary of what the dataset contains. For example, "This dataset includes images captured by NASA’s Curiosity Rover on Mars, along with metadata such as the camera used, the Martian sol (day) when the photo was taken, and the corresponding Earth date."
  • Source of the Data: Explain where the data comes from. If it’s obtained via an API, mention the source (e.g., "The images and metadata are retrieved from NASA's Mars Rover Photos API."). If the dataset is curated from multiple sources, list them.
  • Purpose and Use Cases: Describe why this dataset was created and how it can be used. For example:
    • Machine Learning: Train models for image classification, object detection, and anomaly detection.
    • Scientific Research: Analyze Martian surface patterns, study terrain features, or examine rover camera performance.
    • Space Exploration: Understand Mars' environmental conditions and assist in future exploration planning.
  • Data Format and Organization: Briefly mention the format of the files (e.g., CSV file for metadata, ZIP file containing images) and how they are structured.
  • Licensing and Permissions: Specify if the dataset has any restrictions on usage. Since NASA data is typically public domain, state that users are free to use it for research and development.
  • Limitations or Considerations: Mention any potential challenges, such as missing data, limited coverage, or resolution constraints.

2. File Information

This section provides details about the files included in your dataset, helping users navigate and use them efficiently. Key points to include:

  • List of Files: Clearly mention all the files and their formats, such as:
    • mars_rover_dataset.csv (CSV file containing metadata of images)
    • mars_images.zip (Compressed folder containing all images)
  • Purpose of Each File:
    • CSV File: Contains structured data, including image IDs, timestamps, camera details, and URLs.
    • ZIP File: Stores actual Mars images, which can be extracted and used for ML training or visualization.
  • File Dependencies: Explain how files relate to each other. For example, "The img_src column in mars_rover_dataset.csv corresponds to the images stored in mars_images.zip. Users should extract the images before using the dataset for model training."
  • How to Access the Files: Provide instructions on downloading and extracting files. Example:
    bash unzip mars_images.zip
    This ensures that users can quickly set up the dataset in their working environment.

3. Column Descriptions

This section explains the meaning of each column in the dataset, ensuring users can analyze and interpret the data correctly. A well-structured table format is often useful:

Column NameDescription
idUnique identifier for each image.
solMartian sol (day) when the image was captured.
camera_nameAbbreviated name of the rover's camera (e.g., "FHAZ" for Front Hazard Camera).
camera_full_nameFull descriptive name of the camera.
img_srcURL link to the image. Users can download images using this link.
earth_dateThe Earth date corresponding to the Martian sol.
rover_nameName of the rover that captured the image (e.g., "Curiosity").
rover_statusCurrent operational status of the rover (e.g., "Active" or "Complete").
landing_dateDate when the rover landed on Mars.
launch_dateDate when the rover was launched from Earth.

Additional Details:

  • Data Types: Indicate whether a column contains numbers, text, or dates.
  • Data Format: Example: earth_date is in YYYY-MM-DD format.
  • Special Notes: If any column has missing values or requires preprocessing, mention it.

This section helps users quickly understand the dataset's structure, making it easier for them to work with the data effectively.

Search
Clear search
Close search
Google apps
Main menu