Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The description section is crucial for helping users understand the purpose, context, and potential applications of your dataset. It should include the following details:
This section provides details about the files included in your dataset, helping users navigate and use them efficiently. Key points to include:
mars_rover_dataset.csv (CSV file containing metadata of images) mars_images.zip (Compressed folder containing all images) img_src column in mars_rover_dataset.csv corresponds to the images stored in mars_images.zip. Users should extract the images before using the dataset for model training." bash
unzip mars_images.zipThis section explains the meaning of each column in the dataset, ensuring users can analyze and interpret the data correctly. A well-structured table format is often useful:
| Column Name | Description |
|---|---|
id | Unique identifier for each image. |
sol | Martian sol (day) when the image was captured. |
camera_name | Abbreviated name of the rover's camera (e.g., "FHAZ" for Front Hazard Camera). |
camera_full_name | Full descriptive name of the camera. |
img_src | URL link to the image. Users can download images using this link. |
earth_date | The Earth date corresponding to the Martian sol. |
rover_name | Name of the rover that captured the image (e.g., "Curiosity"). |
rover_status | Current operational status of the rover (e.g., "Active" or "Complete"). |
landing_date | Date when the rover landed on Mars. |
launch_date | Date when the rover was launched from Earth. |
earth_date is in YYYY-MM-DD format. This section helps users quickly understand the dataset's structure, making it easier for them to work with the data effectively.
Facebook
TwitterA supervised learning task involves constructing a mapping from an input data space (normally described by several features) to an output space. A set of training examples---examples with known output values---is used by a learning algorithm to generate a model. This model is intended to approximate the mapping between the inputs and outputs. This model can be used to generate predicted outputs for inputs that have not been seen before. Within supervised learning, one type of task is a classification learning task, in which each output consists of one or more classes to which the corresponding input belongs. For example, we may have data consisting of observations of sunspots. In a classification learning task, our goal may be to learn to classify sunspots into one of several types. Each example may correspond to one candidate sunspot with various measurements or just an image. A learning algorithm would use the supplied examples to generate a model that approximates the mapping between each supplied set of measurements and the type of sunspot. This model can then be used to classify previously unseen sunspots based on the candidate's measurements. In this chapter, we explain several basic classification algorithms.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
N.B. This is not real data. Only here for an example for project templates.
Project Title: Add title here
Project Team: Add contact information for research project team members
Summary: Provide a descriptive summary of the nature of your research project and its aims/focal research questions.
Relevant publications/outputs: When available, add links to the related publications/outputs from this data.
Data availability statement: If your data is not linked on figshare directly, provide links to where it is being hosted here (i.e., Open Science Framework, Github, etc.). If your data is not going to be made publicly available, please provide details here as to the conditions under which interested individuals could gain access to the data and how to go about doing so.
Data collection details: 1. When was your data collected? 2. How were your participants sampled/recruited?
Sample information: How many and who are your participants? Demographic summaries are helpful additions to this section.
Research Project Materials: What materials are necessary to fully reproduce your the contents of your dataset? Include a list of all relevant materials (e.g., surveys, interview questions) with a brief description of what is included in each file that should be uploaded alongside your datasets.
List of relevant datafile(s): If your project produces data that cannot be contained in a single file, list the names of each of the files here with a brief description of what parts of your research project each file is related to.
Data codebook: What is in each column of your dataset? Provide variable names as they are encoded in your data files, verbatim question associated with each response, response options, details of any post-collection coding that has been done on the raw-response (and whether that's encoded in a separate column).
Examples available at: https://www.thearda.com/data-archive?fid=PEWMU17 https://www.thearda.com/data-archive?fid=RELLAND14
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset and R code associated with the following publication: Badiane et al. (2022), Environmental conditions and male quality traits simultaneously explain variation of multiple colour signals in male lizards. Journal of Animal Ecology, in press This dataset includes the following files: - An excel file containing the reflectance spectra of all individuals from all the study populations - An excel file containing the variables collected at the individual and population levels - Two R scripts corresponding to the analyses performed in the publication
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Describe Art is a dataset for vision language (multimodal) tasks - it contains Art Images annotations for 6,402 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterThis dataset includes hibernation phenology (immergence date, emergence date, and hibernation duration) for northern Idaho ground squirrels (Urocitellus brunneus), along with data used as predictor variables in linear mixed-effects models designed to explain intraspecific variation in the three hibernation behaviors. Code for that lme analysis is also included. Also included in this dataset are body mass data for northern Idaho ground squirrels, the code used to generate predicted squirrel body mass curves, NDVI data for northern Idaho ground squirrel study sites, and the code used to generate predicted NDVI curves for those sites. The data files include metadata sheets to better explain the data and its collection.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Pill Define is a dataset for object detection tasks - it contains Hydralazine annotations for 800 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterA supervised learning task involves constructing a mapping from an input data space (normally described by several features) to an output space. A set of training examples---examples with known output values---is used by a learning algorithm to generate a model. This model is intended to approximate the mapping between the inputs and outputs. This model can be used to generate predicted outputs for inputs that have not been seen before. Within supervised learning, one type of task is a classification learning task, in which each output consists of one or more classes to which the corresponding input belongs. For example, we may have data consisting of observations of sunspots. In a classification learning task, our goal may be to learn to classify sunspots into one of several types. Each example may correspond to one candidate sunspot with various measurements or just an image. A learning algorithm would use the supplied examples to generate a model that approximates the mapping between each supplied set of measurements and the type of sunspot. This model can then be used to classify previously unseen sunspots based on the candidate's measurements. In this chapter, we explain several basic classification algorithms.
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
This data set consists of digitized aquifer boundaries for the Rush Springs aquifer in western Oklahoma. This area encompasses all or part of Blaine, Caddo, Canadian, Comanche, Custer, Dewey, Grady, Stephens, and Washita Counties. Mark F. Becker (U.S. Geological Survey, written commun., 1997) created an aquifer boundary data set that represented hydrologic boundaries needed to simulate the ground-water flow in the Rush Springs aquifer with a computer model. In the ground-water flow model, Mark F. Becker defined the Rush Springs aquifer to include the Rush Springs Formation, alluvial and terrace deposits along major streams, and parts of the Marlow Formations, particularly in the eastern part of the aquifer boundary area.
The Permian-age Rush Springs Formation consists of highly cross-bedded sandstone with some interbedded dolomite and gypsum. The Rush Springs Formation is overlain by Quaternary-age alluvial and terrace deposits that consist of unconsolidated clay, silt, sand, a ...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Sicec Define is a dataset for classification tasks - it contains General Stuff annotations for 4,722 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This content have pos_tag,impact of review,description of story explain about movie tv series
MOVIES This column have movies name SENTENCE This represent each word belong to which sentence TAG part of speech WORD each individual word in row REVIEW impact on audience
the dataset created from my last dataset movie.csv
the dataset created from my last dataset movie.csv
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
There aren't many datasets that openly share information that is included in this dataset. Thus, it is very crucial to take advantage of the fact that this data exists for public use.
This dataset includes the names, salaries, and position titles of numerous employees from Chicago, Illinois, America.
This data was found at https://data.cityofchicago.org/Administration-Finance/Current-Employee-Names-Salaries-and-Position-Title/xzkq-xp2w.
Facebook
TwitterDataset of the manuscript "What is local research? Towards a multidimensional framework linking theory and methods". In this research article we propose a theoretical and empirical framework of local research, a concept of growing importance due to its far-reaching implications for public policy. Our motivation stems from the lack of clarity surrounding the increasing yet uncritical use of the term in both scientific publications and policy documents, where local research is conceptualized and measured in many ways. A clear understanding of it is crucial for informed decision-making when setting research agendas, allocating funds, and evaluating and rewarding scientists. Our twofold aim is (1) to compare the existing approaches that define and measure local research, and (2) to assess the implications of applying one over another. We first review the perspectives and measures used since the 1970s. Drawing on spatial scientometrics and proximities, we then build a framework that splits the concept into several dimensions: locally informed research, locally situated research, locally relevant research, locally bound research, and locally governed research. Each dimension is composed of a definition and a methodological approach, which we test in 10 million publications from the Dimensions database. Our findings reveal that these approaches measure distinct and sometimes unaligned aspects of local research, with varying effectiveness across countries and disciplines. This study highlights the complex, multifaceted nature of local research. We provide a flexible framework that facilitates the analysis of these dimensions and their intersections, in an attempt to contribute to the understanding and assessment of local research and its role within the production, dissemination, and impact of scientific knowledge.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for Dataset Name
Dataset Summary
Corpus for testing whether your LLM can explain the joke well. But this is a rather small dataset, if someone can point to a larger ones would be very nice.
Languages
English
Dataset Structure
Data Fields
url : link to the explaination
joke : the original joke
explaination : the explaination of the joke
Data Splits
Since its so small, there's no splits just like gsm8k
Facebook
TwitterThis data set consists of digitized polygons of constant recharge values for the Vamoosa-Ada aquifer, in east-central Oklahoma. The Vamoosa-Ada aquifer is an important source of water that underlies about 2,320-square miles of parts of Osage, Pawnee, Payne, Creek, Lincoln, Okfuskee, and Seminole Counties. Approximately 75 percent of the water withdrawn from the Vamoosa-Ada aquifer is for municipal use. Rural domestic use and water for stock animals account for most of the remaining water withdrawn. The Vamoosa-Ada aquifer is defined in a ground-water report as consisting principally of the rocks of the Late Pennsylvanian-age Vamoosa Formation and overlying Ada Group. The Vamoosa-Ada aquifer consists of a complex sequence of fine- to very fine-grained sandstone, siltstone, shale, and conglomerate interbedded with very thin limestones. The water-yielding capabilities of the aquifer are generally controlled by lateral and vertical distribution of the sandstone beds and their physical characteristics. The Vamoosa-Ada aquifer is unconfined where it outcrops in about an 1,700-square-mile area. The recharge rate of the Vamoosa-Ada aquifer was estimated as 1.52 inches per year from base-flow measurements and precipitation records published in a ground-water report. Most of the recharge polygons were extracted from published digital geology data sets. The lines in the digital geology data sets were scanned or digitized from maps published at a scale of 1:250,000 and represent geologic contacts. Some of the lines in the data set were interpolated in areas where the Vamoosa-Ada aquifer is overlain by alluvial and terrace deposits near streams and rivers.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about book subjects. It has 4 rows and is filtered where the books is What is MS?. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
Facebook
TwitterPlantPopNet (www.plantpopnet.com) collaborators collect demographic information on 65 naturally occurring populations of P. lanceolata across three continents. The present study included 55 populations that had at least two consecutive yearly censuses, presented here. Each population consists of an initial 100 individuals marked in naturally occurring populations and re-visited yearly at the peak of the flowering season. New recruits within the original plots were recorded and followed in subsequent years. The number of rosettes, number of leaves per rosette, length of the longest leaf, and width of the longest leaf for each rosette, flowering status (flowered, not flowered), reproductive output, and survival or death of each individual were recorded at each annual census. For further information on the PlantPopNet protocol, see Buckley et al. (2019). This data is presented as it was used to perform a study on a subset of the plantpopnet data. For said study, we used the first transitio...
Facebook
TwitterThis data set consists of digitized polygons of constant recharge values for the Antlers aquifer in southeastern Oklahoma. The Early Cretaceous-age Antlers Sandstone is an important source of water in an area that underlies about 4,400-square miles of all or part of Atoka, Bryan, Carter, Choctaw, Johnston, Love, Marshall, McCurtain, and Pushmataha Counties. The Antlers aquifer consists of sand, clay, conglomerate, and limestone in the outcrop area. The upper part of the Antlers aquifer consists of beds of sand, poorly cemented sandstone, sandy shale, silt, and clay. The Antlers aquifer is unconfined where it outcrops in about an 1,800-square-mile area. The recharge polygons were developed from recharge rates used as input into a ground-water flow model and from published digital data sets of the surficial geology of the Antlers Sandstone except in areas overlain by alluvial and terrace deposits near streams. Some of the lines were interpolated where the Antlers aquifer is overlain by alluvial and terrace deposits. The interpolated lines are very similar to the aquifer boundaries shown on maps published in a ground-water modeling report for the Antlers aquifer. The constant recharge rates used as input to the ground-water flow model were 0.32 inches per year for the western portion of the aquifer and 0.96 inches per year for the eastern portion of the aquifer. Ground-water flow models are numerical representations that simplify and aggregate natural systems. Models are not unique; different combinations of aquifer characteristics may produce similar results. Therefore, values of recharge used in the model and presented in this data set are not precise, but are within a reasonable range when compared to independently collected data.
Facebook
TwitterSpreadsheets are included here to support the manuscript "Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition".
This dataset is associated with the following publication: Golden , H., C. Lane , A. Prues, and E. D'Amico. Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition. JAWRA. American Water Resources Association, Middleburg, VA, USA, 52(5): 1251-1274, (2016).
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
This dataset is the "additional training dataset" for the DCASE 2024 Challenge Task 2.
The data consists of the normal/anomalous operating sounds of nine types of real/toy machines. Each recording is a single-channel audio that includes both a machine's operating sound and environmental noise. The duration of recordings varies from 6 to 10 seconds. The following nine types of real/toy machines are used in this task:
3DPrinter
AirCompressor
BrushlessMotor
HairDryer
HoveringDrone
RoboticArm
Scanner
ToothBrush
ToyCircuit
Overview of the task
Anomalous sound detection (ASD) is the task of identifying whether the sound emitted from a target machine is normal or anomalous. Automatic detection of mechanical failure is an essential technology in the fourth industrial revolution, which involves artificial-intelligence-based factory automation. Prompt detection of machine anomalies by observing sounds is useful for monitoring the condition of machines.
This task is the follow-up from DCASE 2020 Task 2 to DCASE 2023 Task 2. The task this year is to develop an ASD system that meets the following five requirements.
Train a model using only normal sound (unsupervised learning scenario) Because anomalies rarely occur and are highly diverse in real-world factories, it can be difficult to collect exhaustive patterns of anomalous sounds. Therefore, the system must detect unknown types of anomalous sounds that are not provided in the training data. This is the same requirement as in the previous tasks.
Detect anomalies regardless of domain shifts (domain generalization task) In real-world cases, the operational states of a machine or the environmental noise can change to cause domain shifts. Domain-generalization techniques can be useful for handling domain shifts that occur frequently or are hard-to-notice. In this task, the system is required to use domain-generalization techniques for handling these domain shifts. This requirement is the same as in DCASE 2022 Task 2 and DCASE 2023 Task 2.
Train a model for a completely new machine typeFor a completely new machine type, hyperparameters of the trained model cannot be tuned. Therefore, the system should have the ability to train models without additional hyperparameter tuning. This requirement is the same as in DCASE 2023 Task 2.
Train a model using a limited number of machines from its machine typeWhile sounds from multiple machines of the same machine type can be used to enhance the detection performance, it is often the case that only a limited number of machines are available for a machine type. In such a case, the system should be able to train models using a few machines from a machine type. This requirement is the same as in DCASE 2023 Task 2.
5 . Train a model both with or without attribute informationWhile additional attribute information can help enhance the detection performance, we cannot always obtain such information. Therefore, the system must work well both when attribute information is available and when it is not.
The last requirement is newly introduced in DCASE 2024 Task2.
Definition
We first define key terms in this task: "machine type," "section," "source domain," "target domain," and "attributes.".
"Machine type" indicates the type of machine, which in the additional training dataset is one of nine: 3D-printer, air compressor, brushless motor, hair dryer, hovering drone, robotic arm, document scanner (scanner), toothbrush, and Toy circuit.
A section is defined as a subset of the dataset for calculating performance metrics.
The source domain is the domain under which most of the training data and some of the test data were recorded, and the target domain is a different set of domains under which some of the training data and some of the test data were recorded. There are differences between the source and target domains in terms of operating speed, machine load, viscosity, heating temperature, type of environmental noise, signal-to-noise ratio, etc.
Attributes are parameters that define states of machines or types of noise. For several machine types, the attributes are hidden.
Dataset
This dataset consists of nine machine types. For each machine type, one section is provided, and the section is a complete set of training data. A set of test data corresponding to this training data will be provided in another seperate zenodo page as an "evaluation dataset" for the DCASE 2024 Challenge task 2. For each section, this dataset provides (i) 990 clips of normal sounds in the source domain for training and (ii) ten clips of normal sounds in the target domain for training. The source/target domain of each sample is provided. Additionally, the attributes of each sample in the training and test data are provided in the file names and attribute csv files.
File names and attribute csv files
File names and attribute csv files provide reference labels for each clip. The given reference labels for each training clip include machine type, section index, normal/anomaly information, and attributes regarding the condition other than normal/anomaly. The machine type is given by the directory name. The section index is given by their respective file names. For the datasets other than the evaluation dataset, the normal/anomaly information and the attributes are given by their respective file names. Note that for machine types that has its attribute information hidden, the attribute information in each file names are only labeled as "noAttributes". Attribute csv files are for easy access to attributes that cause domain shifts. In these files, the file names, name of parameters that cause domain shifts (domain shift parameter, dp), and the value or type of these parameters (domain shift value, dv) are listed. Each row takes the following format:
[filename (string)], [d1p (string)], [d1v (int | float | string)], [d2p], [d2v]...
For machine types that have their attribute information hidden, all columns except the filename column are left blank for each row.
Recording procedure
Normal/anomalous operating sounds of machines and its related equipment are recorded. Anomalous sounds were collected by deliberately damaging target machines. For simplifying the task, we use only the first channel of multi-channel recordings; all recordings are regarded as single-channel recordings of a fixed microphone. We mixed a target machine sound with environmental noise, and only noisy recordings are provided as training/test data. The environmental noise samples were recorded in several real factory environments. We will publish papers on the dataset to explain the details of the recording procedure by the submission deadline.
Directory structure
/eval_data
Baseline system
The baseline system is available on the Github repository . The baseline systems provide a simple entry-level approach that gives a reasonable performance in the dataset of Task 2. They are good starting points, especially for entry-level researchers who want to get familiar with the anomalous-sound-detection task.
Condition of use
This dataset was created jointly by Hitachi, Ltd., NTT Corporation and STMicroelectronics and is available under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.
Citation
Contact
If there is any problem, please contact us:
Tomoya Nishida, tomoya.nishida.ax@hitachi.com
Keisuke Imoto, keisuke.imoto@ieee.org
Noboru Harada, noboru@ieee.org
Daisuke Niizumi, daisuke.niizumi.dt@hco.ntt.co.jp
Yohei Kawaguchi, yohei.kawaguchi.xk@hitachi.com
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The description section is crucial for helping users understand the purpose, context, and potential applications of your dataset. It should include the following details:
This section provides details about the files included in your dataset, helping users navigate and use them efficiently. Key points to include:
mars_rover_dataset.csv (CSV file containing metadata of images) mars_images.zip (Compressed folder containing all images) img_src column in mars_rover_dataset.csv corresponds to the images stored in mars_images.zip. Users should extract the images before using the dataset for model training." bash
unzip mars_images.zipThis section explains the meaning of each column in the dataset, ensuring users can analyze and interpret the data correctly. A well-structured table format is often useful:
| Column Name | Description |
|---|---|
id | Unique identifier for each image. |
sol | Martian sol (day) when the image was captured. |
camera_name | Abbreviated name of the rover's camera (e.g., "FHAZ" for Front Hazard Camera). |
camera_full_name | Full descriptive name of the camera. |
img_src | URL link to the image. Users can download images using this link. |
earth_date | The Earth date corresponding to the Martian sol. |
rover_name | Name of the rover that captured the image (e.g., "Curiosity"). |
rover_status | Current operational status of the rover (e.g., "Active" or "Complete"). |
landing_date | Date when the rover landed on Mars. |
launch_date | Date when the rover was launched from Earth. |
earth_date is in YYYY-MM-DD format. This section helps users quickly understand the dataset's structure, making it easier for them to work with the data effectively.