Facebook
TwitterThe dataset consists of public domain acute and chronic toxicity and chemistry data for algal species. Data are accessible at: https://envirotoxdatabase.org/ Data include algal species, chemical identification, and the concentrations that do and do not affect algal growth.
Facebook
Twitterhttps://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Dataset Card for The Stack Metadata
Changelog
Release Description
v1.1 This is the first release of the metadata. It is for The Stack v1.1
v1.2 Metadata dataset matching The Stack v1.2
Dataset Summary
This is a set of additional information for repositories used for The Stack. It contains file paths, detected licenes as well as some other information for the repositories.
Supported Tasks and Leaderboards
The main task is to recreate… See the full description on the dataset page: https://huggingface.co/datasets/bigcode/the-stack-metadata.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Title: IMDB & TMDB Movie Metadata Big Dataset (>1M)
Subtitle: A Comprehensive Dataset Featuring Detailed Metadata of Movies (IMDB, TMDB). Over 1M Rows & 42 Features: Metadata, Ratings, Genres, Cast, Crew, Sentiment Analysis and many more...
Detailed Description:
Overview: This comprehensive dataset merges the extensive film data available from both IMDB and TMDB, offering a rich resource for movie enthusiasts, data scientists, and researchers. With over 1 million rows and 42 detailed features, this dataset provides in-depth information about a wide variety of movies, spanning different genres, periods, and production backgrounds.
File Information: 1. File Size: ≈ 1GB 2. Format: CSV (Comma-Separated Values)
Column Descriptors/Key Features: 1. ID: Unique identifier for each movie. 2. Title: The official title of the movie. 3. Vote Average: Average rating received by the movie. 4. Vote Count: Number of votes the movie has received. 5. Status: Current status of the movie (e.g., Released, Post-Production). 6. Release Date: Official release date of the movie. 7. Revenue: Box office revenue generated by the movie. 8. Runtime: Duration of the movie in minutes. 9. Adult: Indicates if the movie is for adults. 10. Genres: List of genres the movie belongs to. 11. Overview Sentiment: Sentiment analysis of the movie's overview text. 12. Cast: List of main actors in the movie. 13. Crew: List of key crew members, including directors, producers, and writers. 14. Genres List: Detailed genres in list format. 15. Keywords: List of relevant keywords associated with the movie. 16. Director of Photography: Name of the cinematographer. 17. Producers: Names of the producers. 18. Music Composer: Name of the music composer.
Additional Features:
Potential Use Cases: - Sentiment Analysis: Analyze audience sentiment towards movies based on reviews and ratings. - Recommendation Systems: Build models to recommend movies based on user preferences and viewing history. - Market Analysis: Study trends in the movie industry, including genre popularity and revenue patterns. - Content Analysis: Investigate the thematic content and diversity of movies over time. - Data Visualization: Create visual representations of movie data to uncover hidden insights.
Facebook
Twitterhttps://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
This comprehensive dataset features detailed metadata for over 190,000 movies and TV shows, with a strong concentration in the Horror genre. It is ideal for entertainment research, machine learning models, genre-specific trend analysis, and content recommendation systems.
Each record contains rich information, making it perfect for streaming platforms, film industry analysts, or academic media researchers.
Primary Genre Focus: Horror
Build movie recommendation systems or genre classifiers
Train NLP models on movie descriptions
Analyze Horror content trends over time
Explore box office vs. rating correlations
Enrich entertainment datasets with directorial and cast metadata
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Information on samples submitted for RNAseq
Rows are individual samples
Columns are: ID Sample Name Date sampled Species Sex Tissue Geographic location Date extracted Extracted by Nanodrop Conc. (ng/µl) 260/280 260/230 RIN Plate ID Position Index name Index Seq Qubit BR kit Conc. (ng/ul) BioAnalyzer Conc. (ng/ul) BioAnalyzer bp (region 200-1200) Submission reference Date submitted Conc. (nM) Volume provided PE/SE Number of reads Read length
Facebook
Twittersumuks/arxiv-metadata-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterFor each sample, the number of reads and the mean coverage are indicated. Each sample metadata included the percentage of the genome covered by at least 1X (% > 1X), 5X (% > 5X), and 10X (% > 10X) sequencing depth. When available, the latitude and longitude are specified. NA: information not available. (XLSX)
Facebook
TwitterDataset includes CMAQ predicted results. This dataset is not publicly accessible because: Shanghai Jiao Tong University created the dataset - EPA does not have the dataset. It can be accessed through the following means: Contact - Ping Liu, School of Environmental Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China, email: ping_liu@sjtu.edu.cn. Format: Dataset includes CMAQ output files using netcdf format. This dataset is associated with the following publication: Chen, H., P. Liu, Q. Wang, R. Huang, and G. Sarwar. Impact and pathway of halogens on atmospheric oxidants in coastal city clusters in the Yangtze River Delta region in China. Atmospheric Pollution Research. Turkish National Committee for Air Pollution Research and Control, Izmir, TURKEY, 15(2): N/A, (2024).
Facebook
TwitterStores physical and logical information about relational databases and record structures to assist in data identification and management.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset outlines a proposed set of core, minimal metadata elements that can be used to describe biomedical datasets, such as those resulting from research funded by the National Institutes of Health. It can inform efforts to better catalog or index such data to improve discoverability. The proposed metadata elements are based on an analysis of the metadata schemas used in a set of NIH-supported data sharing repositories. Common elements from these data repositories were identified, mapped to existing data-specific metadata standards from to existing multidisciplinary data repositories, DataCite and Dryad, and compared with metadata used in MEDLINE records to establish a sustainable and integrated metadata schema. From the mappings, we developed a preliminary set of minimal metadata elements that can be used to describe NIH-funded datasets. Please see the readme file for more details about the individual sheets within the spreadsheet.
Facebook
TwitterMetadata for the data collected at the NEES@UCSB Garner Valley Downhole Array field site on September 10-12, 2013 as part of the larger PoroTomo project.
Facebook
Twitterada-datadruids/movie-metadata dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterThe OpenScience Slovenia metadata dataset contains metadata entries for Slovenian public domain academic documents which include undergraduate and postgraduate theses, research and professional articles, along with other academic document types. The data within the dataset was collected as a part of the establishment of the Slovenian Open-Access Infrastructure which defined a unified document collection process and cataloguing for universities in Slovenia within the infrastructure repositories. The data was collected from several already established but separate library systems in Slovenia and merged into a single metadata scheme using metadata deduplication and merging techniques. It consists of text and numerical fields, representing attributes that describe documents. These attributes include document titles, keywords, abstracts, typologies, authors, issue years and other identifiers such as URL and UDC. The potential of this dataset lies especially in text mining and text classification tasks and can also be used in development or benchmarking of content-based recommender systems on real-world data.
Facebook
TwitterThis is the metadata from DICOM files for UNIFESP X-ray Body Part Competition in csv format
Competition and original dataset:
https://www.kaggle.com/competitions/unifesp-x-ray-body-part-classifier/
Acknowledgements We thank Sarah Lustosa Haiek, Julia Tagliaferri, Lucas Diniz, and Rogerio Jadjiski for annotating this dataset. We thank the PI Nitamar Abdala, MD, PhD, for supporting this work. We thank Ernandez, our PACS admin, and Jefferson, our IT manager. We thank MD.ai for providing the annotation platform.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Metadata form template for Tempe Open Data.
Facebook
TwitterThis dataset was created by Nicole Wong98
Facebook
TwitterA dataset containing the metadata for all openly published datasets on the SP Energy Networks Open Data Portal. All metadata conforms to the Dublin Core metadata standard - a set of 15 'core' elements. Download dataset metadata (JSON)If you wish to provide feedback at a dataset or row level, please click on the “Feedback” tab above.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This master metadata spreadsheet documents all of the Gede ruins heritage items published by the Zamani Project.The information in this site description is provided for contextual purposes only and should not be regarded as a primary source.Gede is a Swahili archaeological site comprising coral stone structures, including mosques, houses, and tombs arranged within a walled town layout. Architectural features such as mihrabs, water cisterns, and decorative niches reflect Islamic influence and urban planning. Excavations have revealed trade goods and domestic artifacts, indicating participation in Indian Ocean commerce. Gede provides insights into Swahili cultural identity, religious practice, and economic networks.Gede is listed as the UNESCO World Heritage Site, 'The Historic Town and Archaeological Site of Gedi'.The Zamani Project seeks to increase awareness and knowledge of tangible cultural heritage in Africa and internationally by creating metrically accurate digital representations of historical sites. Digital spatial data of cultural heritage sites can be used for research and education, for restoration and conservation, and as a record for future generations. The Zamani Project operates as a non-profit organisation within the University of Cape Town.Special thanks to the Saville Foundation, and the Andrew W. Mellon Foundation, among others, for their contributions to the digital documentation of this heritage site.If you believe any information in this description is incorrect, please contact the repository administrators.
Facebook
TwitterThe GOLD (Genomes OnLine Database)is a resource for centralized monitoring of genome and metagenome projects worldwide. It stores information on complete and ongoing projects, along with their associated metadata. This collection references metadata associated with samples.
Facebook
TwitterThis file describes where to find the dataset used for this paper (PurpleAir and AQS) and the data fields used in the analysis. Contact the corresponding author for access to the code used to generate the dataset. This dataset is associated with the following publication: deSouza, P., K. Barkjohn, A. Clements, J. Lee, R. Kahn, and B. Crawford. An analysis of degradation in low-cost particulate matter sensors. Environmental Science: Atmospheres. Royal Society of Chemistry, Cambridge, UK, NA, (2023).
Facebook
TwitterThe dataset consists of public domain acute and chronic toxicity and chemistry data for algal species. Data are accessible at: https://envirotoxdatabase.org/ Data include algal species, chemical identification, and the concentrations that do and do not affect algal growth.