Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Metadata Database for Danbooru2023
Danbooru 2023 datasets: https://huggingface.co/datasets/nyanko7/danbooru2023 The latest entry of this database is id 7,866,491. Which is newer than nyanko7's dataset. This dataset contains a sqlite db file which have all the tags and posts metadata in it. The Peewee ORM config file is provided too, plz check it for more information. (Especially on how I link posts and tags together) The original data is from the official dump of the posts info.… See the full description on the dataset page: https://huggingface.co/datasets/KBlueLeaf/danbooru2023-metadata-database.
Facebook
TwitterThe Tethys database houses the metadata associated with the acoustic data collection efforts by the Passive Acoustic Group. These metadata include dates, locations and sampling rate, among other things. The database platform itself was developed by colleagues at San Diego State University, and is freely available and open source. See citation details for website link.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Searchable Index of Metadata Aggregators is a database that stores general information of metadata aggregators. This database is accompanied with the “A WDS guide to Metadata Aggregators for Repository Managers”. The Searchable Index of Metadata Aggregators is an up-to-date catalogue of Dataset Metadata Aggregators (DMAs), implemented as an access database. It was designed to fill in a gap found by the Harvestable Metadata Services Working Group (HMetS-WG) members of the World Data System’s International Technology Office (WDS-ITO). These include up-to-date resources giving an overview of current infrastructures used to syndicate dataset metadata. The database contains information on DMA's supported metadata standards and software interfaces, as well as documentation on how to be aggregated by each.
The WDS Guide to Metadata Aggregators is a guidance document for the associated Searchable Index of Metadata Aggregators. We have defined DMAs as federated service infrastructures that foster the findability and accessibility of data products by enabling access to multiple, distributed metadata records via a single search interface. This guide gives a description of this catalogue and general guidance on how to use it. In the sections that follow, we give a short background to the Harvestable Metadata Services-Working Group project. Then, we outline the project's research methodology and the properties of the searchable index. Finally, we discuss this project's limitations, as well as its future development. Providing metadata to aggregators can significantly improve the findability of research data products.
Together, this guidance document and dataset package are designed to provide research data repository managers with options for participation in federated research data systems, and support institutional repositories' harvestable metadata service implementation strategies. In addition, as developers in the global research data management community seek to create pathways and workflows across data, software and compute resources, we anticipate that they're likely to prioritize connecting sites, organizations and services that have already done a lot of work harmonizing content from disparate providers. In this context, this resource will be helpful for creating roadmaps and implementation plans for integration across science clouds.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is part of the database compiled as an outcome of Work Area 1 in project OrganicYieldsUP. This Excel file describes the content of the OYUP relational database for each table and column.
The main sheet "table_schema_oyup" contains:
| ORDINAL_POSITION_per_table | Unique row number for each table. Can be used for sorting. |
| TABLE_NAME | Name of Table being part of the relational database. |
| COLUMN_NAME | Name of Column in the respective table. |
| COLUMN DESCRIPTION | Description of the Column content. |
| DATA_TYPE | SQlite data type of the Column. |
| TABLE_COLUMN_ID | Letter-based ID for the Column that was used during data upload into the database. Can be used to link gap filling information to the gap filled indicator. |
The additional sheet "quality_indicator_description" contains:
| PARAMETER | Quality indicator (unit) reported |
| CROP | Crop that the quality indicator refers to |
| DESCRIPTION | Description of the quality indicator (unit) |
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Unique values and counts of metadata subject fields.
Facebook
TwitterStores physical and logical information about relational databases and record structures to assist in data identification and management.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The SEABORNE (Sustainable UsE And Benefits fOR mariNE) has consolidated and synthesised existing information about who is using the Reef, how it is being used and what the benefits are from this use. SEABORNE began in November 2021, and initially, we were provided with a list of potential datasets relevant to our project in a spreadsheet. To this, we continued to search various data portals online and find additional datasets relevant to our project, particularly focusing on the Great Barrier Reef. We recorded these initially in an Excel spreadsheet. We then transferred this to an MS Access database and developed a more user-friendly entry form. Within the MS Access database, there is one table that stores all the metadata records entered. And another table that stores the static preview images. There are 58 fields (which have been described in a data dictionary) – some of these are mandatory. At the moment there are 3 metadata records entered and we expect this to grow to 50-100 records by the completion of the project. Lineage: Data was produced by examining each of the datasets metadata and documenting various features of each of the individual datasets and how useful they were for examining ecosystem services. Data was initially entered in excel, then migrated to MS Access database, and then imported or read in by SHiny R app.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Metadata for the 16,012 microbial samples included in this database. Metadata was collated from the originally published studies, available supplementary information, and from online databases
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
A manually curated registry of standards, split into three types - Terminology Artifacts (ontologies, e.g. Gene Ontology), Models and Formats (conceptual schema, formats, data models, e.g. FASTA), and Reporting Guidelines (e.g. the ARRIVE guidelines for in vivo animal testing). These are linked to the databases that implement them and the funder and journal publisher data policies that recommend or endorse their use.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is related to the manuscript "An empirical meta-analysis of the life sciences linked open data on the web" published at Nature Scientific Data. If you use the dataset, please cite the manuscript as follows:Kamdar, M.R., Musen, M.A. An empirical meta-analysis of the life sciences linked open data on the web. Sci Data 8, 24 (2021). https://doi.org/10.1038/s41597-021-00797-yWe have extracted schemas from more than 80 publicly available biomedical linked data graphs in the Life Sciences Linked Open Data (LSLOD) cloud into an LSLOD schema graph and conduct an empirical meta-analysis to evaluate the extent of semantic heterogeneity across the LSLOD cloud. The dataset published here contains the following files:- The set of Linked Data Graphs from the LSLOD cloud from which schemas are extracted.- Refined Sets of extracted classes, object properties, data properties, and datatypes, shared across the Linked Data Graphs on LSLOD cloud. Where the schema element is reused from a Linked Open Vocabulary or an ontology, it is explicitly indicated.- The LSLOD Schema Graph, which contains all the above extracted schema elements interlinked with each other based on the underlying content. Sample instances and sample assertions are also provided along with broad level characteristics of the modeled content. The LSLOD Schema Graph is saved as a JSON Pickle File. To read the JSON object in this Pickle file use the Python command as follows:with open('LSLOD-Schema-Graph.json.pickle' , 'rb') as infile: x = pickle.load(infile, encoding='iso-8859-1')Check the Referenced Link for more details on this research, raw data files, and code references.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Catalogue of data, metadata and databases that PoliS-Lombardia manages on behalf of the Region pursuant to Article 52(1) of the Digital Administration Code.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The database was developed as part of a research project investigating the use and adoption of metadata standards for UAV (Uncrewed Aerial Vehicle) data. It compiles a list of published datasets containing UAV data or products generated based on UAV data identified through a systematic search of public data repositories. The search covered established data platforms, including DANS, 4TU.ResearchData, DataONE Science Data Bank, DRYAD, Figshare and Zenodo. In addition, a broader internet search using search engines such as Google, DuckDuckGo, Bing, and Perplexity was conducted to identify other publicly accessible UAV datasets. Only datasets with a persistent identifier, such as a DOI (Digital Object Identifier), were included.
Facebook
TwitterThis data describe the abundance of individual lichen species across the U.S. as recorded in the Forest Health and Monitoring dataset of the Forest Inventory and Analysis program (i.e. Phase 3 plots). This dataset is not publicly accessible because: These data are already housed on the USFS Forest Inventory and Analysis site (see below). It can be accessed through the following means: The lichen data for this product are from the USDA Forest Services (USFS) Forest Inventory and Analysis (FIA) Phase 3 (P3) dataset - Forest Health and Monitoring. The metadata and database description for the FIA-P3 is here (https://www.fia.fs.fed.us/library/database-documentation/). The data itself is located at the USFS Data Mart here (https://apps.fs.usda.gov/fia/datamart/CSV/datamart_csv.html) in two files: “LICHEN_PLOT_SUMMARY.zip,” and “LICHEN_VISIT.zip.” Point of contact: Linda Geiser, lgeiser@fs.fed.us. Format: The data are in .csv format.
Facebook
Twitterhttp://data.europa.eu/eli/dec/2011/833/ojhttp://data.europa.eu/eli/dec/2011/833/oj
Each year the Council publishes an annual report on the implementation of regulation 1049/2001 on access to documents. The annual report contains statistical information on the requests for public access received by the Council. With the exception of personal data, information on such requests is public.
This dataset contains the following information on the requests for public access to documents received by the Council:
General information on the applicant (anonymous): professional activity of applicant; geographic origin
General Information on the request: request number; type of request (initial request, confirmatory application); date of request; deadline to reply; extended deadline to reply; date of reply; effort spent; follow-up; policy area(s).
Information on the requested documents: publication status (public or not); type of reply; document category; document number
Facebook
TwitterA description of biological and ecological data of the Danube delta lakes and channels is presented. The biological indicators refer to aquatic macrophytes, fish, zoo-plankton, and macro-invertebrates. Environmental data include physio-chemical data as well as hydrological parameters. More information on this dataset can be found in the Freshwater Metadatabase - MARS_12 (http://www.freshwatermetadata.eu/metadb/bf_mdb_view.php?entryID=MARS_12
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time-based metadata formatted for TimelineJS or other applications.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This file is the metadata associated with all the genomic annotation curation in the Wormbiome collection.
The Wormbiome collection is an online database dedicated to centralizing all the information related to bacteria associated with C. elegans. More information on wormbiome.org
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In most democracies, the public record of legislative votes in national and local parliaments is an important basis for holding elected officials accountable. In political science, that record is also an important source of data on legislator and party behavior. In practice, many legislatures create a public record of the votes cast by individual legislators for only a fraction of the issues on which votes occur. These recorded votes often are not a representative sample of all votes cast and may exhibit systematic biases that have implications for political accountability and for the science of political behavior. Therefore, understanding the characteristics of the issues that receive a publicly recorded vote (a roll-call vote) is essential to our understanding of democratic processes and evaluating the limits of scientific inferences that can be drawn from roll-call data. This data set advances our understanding of the voting record through examination of national parliamentary bodies around the world.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Unique values and counts of metadata facet fields.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
🎬 Overview
This dataset contains a cleaned and structured collection of movie metadata sourced from The Movie Database (TMDB), covering films released between 1900 and 2025. It includes over 946,000 movies with detailed information such as genres, production companies, budgets, revenues, popularity, ratings, and more.
This dataset is ideal for data science, analytics, and machine learning projects related to the film industry — including trend analysis, box office prediction, and recommendation systems.
| Column | Description |
|---|---|
| id | Unique movie identifier |
| title | Official movie title |
| adult | Boolean flag indicating adult content |
| original_language | Original spoken language (ISO 639-1 code) |
| origin_country | List of production countries |
| release_date | Movie release date |
| genre_names | List of genres associated with the movie |
| production_company_names | Names of involved production companies |
| budget | Reported production budget (USD) |
| revenue | Worldwide gross revenue (USD) |
| runtime | Duration in minutes |
| popularity | Popularity score (as provided by TMDB) |
| vote_average | Average user rating |
| vote_count | Number of votes received |
🧠 Potential Use Cases
🎥 Movie trend analysis across decades
💰 Budget vs. revenue ROI exploration
⭐ Predictive modeling for ratings or popularity
🌍 Cross-cultural film analysis by countries and languages
🧩 Recommender systems and content-based filtering projects
⚙️ Data Source & Attribution
The data in this dataset was collected and preprocessed using the TMDB API. All movie information is © TMDB — provided under their Terms of Use .
This dataset is not endorsed or certified by TMDB. Users must comply with TMDB’s attribution and API usage policies when using this data.
🙌 Acknowledgements
Special thanks to The Movie Database (TMDB) for providing open access to their rich movie metadata. Dataset cleaned, organized, and published by Mustafa Sayed Said 🧑💻.
🏷️ Tags
movies film cinema tmdb data-cleaning machine-learning dataset EDA entertainment analytics
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Metadata Database for Danbooru2023
Danbooru 2023 datasets: https://huggingface.co/datasets/nyanko7/danbooru2023 The latest entry of this database is id 7,866,491. Which is newer than nyanko7's dataset. This dataset contains a sqlite db file which have all the tags and posts metadata in it. The Peewee ORM config file is provided too, plz check it for more information. (Especially on how I link posts and tags together) The original data is from the official dump of the posts info.… See the full description on the dataset page: https://huggingface.co/datasets/KBlueLeaf/danbooru2023-metadata-database.