86 datasets found

BLM NV PLSS CADNSDI Version 2 Metadata Glance Polygon
catalog.data.gov
data.amerigeoss.org
+2more
Updated Jul 9, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bureau of Land Management (2025). BLM NV PLSS CADNSDI Version 2 Metadata Glance Polygon [Dataset]. https://catalog.data.gov/dataset/blm-nv-plss-cadnsdi-version-2-metadata-glance-polygon-b3b5c
Explore at:
Dataset updated
Jul 9, 2025
Dataset provided by
Bureau of Land Managementhttp://www.blm.gov/
Description
BLM NV PLSS Metadata Glance: MetadataGlance provides PLSS data steward content for individual PLSS units.This dataset represents the GIS Version of the Public Land Survey System including both rectangular and non-rectangular surveys. The primary source for the data is cadastral survey records housed by the BLM supplemented with local records and geographic control coordinates from states, counties as well as other federal agencies such as the USGS and USFS. The data has been converted from source documents to digital form and transferred into a GIS format that is compliant with FGDC Cadastral Data Content Standards and Guidelines for publication. This data is optimized for data publication and sharing rather than for specific "production" or operation and maintenance. This data set includes the following: PLSS Fully Intersected (all of the PLSS feature at the atomic or smallest polygon level), PLSS Townships, First Divisions and Second Divisions (the hierarchical break down of the PLSS Rectangular surveys) PLSS Special surveys (non rectangular components of the PLSS) Meandered Water, Corners and Conflicted Areas (known areas of gaps or overlaps between Townships or state boundaries). The Entity-Attribute section of this metadata describes these components in greater detail.
f
Data from "Obstacles to the Reuse of Study Metadata in ClinicalTrials.gov"
figshare.com
zip
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Laura Miron; Rafael Gonçalves; Mark A. Musen (2023). Data from "Obstacles to the Reuse of Study Metadata in ClinicalTrials.gov" [Dataset]. http://doi.org/10.6084/m9.figshare.12743939.v2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12743939.v2
Dataset updated
Jun 1, 2023
Dataset provided by
figshare
Authors
Laura Miron; Rafael Gonçalves; Mark A. Musen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This fileset provides supporting data and corpora for the empirical study described in: Laura Miron, Rafael S. Goncalves and Mark A. Musen. Obstacles to the Reuse of Metadata in ClinicalTrials.govDescription of filesOriginal data files:- AllPublicXml.zip contains the set of all public XML records in ClinicalTrials.gov (protocols and summary results information), on which all remaining analyses are based. Set contains 302,091 records downloaded on April 3, 2019.- public.xsd is the XML schema downloaded from ClinicalTrials.gov on April 3, 2019, used to validate records in AllPublicXML.BioPortal API Query Results- condition_matches.csv contains the results of querying the BioPortal API for all ontology terms that are an 'exact match' to each condition string scraped from the ClinicalTrials.gov XML. Columns={filename, condition, url, bioportal term, cuis, tuis}. - intervention_matches.csv contains BioPortal API query results for all interventions scraped from the ClinicalTrials.gov XML. Columns={filename, intervention, url, bioportal term, cuis, tuis}.Data Element Definitions- supplementary_table_1.xlsx Mapping of element names, element types, and whether elements are required in ClinicalTrials.gov data dictionaries, the ClinicalTrials.gov XML schema declaration for records (public.XSD), the Protocol Registration System (PRS), FDAAA801, and the WHO required data elements for clinical trial registrations.Column and value definitions: - CT.gov Data Dictionary Section: Section heading for a group of data elements in the ClinicalTrials.gov data dictionary (https://prsinfo.clinicaltrials.gov/definitions.html) - CT.gov Data Dictionary Element Name: Name of an element/field according to the ClinicalTrials.gov data dictionaries (https://prsinfo.clinicaltrials.gov/definitions.html) and (https://prsinfo.clinicaltrials.gov/expanded_access_definitions.html) - CT.gov Data Dictionary Element Type: "Data" if the element is a field for which the user provides a value, "Group Heading" if the element is a group heading for several sub-fields, but is not in itself associated with a user-provided value. - Required for CT.gov for Interventional Records: "Required" if the element is required for interventional records according to the data dictionary, "CR" if the element is conditionally required, "Jan 2017" if the element is required for studies starting on or after January 18, 2017, the effective date of the FDAAA801 Final Rule, "-" indicates if this element is not applicable to interventional records (only observational or expanded access) - Required for CT.gov for Observational Records: "Required" if the element is required for interventional records according to the data dictionary, "CR" if the element is conditionally required, "Jan 2017" if the element is required for studies starting on or after January 18, 2017, the effective date of the FDAAA801 Final Rule, "-" indicates if this element is not applicable to observational records (only interventional or expanded access) - Required in CT.gov for Expanded Access Records?: "Required" if the element is required for interventional records according to the data dictionary, "CR" if the element is conditionally required, "Jan 2017" if the element is required for studies starting on or after January 18, 2017, the effective date of the FDAAA801 Final Rule, "-" indicates if this element is not applicable to expanded access records (only interventional or observational) - CT.gov XSD Element Definition: abbreviated xpath to the corresponding element in the ClinicalTrials.gov XSD (public.XSD). The full xpath includes 'clinical_study/' as a prefix to every element. (There is a single top-level element called "clinical_study" for all other elements.) - Required in XSD? : "Yes" if the element is required according to public.XSD, "No" if the element is optional, "-" if the element is not made public or included in the XSD - Type in XSD: "text" if the XSD type was "xs:string" or "textblock", name of enum given if type was enum, "integer" if type was "xs:integer" or "xs:integer" extended with the "type" attribute, "struct" if the type was a struct defined in the XSD - PRS Element Name: Name of the corresponding entry field in the PRS system - PRS Entry Type: Entry type in the PRS system. This column contains some free text explanations/observations - FDAAA801 Final Rule FIeld Name: Name of the corresponding required field in the FDAAA801 Final Rule (https://www.federalregister.gov/documents/2016/09/21/2016-22129/clinical-trials-registration-and-results-information-submission). This column contains many empty values where elements in ClinicalTrials.gov do not correspond to a field required by the FDA - WHO Field Name: Name of the corresponding field required by the WHO Trial Registration Data Set (v 1.3.1) (https://prsinfo.clinicaltrials.gov/trainTrainer/WHO-ICMJE-ClinTrialsgov-Cross-Ref.pdf)Analytical Results:- EC_human_review.csv contains the results of a manual review of random sample eligibility criteria from 400 CT.gov records. Table gives filename, criteria, and whether manual review determined the criteria to contain criteria for "multiple subgroups" of participants.- completeness.xlsx contains counts and percentages of interventional records missing fields required by FDAAA801 and its Final Rule.- industry_completeness.xlsx contains percentages of interventional records missing required fields, broken up by agency class of trial's lead sponsor ("NIH", "US Fed", "Industry", or "Other"), and before and after the effective date of the Final Rule- location_completeness.xlsx contains percentages of interventional records missing required fields, broken up by whether record listed at least one location in the United States and records with only international location (excluding trials with no listed location), and before and after the effective date of the Final RuleIntermediate Results:- cache.zip contains pickle and csv files of pandas dataframes with values scraped from the XML records in AllPublicXML. Downloading these files greatly speeds up running analysis steps from jupyter notebooks in our github repository.
c
Movies & TV Shows Metadata Dataset (190K+ Records, Horror-Heavy Collection)
crawlfeeds.com
csv, zip
Updated Jun 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Movies & TV Shows Metadata Dataset (190K+ Records, Horror-Heavy Collection) [Dataset]. https://crawlfeeds.com/datasets/movies-tv-shows-metadata-dataset-190k-records-horror-heavy-collection
Explore at:
zip, csvAvailable download formats
Dataset updated
Jun 22, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
This comprehensive dataset features detailed metadata for over 190,000 movies and TV shows, with a strong concentration in the Horror genre. It is ideal for entertainment research, machine learning models, genre-specific trend analysis, and content recommendation systems.

Each record contains rich information, making it perfect for streaming platforms, film industry analysts, or academic media researchers.

Primary Genre Focus: Horror

Use Cases:

Build movie recommendation systems or genre classifiers

Train NLP models on movie descriptions

Analyze Horror content trends over time

Explore box office vs. rating correlations

Enrich entertainment datasets with directorial and cast metadata
Data from: Metadata capital in a data repository
zenodo.org
data.niaid.nih.gov
+1more
csv, txt
Updated May 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jane Greenberg; Shea Swauger; Elena M. Feinstein; Jane Greenberg; Shea Swauger; Elena M. Feinstein (2022). Data from: Metadata capital in a data repository [Dataset]. http://doi.org/10.5061/dryad.8c1p6
Explore at:
txt, csvAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.8c1p6
Dataset updated
May 30, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jane Greenberg; Shea Swauger; Elena M. Feinstein; Jane Greenberg; Shea Swauger; Elena M. Feinstein
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This paper reports on a study exploring 'metadata capital' acquired via metadata reuse. Collaborative modeling and content analysis methods were used to study metadata capital in the Dryad data repository. A sample of 20 cases for two Dryad metadata workflows (Case A and Case B) consisting of 100 instantiations (60 metadata objects, 40 metadata activities) was analyzed. Results indicate that Dryad's overall workflow builds metadata capital, with the total metadata reuse at 50% or greater for 8 of 12 metadata properties, and 5 of these 8 properties showing reuse at 80% or higher. Metadata reuse is frequent for basic bibliographic properties (e.g., author, title, subject), although it is limited or absent for more complex scientific properties (e.g., taxon, spatial, and temporal information). This paper provides background context, reports the research approach and findings, and considers research implications and system design priorities that may contribute to metadata capital—long term.
Open Data Portal Catalogue
open.canada.ca
datasets.ai
+1more
csv, json, jsonl, png +2
Updated Jul 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Treasury Board of Canada Secretariat (2025). Open Data Portal Catalogue [Dataset]. https://open.canada.ca/data/en/dataset/c4c5c7f1-bfa6-4ff6-b4a0-c164cb2060f7
Explore at:
csv, sqlite, json, png, jsonl, xlsxAvailable download formats
Dataset updated
Jul 13, 2025
Dataset provided by
Treasury Board of Canadahttps://www.canada.ca/en/treasury-board-secretariat/corporate/about-treasury-board.html
Treasury Board of Canada Secretariathttp://www.tbs-sct.gc.ca/
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Description
The open data portal catalogue is a downloadable dataset containing some key metadata for the general datasets available on the Government of Canada's Open Data portal. Resource 1 is generated using the ckanapi tool (external link) Resources 2 - 8 are generated using the Flatterer (external link) utility. ###Description of resources: 1. Dataset is a JSON Lines (external link) file where the metadata of each Dataset/Open Information Record is one line of JSON. The file is compressed with GZip. The file is heavily nested and recommended for users familiar with working with nested JSON. 2. Catalogue is a XLSX workbook where the nested metadata of each Dataset/Open Information Record is flattened into worksheets for each type of metadata. 3. datasets metadata contains metadata at the dataset level. This is also referred to as the package in some CKAN documentation. This is the main table/worksheet in the SQLite database and XLSX output. 4. Resources Metadata contains the metadata for the resources contained within each dataset. 5. resource views metadata contains the metadata for the views applied to each resource, if a resource has a view configured. 6. datastore fields metadata contains the DataStore information for CSV datasets that have been loaded into the DataStore. This information is displayed in the Data Dictionary for DataStore enabled CSVs. 7. Data Package Fields contains a description of the fields available in each of the tables within the Catalogue, as well as the count of the number of records each table contains. 8. data package entity relation diagram Displays the title and format for column, in each table in the Data Package in the form of a ERD Diagram. The Data Package resource offers a text based version. 9. SQLite Database is a .db database, similar in structure to Catalogue. This can be queried with database or analytical software tools for doing analysis.
o
Making the case for FAIR Data Points
explore.openaire.eu
Updated Apr 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Angus Whyte; Ryan O'Connor; Josefine Nordling (2022). Making the case for FAIR Data Points [Dataset]. http://doi.org/10.5281/zenodo.6256839
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.6256839
Dataset updated
Apr 8, 2022
Authors
Angus Whyte; Ryan O'Connor; Josefine Nordling
Description
As a service manager how may I assist my organisation to make research data we hold both FAIR and “as open as possible, as closed as necessary”? The FAIR Data Point is a protocol for (meta)data provision championed by GO-FAIR as a solution to this need. In this story we describe how two organisations have applied the FAIR Data Point (FDP) to provide FAIR data or metadata in two contexts. In Leiden University Medical Centre the FDP is used to make metadata about COVID patient data as open as possible in the interest of research, while the data is necessarily closed and held in a variety of different systems. By contrast, Dutch data service provider SURF is applying the FDP to improve the FAIRness of an extensive dataset repository that is openly accessible by default. Based on interviews with the lead protagonists in both organisations' FDP implementations we compare their rationales and approaches, and how they expect this FAIR-enabling technology to benefit their user communities.
excell file with metadata sheet and data from the PFQ and BPC paper
catalog.data.gov
Updated Nov 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). excell file with metadata sheet and data from the PFQ and BPC paper [Dataset]. https://catalog.data.gov/dataset/excell-file-with-metadata-sheet-and-data-from-the-pfq-and-bpc-paper
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
this file has the metadata sheet and the data used for the figures and tables in the PFQ vs BPC manuscript. This dataset is associated with the following publication: Gray, E., J. Furr, J. Conley, C. Lambright, N. Evans, M. Cardon, V. Wilson, P. Foster, and P. Hartig. A Conflicted Tale of Two Novel AR Antagonists In vitro and In vivo: Pyrifluquinazon versus Bisphenol C.. TOXICOLOGICAL SCIENCES. Society of Toxicology, RESTON, VA, 632-643, (2019).
Z
Standard Sample Description V2 Structural Metadata
data.niaid.nih.gov
zenodo.org
Updated Feb 3, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
European Food Safety Authority (2020). Standard Sample Description V2 Structural Metadata [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1215986
Explore at:
Dataset updated
Feb 3, 2020
Dataset authored and provided by
European Food Safety Authority
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Standard Sample Description V2 is a specification aimed at harmonising the collection of analytical measurement data for the presence of harmful or beneficial chemical substances in food, feed and water. The specification is a list of standardised data elements (items describing characteristics of samples or analytical results such as country of origin, product, analytical method, limit of detection, result, etc.), linked to controlled terminologies. This specification uses EFSA FoodEx2 to describe sampled foods.

This file has been prepared to support the publication of data and interoperability. This file indicates which data elements from the specification will not be published to ensure full protection of confidential/sensitive information, for example personal data in accordance with Regulation (EC) No 45/2001 and to protect commercial interests, including intellectual property as specified in Article 4(2), first indent, of Regulation (EC) No 1049/2001.

The Excel table contains information about the structural metadata elements of the data collection and their fact tables.

The column name shows the name of the element (e.g. localOrg). The column description describes how the content has to be interpreted. The column code expresses the corresponding code of the structural metadata element. The column optional says whether the structural metadata element is optional or not (then it is mandatory). The column dataType contains the type which can be used to fill the structural metadata element and the possible maximal length of the field. The possible types are: text or number. The column catalogue contains the name of the catalogue where the content of the structural metadata element has to be picked from (e.g. COUNTRY). The column data protection contains whether the structural metadata element will be published or not (yes = will not be published, no = will be published).
c
IMDb Movies Metadata Dataset – 4.5M Records (Global Coverage)
crawlfeeds.com
csv, zip
Updated Jul 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). IMDb Movies Metadata Dataset – 4.5M Records (Global Coverage) [Dataset]. https://crawlfeeds.com/datasets/imdb-movies-metadata-dataset-4-5m-records-global-coverage
Explore at:
csv, zipAvailable download formats
Dataset updated
Jul 5, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
Unlock one of the most comprehensive movie datasets available—4.5 million structured IMDb movie records, extracted and enriched for data science, machine learning, and entertainment research.

This dataset includes a vast collection of global movie metadata, including details on title, release year, genre, country, language, runtime, cast, directors, IMDb ratings, reviews, and synopsis. Whether you're building a recommendation engine, benchmarking trends, or training AI models, this dataset is designed to give you deep and wide access to cinematic data across decades and continents.

Perfect for use in film analytics, OTT platforms, review sentiment analysis, knowledge graphs, and LLM fine-tuning, the dataset is cleaned, normalized, and exportable in multiple formats.

What’s Included:

Genres: Drama, Comedy, Horror, Action, Sci-Fi, Documentary, and more

Delivery: Direct download

Use Cases:

Train LLMs or chatbots on cinematic language and metadata

Build or enrich movie recommendation engines

Run cross-lingual or multi-region film analytics

Benchmark genre popularity across time periods

Power academic studies or entertainment dashboards

Feed into knowledge graphs, search engines, or NLP pipelines
w
Meta-data for data.gov.uk datasets
data.wu.ac.at
api, csv, html, json +1
Updated May 31, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Government Digital Service (2018). Meta-data for data.gov.uk datasets [Dataset]. https://data.wu.ac.at/odso/data_gov_uk/YjVlNGJlN2UtNmMzNi00MWI2LTlkNDgtY2FlMTk1YzMyZTM0
Explore at:
html, json, api, csv, xmlAvailable download formats
Dataset updated
May 31, 2018
Dataset provided by
Government Digital Service
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
A dataset of all the meta-data for all of the datasets available through the data.gov.uk service. This is provided as a zipped CSV or JSON file. It is published nightly.

Updates: 27 Sep 2017: we've moved all the previous dumps to an S3 bucket at https://dgu-ckan-metadata-dumps.s3-eu-west-1.amazonaws.com/ - This link is now listed here as a data file.

From 13/10/16 we added .v2.jsonl dump, which is set to replace the .json dump (which will be discontinued after a 3 month transition). This is produced using 'ckanapi dump'. It provides an enhanced version of each dataset ('validated', or what you get from package_show in CKAN API v3 - the old json was the unvalidated version). This now includes full details of the organization the dataset is in, rather than just the owner_id. Plus it includes the results of the archival & qa for each dataset and resource, showing whether the link is broken, detected format and stars of openness. It also benefits from being json lines http://jsonlines.org/ format, so you don't need to load the whole thing into memory to parse the json - just a line at a time.

On 12/1/2015 the organizations of the CSV was changed:

Before this date, each dataset was one line, and resources added as numbered columns. Since a dataset may have up to 300 resources, it ends up with 1025 columns, which is wider than many versions of Excel and Libreoffice will open. And the uncompressed size of 170Mb is more than most will deal with too. It is suggested you load it into a database, ahandle it with a python or ruby script, or use tools such as Refine or Google Fusion Tables.

After this date, the datasets are provided in one CSV and resources in another. On occasions that you want to join them, you can join them using the (dataset) "Name" column. These are now manageable in spreadsheet software.

You can also use the standard CKAN API if you want to search or get a small section of the data. Please respect the traffic limits in the API: http://data.gov.uk/terms-and-conditions
d
Hazardous Waste Portal Manifest Metadata
catalog.data.gov
data.ct.gov
+2more
Updated Jan 26, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.ct.gov (2024). Hazardous Waste Portal Manifest Metadata [Dataset]. https://catalog.data.gov/dataset/hazardous-waste-portal-manifest-metadata
Explore at:
Dataset updated
Jan 26, 2024
Dataset provided by
data.ct.gov
Description
Note: Please use the following view to be able to see the entire Dataset Description: https://data.ct.gov/Environment-and-Natural-Resources/Hazardous-Waste-Portal-Manifest-Metadata/x2z6-swxe Dataset Description Outline (5 sections) • INTRODUCTION • WHY USE THE CONNECTICUT OPEN DATA PORTAL MANIFEST METADATA DATASET INSTEAD OF THE DEEP DOCUMENT ONLINE SEARCH PORTAL ITSELF? • WHAT MANIFESTS ARE INCLUDED IN DEEP’S MANIFEST PERMANENT RECORDS ARE ALSO AVAILABLE VIA THE DEEP DOCUMENT SEARCH PORTAL AND CT OPEN DATA? • HOW DOES THE PORTAL MANIFEST METADATA DATASET RELATE TO THE OTHER TWO MANIFEST DATASETS PUBLISHED IN CT OPEN DATA? • IMPORTANT NOTES INTRODUCTION • All of DEEP’s paper hazardous waste manifest records were recently scanned and “indexed”. • Indexing consisted of 6 basic pieces of information or “metadata” taken from each manifest about the Generator and stored with the scanned image. The metadata enables searches by: Site Town, Site Address, Generator Name, Generator ID Number, Manifest ID Number and Date of Shipment. • All of the metadata and scanned images are available electronically via DEEP’s Document Online Search Portal at: https://filings.deep.ct.gov/DEEPDocumentSearchPortal/ • Therefore, it is no longer necessary to visit the DEEP Records Center in Hartford for manifest records or information. • This CT Data dataset “Hazardous Waste Portal Manifest Metadata” (or “Portal Manifest Metadata”) was copied from the DEEP Document Online Search Portal, and includes only the metadata – no images. WHY USE THE CONNECTICUT OPEN DATA PORTAL MANIFEST METADATA DATASET INSTEAD OF THE DEEP DOCUMENT ONLINE SEARCH PORTAL ITSELF? The Portal Manifest Metadata is a good search tool to use along with the Portal. Searching the Portal Manifest Metadata can provide the following advantages over searching the Portal: • faster searches, especially for “large searches” - those with a large number of search returns unlimited number of search returns (Portal is limited to 500); • larger display of search returns; • search returns can be sorted and filtered online in CT Data; and • search returns and the entire dataset can be downloaded from CT Data and used offline (e.g. download to Excel format) • metadata from searches can be copied from CT Data and pasted into the Portal search fields to quickly find single scanned images. The main advantages of the Portal are: • it provides access to scanned images of manifest documents (CT Data does not); and • images can be downloaded one or multiple at a time. WHAT MANIFESTS ARE INCLUDED IN DEEP’S MANIFEST PERMANENT RECORDS ARE ALSO AVAILABLE VIA THE DEEP DOCUMENT SEARCH PORTAL AND CT OPEN DATA? All hazardous waste manifest records received and maintained by the DEEP Manifest Program; including: • manifests originating from a Connecticut Generator or sent to a Connecticut Destination Facility including manifests accompanying an exported shipment • manifests with RCRA hazardous waste listed on them (such manifests may also have non-RCRA hazardous waste listed) • manifests from a Generator with a Connecticut Generator ID number (permanent or temporary number) • manifests with sufficient quantities of RCRA hazardous waste listed for DEEP to consider the Generator to be a Small or Large Quantity Generator • manifests with PCBs listed on them from 2016 to 6-29-2018. • Note: manifests sent to a CT Destination Facility were indexed by the Connecticut or Out of State Generator. Searches by CT Designated Facility are not possible unless such facility is the Generator for the purposes of manifesting. All other manifests were considered “non-hazardous” manifests and not scanned. They were discarded after 2 years in accord with DEEP records retention schedule. Non-hazardous manifests include: • Manifests with only non-RCRA hazardous waste listed • Manifests from generators that did not have a permanent or temporary Generator ID number • Sometimes non-hazardous manifests were considered “Hazar
c
Rotten Tomatoes Movie Dataset – Clean Movie Metadata
crawlfeeds.com
csv, zip
Updated Jul 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Rotten Tomatoes Movie Dataset – Clean Movie Metadata [Dataset]. https://crawlfeeds.com/datasets/rotten-tomatoes-movie-dataset-clean-movie-metadata
Explore at:
csv, zipAvailable download formats
Dataset updated
Jul 17, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
We provide a high-quality Rotten Tomatoes movie dataset that includes key metadata for thousands of movies. This dataset is ideal for anyone working with movie-related platforms, entertainment analytics, content curation, or movie discovery tools.

Our collection is structured, clean, and designed to support real-time apps, dashboards, and research use cases.

What the Dataset Includes

Each record in the dataset contains core information pulled directly from Rotten Tomatoes, including:

Movie Name – The official title of the movie.

Poster URL – High-resolution image link to the movie poster.

Trailer URL – Direct link to the official trailer (when available).

Genre – One or more genres associated with the movie, such as Action, Drama, Comedy, or Horror.

Release Date – The date the movie was released to the public.

Actors – Main cast members listed on Rotten Tomatoes.

Directors – Director(s) responsible for the movie.

Rating – Audience or critic scores, where available.

Broad Coverage

This dataset spans a wide range of movies across all major genres and decades. From modern releases to timeless classics, from Hollywood blockbusters to independent films — we’ve included movies of all types with relevant data points.

You can expect data on:

U.S. theatrical releases

Netflix, Amazon, and other streaming exclusives

Festival films and limited releases

Animated and documentary films

Use Cases

Here are just a few ways this dataset can be useful:

Movie Recommendation Engines – Use metadata and genre info to power personalized movie suggestions.

Entertainment Search Tools – Build searchable movie listings with visual poster previews and trailer links.

Data Visualization Projects – Create dashboards showing trends by genre, release periods, or actor participation.

AI/ML Training – Use metadata to train classification models or sentiment prediction tools.

Research & Academic Use – Analyze patterns in movie releases, cast dynamics, and genre evolution.

Why Use Our Dataset?

Clean & ready-to-use: No raw HTML, just clean structured data.

Minimal but meaningful fields: Focused on useful movie attributes without clutter.

Updated info: Covers both classic and current titles.

Simple integration: Easy to use for developers, analysts, and product teams.

If you're working on a movie-based product or looking for reliable film metadata for your project, this dataset offers an ideal foundation.

Let us know if you’d like to explore it further.
d
US Restaurant POI dataset with metadata
datarade.ai
.csv
Updated Jul 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Geolytica (2022). US Restaurant POI dataset with metadata [Dataset]. https://datarade.ai/data-products/us-restaurant-poi-dataset-with-metadata-geolytica
Explore at:
.csvAvailable download formats
Dataset updated
Jul 30, 2022
Dataset authored and provided by
Geolytica
Area covered
United States of America
Description
Point of Interest (POI) is defined as an entity (such as a business) at a ground location (point) which may be (of interest). We provide high-quality POI data that is fresh, consistent, customizable, easy to use and with high-density coverage for all countries of the world.

This is our process flow:

Our machine learning systems continuously crawl for new POI data Our geoparsing and geocoding calculates their geo locations Our categorization systems cleanup and standardize the datasets Our data pipeline API publishes the datasets on our data store

A new POI comes into existence. It could be a bar, a stadium, a museum, a restaurant, a cinema, or store, etc.. In today's interconnected world its information will appear very quickly in social media, pictures, websites, press releases. Soon after that, our systems will pick it up.

POI Data is in constant flux. Every minute worldwide over 200 businesses will move, over 600 new businesses will open their doors and over 400 businesses will cease to exist. And over 94% of all businesses have a public online presence of some kind tracking such changes. When a business changes, their website and social media presence will change too. We'll then extract and merge the new information, thus creating the most accurate and up-to-date business information dataset across the globe.

We offer our customers perpetual data licenses for any dataset representing this ever changing information, downloaded at any given point in time. This makes our company's licensing model unique in the current Data as a Service - DaaS Industry. Our customers don't have to delete our data after the expiration of a certain "Term", regardless of whether the data was purchased as a one time snapshot, or via our data update pipeline.

Customers requiring regularly updated datasets may subscribe to our Annual subscription plans. Our data is continuously being refreshed, therefore subscription plans are recommended for those who need the most up to date data. The main differentiators between us vs the competition are our flexible licensing terms and our data freshness.

Data samples may be downloaded at https://store.poidata.xyz/us
H
Data from: A general purpose tool-set for representing data relationships:...
dataverse.harvard.edu
Updated May 4, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joshua Stillerman, Thomas Fredian, Martin Greenwald, John Wright (2018). A general purpose tool-set for representing data relationships: Converting data into knowledge [Dataset]. http://doi.org/10.7910/DVN/SHYWLB
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/SHYWLB
Dataset updated
May 4, 2018
Dataset provided by
Harvard Dataverse
Authors
Joshua Stillerman, Thomas Fredian, Martin Greenwald, John Wright
License
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/SHYWLBhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/SHYWLB
Description
Rich metadata is required to find and understand the recorded measurements from modern experiments with their immense and complex data stores. Systems to store and manage these metadata have improved over time, but in most cases are ad-hoc collections of data relationships, often represented in domain or site specific application code. We are developing a general set of tools to store, manage, and retrieve datarelationship metadata. These tools will be agnostic to the underlying data storage mechanisms, and to the data stored in them, making the system applicable across a wide range of science domains. Data management tools typically represent at least one relationship paradigm through implicit or explicit metadata. The addition of these metadata allows the data to be searched and understood by larger groups of users over longer periods of time. Using these systems, researchers are less dependent on one on one communication with the scientists involved in running the experiments, nor to rely on their ability to remember the details of their data. In the magnetic fusion research community, the MDSplus system is widely used to record raw and processed data from experiments. Users create a hierarchical relationship tree for each instance of their experiment, allowing them to record the meanings of what is recorded. Most users of this system, add to this a set of ad-hoc tools to help users locate specific experiment runs, which they can then access via this hierarchical organization. However, the MDSplus tree is only one possible organization of the records, and these additional applications that relate the experiment 'shots' into run days, experimental proposals, logbook entries, run summaries, analysis work flow, publications, etc. have up until now, been implemented on an experiment by experiment basis. The Metadata Provenance Ontology project, MPO, is a system built to record data provenance information about computed results. It allows users to record the inputs and outputs from each step of their computational workflows, in particular, what raw and processed data were used as inputs, what codes were run and what results were produced. The resulting collections of provenance graphs can be annotated, grouped, searched, filtered and browsed. This provides a powerful tool to record, understand, and locate computed results. However, this can be understood as one more specific data relationship, which can be construed as an instance of something more general. Building on concepts developed in these projects, we are developing a general system that could be used to represent all of these kinds of data relationships as mathematical graphs. Just as MDSplus and MPO were generalizations of data management needs for a collection of users, this new system will generalize the storage, location, and retrieval of the relationships between data. The system will store data relationships as data, not encoded in a set of application specific programs or ad hoc data structures. Stored data, would be referred to by URIs allowing the system to be agnostic to the underlying data representations. Users can then traverse these graphs. The system will allow users to construct a collection of graphs describing ANY OR ALL OF the relationships between data items, locate interesting data, see what other graphs these data are members of and navigate into and through them.
W
Grab vs Composite metadata
cloud.csiss.gmu.edu
catalog.data.gov
Updated Mar 8, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States (2021). Grab vs Composite metadata [Dataset]. http://doi.org/10.23719/1520144
Explore at:
Unique identifier
https://doi.org/10.23719/1520144
Dataset updated
Mar 8, 2021
Dataset provided by
United States
License
https://pasteur.epa.gov/license/sciencehub-license-non-epa-generated.htmlhttps://pasteur.epa.gov/license/sciencehub-license-non-epa-generated.html
Description
Data described concentrations of human adenovirus, crAssphage and Pepper Mild Mottle virus in 1 hour composite wastewater samples and 24 h composite wastewater samples. This dataset is not publicly accessible because: Data is the property of CSIRO. It can be accessed through the following means: Contact Warish Ahmed, Warish.Ahmed@csiro.au. Format: Data is in excel format.

This dataset is associated with the following publication: Ahmed, W., A. Bivins, P.M. Bertsch, K. Bibby, P. Gyawali, S.P. Sherchan, S.L. Simpson, K.V. Thomas, R. Verhagen, M. Kitajima, J.F. Mueller, and A. Korajkic. Intraday variability of indicator and pathogenic viruses in 1-h and 24-h composite wastewater samples: Implications for wastewater-based epidemiology. ENVIRONMENTAL RESEARCH. Elsevier B.V., Amsterdam, NETHERLANDS, 193: 110531, (2021).
o
dataset: Create interoperable and well-documented data frames
explore.openaire.eu
Updated Jun 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Antal (2022). dataset: Create interoperable and well-documented data frames [Dataset]. http://doi.org/10.5281/zenodo.6854273
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.6854273
Dataset updated
Jun 23, 2022
Authors
Daniel Antal
Description
See the package documentation website on dataset.dataobservatory.eu. Report bugs and suggestions on Github: https://github.com/dataobservatory-eu/dataset/issues The primary aim of dataset is to build well-documented data.frames, tibbles or data.tables that follow the W3C Data Cube Vocabulary based on the statistical SDMX data cube model. Such standard R objects (data.fame, data.table, tibble, or well-structured lists like json) become highly interoperable and can be placed into relational databases, semantic web applications, archives, repositories. They follow the FAIR principles: they are findable, accessible, interoperable and reusable. Our datasets: Contain Dublin Core or DataCite (or both) metadata that makes the findable and easier accessible via online libraries. See vignette article Datasets With FAIR Metadata. Their dimensions can be easily and unambigously reduced to triples for RDF applications; they can be easily serialized to, or synchronized with semantic web applications. See vignette article From dataset To RDF. Contain processing metadata that greatly enhance the reproducibility of the results, and the reviewability of the contents of the dataset, including metadata defined by the DDI Alliance, which is particularly helpful for not yet processed data; Follow the datacube model of the Statistical Data and Metadata eXchange, therefore allowing easy refreshing with new data from the source of the analytical work, and particularly useful for datasets containing results of statistical operations in R; Correct exporting with FAIR metadata to the most used file formats and straighforward publication to open science repositories with correct bibliographical and use metadata. See Export And Publish a dataset. Relatively lightweight in dependencies and easily works with data.frame, tibble or data.table R objects.
H
Bear Lake Data Repository
hydroshare.org
zip
Updated Sep 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeff Nielson; Katie Wadsworth (2024). Bear Lake Data Repository [Dataset]. https://www.hydroshare.org/resource/444e4bd2940e47e6bcab5e7966a929fe
Explore at:
zip(154.6 MB)Available download formats
Dataset updated
Sep 9, 2024
Dataset provided by
HydroShare
Authors
Jeff Nielson; Katie Wadsworth
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Bear Lake
Description
The Bear Lake Data Repository (BLDR) is an active archive, containing a growing compilation of biological, chemical, and physical datasets collected from Bear Lake and its surrounding watershed. The datasets herein have been digitized from historical records and reports, extracted from papers and theses, and obtained from public and private entities, including the United States Geological Survey, PacifiCorp, and, inter alia, Ecosystems Research Institute.

Contributions are welcome. The BLDR accepts biological, chemical, or physical datasets obtained at Bear Lake, irrespective of funding source. There is no submission size limit at present—workarounds will be found if submissions exceed Hydroshare limits (20 GB). Contributions are published with an open access license and will serve many use cases. The current repository steward, Bear Lake Watch, will advise on submissions and make accepted contributions available promptly.

Metadata files are provided for each dataset, however, contact with original contributor(s) is encouraged for questions and additional details prior to data usage. The BLDR and its contributors shall not be liable for any damages resulting from misinterpretation or misuse of the data or metadata.
c
Free Dataset: Beauty Product Price & Metadata Snapshot
crawlfeeds.com
csv, zip
Updated May 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Free Dataset: Beauty Product Price & Metadata Snapshot [Dataset]. https://crawlfeeds.com/datasets/free-dataset-beauty-product-price-metadata-snapshot
Explore at:
csv, zipAvailable download formats
Dataset updated
May 26, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
Explore a curated dataset of beauty and personal care products, extracted from global online retailers. This free download includes structured data that mirrors what powers the BeautyFeeds live product tracking platform.

Whether you're building an eCommerce dashboard, researching market trends, or prototyping beauty intelligence tools — this dataset is a perfect place to start.

Interested in Live Tracking or API?

This dataset represents just a snapshot of what we track in real time at https://beautyfeeds.io/" target="_new" rel="noopener" data-start="2004" data-end="2041">BeautyFeeds:

Monitor price & stock changes daily or weekly

Track products from major retailers like Sephora, Ulta, Nykaa, Amazon, and more

Access via export or live API

Filter by brand, country, or category

Assign custom URLs for targeted scraping

👉 Learn more and get 500 free credits at https://beautyfeeds.io/" target="_new" rel="noopener" data-start="2342" data-end="2382">BeautyFeeds.io
A
BLM ES OH PLSS Metadata Glance Polygon
data.amerigeoss.org
datadiscoverystudio.org
zip
Updated Jul 30, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States[old] (2019). BLM ES OH PLSS Metadata Glance Polygon [Dataset]. https://data.amerigeoss.org/ko_KR/dataset/blm-es-oh-plss-metadata-glance-polygon
Explore at:
zipAvailable download formats
Dataset updated
Jul 30, 2019
Dataset provided by
United States[old]
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
This data represents the GIS Version of the Public Land Survey System including both rectangular and non-rectangular survey data. The rectangular survey data are a reference system for land tenure based upon meridian, township/range, section, section subdivision and government lots. The non-rectangular survey data represent surveys that were largely performed to protect and/or convey title on specific parcels of land such as mineral surveys and tracts. The data are largely complete in reference to the rectangular survey data at the level of first division. However, the data varies in terms of granularity of its spatial representation as well as its content below the first division. Therefore, depending upon the data source and steward, accurate subdivision of the rectangular data may not be available below the first division and the non-rectangular minerals surveys may not be present. At times, the complexity of surveys rendered the collection of data cost prohibitive such as in areas characterized by numerous, overlapping mineral surveys. In these situations, the data were often not abstracted or were only partially abstracted and incorporated into the data set. These PLSS data were compiled from a broad spectrum or sources including federal, county, and private survey records such as field notes and plats as well as map sources such as USGS 7 ½ minute quadrangles. The metadata in each data set describes the production methods for the data content. This data is optimized for data publication and sharing rather than for specific "production" or operation and maintenance. A complete PLSS data set includes the following: PLSS Townships, First Divisions and Second Divisions (the hierarchical break down of the PLSS Rectangular surveys) PLSS Special surveys (non-rectangular components of the PLSS) Meandered Water, Corners, Metadata at a Glance (which identified last revised date and data steward) and Conflicted Areas (known areas of gaps or overlaps or inconsistencies). The Entity-Attribute section of this metadata describes these components in greater detail. This is a graphic representation of the data stewards based on PLSS Townships in PLSS areas. In non-PLSS areas the metadata at a glance is based on a data steward defined polygons such as a city or county or other units. The identification of the data steward is a general indication of the agency that will be responsible for updates and providing the authoritative data sources. In other implementations this may have been termed the alternate source, meaning alternate to the BLM. But in the shared environment of the NSDI the data steward for an area is the primary coordinator or agency responsible for making updates or causing updates to be made. The data stewardship polygons are defined and provided by the data steward.
BLM AK Metadata Glance
gbp-blm-egis.hub.arcgis.com
gis.data.alaska.gov
+1more
Updated Apr 23, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bureau of Land Management (2025). BLM AK Metadata Glance [Dataset]. https://gbp-blm-egis.hub.arcgis.com/datasets/blm-ak-metadata-glance/about
Explore at:
Dataset updated
Apr 23, 2025
Dataset authored and provided by
Bureau of Land Managementhttp://www.blm.gov/
Area covered

Description
This data represents the GIS Version of the Public Land Survey System including both rectangular and non-rectangular survey data. The rectangular survey data are a reference system for land tenure based upon meridian, township/range, section, section subdivision and government lots. The non-rectangular survey data represent surveys that were largely performed to protect and/or convey title on specific parcels of land such as mineral surveys and tracts. The data are largely complete in reference to the rectangular survey data at the level of first division. However, the data varies in terms of granularity of its spatial representation as well as its content below the first division. Therefore, depending upon the data source and steward, accurate subdivision of the rectangular data may not be available below the first division and the non-rectangular minerals surveys may not be present. At times, the complexity of surveys rendered the collection of data cost prohibitive such as in areas characterized by numerous, overlapping mineral surveys. In these situations, the data were often not abstracted or were only partially abstracted and incorporated into the data set. These PLSS data were compiled from a broad spectrum or sources including federal, county, and private survey records such as field notes and plats as well as map sources such as USGS 7 ½ minute quadrangles. The metadata in each data set describes the production methods for the data content. This data is optimized for data publication and sharing rather than for specific "production" or operation and maintenance. A complete PLSS data set includes the following: PLSS Townships, First Divisions and Second Divisions (the hierarchical break down of the PLSS Rectangular surveys) PLSS Special surveys (non-rectangular components of the PLSS) Meandered Water, Corners, Metadata at a Glance (which identified last revised date and data steward) and Conflicted Areas (known areas of gaps or overlaps or inconsistencies). The Entity-Attribute section of this metadata describes these components in greater detail. This is a graphic representation of the data stewards based on PLSS Townships in PLSS areas. In non-PLSS areas the metadata at a glance is based on a data steward defined polygons such as a city or county or other units. The identification of the data steward is a general indication of the agency that will be responsible for updates and providing the authoritative data sources. In other implementations this may have been termed the alternate source, meaning alternate to the BLM. But in the shared environment of the NSDI the data steward for an area is the primary coordinator or agency responsible for making updates or causing updates to be made. The data stewardship polygons are defined and provided by the data steward.

Facebook

Twitter

Click to copy link

Link copied

Cite

Bureau of Land Management (2025). BLM NV PLSS CADNSDI Version 2 Metadata Glance Polygon [Dataset]. https://catalog.data.gov/dataset/blm-nv-plss-cadnsdi-version-2-metadata-glance-polygon-b3b5c

BLM NV PLSS CADNSDI Version 2 Metadata Glance Polygon

Explore at:

Dataset updated

Jul 9, 2025

Dataset provided by

Bureau of Land Managementhttp://www.blm.gov/

Description

BLM NV PLSS Metadata Glance: MetadataGlance provides PLSS data steward content for individual PLSS units.This dataset represents the GIS Version of the Public Land Survey System including both rectangular and non-rectangular surveys. The primary source for the data is cadastral survey records housed by the BLM supplemented with local records and geographic control coordinates from states, counties as well as other federal agencies such as the USGS and USFS. The data has been converted from source documents to digital form and transferred into a GIS format that is compliant with FGDC Cadastral Data Content Standards and Guidelines for publication. This data is optimized for data publication and sharing rather than for specific "production" or operation and maintenance. This data set includes the following: PLSS Fully Intersected (all of the PLSS feature at the atomic or smallest polygon level), PLSS Townships, First Divisions and Second Divisions (the hierarchical break down of the PLSS Rectangular surveys) PLSS Special surveys (non rectangular components of the PLSS) Meandered Water, Corners and Conflicted Areas (known areas of gaps or overlaps between Townships or state boundaries). The Entity-Attribute section of this metadata describes these components in greater detail.

Clear search

Close search

Google apps

Main menu

BLM NV PLSS CADNSDI Version 2 Metadata Glance Polygon

Data from "Obstacles to the Reuse of Study Metadata in ClinicalTrials.gov"

Movies & TV Shows Metadata Dataset (190K+ Records, Horror-Heavy Collection)

Use Cases:

Data from: Metadata capital in a data repository

Open Data Portal Catalogue

Making the case for FAIR Data Points

excell file with metadata sheet and data from the PFQ and BPC paper

Standard Sample Description V2 Structural Metadata

IMDb Movies Metadata Dataset – 4.5M Records (Global Coverage)

What’s Included:

Use Cases:

Meta-data for data.gov.uk datasets

Hazardous Waste Portal Manifest Metadata

Rotten Tomatoes Movie Dataset – Clean Movie Metadata

What the Dataset Includes

Broad Coverage

Use Cases

Why Use Our Dataset?

US Restaurant POI dataset with metadata

Data from: A general purpose tool-set for representing data relationships:...

Grab vs Composite metadata

dataset: Create interoperable and well-documented data frames

Bear Lake Data Repository

Free Dataset: Beauty Product Price & Metadata Snapshot

Interested in Live Tracking or API?

BLM ES OH PLSS Metadata Glance Polygon

BLM AK Metadata Glance

BLM NV PLSS CADNSDI Version 2 Metadata Glance PolygonSee More Versions

BLM NV PLSS CADNSDI Version 2 Metadata Glance Polygon