91 datasets found
  1. c

    Movies & TV Shows Metadata Dataset (190K+ Records, Horror-Heavy Collection)

    • crawlfeeds.com
    csv, zip
    Updated Jun 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). Movies & TV Shows Metadata Dataset (190K+ Records, Horror-Heavy Collection) [Dataset]. https://crawlfeeds.com/datasets/movies-tv-shows-metadata-dataset-190k-records-horror-heavy-collection
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Jun 22, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    This comprehensive dataset features detailed metadata for over 190,000 movies and TV shows, with a strong concentration in the Horror genre. It is ideal for entertainment research, machine learning models, genre-specific trend analysis, and content recommendation systems.

    Each record contains rich information, making it perfect for streaming platforms, film industry analysts, or academic media researchers.

    Primary Genre Focus: Horror

    Use Cases:

    • Build movie recommendation systems or genre classifiers

    • Train NLP models on movie descriptions

    • Analyze Horror content trends over time

    • Explore box office vs. rating correlations

    • Enrich entertainment datasets with directorial and cast metadata

  2. Data from: Metadata capital in a data repository

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    csv, txt
    Updated May 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jane Greenberg; Shea Swauger; Elena M. Feinstein; Jane Greenberg; Shea Swauger; Elena M. Feinstein (2022). Data from: Metadata capital in a data repository [Dataset]. http://doi.org/10.5061/dryad.8c1p6
    Explore at:
    txt, csvAvailable download formats
    Dataset updated
    May 30, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jane Greenberg; Shea Swauger; Elena M. Feinstein; Jane Greenberg; Shea Swauger; Elena M. Feinstein
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This paper reports on a study exploring 'metadata capital' acquired via metadata reuse. Collaborative modeling and content analysis methods were used to study metadata capital in the Dryad data repository. A sample of 20 cases for two Dryad metadata workflows (Case A and Case B) consisting of 100 instantiations (60 metadata objects, 40 metadata activities) was analyzed. Results indicate that Dryad's overall workflow builds metadata capital, with the total metadata reuse at 50% or greater for 8 of 12 metadata properties, and 5 of these 8 properties showing reuse at 80% or higher. Metadata reuse is frequent for basic bibliographic properties (e.g., author, title, subject), although it is limited or absent for more complex scientific properties (e.g., taxon, spatial, and temporal information). This paper provides background context, reports the research approach and findings, and considers research implications and system design priorities that may contribute to metadata capital—long term.

  3. o

    Making the case for FAIR Data Points

    • explore.openaire.eu
    Updated Apr 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Angus Whyte; Ryan O'Connor; Josefine Nordling (2022). Making the case for FAIR Data Points [Dataset]. http://doi.org/10.5281/zenodo.6256839
    Explore at:
    Dataset updated
    Apr 8, 2022
    Authors
    Angus Whyte; Ryan O'Connor; Josefine Nordling
    Description

    As a service manager how may I assist my organisation to make research data we hold both FAIR and “as open as possible, as closed as necessary”? The FAIR Data Point is a protocol for (meta)data provision championed by GO-FAIR as a solution to this need. In this story we describe how two organisations have applied the FAIR Data Point (FDP) to provide FAIR data or metadata in two contexts. In Leiden University Medical Centre the FDP is used to make metadata about COVID patient data as open as possible in the interest of research, while the data is necessarily closed and held in a variety of different systems. By contrast, Dutch data service provider SURF is applying the FDP to improve the FAIRness of an extensive dataset repository that is openly accessible by default. Based on interviews with the lead protagonists in both organisations' FDP implementations we compare their rationales and approaches, and how they expect this FAIR-enabling technology to benefit their user communities.

  4. BLM NV PLSS CADNSDI Version 2 Metadata Glance Polygon

    • catalog.data.gov
    • data.amerigeoss.org
    • +2more
    Updated Jul 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bureau of Land Management (2025). BLM NV PLSS CADNSDI Version 2 Metadata Glance Polygon [Dataset]. https://catalog.data.gov/dataset/blm-nv-plss-cadnsdi-version-2-metadata-glance-polygon-b3b5c
    Explore at:
    Dataset updated
    Jul 9, 2025
    Dataset provided by
    Bureau of Land Managementhttp://www.blm.gov/
    Description

    BLM NV PLSS Metadata Glance: MetadataGlance provides PLSS data steward content for individual PLSS units.This dataset represents the GIS Version of the Public Land Survey System including both rectangular and non-rectangular surveys. The primary source for the data is cadastral survey records housed by the BLM supplemented with local records and geographic control coordinates from states, counties as well as other federal agencies such as the USGS and USFS. The data has been converted from source documents to digital form and transferred into a GIS format that is compliant with FGDC Cadastral Data Content Standards and Guidelines for publication. This data is optimized for data publication and sharing rather than for specific "production" or operation and maintenance. This data set includes the following: PLSS Fully Intersected (all of the PLSS feature at the atomic or smallest polygon level), PLSS Townships, First Divisions and Second Divisions (the hierarchical break down of the PLSS Rectangular surveys) PLSS Special surveys (non rectangular components of the PLSS) Meandered Water, Corners and Conflicted Areas (known areas of gaps or overlaps between Townships or state boundaries). The Entity-Attribute section of this metadata describes these components in greater detail.

  5. excell file with metadata sheet and data from the PFQ and BPC paper

    • catalog.data.gov
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). excell file with metadata sheet and data from the PFQ and BPC paper [Dataset]. https://catalog.data.gov/dataset/excell-file-with-metadata-sheet-and-data-from-the-pfq-and-bpc-paper
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    this file has the metadata sheet and the data used for the figures and tables in the PFQ vs BPC manuscript. This dataset is associated with the following publication: Gray, E., J. Furr, J. Conley, C. Lambright, N. Evans, M. Cardon, V. Wilson, P. Foster, and P. Hartig. A Conflicted Tale of Two Novel AR Antagonists In vitro and In vivo: Pyrifluquinazon versus Bisphenol C.. TOXICOLOGICAL SCIENCES. Society of Toxicology, RESTON, VA, 632-643, (2019).

  6. W

    Grab vs Composite metadata

    • cloud.csiss.gmu.edu
    • catalog.data.gov
    Updated Mar 8, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States (2021). Grab vs Composite metadata [Dataset]. http://doi.org/10.23719/1520144
    Explore at:
    Dataset updated
    Mar 8, 2021
    Dataset provided by
    United States
    License

    https://pasteur.epa.gov/license/sciencehub-license-non-epa-generated.htmlhttps://pasteur.epa.gov/license/sciencehub-license-non-epa-generated.html

    Description

    Data described concentrations of human adenovirus, crAssphage and Pepper Mild Mottle virus in 1 hour composite wastewater samples and 24 h composite wastewater samples. This dataset is not publicly accessible because: Data is the property of CSIRO. It can be accessed through the following means: Contact Warish Ahmed, Warish.Ahmed@csiro.au. Format: Data is in excel format.

    This dataset is associated with the following publication: Ahmed, W., A. Bivins, P.M. Bertsch, K. Bibby, P. Gyawali, S.P. Sherchan, S.L. Simpson, K.V. Thomas, R. Verhagen, M. Kitajima, J.F. Mueller, and A. Korajkic. Intraday variability of indicator and pathogenic viruses in 1-h and 24-h composite wastewater samples: Implications for wastewater-based epidemiology. ENVIRONMENTAL RESEARCH. Elsevier B.V., Amsterdam, NETHERLANDS, 193: 110531, (2021).

  7. NOPD In-Car Camera Metadata

    • data.nola.gov
    • gimi9.com
    • +1more
    application/rdfxml +5
    Updated Apr 10, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of New Orleans Police Department (2023). NOPD In-Car Camera Metadata [Dataset]. https://data.nola.gov/Public-Safety-and-Preparedness/NOPD-In-Car-Camera-Metadata/md3v-ph3u
    Explore at:
    application/rdfxml, application/rssxml, csv, json, xml, tsvAvailable download formats
    Dataset updated
    Apr 10, 2023
    Dataset provided by
    New Orleans Police Departmenthttp://nola.gov/nopd
    Authors
    City of New Orleans Police Department
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset represents the metadata describing the process of transferring in-car camera videos recorded by the New Orleans Police Department from the server to DVDs in order to free up storage space. This dataset is updated quarterly through a manual spreadsheet transfer and upsert. Disclaimer: The New Orleans Police Department does not guarantee (either expressed or implied) the accuracy, completeness, timeliness, or correct sequencing of the information. The New Orleans Police Department will not be responsible for any error or omission, or for the use of, or the results obtained from the use of this information. All data visualizations on maps should be considered approximate and attempts to derive specific addresses are strictly prohibited. The New Orleans Police Department is not responsible for the content of any off-site pages that are referenced by or that reference this web page other than an official City of New Orleans or New Orleans Police Department web page. The user specifically acknowledges that the New Orleans Police Department is not responsible for any defamatory, offensive, misleading, or illegal conduct of other users, links, or third parties and that the risk of injury from the foregoing rests entirely with the user. Any use of the information for commercial purposes is strictly prohibited. The unauthorized use of the words "New Orleans Police Department," "NOPD," or any colorable imitation of these words or the unauthorized use of the New Orleans Police Department logo is unlawful. This web page does not, in any way, authorize such use.

  8. o

    Popular TMDB Films Metadata Dataset

    • opendatabay.com
    .undefined
    Updated Jul 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Popular TMDB Films Metadata Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/4dde40e9-76eb-4270-983d-c1ba4b8fe72d
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 2, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Entertainment & Media Consumption
    Description

    This dataset contains metadata for the top 10,000 most popular movies available on The Movie Database (TMDB). TMDB is a widely used online platform and community providing extensive details on films, TV shows, and related content. Users can browse and search for titles, accessing information such as cast, crew, synopses, and ratings. This dataset is designed for data analysts, researchers, and developers keen on examining movie popularity and attributes. It is a valuable resource for various analyses, including exploring trends in movie genres over time, identifying patterns in budget versus revenue, and evaluating the impact of different attributes on a film's popularity. The data was gathered from TMDB's public API and has undergone thorough cleaning and preprocessing to enhance its quality and usability.

    Columns

    • id: A unique identifier for each movie within the TMDB database.
    • title: The title of the movie.
    • release_date: The date on which the movie was released.
    • vote_average: The average rating given to the movie by TMDB users.
    • vote_count: The total number of votes cast for the movie on TMDB.
    • popularity: A score assigned to the movie by TMDB, based on user engagement metrics.

    Distribution

    This dataset comprises metadata for the top 10,000 most popular movies from The Movie Database. Specific numbers for rows or records beyond this top count are not available. The data has been meticulously crafted from raw information obtained via TMDB's public API and subsequently cleaned and preprocessed.

    Usage

    Ideal applications for this dataset include: * Analysing trends in movie genres over time. * Identifying correlations between movie budget, revenue, and popularity. * Developing and testing movie recommendation systems. * Exploring the impact of different attributes on a movie's success. * Academic research into film industry dynamics and audience reception.

    Coverage

    The dataset's geographic coverage is Global, reflecting the worldwide reach of movies and TMDB's user base. It focuses on the top 10,000 most popular movies, implying a snapshot of current or recent popularity without a specific historical time range for the films themselves. No specific demographic scope for the data is provided, but it reflects engagement from TMDB users generally.

    License

    CC0

    Who Can Use It

    This dataset is primarily intended for: * Data Analysts: To scrutinise and analyse movie popularity and attributes. * Researchers: For academic studies on film trends, audience behaviour, and industry patterns. * Developers: To build and test applications such as movie recommendation engines or data visualisations.

    Dataset Name Suggestions

    • TMDB Top Movies
    • Popular TMDB Films Metadata
    • Movie Popularity Dataset
    • TMDB Film Attributes

    Attributes

    Original Data Source: TMDB_top_rated_movies

  9. f

    Data from "Obstacles to the Reuse of Study Metadata in ClinicalTrials.gov"

    • figshare.com
    zip
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laura Miron; Rafael Gonçalves; Mark A. Musen (2023). Data from "Obstacles to the Reuse of Study Metadata in ClinicalTrials.gov" [Dataset]. http://doi.org/10.6084/m9.figshare.12743939.v2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    figshare
    Authors
    Laura Miron; Rafael Gonçalves; Mark A. Musen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This fileset provides supporting data and corpora for the empirical study described in: Laura Miron, Rafael S. Goncalves and Mark A. Musen. Obstacles to the Reuse of Metadata in ClinicalTrials.govDescription of filesOriginal data files:- AllPublicXml.zip contains the set of all public XML records in ClinicalTrials.gov (protocols and summary results information), on which all remaining analyses are based. Set contains 302,091 records downloaded on April 3, 2019.- public.xsd is the XML schema downloaded from ClinicalTrials.gov on April 3, 2019, used to validate records in AllPublicXML.BioPortal API Query Results- condition_matches.csv contains the results of querying the BioPortal API for all ontology terms that are an 'exact match' to each condition string scraped from the ClinicalTrials.gov XML. Columns={filename, condition, url, bioportal term, cuis, tuis}. - intervention_matches.csv contains BioPortal API query results for all interventions scraped from the ClinicalTrials.gov XML. Columns={filename, intervention, url, bioportal term, cuis, tuis}.Data Element Definitions- supplementary_table_1.xlsx Mapping of element names, element types, and whether elements are required in ClinicalTrials.gov data dictionaries, the ClinicalTrials.gov XML schema declaration for records (public.XSD), the Protocol Registration System (PRS), FDAAA801, and the WHO required data elements for clinical trial registrations.Column and value definitions: - CT.gov Data Dictionary Section: Section heading for a group of data elements in the ClinicalTrials.gov data dictionary (https://prsinfo.clinicaltrials.gov/definitions.html) - CT.gov Data Dictionary Element Name: Name of an element/field according to the ClinicalTrials.gov data dictionaries (https://prsinfo.clinicaltrials.gov/definitions.html) and (https://prsinfo.clinicaltrials.gov/expanded_access_definitions.html) - CT.gov Data Dictionary Element Type: "Data" if the element is a field for which the user provides a value, "Group Heading" if the element is a group heading for several sub-fields, but is not in itself associated with a user-provided value. - Required for CT.gov for Interventional Records: "Required" if the element is required for interventional records according to the data dictionary, "CR" if the element is conditionally required, "Jan 2017" if the element is required for studies starting on or after January 18, 2017, the effective date of the FDAAA801 Final Rule, "-" indicates if this element is not applicable to interventional records (only observational or expanded access) - Required for CT.gov for Observational Records: "Required" if the element is required for interventional records according to the data dictionary, "CR" if the element is conditionally required, "Jan 2017" if the element is required for studies starting on or after January 18, 2017, the effective date of the FDAAA801 Final Rule, "-" indicates if this element is not applicable to observational records (only interventional or expanded access) - Required in CT.gov for Expanded Access Records?: "Required" if the element is required for interventional records according to the data dictionary, "CR" if the element is conditionally required, "Jan 2017" if the element is required for studies starting on or after January 18, 2017, the effective date of the FDAAA801 Final Rule, "-" indicates if this element is not applicable to expanded access records (only interventional or observational) - CT.gov XSD Element Definition: abbreviated xpath to the corresponding element in the ClinicalTrials.gov XSD (public.XSD). The full xpath includes 'clinical_study/' as a prefix to every element. (There is a single top-level element called "clinical_study" for all other elements.) - Required in XSD? : "Yes" if the element is required according to public.XSD, "No" if the element is optional, "-" if the element is not made public or included in the XSD - Type in XSD: "text" if the XSD type was "xs:string" or "textblock", name of enum given if type was enum, "integer" if type was "xs:integer" or "xs:integer" extended with the "type" attribute, "struct" if the type was a struct defined in the XSD - PRS Element Name: Name of the corresponding entry field in the PRS system - PRS Entry Type: Entry type in the PRS system. This column contains some free text explanations/observations - FDAAA801 Final Rule FIeld Name: Name of the corresponding required field in the FDAAA801 Final Rule (https://www.federalregister.gov/documents/2016/09/21/2016-22129/clinical-trials-registration-and-results-information-submission). This column contains many empty values where elements in ClinicalTrials.gov do not correspond to a field required by the FDA - WHO Field Name: Name of the corresponding field required by the WHO Trial Registration Data Set (v 1.3.1) (https://prsinfo.clinicaltrials.gov/trainTrainer/WHO-ICMJE-ClinTrialsgov-Cross-Ref.pdf)Analytical Results:- EC_human_review.csv contains the results of a manual review of random sample eligibility criteria from 400 CT.gov records. Table gives filename, criteria, and whether manual review determined the criteria to contain criteria for "multiple subgroups" of participants.- completeness.xlsx contains counts and percentages of interventional records missing fields required by FDAAA801 and its Final Rule.- industry_completeness.xlsx contains percentages of interventional records missing required fields, broken up by agency class of trial's lead sponsor ("NIH", "US Fed", "Industry", or "Other"), and before and after the effective date of the Final Rule- location_completeness.xlsx contains percentages of interventional records missing required fields, broken up by whether record listed at least one location in the United States and records with only international location (excluding trials with no listed location), and before and after the effective date of the Final RuleIntermediate Results:- cache.zip contains pickle and csv files of pandas dataframes with values scraped from the XML records in AllPublicXML. Downloading these files greatly speeds up running analysis steps from jupyter notebooks in our github repository.

  10. d

    Hazardous Waste Portal Manifest Metadata

    • catalog.data.gov
    • data.ct.gov
    • +2more
    Updated Jan 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.ct.gov (2024). Hazardous Waste Portal Manifest Metadata [Dataset]. https://catalog.data.gov/dataset/hazardous-waste-portal-manifest-metadata
    Explore at:
    Dataset updated
    Jan 26, 2024
    Dataset provided by
    data.ct.gov
    Description

    Note: Please use the following view to be able to see the entire Dataset Description: https://data.ct.gov/Environment-and-Natural-Resources/Hazardous-Waste-Portal-Manifest-Metadata/x2z6-swxe Dataset Description Outline (5 sections) • INTRODUCTION • WHY USE THE CONNECTICUT OPEN DATA PORTAL MANIFEST METADATA DATASET INSTEAD OF THE DEEP DOCUMENT ONLINE SEARCH PORTAL ITSELF? • WHAT MANIFESTS ARE INCLUDED IN DEEP’S MANIFEST PERMANENT RECORDS ARE ALSO AVAILABLE VIA THE DEEP DOCUMENT SEARCH PORTAL AND CT OPEN DATA? • HOW DOES THE PORTAL MANIFEST METADATA DATASET RELATE TO THE OTHER TWO MANIFEST DATASETS PUBLISHED IN CT OPEN DATA? • IMPORTANT NOTES INTRODUCTION • All of DEEP’s paper hazardous waste manifest records were recently scanned and “indexed”. • Indexing consisted of 6 basic pieces of information or “metadata” taken from each manifest about the Generator and stored with the scanned image. The metadata enables searches by: Site Town, Site Address, Generator Name, Generator ID Number, Manifest ID Number and Date of Shipment. • All of the metadata and scanned images are available electronically via DEEP’s Document Online Search Portal at: https://filings.deep.ct.gov/DEEPDocumentSearchPortal/ • Therefore, it is no longer necessary to visit the DEEP Records Center in Hartford for manifest records or information. • This CT Data dataset “Hazardous Waste Portal Manifest Metadata” (or “Portal Manifest Metadata”) was copied from the DEEP Document Online Search Portal, and includes only the metadata – no images. WHY USE THE CONNECTICUT OPEN DATA PORTAL MANIFEST METADATA DATASET INSTEAD OF THE DEEP DOCUMENT ONLINE SEARCH PORTAL ITSELF? The Portal Manifest Metadata is a good search tool to use along with the Portal. Searching the Portal Manifest Metadata can provide the following advantages over searching the Portal: • faster searches, especially for “large searches” - those with a large number of search returns unlimited number of search returns (Portal is limited to 500); • larger display of search returns; • search returns can be sorted and filtered online in CT Data; and • search returns and the entire dataset can be downloaded from CT Data and used offline (e.g. download to Excel format) • metadata from searches can be copied from CT Data and pasted into the Portal search fields to quickly find single scanned images. The main advantages of the Portal are: • it provides access to scanned images of manifest documents (CT Data does not); and • images can be downloaded one or multiple at a time. WHAT MANIFESTS ARE INCLUDED IN DEEP’S MANIFEST PERMANENT RECORDS ARE ALSO AVAILABLE VIA THE DEEP DOCUMENT SEARCH PORTAL AND CT OPEN DATA? All hazardous waste manifest records received and maintained by the DEEP Manifest Program; including: • manifests originating from a Connecticut Generator or sent to a Connecticut Destination Facility including manifests accompanying an exported shipment • manifests with RCRA hazardous waste listed on them (such manifests may also have non-RCRA hazardous waste listed) • manifests from a Generator with a Connecticut Generator ID number (permanent or temporary number) • manifests with sufficient quantities of RCRA hazardous waste listed for DEEP to consider the Generator to be a Small or Large Quantity Generator • manifests with PCBs listed on them from 2016 to 6-29-2018. • Note: manifests sent to a CT Destination Facility were indexed by the Connecticut or Out of State Generator. Searches by CT Designated Facility are not possible unless such facility is the Generator for the purposes of manifesting. All other manifests were considered “non-hazardous” manifests and not scanned. They were discarded after 2 years in accord with DEEP records retention schedule. Non-hazardous manifests include: • Manifests with only non-RCRA hazardous waste listed • Manifests from generators that did not have a permanent or temporary Generator ID number • Sometimes non-hazardous manifests were considered “Hazar

  11. Open Data Portal Catalogue

    • open.canada.ca
    • datasets.ai
    • +1more
    csv, json, jsonl, png +2
    Updated Jul 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Treasury Board of Canada Secretariat (2025). Open Data Portal Catalogue [Dataset]. https://open.canada.ca/data/en/dataset/c4c5c7f1-bfa6-4ff6-b4a0-c164cb2060f7
    Explore at:
    csv, sqlite, json, png, jsonl, xlsxAvailable download formats
    Dataset updated
    Jul 13, 2025
    Dataset provided by
    Treasury Board of Canada Secretariathttp://www.tbs-sct.gc.ca/
    Treasury Board of Canadahttps://www.canada.ca/en/treasury-board-secretariat/corporate/about-treasury-board.html
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Description

    The open data portal catalogue is a downloadable dataset containing some key metadata for the general datasets available on the Government of Canada's Open Data portal. Resource 1 is generated using the ckanapi tool (external link) Resources 2 - 8 are generated using the Flatterer (external link) utility. ###Description of resources: 1. Dataset is a JSON Lines (external link) file where the metadata of each Dataset/Open Information Record is one line of JSON. The file is compressed with GZip. The file is heavily nested and recommended for users familiar with working with nested JSON. 2. Catalogue is a XLSX workbook where the nested metadata of each Dataset/Open Information Record is flattened into worksheets for each type of metadata. 3. datasets metadata contains metadata at the dataset level. This is also referred to as the package in some CKAN documentation. This is the main table/worksheet in the SQLite database and XLSX output. 4. Resources Metadata contains the metadata for the resources contained within each dataset. 5. resource views metadata contains the metadata for the views applied to each resource, if a resource has a view configured. 6. datastore fields metadata contains the DataStore information for CSV datasets that have been loaded into the DataStore. This information is displayed in the Data Dictionary for DataStore enabled CSVs. 7. Data Package Fields contains a description of the fields available in each of the tables within the Catalogue, as well as the count of the number of records each table contains. 8. data package entity relation diagram Displays the title and format for column, in each table in the Data Package in the form of a ERD Diagram. The Data Package resource offers a text based version. 9. SQLite Database is a .db database, similar in structure to Catalogue. This can be queried with database or analytical software tools for doing analysis.

  12. c

    IMDb Movies Metadata Dataset – 4.5M Records (Global Coverage)

    • crawlfeeds.com
    csv, zip
    Updated Jul 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). IMDb Movies Metadata Dataset – 4.5M Records (Global Coverage) [Dataset]. https://crawlfeeds.com/datasets/imdb-movies-metadata-dataset-4-5m-records-global-coverage
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    Jul 5, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    Unlock one of the most comprehensive movie datasets available—4.5 million structured IMDb movie records, extracted and enriched for data science, machine learning, and entertainment research.

    This dataset includes a vast collection of global movie metadata, including details on title, release year, genre, country, language, runtime, cast, directors, IMDb ratings, reviews, and synopsis. Whether you're building a recommendation engine, benchmarking trends, or training AI models, this dataset is designed to give you deep and wide access to cinematic data across decades and continents.

    Perfect for use in film analytics, OTT platforms, review sentiment analysis, knowledge graphs, and LLM fine-tuning, the dataset is cleaned, normalized, and exportable in multiple formats.

    What’s Included:

    • Genres: Drama, Comedy, Horror, Action, Sci-Fi, Documentary, and more

    • Delivery: Direct download

    Use Cases:

    • Train LLMs or chatbots on cinematic language and metadata

    • Build or enrich movie recommendation engines

    • Run cross-lingual or multi-region film analytics

    • Benchmark genre popularity across time periods

    • Power academic studies or entertainment dashboards

    • Feed into knowledge graphs, search engines, or NLP pipelines

  13. a

    Metadata At A Glance

    • montana-state-library-2022-floods-gis-data-hub-montana.hub.arcgis.com
    • geoenabled-elections-montana.hub.arcgis.com
    Updated Mar 16, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Montana Geographic Information (2016). Metadata At A Glance [Dataset]. https://montana-state-library-2022-floods-gis-data-hub-montana.hub.arcgis.com/datasets/metadata-at-a-glance
    Explore at:
    Dataset updated
    Mar 16, 2016
    Dataset authored and provided by
    Montana Geographic Information
    Area covered
    Description

    This is a graphic representation of the data stewards based on PLSS Townships in PLSS areas. In non-PLSS areas the metadata at a glance is based on a data steward defined polygons such as a city or county or other units. The identification of the data steward is a general indication of the agency that will be responsible for updates and providing the authoritative data sources. In other implementations this may have been termed the alternate source, meaning alternate to the BLM. But in the shared environment of the NSDI the data steward for an area is the primary coordinator or agency responsible for making updates or causing updates to be made. The data stewardship polygons are defined and provided by the data steward.

  14. Z

    Standard Sample Description V2 Structural Metadata

    • data.niaid.nih.gov
    • zenodo.org
    Updated Feb 3, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    European Food Safety Authority (2020). Standard Sample Description V2 Structural Metadata [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1215986
    Explore at:
    Dataset updated
    Feb 3, 2020
    Dataset authored and provided by
    European Food Safety Authority
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Standard Sample Description V2 is a specification aimed at harmonising the collection of analytical measurement data for the presence of harmful or beneficial chemical substances in food, feed and water. The specification is a list of standardised data elements (items describing characteristics of samples or analytical results such as country of origin, product, analytical method, limit of detection, result, etc.), linked to controlled terminologies. This specification uses EFSA FoodEx2 to describe sampled foods.

    This file has been prepared to support the publication of data and interoperability. This file indicates which data elements from the specification will not be published to ensure full protection of confidential/sensitive information, for example personal data in accordance with Regulation (EC) No 45/2001 and to protect commercial interests, including intellectual property as specified in Article 4(2), first indent, of Regulation (EC) No 1049/2001.

    The Excel table contains information about the structural metadata elements of the data collection and their fact tables.

    The column name shows the name of the element (e.g. localOrg). The column description describes how the content has to be interpreted. The column code expresses the corresponding code of the structural metadata element. The column optional says whether the structural metadata element is optional or not (then it is mandatory). The column dataType contains the type which can be used to fill the structural metadata element and the possible maximal length of the field. The possible types are: text or number. The column catalogue contains the name of the catalogue where the content of the structural metadata element has to be picked from (e.g. COUNTRY). The column data protection contains whether the structural metadata element will be published or not (yes = will not be published, no = will be published).

  15. Standard Sample Description V1 Structural Metadata

    • zenodo.org
    • data.niaid.nih.gov
    csv
    Updated Feb 3, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    European Food Safety Authority; European Food Safety Authority (2020). Standard Sample Description V1 Structural Metadata [Dataset]. http://doi.org/10.5281/zenodo.1215888
    Explore at:
    csvAvailable download formats
    Dataset updated
    Feb 3, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    European Food Safety Authority; European Food Safety Authority
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Standard Sample Description V1 is a specification aimed at harmonising the collection of analytical measurement data for the presence of harmful or beneficial chemical substances in food, feed and water. The specification is a list of standardised data elements (items describing characteristics of samples or analytical results such as country of origin, product, analytical method, limit of detection, result, etc.), linked to controlled terminologies. This file has been prepared to support the publication of data and interoperability. This file indicates which data elements from the specification will not be published to ensure full protection of confidential/sensitive information, for example personal data in accordance with Regulation (EC) No 45/2001 and to protect commercial interests, including intellectual property as specified in Article 4(2), first indent, of Regulation (EC) No 1049/2001.

    The Excel table contains information about the structural metadata elements of the data collection and their fact tables.

    The column name shows the name of the element (e.g. localOrg).
    The column description describes how the content has to be interpreted.
    The column code expresses the corresponding code of the structural metadata element.
    The column optional says whether the structural metadata element is optional or not (then it is mandatory).
    The column dataType contains the type which can be used to fill the structural metadata element and the possible maximal length of the field. The possible types are: text or number.
    The column catalogue contains the name of the catalogue where the content of the structural metadata element has to be picked from (e.g. COUNTRY).
    The column data protection contains whether the structural metadata element will be published or not (yes = will not be published, no = will be published).

  16. Libraries.io Data

    • kaggle.com
    zip
    Updated Mar 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Libraries.io (2019). Libraries.io Data [Dataset]. https://www.kaggle.com/librariesdotio/libraries-io
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 20, 2019
    Dataset provided by
    Libraries.iohttps://libraries.io/
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    In this release you will find data about software distributed and/or crafted publicly on the Internet. You will find information about its development, its distribution and its relationship with other software included as a dependency. You will not find any information about the individuals who create and maintain these projects.

    Content

    Libraries.io gathers data on open source software from 33 package managers and 3 source code repositories. We track over 2.4m unique open source projects, 25m repositories and 121m interdependencies between them. This gives Libraries.io a unique understanding of open source software.

    https://libraries.io/data

    Fork this kernel to get started with this dataset.

    Acknowledgements

    This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — https://libraries.io/data — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

    https://libraries.io/data

    https://bigquery.cloud.google.com/dataset/bigquery-public-data:libraries_io?_ga=2.42277601.-577194880.1523455401

    https://console.cloud.google.com/marketplace/details/libraries-io/librariesio

    Banner Photo by Caspar Rubin from Unplash.

    Inspiration

    What are the repositories, avg project size, and avg # of stars?

    What are the top dependencies per platform?

    What are the top unmaintained or deprecated projects?

  17. H

    Data from: A general purpose tool-set for representing data relationships:...

    • dataverse.harvard.edu
    Updated May 4, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joshua Stillerman, Thomas Fredian, Martin Greenwald, John Wright (2018). A general purpose tool-set for representing data relationships: Converting data into knowledge [Dataset]. http://doi.org/10.7910/DVN/SHYWLB
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 4, 2018
    Dataset provided by
    Harvard Dataverse
    Authors
    Joshua Stillerman, Thomas Fredian, Martin Greenwald, John Wright
    License

    https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/SHYWLBhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/SHYWLB

    Description

    Rich metadata is required to find and understand the recorded measurements from modern experiments with their immense and complex data stores. Systems to store and manage these metadata have improved over time, but in most cases are ad-hoc collections of data relationships, often represented in domain or site specific application code. We are developing a general set of tools to store, manage, and retrieve datarelationship metadata. These tools will be agnostic to the underlying data storage mechanisms, and to the data stored in them, making the system applicable across a wide range of science domains. Data management tools typically represent at least one relationship paradigm through implicit or explicit metadata. The addition of these metadata allows the data to be searched and understood by larger groups of users over longer periods of time. Using these systems, researchers are less dependent on one on one communication with the scientists involved in running the experiments, nor to rely on their ability to remember the details of their data. In the magnetic fusion research community, the MDSplus system is widely used to record raw and processed data from experiments. Users create a hierarchical relationship tree for each instance of their experiment, allowing them to record the meanings of what is recorded. Most users of this system, add to this a set of ad-hoc tools to help users locate specific experiment runs, which they can then access via this hierarchical organization. However, the MDSplus tree is only one possible organization of the records, and these additional applications that relate the experiment 'shots' into run days, experimental proposals, logbook entries, run summaries, analysis work flow, publications, etc. have up until now, been implemented on an experiment by experiment basis. The Metadata Provenance Ontology project, MPO, is a system built to record data provenance information about computed results. It allows users to record the inputs and outputs from each step of their computational workflows, in particular, what raw and processed data were used as inputs, what codes were run and what results were produced. The resulting collections of provenance graphs can be annotated, grouped, searched, filtered and browsed. This provides a powerful tool to record, understand, and locate computed results. However, this can be understood as one more specific data relationship, which can be construed as an instance of something more general. Building on concepts developed in these projects, we are developing a general system that could be used to represent all of these kinds of data relationships as mathematical graphs. Just as MDSplus and MPO were generalizations of data management needs for a collection of users, this new system will generalize the storage, location, and retrieval of the relationships between data. The system will store data relationships as data, not encoded in a set of application specific programs or ad hoc data structures. Stored data, would be referred to by URIs allowing the system to be agnostic to the underlying data representations. Users can then traverse these graphs. The system will allow users to construct a collection of graphs describing ANY OR ALL OF the relationships between data items, locate interesting data, see what other graphs these data are members of and navigate into and through them.

  18. d

    BLM ES OH PLSS Metadata Glance Polygon.

    • datadiscoverystudio.org
    • data.amerigeoss.org
    Updated Jun 8, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). BLM ES OH PLSS Metadata Glance Polygon. [Dataset]. http://datadiscoverystudio.org/geoportal/rest/metadata/item/982fa385e79c42db9c0221bd0764c70e/html
    Explore at:
    Dataset updated
    Jun 8, 2018
    Description

    description: This data represents the GIS Version of the Public Land Survey System including both rectangular and non-rectangular survey data. The rectangular survey data are a reference system for land tenure based upon meridian, township/range, section, section subdivision and government lots. The non-rectangular survey data represent surveys that were largely performed to protect and/or convey title on specific parcels of land such as mineral surveys and tracts. The data are largely complete in reference to the rectangular survey data at the level of first division. However, the data varies in terms of granularity of its spatial representation as well as its content below the first division. Therefore, depending upon the data source and steward, accurate subdivision of the rectangular data may not be available below the first division and the non-rectangular minerals surveys may not be present. At times, the complexity of surveys rendered the collection of data cost prohibitive such as in areas characterized by numerous, overlapping mineral surveys. In these situations, the data were often not abstracted or were only partially abstracted and incorporated into the data set. These PLSS data were compiled from a broad spectrum or sources including federal, county, and private survey records such as field notes and plats as well as map sources such as USGS 7 minute quadrangles. The metadata in each data set describes the production methods for the data content. This data is optimized for data publication and sharing rather than for specific "production" or operation and maintenance. A complete PLSS data set includes the following: PLSS Townships, First Divisions and Second Divisions (the hierarchical break down of the PLSS Rectangular surveys) PLSS Special surveys (non-rectangular components of the PLSS) Meandered Water, Corners, Metadata at a Glance (which identified last revised date and data steward) and Conflicted Areas (known areas of gaps or overlaps or inconsistencies). The Entity-Attribute section of this metadata describes these components in greater detail. This is a graphic representation of the data stewards based on PLSS Townships in PLSS areas. In non-PLSS areas the metadata at a glance is based on a data steward defined polygons such as a city or county or other units. The identification of the data steward is a general indication of the agency that will be responsible for updates and providing the authoritative data sources. In other implementations this may have been termed the alternate source, meaning alternate to the BLM. But in the shared environment of the NSDI the data steward for an area is the primary coordinator or agency responsible for making updates or causing updates to be made. The data stewardship polygons are defined and provided by the data steward.; abstract: This data represents the GIS Version of the Public Land Survey System including both rectangular and non-rectangular survey data. The rectangular survey data are a reference system for land tenure based upon meridian, township/range, section, section subdivision and government lots. The non-rectangular survey data represent surveys that were largely performed to protect and/or convey title on specific parcels of land such as mineral surveys and tracts. The data are largely complete in reference to the rectangular survey data at the level of first division. However, the data varies in terms of granularity of its spatial representation as well as its content below the first division. Therefore, depending upon the data source and steward, accurate subdivision of the rectangular data may not be available below the first division and the non-rectangular minerals surveys may not be present. At times, the complexity of surveys rendered the collection of data cost prohibitive such as in areas characterized by numerous, overlapping mineral surveys. In these situations, the data were often not abstracted or were only partially abstracted and incorporated into the data set. These PLSS data were compiled from a broad spectrum or sources including federal, county, and private survey records such as field notes and plats as well as map sources such as USGS 7 minute quadrangles. The metadata in each data set describes the production methods for the data content. This data is optimized for data publication and sharing rather than for specific "production" or operation and maintenance. A complete PLSS data set includes the following: PLSS Townships, First Divisions and Second Divisions (the hierarchical break down of the PLSS Rectangular surveys) PLSS Special surveys (non-rectangular components of the PLSS) Meandered Water, Corners, Metadata at a Glance (which identified last revised date and data steward) and Conflicted Areas (known areas of gaps or overlaps or inconsistencies). The Entity-Attribute section of this metadata describes these components in greater detail. This is a graphic representation of the data stewards based on PLSS Townships in PLSS areas. In non-PLSS areas the metadata at a glance is based on a data steward defined polygons such as a city or county or other units. The identification of the data steward is a general indication of the agency that will be responsible for updates and providing the authoritative data sources. In other implementations this may have been termed the alternate source, meaning alternate to the BLM. But in the shared environment of the NSDI the data steward for an area is the primary coordinator or agency responsible for making updates or causing updates to be made. The data stewardship polygons are defined and provided by the data steward.

  19. o

    dataset: Create interoperable and well-documented data frames

    • explore.openaire.eu
    Updated Jun 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Antal (2022). dataset: Create interoperable and well-documented data frames [Dataset]. http://doi.org/10.5281/zenodo.6854273
    Explore at:
    Dataset updated
    Jun 23, 2022
    Authors
    Daniel Antal
    Description

    See the package documentation website on dataset.dataobservatory.eu. Report bugs and suggestions on Github: https://github.com/dataobservatory-eu/dataset/issues The primary aim of dataset is to build well-documented data.frames, tibbles or data.tables that follow the W3C Data Cube Vocabulary based on the statistical SDMX data cube model. Such standard R objects (data.fame, data.table, tibble, or well-structured lists like json) become highly interoperable and can be placed into relational databases, semantic web applications, archives, repositories. They follow the FAIR principles: they are findable, accessible, interoperable and reusable. Our datasets: Contain Dublin Core or DataCite (or both) metadata that makes the findable and easier accessible via online libraries. See vignette article Datasets With FAIR Metadata. Their dimensions can be easily and unambigously reduced to triples for RDF applications; they can be easily serialized to, or synchronized with semantic web applications. See vignette article From dataset To RDF. Contain processing metadata that greatly enhance the reproducibility of the results, and the reviewability of the contents of the dataset, including metadata defined by the DDI Alliance, which is particularly helpful for not yet processed data; Follow the datacube model of the Statistical Data and Metadata eXchange, therefore allowing easy refreshing with new data from the source of the analytical work, and particularly useful for datasets containing results of statistical operations in R; Correct exporting with FAIR metadata to the most used file formats and straighforward publication to open science repositories with correct bibliographical and use metadata. See Export And Publish a dataset. Relatively lightweight in dependencies and easily works with data.frame, tibble or data.table R objects.

  20. H

    Bear Lake Data Repository

    • hydroshare.org
    zip
    Updated Sep 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeff Nielson; Katie Wadsworth (2024). Bear Lake Data Repository [Dataset]. https://www.hydroshare.org/resource/444e4bd2940e47e6bcab5e7966a929fe
    Explore at:
    zip(154.6 MB)Available download formats
    Dataset updated
    Sep 9, 2024
    Dataset provided by
    HydroShare
    Authors
    Jeff Nielson; Katie Wadsworth
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Bear Lake
    Description

    The Bear Lake Data Repository (BLDR) is an active archive, containing a growing compilation of biological, chemical, and physical datasets collected from Bear Lake and its surrounding watershed. The datasets herein have been digitized from historical records and reports, extracted from papers and theses, and obtained from public and private entities, including the United States Geological Survey, PacifiCorp, and, inter alia, Ecosystems Research Institute.

    Contributions are welcome. The BLDR accepts biological, chemical, or physical datasets obtained at Bear Lake, irrespective of funding source. There is no submission size limit at present—workarounds will be found if submissions exceed Hydroshare limits (20 GB). Contributions are published with an open access license and will serve many use cases. The current repository steward, Bear Lake Watch, will advise on submissions and make accepted contributions available promptly.

    Metadata files are provided for each dataset, however, contact with original contributor(s) is encouraged for questions and additional details prior to data usage. The BLDR and its contributors shall not be liable for any damages resulting from misinterpretation or misuse of the data or metadata.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Crawl Feeds (2025). Movies & TV Shows Metadata Dataset (190K+ Records, Horror-Heavy Collection) [Dataset]. https://crawlfeeds.com/datasets/movies-tv-shows-metadata-dataset-190k-records-horror-heavy-collection

Movies & TV Shows Metadata Dataset (190K+ Records, Horror-Heavy Collection)

Movies & TV Shows Metadata Dataset (190K+ Records, Horror-Heavy Collection) from

Explore at:
zip, csvAvailable download formats
Dataset updated
Jun 22, 2025
Dataset authored and provided by
Crawl Feeds
License

https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

Description

This comprehensive dataset features detailed metadata for over 190,000 movies and TV shows, with a strong concentration in the Horror genre. It is ideal for entertainment research, machine learning models, genre-specific trend analysis, and content recommendation systems.

Each record contains rich information, making it perfect for streaming platforms, film industry analysts, or academic media researchers.

Primary Genre Focus: Horror

Use Cases:

  • Build movie recommendation systems or genre classifiers

  • Train NLP models on movie descriptions

  • Analyze Horror content trends over time

  • Explore box office vs. rating correlations

  • Enrich entertainment datasets with directorial and cast metadata

Search
Clear search
Close search
Google apps
Main menu