The table metadata is part of the dataset Wikipedia Change Metadata, available at https://redivis.com/datasets/1ky2-8b1pvrv76. It contains 583596685 rows across 11 variables.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The LiLaH-HAG dataset (HAG is short for hate-age-gender) consists of metadata on Facebook comments to Facebook posts of mainstream media in Great Britain, Flanders, Slovenia and Croatia. The metadata available in the dataset are the hatefulness of the comment (0 is acceptable, 1 is hateful), age of the commenter (0-25, 26-30, 36-65, 65-), gender of the commenter (M or F), and the language in which the comment was written (EN, NL, SL, HR). The hatefulness of the comment was assigned by multiple well-trained annotators by reading comments in the order of appearance in a discussion thread, while the age and gender variables were estimated from the Facebook profile of a specific user by a single annotator.
CORRESPONDENCE AND OTHER WRITINGS OF SIX MAJOR SHAPERS OF THE UNITED STATES: George Washington, Benjamin Franklin, John Adams (and family), Thomas Jefferson, Alexander Hamilton, and James Madison. Over 180,000 searchable documents, fully annotated, from the authoritative Founding Fathers Papers projects.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Metadata of Machine Learning videos on YouTube.
This dataset contains meta data of 500 videos on machine learning. Simply first 500 videos when your search machine learning in youtube search.
Data scraped from https://wiki.digitalmethods.net/Dmi/ToolDatabase . Cover Photo: Photo by Rachit Tank on Unsplash.
Motivation : Dataset by Gabriel Preda
Using this dataset, analyse popularity of machine learning videos and channel with their like, dislike counts.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A special dataset that contains metadata for all the published datasets. Dataset profile fields conform to Dublin Core standard.
You can download metadata for individual datasets, via the links provided in descriptions.
Definitions of key terms related to this dataset can be found in the Open Data Portal Glossary: https://ukpowernetworks.opendatasoft.com/pages/glossary/
The Active Marine Station Metadata is a daily metadata report for active marine bouy and C-MAN (Coastal Marine Automated Network) platforms from the National Data Buoy Center (NDBC). Metadata includes the station id, latitude/longitude (resolution to thousandths of a degree), the station name, the station owner, the program the station is associated with (e.g., TAO, NDBC, tsunami, NOS, etc.), station type (e.g., buoy, fixed, oil rig, etc.), notification if the station observes meteorology, currents, and water quality (signified by 'y' for yes and 'n' for no). If there is a 'y' associated with one of these tags, then the station has reported data in that category within the last 8 hours (or 24 hours for DART stations--Deep-Ocean Assessment Reporting of Tsunamis). If there is an 'n', data has not been received within those times. Stations are removed from the list when they are dismantled. The metadata information is written to a daily XML-formatted file.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This paper reports on a study exploring ‘metadata capital’ acquired via metadata reuse. Collaborative modeling and content analysis methods were used to study metadata capital in the Dryad data repository. A sample of 20 cases for two Dryad metadata workflows (Case A and Case B) consisting of 100 instantiations (60 metadata objects, 40 metadata activities) was analyzed. Results indicate that Dryad’s overall workflow builds metadata capital, with the total metadata reuse at 50% or greater for 8 of 12 metadata properties, and 5 of these 8 properties showing reuse at 80% or higher. Metadata reuse is frequent for basic bibliographic properties (e.g., author, title, subject), although it is limited or absent for more complex scientific properties (e.g., taxon, spatial, and temporal information). This paper provides background context, reports the research approach and findings, and considers research implications and system design priorities that may contribute to metadata capital—long term.
https://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/
Repository for SPASE description for resources under NASA Community Coordinated Modeling Center (CCMC) authority.
Note: Please use the following view to be able to see the entire Dataset Description: https://data.ct.gov/Environment-and-Natural-Resources/Hazardous-Waste-Portal-Manifest-Metadata/x2z6-swxe
Dataset Description Outline (5 sections)
• INTRODUCTION
• WHY USE THE CONNECTICUT OPEN DATA PORTAL MANIFEST METADATA DATASET INSTEAD OF THE DEEP DOCUMENT ONLINE SEARCH PORTAL ITSELF?
• WHAT MANIFESTS ARE INCLUDED IN DEEP’S MANIFEST PERMANENT RECORDS ARE ALSO AVAILABLE VIA THE DEEP DOCUMENT SEARCH PORTAL AND CT OPEN DATA?
• HOW DOES THE PORTAL MANIFEST METADATA DATASET RELATE TO THE OTHER TWO MANIFEST DATASETS PUBLISHED IN CT OPEN DATA?
• IMPORTANT NOTES
INTRODUCTION • All of DEEP’s paper hazardous waste manifest records were recently scanned and “indexed”. • Indexing consisted of 6 basic pieces of information or “metadata” taken from each manifest about the Generator and stored with the scanned image. The metadata enables searches by: Site Town, Site Address, Generator Name, Generator ID Number, Manifest ID Number and Date of Shipment. • All of the metadata and scanned images are available electronically via DEEP’s Document Online Search Portal at: https://filings.deep.ct.gov/DEEPDocumentSearchPortal/ • Therefore, it is no longer necessary to visit the DEEP Records Center in Hartford for manifest records or information. • This CT Data dataset “Hazardous Waste Portal Manifest Metadata” (or “Portal Manifest Metadata”) was copied from the DEEP Document Online Search Portal, and includes only the metadata – no images.
WHY USE THE CONNECTICUT OPEN DATA PORTAL MANIFEST METADATA DATASET INSTEAD OF THE DEEP DOCUMENT ONLINE SEARCH PORTAL ITSELF? The Portal Manifest Metadata is a good search tool to use along with the Portal. Searching the Portal Manifest Metadata can provide the following advantages over searching the Portal: • faster searches, especially for “large searches” - those with a large number of search returns unlimited number of search returns (Portal is limited to 500); • larger display of search returns; • search returns can be sorted and filtered online in CT Data; and • search returns and the entire dataset can be downloaded from CT Data and used offline (e.g. download to Excel format) • metadata from searches can be copied from CT Data and pasted into the Portal search fields to quickly find single scanned images. The main advantages of the Portal are: • it provides access to scanned images of manifest documents (CT Data does not); and • images can be downloaded one or multiple at a time.
WHAT MANIFESTS ARE INCLUDED IN DEEP’S MANIFEST PERMANENT RECORDS ARE ALSO AVAILABLE VIA THE DEEP DOCUMENT SEARCH PORTAL AND CT OPEN DATA? All hazardous waste manifest records received and maintained by the DEEP Manifest Program; including: • manifests originating from a Connecticut Generator or sent to a Connecticut Destination Facility including manifests accompanying an exported shipment • manifests with RCRA hazardous waste listed on them (such manifests may also have non-RCRA hazardous waste listed) • manifests from a Generator with a Connecticut Generator ID number (permanent or temporary number) • manifests with sufficient quantities of RCRA hazardous waste listed for DEEP to consider the Generator to be a Small or Large Quantity Generator • manifests with PCBs listed on them from 2016 to 6-29-2018. • Note: manifests sent to a CT Destination Facility were indexed by the Connecticut or Out of State Generator. Searches by CT Designated Facility are not possible unless such facility is the Generator for the purposes of manifesting.
All other manifests were considered “non-hazardous” manifests and not scanned. They were discarded after 2 years in accord with DEEP records retention schedule. Non-hazardous manifests include: • Manifests with only non-RCRA hazardous waste listed • Manifests from generators that did not have a permanent or temporary Generator ID number • Sometimes non-hazardous manifests were considered “Hazardous Manifests” and kept on file if DEEP had reason to believe the generator should have had a permanent or temporary Generator ID number. These manifests were scanned and included in the Portal.
Dates included: manifests with shipment dates from 1980 to present • States were the primary keepers of manifest records until June 29, 2018. Any manifest regarding a Connecticut Generator or Destination Facility should have been sent to DEEP, and should be present in the Portal and CT Data. • June 30, 2018 was the start of the EPA e-Manifest program. Most manifests with a shipment date on and after this date are sent to, and maintained by the EPA. • For information from EPA regarding these newer manifests: • Overview: https://rcrapublic.epa.gov/rcrainfoweb/action/modules/em/emoverview • To search by site, use EPA’s Sites List: https://rcrapublic.epa.gov/rcrainfoweb/action/modules/hd/handlerindex (Tip: Change the Location field from “National” to “Connecticut”) • Manifests still sent to DEEP on or after 6-30-2018 include: • manifests from exported shipments; and • manifest copies submitted pursuant to discrepancy reports and unmanifested shipments.
HOW DOES THE PORTAL MANIFEST METADATA RELATE TO THE OTHER TWO MANIFEST DATASETS PUBLISHED IN CT DATA?
• DEEP has posted in CT Data two other datasets about the same hazardous waste documents which are the subject of the Portal and the Portal Manifest Metadata Copy.
• There are likely some differences in the metadata between the Portal Manifest Metadata and the two others. DEEP recommends using all data sources for a complete search.
• These two datasets were the best search tool DEEP had available to the public prior to the Portal and the Metadata Copy:
• “Hazardous Waste Manifest Data (CT) 1984 – 2008”
https://data.ct.gov/Environment-and-Natural-Resources/Hazardous-Waste-Manifest-Data-CT-1984-2008/h6d8-qiar; and
• “Hazardous Waste Manifest Data (CT) 1984 – 2008: Generator Summary View”
https://data.ct.gov/Environment-and-Natural-Resources/Hazardous-Waste-Manifest-Data-CT-1984-2008-Generat/72mi-3f82.
• The only difference between these two datasets is:
• the first dataset includes all of the metadata transcribed from the manifests.
• the second “Generator Summary View” dataset is a smaller subset of the first, requested for convenience by the public.
Both of these datasets:
• Are copies of metadata from a manifest database maintained by DEEP. No scanned images are available as a companion to these datasets.
• The date range of the manifests for these datasets is 1984 to approximately 2008.
IMPORTANT NOTES (4): NOTE 1: Some manifest images are effectively unavailable via the Portal and the Portal Metadata due to incomplete or incorrect metadata. Such errors may be the result of unintentional data entry error, errors on the manifests or illegible manifests. • Incomplete or incorrect metadata may prevent a manifest from being found by a search. DEEP is currently working to complete the metadata as best it can. • Please report errors to the DEEP Manifest Program at deep.manifests@ct.gov. • DEEP will publish updates regarding this work here and through the DEEP Hazardous Waste Advisory Committee listserv. To sign up for this listserv, visit this webpage: https://portal.ct.gov/DEEP/Waste-Management-and-Disposal/Hazardous-Waste-Advisory-Committee/HWAC-Home. NOTE 2: This dataset does not replace the potential need for a full review of other files publicly available either on-line and/or at CT DEEP’s Records Center. For a complete review of agency records for this or other agency programs, you can perform your own search in our DEEP public file room located at 79 Elm Street, Hartford CT or at our DEEP Online Search Portal at: https://filings.deep.ct.gov/DEEPDocumentSearchPortal/Home. NOTE 3: Other DEEP programs or state and federal agencies may maintain manifest records (e.g., DEEP Emergency Response, US Environmental Protection Agency, etc.) These other manifests were not scanned along with those from the Manifest Program files. However, most likely these other manifests are duplicate copies of manifests available via the Portal. NOTE 4: search tips for using the Portal and CT Data: • If your search will yield a small number of search returns, try using the Portal for your search. “Small” is meant to mean fewer than the 500 maximum search returns allowed using the Portal. • Start your search as broadly as possible – try entering just the town and the street name, or a portion of the street name that is likely to be spelled correctly • For searches yielding a large number of search returns, try using first the Portal Manifest Metadata in CT Data. • Try downloading the metadata and sorting, filtering, etc. the data to look for related spellings, etc. • Once you narrow down you research, copy the manifest number of a manifest you are interested in, and paste it into the Agency ID field of the Portal search page. • If you are using information from older information sources for consistency, you may want to search the two datasets copied from the older DEEP Manifest Database.
https://choosealicense.com/licenses/afl-3.0/https://choosealicense.com/licenses/afl-3.0/
abross/channel-metadata dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Metadata for the sequencing analysis
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The datasets containing metadata in MODS for the entire BHL collection (both hosted and externally linked content) can be downloaded from the following locations:
bhlitem.mods.xml bhlitem.mods.xml.zip bhlpart.mods.xml bhlpart.mods.xml.zip bhltitle.mods.xml bhltitle.mods.xml.zip
For contextual information and key definitions about this dataset see the Biodiversity Heritage Library Open Data Collection.
Data Dictionary:https://www.loc.gov/standards/mods/v3/mods-3-8.xsd Release Date: First of the month Frequency: Monthly bureauCode: 452:11 Access Level: public Rights: http://rightsstatements.org/vocab/NoC-US/
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This simple shell script loops through all .pdf files in your current working directory and exports metadata extracted with exiftool to a plaintext file of the same name as the original but with the extension changed from .pdf to .meta Obviously you'll need to have exiftool downloaded and installed properly beforehand for this to work.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
A list of the data sets that will be made available in the sub heading
This is a summary of the community input to the Fond du Lac Band of Lake Superior Chippewa's health impact assessment to inform their baseline health assessment. This dataset is not publicly accessible because: It is data that belongs to the Fond du Lac Band of Lake Superior Chippewa. It can be accessed through the following means: Contact Nancy Schuldt Fond du Lac Band Natural Resources NancySchuldt@FDLREZ.COM. Format: Site of data collection (community meeting or survey), individual quotes (this is text data)
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Additional composition metadata aligned with IHE-XDS which is not already available from the Reference Model COMPOSITION class.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Metadata presented include high resolution respiration data from Janthinobacterium sp. CG3 for three dissolved organic matter samples Cotton Glacier Supraglacial stream, Pony Lake fulvic acid, and Suwannee River Natural Organic Matter (NOM).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The purpose of VAEM is to provide, by import, a foundation for commonly needed resources when building an ontology. @en
This dataset supports a paper being written about metadata standard use by geoscience data repositories. The study is being done to better understand which metadata standards and keyword vocabularies are prominent within the geoscience data repository landscape. The findings should be useful for NCAR's evaluation of metadata standards within our own systems, as well as by external data repository staff. The guiding questions of the study are as follows: 1. What metadata standards are geoscience data repositories using? 2. What keyword / subject term vocabularies are they using? 3. What interoperability challenges are present in the use of metadata and keyword vocabulary standards within the geoscience repository community?
DataONE has consistently focused on interoperability among data repositories to enable seamless access to well-described data on the Earth and the environment. Our existing services promote data discovery and access through harmonization of the diverse metadata specifications used across communities, and through our integrated data search portal and services. In terms of the FAIR principles, we have done a good job at Findable and Accessible, while as a community we have placed less emphasis on Interoperable and Reusable. We present new DataONE services for quantitatively assessing metadata completeness and effectiveness relative to the FAIR principles. The services produce guidance for FAIRness at both the level of an individual data set and trends through time for repository, user, and funder data collections. These analytical results regarding conformance to FAIR principles are preliminary and based on proposed quantitative assessment metrics for FAIR which will be changed with input from the community. Thus, these results should not be viewed as conclusive about the data sets presented, but rather illustrate the types of quantitative comparisons that will be able to be made when the FAIR metrics at DataONE have been finalized. meta update
The table metadata is part of the dataset Wikipedia Change Metadata, available at https://redivis.com/datasets/1ky2-8b1pvrv76. It contains 583596685 rows across 11 variables.