66 datasets found
  1. BIDS Phenotype Aggregation Example Dataset

    • openneuro.org
    Updated Jun 4, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Samuel Guay; Eric Earl; Hao-Ting Wang; Remi Gau; Dorota Jarecka; David Keator; Melissa Kline Struhl; Satra Ghosh; Louis De Beaumont; Adam G. Thomas (2022). BIDS Phenotype Aggregation Example Dataset [Dataset]. http://doi.org/10.18112/openneuro.ds004130.v1.0.0
    Explore at:
    Dataset updated
    Jun 4, 2022
    Dataset provided by
    OpenNeurohttps://openneuro.org/
    Authors
    Samuel Guay; Eric Earl; Hao-Ting Wang; Remi Gau; Dorota Jarecka; David Keator; Melissa Kline Struhl; Satra Ghosh; Louis De Beaumont; Adam G. Thomas
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    BIDS Phenotype Aggregation Example COPY OF "The NIMH Healthy Research Volunteer Dataset" (ds003982)

    Modality-agnostic files were copied over and the CHANGES file was updated. Data was aggregated using:

    python phenotype.py aggregate subject -i segregated_subject -o aggregated_subject

    phenotype.py came from the GitHub repository: https://github.com/ericearl/bids-phenotype

    THE ORIGINAL DATASET ds003982 README FOLLOWS

    A comprehensive clinical, MRI, and MEG collection characterizing healthy research volunteers collected at the National Institute of Mental Health (NIMH) Intramural Research Program (IRP) in Bethesda, Maryland using medical and mental health assessments, diagnostic and dimensional measures of mental health, cognitive and neuropsychological functioning, structural and functional magnetic resonance imaging (MRI), along with diffusion tensor imaging (DTI), and a comprehensive magnetoencephalography battery (MEG).

    In addition, blood samples are currently banked for future genetic analysis. All data collected in this protocol are broadly shared in the OpenNeuro repository, in the Brain Imaging Data Structure (BIDS) format. In addition, blood samples of healthy volunteers are banked for future analyses. All data collected in this protocol are broadly shared here, in the Brain Imaging Data Structure (BIDS) format. In addition, task paradigms and basic pre-processing scripts are shared on GitHub. This dataset is unique in its depth of characterization of a healthy population in terms of brain health and will contribute to a wide array of secondary investigations of non-clinical and clinical research questions.

    This dataset is licensed under the Creative Commons Zero (CC0) v1.0 License.

    Recruitment

    Inclusion criteria for the study require that participants are adults at or over 18 years of age in good health with the ability to read, speak, understand, and provide consent in English. All participants provided electronic informed consent for online screening and written informed consent for all other procedures. Exclusion criteria include:

    • A history of significant or unstable medical or mental health condition requiring treatment
    • Current self-injury, suicidal thoughts or behavior
    • Current illicit drug use by history or urine drug screen
    • Abnormal physical exam or laboratory result at the time of in-person assessment
    • Less than an 8th grade education or IQ below 70
    • Current employees, or first-degree relatives of NIMH employees

    Study participants are recruited through direct mailings, bulletin boards and listservs, outreach exhibits, print advertisements, and electronic media.

    Clinical Measures

    All potential volunteers first visit the study website (https://nimhresearchvolunteer.ctss.nih.gov), check a box indicating consent, and complete preliminary self-report screening questionnaires. The study website is HIPAA compliant and therefore does not collect PII ; instead, participants are instructed to contact the study team to provide their identity and contact information. The questionnaires include demographics, clinical history including medications, disability status (WHODAS 2.0), mental health symptoms (modified DSM-5 Self-Rated Level 1 Cross-Cutting Symptom Measure), substance use survey (DSM-5 Level 2), alcohol use (AUDIT), handedness (Edinburgh Handedness Inventory), and perceived health ratings. At the conclusion of the questionnaires, participants are again prompted to send an email to the study team. Survey results, supplemented by NIH medical records review (if present), are reviewed by the study team, who determine if the participant is likely eligible for the protocol. These participants are then scheduled for an in-person assessment. Follow-up phone screenings were also used to determine if participants were eligible for in-person screening.

    In-person Assessments

    At this visit, participants undergo a comprehensive clinical evaluation to determine final eligibility to be included as a healthy research volunteer. The mental health evaluation consists of a psychiatric diagnostic interview (Structured Clinical Interview for DSM-5 Disorders (SCID-5), along with self-report surveys of mood (Beck Depression Inventory-II (BD-II) and anxiety (Beck Anxiety Inventory, BAI) symptoms. An intelligence quotient (IQ) estimation is determined with the Kaufman Brief Intelligence Test, Second Edition (KBIT-2). The KBIT-2 is a brief (20-30 minute) assessment of intellectual functioning administered by a trained examiner. There are three subtests, including verbal knowledge, riddles, and matrices.

    Medical Evaluation

    Medical evaluation includes medical history elicitation and systematic review of systems. Biological and physiological measures include vital signs (blood pressure, pulse), as well as weight, height, and BMI. Blood and urine samples are taken and a complete blood count, acute care panel, hepatic panel, thyroid stimulating hormone, viral markers (HCV, HBV, HIV), C-reactive protein, creatine kinase, urine drug screen and urine pregnancy tests are performed. In addition, blood samples that can be used for future genomic analysis, development of lymphoblastic cell lines or other biomarker measures are collected and banked with the NIMH Repository and Genomics Resource (Infinity BiologiX). The Family Interview for Genetic Studies (FIGS) was later added to the assessment in order to provide better pedigree information; the Adverse Childhood Events (ACEs) survey was also added to better characterize potential risk factors for psychopathology. The entirety of the in-person assessment not only collects information relevant for eligibility determination, but it also provides a comprehensive set of standardized clinical measures of volunteer health that can be used for secondary research.

    MRI Scan

    Participants are given the option to consent for a magnetic resonance imaging (MRI) scan, which can serve as a baseline clinical scan to determine normative brain structure, and also as a research scan with the addition of functional sequences (resting state and diffusion tensor imaging). The MR protocol used was initially based on the ADNI-3 basic protocol, but was later modified to include portions of the ABCD protocol in the following manner:

    1. The T1 scan from ADNI3 was replaced by the T1 scan from the ABCD protocol.
    2. The Axial T2 2D FLAIR acquisition from ADNI2 was added, and fat saturation turned on.
    3. Fat saturation was turned on for the pCASL acquisition.
    4. The high-resolution in-plane hippocampal 2D T2 scan was removed and replaced with the whole brain 3D T2 scan from the ABCD protocol (which is resolution and bandwidth matched to the T1 scan).
    5. The slice-select gradient reversal method was turned on for DTI acquisition, and reconstruction interpolation turned off.
    6. Scans for distortion correction were added (reversed-blip scans for DTI and resting state scans).
    7. The 3D FLAIR sequence was made optional and replaced by one where the prescription and other acquisition parameters provide resolution and geometric correspondence between the T1 and T2 scans.

    At the time of the MRI scan, volunteers are administered a subset of tasks from the NIH Toolbox Cognition Battery. The four tasks include:

    1. Flanker inhibitory control and attention task assesses the constructs of attention and executive functioning.
    2. Executive functioning is also assessed using a dimensional change card sort test.
    3. Episodic memory is evaluated using a picture sequence memory test.
    4. Working memory is evaluated using a list sorting test.

    MEG

    An optional MEG study was added to the protocol approximately one year after the study was initiated, thus there are relatively fewer MEG recordings in comparison to the MRI dataset. MEG studies are performed on a 275 channel CTF MEG system (CTF MEG, Coquiltam BC, Canada). The position of the head was localized at the beginning and end of each recording using three fiducial coils. These coils were placed 1.5 cm above the nasion, and at each ear, 1.5 cm from the tragus on a line between the tragus and the outer canthus of the eye. For 48 participants (as of 2/1/2022), photographs were taken of the three coils and used to mark the points on the T1 weighted structural MRI scan for co-registration. For the remainder of the participants (n=16 as of 2/1/2022), a Brainsight neuronavigation system (Rogue Research, Montréal, Québec, Canada) was used to coregister the MRI and fiducial localizer coils in realtime prior to MEG data acquisition.

    Specific Measures within Dataset

    Online and In-person behavioral and clinical measures, along with the corresponding phenotype file name, sorted first by measurement location and then by file name.

    LocationMeasureFile Name
    OnlineAlcohol Use Disorders Identification Test (AUDIT)audit
    Demographicsdemographics
    DSM-5 Level 2 Substance Use - Adultdrug_use
    Edinburgh Handedness Inventory (EHI)ehi
    Health History Formhealth_history_questions
    Perceived Health Rating - selfhealth_rating
  2. Patent Data

    • kaggle.com
    zip
    Updated Apr 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dushyant Rathore (2022). Patent Data [Dataset]. https://www.kaggle.com/datasets/dushyantrathore/patent-data
    Explore at:
    zip(5314449 bytes)Available download formats
    Dataset updated
    Apr 11, 2022
    Authors
    Dushyant Rathore
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The dataset contains the details of Patent Litigation Cases in the United States from 2000 to 2021. The team collected the litigation data in two phases. The first phase looked at data from 2010, specifically within Texas's Western and Eastern Districts. Unified Patent's Portal includes litigation data that each plaintiff has been marked as NPE (Patent Assertion Entity), NPE (Small Company), or NPE (Individual).

    Using the definitions, Unified first focused on identifying what NPEs were aggregators and then if they involved third-party financing. NPE aggregators were defined as NPEs with more than one affiliated subsidiary bringing patent litigation. An example of this would be IP Edge and the various limited liability companies underneath IP Edge's control that have brought numerous litigations against operating companies. Third-party financing was defined as evidence of any third party with a financial interest other than the assertors.

    With a narrow focus on the Western and Eastern District of Texas, Unified then used several public databases, such as Edgar, USPTO Assignment Records, the NPE Stanford Database, press releases, and its database of NPEs to identify any aggregator and any third-party financial interest, as well as various secretary of state corporate filings or court-ordered disclosures. After these two districts were identified, Unified expanded the data to cover the top five most litigious venues for patents, including the Western and Eastern Districts of Texas, Delaware, and the North and Central Districts of California. (On average, over the past five years, these districts have seen about 70% of all patent litigation.) Once that was completed, that dataset was then expanded to include all jurisdictions from 2010 and on.

    The final step was to complete the data set from 2000 to 2009. The team followed a similar data collection process using Lex Machina, the NPE Stanford Database, and Unified's Portal. Unified identified all of the litigation known to be NPE-related. Using the top five jurisdictions' aggregation and financing data, aggregator entities—such as Intellectual Ventures—were identified using the same methodology. The current dataset covers 2000-2021, determines who is an NPE, notes which NPEs are aggregators, and identifies which aggregators are known to have third-party financing.

    Note: there are currently no reporting requirements Federally, at the state level, or in the courts to publicly disclose the financing details of nonpublic entities. Thus, any data analysis of which litigations are funded or financed is incomplete, as many of these arrangements are closely held, private, and unknown even to the courts and the parties to the actions. This data set describes the minimum known amount of third-party-funded patent litigation. It is necessarily underinclusive of all nonpublic deals for which there is no available evidence or insight. For further generalized industry information on the size and scope of litigation funding for patent litigations, private sources often report on the size and scope of the burgeoning industry in the aggregate. For example, see Westfleet Advisor's 2021 Litigation Finance Report, available at https://www.westfleetadvisors.com/publications/2021-litigation-finance-report/.

  3. Genome Aggregation Database (gnomAD) - Data Lakehouse Ready

    • registry.opendata.aws
    Updated Sep 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amazon Web Services (2021). Genome Aggregation Database (gnomAD) - Data Lakehouse Ready [Dataset]. https://registry.opendata.aws/gnomad-data-lakehouse-ready/
    Explore at:
    Dataset updated
    Sep 13, 2021
    Dataset provided by
    Amazon Web Serviceshttp://aws.amazon.com/
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The Genome Aggregation Database (gnomAD) is a resource developed by an international coalition of investigators that aggregates and harmonizes both exome and genome data from a wide range of large-scale human sequencing projects Sign up for the gnomAD mailing list here. This dataset was derived from summary data from gnomAD release 3.1, available on the Registry of Open Data on AWS for ready enrollment into the Data Lake as Code.

  4. Data from: Meta-analysis of aggregate data on medical events

    • kaggle.com
    zip
    Updated Nov 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mahdieh hajian (2024). Meta-analysis of aggregate data on medical events [Dataset]. https://www.kaggle.com/datasets/mahdiehhajian/meta-analysis-of-aggregate-data-on-medical-events/code
    Explore at:
    zip(1957 bytes)Available download formats
    Dataset updated
    Nov 18, 2024
    Authors
    mahdieh hajian
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset provided by = Björn Holzhauer

    Dataset Description==Meta-analyses of clinical trials often treat the number of patients experiencing a medical event as binomially distributed when individual patient data for fitting standard time-to-event models are unavailable. Assuming identical drop-out time distributions across arms, random censorship and low proportions of patients with an event, a binomial approach results in a valid test of the null hypothesis of no treatment effect with minimal loss in efficiency compared to time-to-event methods. To deal with differences in follow-up - at the cost of assuming specific distributions for event and drop-out times - we propose a hierarchical multivariate meta-analysis model using the aggregate data likelihood based on the number of cases, fatal cases and discontinuations in each group, as well as the planned trial duration and groups sizes. Such a model also enables exchangeability assumptions about parameters of survival distributions, for which they are more appropriate than for the expected proportion of patients with an event across trials of substantially different length. Borrowing information from other trials within a meta-analysis or from historical data is particularly useful for rare events data. Prior information or exchangeability assumptions also avoid the parameter identifiability problems that arise when using more flexible event and drop-out time distributions than the exponential one. We discuss the derivation of robust historical priors and illustrate the discussed methods using an example. We also compare the proposed approach against other aggregate data meta-analysis methods in a simulation study.

  5. d

    FHV Base Aggregate Report

    • catalog.data.gov
    • data.cityofnewyork.us
    • +2more
    Updated Nov 29, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.cityofnewyork.us (2025). FHV Base Aggregate Report [Dataset]. https://catalog.data.gov/dataset/fhv-base-aggregate-report
    Explore at:
    Dataset updated
    Nov 29, 2025
    Dataset provided by
    data.cityofnewyork.us
    Description

    Monthly report including total dispatched trips, total dispatched shared trips, and unique dispatched vehicles aggregated by FHV (For-Hire Vehicle) base. These have been tabulated from raw trip record submissions made by bases to the NYC Taxi and Limousine Commission (TLC). This dataset is typically updated monthly on a two-month lag, as bases have until the conclusion of the following month to submit a month of trip records to the TLC. In example, a base has until Feb 28 to submit complete trip records for January. Therefore, the January base aggregates will appear in March at the earliest. The TLC may elect to defer updates to the FHV Base Aggregate Report if a large number of bases have failed to submit trip records by the due date. Note: The TLC publishes base trip record data as submitted by the bases, and we cannot guarantee or confirm their accuracy or completeness. Therefore, this may not represent the total amount of trips dispatched by all TLC-licensed bases. The TLC performs routine reviews of the records and takes enforcement actions when necessary to ensure, to the extent possible, complete and accurate information.

  6. n

    Jurisdictional Unit (Public) - Dataset - CKAN

    • nationaldataplatform.org
    Updated Feb 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Jurisdictional Unit (Public) - Dataset - CKAN [Dataset]. https://nationaldataplatform.org/catalog/dataset/jurisdictional-unit-public
    Explore at:
    Dataset updated
    Feb 28, 2024
    Description

    Jurisdictional Unit, 2022-05-21. For use with WFDSS, IFTDSS, IRWIN, and InFORM.This is a feature service which provides Identify and Copy Feature capabilities. If fast-drawing at coarse zoom levels is a requirement, consider using the tile (map) service layer located at https://nifc.maps.arcgis.com/home/item.html?id=3b2c5daad00742cd9f9b676c09d03d13.OverviewThe Jurisdictional Agencies dataset is developed as a national land management geospatial layer, focused on representing wildland fire jurisdictional responsibility, for interagency wildland fire applications, including WFDSS (Wildland Fire Decision Support System), IFTDSS (Interagency Fuels Treatment Decision Support System), IRWIN (Interagency Reporting of Wildland Fire Information), and InFORM (Interagency Fire Occurrence Reporting Modules). It is intended to provide federal wildland fire jurisdictional boundaries on a national scale. The agency and unit names are an indication of the primary manager name and unit name, respectively, recognizing that:There may be multiple owner names.Jurisdiction may be held jointly by agencies at different levels of government (ie State and Local), especially on private lands, Some owner names may be blocked for security reasons.Some jurisdictions may not allow the distribution of owner names. Private ownerships are shown in this layer with JurisdictionalUnitIdentifier=null,JurisdictionalUnitAgency=null, JurisdictionalUnitKind=null, and LandownerKind="Private", LandownerCategory="Private". All land inside the US country boundary is covered by a polygon.Jurisdiction for privately owned land varies widely depending on state, county, or local laws and ordinances, fire workload, and other factors, and is not available in a national dataset in most cases.For publicly held lands the agency name is the surface managing agency, such as Bureau of Land Management, United States Forest Service, etc. The unit name refers to the descriptive name of the polygon (i.e. Northern California District, Boise National Forest, etc.).These data are used to automatically populate fields on the WFDSS Incident Information page.This data layer implements the NWCG Jurisdictional Unit Polygon Geospatial Data Layer Standard.Relevant NWCG Definitions and StandardsUnit2. A generic term that represents an organizational entity that only has meaning when it is contextualized by a descriptor, e.g. jurisdictional.Definition Extension: When referring to an organizational entity, a unit refers to the smallest area or lowest level. Higher levels of an organization (region, agency, department, etc) can be derived from a unit based on organization hierarchy.Unit, JurisdictionalThe governmental entity having overall land and resource management responsibility for a specific geographical area as provided by law.Definition Extension: 1) Ultimately responsible for the fire report to account for statistical fire occurrence; 2) Responsible for setting fire management objectives; 3) Jurisdiction cannot be re-assigned by agreement; 4) The nature and extent of the incident determines jurisdiction (for example, Wildfire vs. All Hazard); 5) Responsible for signing a Delegation of Authority to the Incident Commander.See also: Unit, Protecting; LandownerUnit IdentifierThis data standard specifies the standard format and rules for Unit Identifier, a code used within the wildland fire community to uniquely identify a particular government organizational unit.Landowner Kind & CategoryThis data standard provides a two-tier classification (kind and category) of landownership. Attribute Fields JurisdictionalAgencyKind Describes the type of unit Jurisdiction using the NWCG Landowner Kind data standard. There are two valid values: Federal, and Other. A value may not be populated for all polygons.JurisdictionalAgencyCategoryDescribes the type of unit Jurisdiction using the NWCG Landowner Category data standard. Valid values include: ANCSA, BIA, BLM, BOR, DOD, DOE, NPS, USFS, USFWS, Foreign, Tribal, City, County, OtherLoc (other local, not in the standard), State. A value may not be populated for all polygons.JurisdictionalUnitNameThe name of the Jurisdictional Unit. Where an NWCG Unit ID exists for a polygon, this is the name used in the Name field from the NWCG Unit ID database. Where no NWCG Unit ID exists, this is the “Unit Name” or other specific, descriptive unit name field from the source dataset. A value is populated for all polygons.JurisdictionalUnitIDWhere it could be determined, this is the NWCG Standard Unit Identifier (Unit ID). Where it is unknown, the value is ‘Null’. Null Unit IDs can occur because a unit may not have a Unit ID, or because one could not be reliably determined from the source data. Not every land ownership has an NWCG Unit ID. Unit ID assignment rules are available from the Unit ID standard, linked above.LandownerKindThe landowner category value associated with the polygon. May be inferred from jurisdictional agency, or by lack of a jurisdictional agency. A value is populated for all polygons. There are three valid values: Federal, Private, or Other.LandownerCategoryThe landowner kind value associated with the polygon. May be inferred from jurisdictional agency, or by lack of a jurisdictional agency. A value is populated for all polygons. Valid values include: ANCSA, BIA, BLM, BOR, DOD, DOE, NPS, USFS, USFWS, Foreign, Tribal, City, County, OtherLoc (other local, not in the standard), State, Private.DataSourceThe database from which the polygon originated. Be as specific as possible, identify the geodatabase name and feature class in which the polygon originated.SecondaryDataSourceIf the Data Source is an aggregation from other sources, use this field to specify the source that supplied data to the aggregation. For example, if Data Source is "PAD-US 2.1", then for a USDA Forest Service polygon, the Secondary Data Source would be "USDA FS Automated Lands Program (ALP)". For a BLM polygon in the same dataset, Secondary Source would be "Surface Management Agency (SMA)."SourceUniqueIDIdentifier (GUID or ObjectID) in the data source. Used to trace the polygon back to its authoritative source.MapMethod:Controlled vocabulary to define how the geospatial feature was derived. Map method may help define data quality. MapMethod will be Mixed Method by default for this layer as the data are from mixed sources. Valid Values include: GPS-Driven; GPS-Flight; GPS-Walked; GPS-Walked/Driven; GPS-Unknown Travel Method; Hand Sketch; Digitized-Image; DigitizedTopo; Digitized-Other; Image Interpretation; Infrared Image; Modeled; Mixed Methods; Remote Sensing Derived; Survey/GCDB/Cadastral; Vector; Phone/Tablet; OtherDateCurrentThe last edit, update, of this GIS record. Date should follow the assigned NWCG Date Time data standard, using 24 hour clock, YYYY-MM-DDhh.mm.ssZ, ISO8601 Standard.CommentsAdditional information describing the feature. GeometryIDPrimary key for linking geospatial objects with other database systems. Required for every feature. This field may be renamed for each standard to fit the feature.JurisdictionalUnitID_sansUSNWCG Unit ID with the "US" characters removed from the beginning. Provided for backwards compatibility.JoinMethodAdditional information on how the polygon was matched information in the NWCG Unit ID database.LocalNameLocalName for the polygon provided from PADUS or other source.LegendJurisdictionalAgencyJurisdictional Agency but smaller landholding agencies, or agencies of indeterminate status are grouped for more intuitive use in a map legend or summary table.LegendLandownerAgencyLandowner Agency but smaller landholding agencies, or agencies of indeterminate status are grouped for more intuitive use in a map legend or summary table.DataSourceYearYear that the source data for the polygon were acquired.Data InputThis dataset is based on an aggregation of 4 spatial data sources: Protected Areas Database US (PAD-US 2.1), data from Bureau of Indian Affairs regional offices, the BLM Alaska Fire Service/State of Alaska, and Census Block-Group Geometry. NWCG Unit ID and Agency Kind/Category data are tabular and sourced from UnitIDActive.txt, in the WFMI Unit ID application (https://wfmi.nifc.gov/unit_id/Publish.html). Areas of with unknown Landowner Kind/Category and Jurisdictional Agency Kind/Category are assigned LandownerKind and LandownerCategory values of "Private" by use of the non-water polygons from the Census Block-Group geometry.PAD-US 2.1:This dataset is based in large part on the USGS Protected Areas Database of the United States - PAD-US 2.`. PAD-US is a compilation of authoritative protected areas data between agencies and organizations that ultimately results in a comprehensive and accurate inventory of protected areas for the United States to meet a variety of needs (e.g. conservation, recreation, public health, transportation, energy siting, ecological, or watershed assessments and planning). Extensive documentation on PAD-US processes and data sources is available.How these data were aggregated:Boundaries, and their descriptors, available in spatial databases (i.e. shapefiles or geodatabase feature classes) from land management agencies are the desired and primary data sources in PAD-US. If these authoritative sources are unavailable, or the agency recommends another source, data may be incorporated by other aggregators such as non-governmental organizations. Data sources are tracked for each record in the PAD-US geodatabase (see below).BIA and Tribal Data:BIA and Tribal land management data are not available in PAD-US. As such, data were aggregated from BIA regional offices. These data date from 2012 and were substantially updated in 2022. Indian Trust Land affiliated with Tribes, Reservations, or BIA Agencies: These data are not considered the system of record and are not intended to be used as such. The Bureau of Indian Affairs (BIA), Branch of Wildland Fire Management (BWFM) is not the originator of these data. The

  7. d

    Data Collaborations Across Boundaries (Slides)

    • data.depositar.io
    pdf
    Updated Jun 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    depositar (2025). Data Collaborations Across Boundaries (Slides) [Dataset]. https://data.depositar.io/dataset/data-collaborations-across-boundaries
    Explore at:
    pdf(3112569), pdf(4440122), pdf(1792282), pdf(1296859), pdf(10713394)Available download formats
    Dataset updated
    Jun 27, 2025
    Dataset provided by
    depositar
    Description

    This dataset collects the slides that were presented at the Data Collaborations Across Boundaries session in SciDataCon 2022, part of the International Data Week.

    The following session proposal was prepared by Tyng-Ruey Chuang and submitted to SciDataCon 2022 organizers for consideration on 2022-02-28. The proposal was accepted on 2022-03-28. Six abstracts were submitted and accepted to this session. Five presentations were delivered online in a virtual session on 2022-06-21.

    Data Collaborations Across Boundaries

    There are many good stories about data collaborations across boundaries. We need more. We also need to share the lessons each of us has learned from collaborating with parties and communities not in our familiar circles.

    By boundaries, we mean not just the regulatory borders in between the nation states about data sharing but the various barriers, readily conceivable or not, that hinder collaboration in aggregating, sharing, and reusing data for social good. These barriers to collaboration exist between the academic disciplines, between the economic players, and between the many user communities, just to name a few. There are also cross-domain barriers, for example those that lay among data practitioners, public administrators, and policy makers when they are articulating the why, what, and how of "open data" and debating its economic significance and fair distribution. This session aims to bring together experiences and thoughts on good data practices in facilitating collaborations across boundaries and domains.

    The success of Wikipedia proves that collaborative content production and service, by ways of copyleft licenses, can be sustainable when coordinated by a non-profit and funded by the general public. Collaborative code repositories like GitHub and GitLab demonstrate the enormous value and mass scale of systems-facilitated integration of user contributions that run across multiple programming languages and developer communities. Research data aggregators and repositories such as GBIF, GISAID, and Zenodo have served numerous researchers across academic disciplines. Citizen science projects and platforms, for instance eBird, Galaxy Zoo, and Taiwan Roadkill Observation Network (TaiRON), not only collect data from diverse communities but also manage and release datasets for research use and public benefit (e.g. TaiRON datasets being used to improve road design and reduce animal mortality). At the same time large scale data collaborations depend on standards, protocols, and tools for building registries (e.g. Archival Resource Key), ontologies (e.g. Wikidata and schema.org), repositories (e.g. CKAN and Omeka), and computing services (e.g. Jupyter Notebook). There are many types of data collaborations. The above lists only a few.

    This session proposal calls for contributions to bring forward lessons learned from collaborative data projects and platforms, especially about those that involve multiple communities and/or across organizational boundaries. Presentations focusing on the following (non-exclusive) topics are sought after:

    1. Support mechanisms and governance structures for data collaborations across organizations/communities.

    2. Data policies --- such as data sharing agreements, memorandum of understanding, terms of use, privacy policies, etc. --- for facilitating collaborations across organizations/communities.

    3. Traditional and non-traditional funding sources for data collaborations across multiple parties; sustainability of data collaboration projects, platforms, and communities.

    4. Data workflows --- collection, processing, aggregation, archiving, and publishing, etc. --- designed with considerations of (external) collaboration.

    5. Collaborative web platforms for data acquisition, curation, analysis, visualization, and education.

    6. Examples and insights from data trusts, data coops, as well as other formal and informal forms of data stewardship.

    7. Debates on the pros and cons of centralized, distributed, and/or federated data services.

    8. Practical lessons learned from data collaboration stories: failure, success, incidence, unexpected turn of event, aftermath, etc. (no story is too small!).

  8. 4

    Dataset of Particle Size Distribution of Fine Aggregate sourced from Goain...

    • data.4tu.nl
    zip
    Updated Sep 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Minhaz Uddin; Aziz Ahmed; Khondaker Sakil Ahmed (2023). Dataset of Particle Size Distribution of Fine Aggregate sourced from Goain River (Bangladesh) and Dawki River (India) as utilized in a Batch Mixing Plant [Dataset]. http://doi.org/10.4121/50989a7d-2452-4f66-b29a-5b485709328f.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 6, 2023
    Dataset provided by
    4TU.ResearchData
    Authors
    Minhaz Uddin; Aziz Ahmed; Khondaker Sakil Ahmed
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    India, Dawki River, Bangladesh
    Description

    This data paper presents the Fine Aggregate (FA) Profile of an important river, the Goain in Bangladesh and the Dawki called in the part of India, which is a major source of natural FA for construction activities in Bangladesh. The FA Profiles were analyzed using sieve and Sand Equivalent (SE) Value of Soils and FA tests over a period of more than two years, with samples collected thrice a month from Jaflong, Sylhet, Bangladesh. The sampling method followed standard guidelines, and the sieve analysis test report satisfied size distribution requirements, despite some fluctuations in the test results. The primary focus of this data is to present the scenario of sand availability throughout the years, which will be valuable for researchers, engineers, policymakers, and stakeholders involved in planning and designing construction projects that involve river sand. This data paper provides a comprehensive dataset on the FA Profile of the Goain (Dawki) River, which can be reused in various ways, including developing predictive models, monitoring the effects of climate change, and identifying areas for sustainable sand extraction.


  9. d

    Protected Areas Database of the United States (PAD-US) 3.0 (ver. 2.0, March...

    • catalog.data.gov
    • data.usgs.gov
    Updated Nov 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Protected Areas Database of the United States (PAD-US) 3.0 (ver. 2.0, March 2023) [Dataset]. https://catalog.data.gov/dataset/protected-areas-database-of-the-united-states-pad-us-3-0-ver-2-0-march-2023
    Explore at:
    Dataset updated
    Nov 27, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    United States
    Description

    The USGS Protected Areas Database of the United States (PAD-US) is the nation's inventory of protected areas, including public land and voluntarily provided private protected areas, identified as an A-16 National Geospatial Data Asset in the Cadastre Theme ( https://communities.geoplatform.gov/ngda-cadastre/ ). The PAD-US is an ongoing project with several published versions of a spatial database including areas dedicated to the preservation of biological diversity, and other natural (including extraction), recreational, or cultural uses, managed for these purposes through legal or other effective means. The database was originally designed to support biodiversity assessments; however, its scope expanded in recent years to include all open space public and nonprofit lands and waters. Most are public lands owned in fee (the owner of the property has full and irrevocable ownership of the land); however, permanent and long-term easements, leases, agreements, Congressional (e.g. 'Wilderness Area'), Executive (e.g. 'National Monument'), and administrative designations (e.g. 'Area of Critical Environmental Concern') documented in agency management plans are also included. The PAD-US strives to be a complete inventory of U.S. public land and other protected areas, compiling “best available” data provided by managing agencies and organizations. The PAD-US geodatabase maps and describes areas using thirty-six attributes and five separate feature classes representing the U.S. protected areas network: Fee (ownership parcels), Designation, Easement, Marine, Proclamation and Other Planning Boundaries. An additional Combined feature class includes the full PAD-US inventory to support data management, queries, web mapping services, and analyses. The Feature Class (FeatClass) field in the Combined layer allows users to extract data types as needed. A Federal Data Reference file geodatabase lookup table (PADUS3_0Combined_Federal_Data_References) facilitates the extraction of authoritative federal data provided or recommended by managing agencies from the Combined PAD-US inventory. This PAD-US Version 3.0 dataset includes a variety of updates from the previous Version 2.1 dataset (USGS, 2020, https://doi.org/10.5066/P92QM3NT ), achieving goals to: 1) Annually update and improve spatial data representing the federal estate for PAD-US applications; 2) Update state and local lands data as state data-steward and PAD-US Team resources allow; and 3) Automate data translation efforts to increase PAD-US update efficiency. The following list summarizes the integration of "best available" spatial data to ensure public lands and other protected areas from all jurisdictions are represented in the PAD-US (other data were transferred from PAD-US 2.1). Federal updates - The USGS remains committed to updating federal fee owned lands data and major designation changes in annual PAD-US updates, where authoritative data provided directly by managing agencies are available or alternative data sources are recommended. The following is a list of updates or revisions associated with the federal estate: 1) Major update of the Federal estate (fee ownership parcels, easement interest, and management designations where available), including authoritative data from 8 agencies: Bureau of Land Management (BLM), U.S. Census Bureau (Census Bureau), Department of Defense (DOD), U.S. Fish and Wildlife Service (FWS), National Park Service (NPS), Natural Resources Conservation Service (NRCS), U.S. Forest Service (USFS), and National Oceanic and Atmospheric Administration (NOAA). The federal theme in PAD-US is developed in close collaboration with the Federal Geographic Data Committee (FGDC) Federal Lands Working Group (FLWG, https://communities.geoplatform.gov/ngda-govunits/federal-lands-workgroup/ ). 2) Improved the representation (boundaries and attributes) of the National Park Service, U.S. Forest Service, Bureau of Land Management, and U.S. Fish and Wildlife Service lands, in collaboration with agency data-stewards, in response to feedback from the PAD-US Team and stakeholders. 3) Added a Federal Data Reference file geodatabase lookup table (PADUS3_0Combined_Federal_Data_References) to the PAD-US 3.0 geodatabase to facilitate the extraction (by Data Provider, Dataset Name, and/or Aggregator Source) of authoritative data provided directly (or recommended) by federal managing agencies from the full PAD-US inventory. A summary of the number of records (Frequency) and calculated GIS Acres (vs Documented Acres) associated with features provided by each Aggregator Source is included; however, the number of records may vary from source data as the "State Name" standard is applied to national files. The Feature Class (FeatClass) field in the table and geodatabase describe the data type to highlight overlapping features in the full inventory (e.g. Designation features often overlap Fee features) and to assist users in building queries for applications as needed. 4) Scripted the translation of the Department of Defense, Census Bureau, and Natural Resource Conservation Service source data into the PAD-US format to increase update efficiency. 5) Revised conservation measures (GAP Status Code, IUCN Category) to more accurately represent protected and conserved areas. For example, Fish and Wildlife Service (FWS) Waterfowl Production Area Wetland Easements changed from GAP Status Code 2 to 4 as spatial data currently represents the complete parcel (about 10.54 million acres primarily in North Dakota and South Dakota). Only aliquot parts of these parcels are documented under wetland easement (1.64 million acres). These acreages are provided by the U.S. Fish and Wildlife Service and are referenced in the PAD-US geodatabase Easement feature class 'Comments' field. State updates - The USGS is committed to building capacity in the state data-steward network and the PAD-US Team to increase the frequency of state land updates, as resources allow. The USGS supported efforts to significantly increase state inventory completeness with the integration of local parks data in the PAD-US 2.1, and developed a state-to-PAD-US data translation script during PAD-US 3.0 development to pilot in future updates. Additional efforts are in progress to support the technical and organizational strategies needed to increase the frequency of state updates. The PAD-US 3.0 included major updates to the following three states: 1) California - added or updated state, regional, local, and nonprofit lands data from the California Protected Areas Database (CPAD), managed by GreenInfo Network, and integrated conservation and recreation measure changes following review coordinated by the data-steward with state managing agencies. Developed a data translation Python script (see Process Step 2 Source Data Documentation) in collaboration with the data-steward to increase the accuracy and efficiency of future PAD-US updates from CPAD. 2) Virginia - added or updated state, local, and nonprofit protected areas data (and removed legacy data) from the Virginia Conservation Lands Database, provided by the Virginia Department of Conservation and Recreation's Natural Heritage Program, and integrated conservation and recreation measure changes following review by the data-steward. 3) West Virginia - added or updated state, local, and nonprofit protected areas data provided by the West Virginia University, GIS Technical Center. For more information regarding the PAD-US dataset please visit, https://www.usgs.gov/gapanalysis/PAD-US/. For more information about data aggregation please review the PAD-US Data Manual available at https://www.usgs.gov/core-science-systems/science-analytics-and-synthesis/gap/pad-us-data-manual . A version history of PAD-US updates is summarized below (See https://www.usgs.gov/core-science-systems/science-analytics-and-synthesis/gap/pad-us-data-history for more information): 1) First posted - April 2009 (Version 1.0 - available from the PAD-US: Team pad-us@usgs.gov). 2) Revised - May 2010 (Version 1.1 - available from the PAD-US: Team pad-us@usgs.gov). 3) Revised - April 2011 (Version 1.2 - available from the PAD-US: Team pad-us@usgs.gov). 4) Revised - November 2012 (Version 1.3) https://doi.org/10.5066/F79Z92XD 5) Revised - May 2016 (Version 1.4) https://doi.org/10.5066/F7G73BSZ 6) Revised - September 2018 (Version 2.0) https://doi.org/10.5066/P955KPLE 7) Revised - September 2020 (Version 2.1) https://doi.org/10.5066/P92QM3NT 8) Revised - January 2022 (Version 3.0) https://doi.org/10.5066/P9Q9LQ4B Comparing protected area trends between PAD-US versions is not recommended without consultation with USGS as many changes reflect improvements to agency and organization GIS systems, or conservation and recreation measure classification, rather than actual changes in protected area acquisition on the ground.

  10. f

    Datasets: Crowd Data Center (CDC)

    • unisa.figshare.com
    xlsx
    Updated Nov 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lenny Mamaro (2025). Datasets: Crowd Data Center (CDC) [Dataset]. http://doi.org/10.25399/UnisaData.30656291.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Nov 29, 2025
    Dataset provided by
    University of South Africa
    Authors
    Lenny Mamaro
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Crowd Data Center is an online, international aggregator of openly available data from various global crowdfunding platforms. It provides standardised, structured, and downloadable datasets containing campaign-level information such as project descriptions, funding targets, amounts raised, backer counts, campaign duration, and project categories.The crowd data center is one of the most widely utilized datasets in academia due to its high-quality, cleaned data, which is captured directly from original crowdfunding platforms through automated data extraction protocols. Available at https://thecrowddatacenter.com/Type of Data Collection MethodSecondary Data Collection or Archival Data.Description of the Data Collection MethodThe research design depends on secondary, archival data obtained from the CDC. The data were collected from previously run campaigns recorded across crowdfunding platforms, such as Kickstarter or Indiegogo, and stored within the CDC database.The crowd data center utilises automated web-scraping and API-based data extraction methods for continuous gathering, verification, and updating of data from campaigns. Researchers download datasets from the data centre in CSV or Excel format for analysis. Short Paragraph Example for Your Proposal: The study will utilize secondary data sourced from the Crowd Data Center, an international database that aggregates structured archival data from large crowdfunding platforms. The CDC contains extensive information on campaign characteristics, funding models, creator profiles, and project outcomes. Data presented by the CDC is gathered through automated web scraping and API extraction techniques, ensuring accuracy and comparability across crowdfunding platforms. This approach of collecting secondary data facilitates large-sample empirical analysis and allows one to avoid collecting primary data.

  11. US Births by County and State

    • kaggle.com
    zip
    Updated Jan 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). US Births by County and State [Dataset]. https://www.kaggle.com/datasets/thedevastator/us-births-by-county-and-state
    Explore at:
    zip(3159011 bytes)Available download formats
    Dataset updated
    Jan 22, 2023
    Authors
    The Devastator
    Area covered
    United States
    Description

    US Births by County and State

    1985-2015 Aggregated Data

    By data.world's Admin [source]

    About this dataset

    This dataset contains an aggregation of birth data from the United Statesbetween 1985 and 2015. It consists of information on mothers' locations by state (including District of Columbia) and county, as well as information such as the month they gave birth, and aggregates giving the sum of births during that month. This data has been provided by both the National Bureau for Economic Research and National Center for Health Statistics, whose shared mission is to understand how life works in order to aid individuals in making decisions about their health and wellbeing. This dataset provides valuable insight into population trends across time and location - for example, which states have higher or lower birthrates than others? Which counties experience dramatic fluctuations over time? Given its scope, this dataset could be used in a number of contexts--from epidemiology research to population forecasting. Be sure to check out our other datasets related to births while you're here!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset could be used to examine local trends in birth rates over time or analyze births at different geographical locations. In order to maximize your use of this dataset, it is important that you understand what information the various columns contain.

    The main columns are: State (including District of Columbia), County (coded using the FIPS county code number), Month (numbering from 1 for January through 12 for December), Year (4-digit year) countyBirths (calculated sum of births that occurred to mothers living in a county for a given month) and stateBirths (calculated sum of births that occurred to mothers living in a state for a given month). These fields should provide enough information for you analyze trends across geographic locations both at monthly and yearly levels. You could also consider combining variables such as Year with State or Year with Month or any other grouping combinations depending on your analysis goal.

    In addition, while all data were downloaded on April 5th 2017, it is worth noting that all sources used followed privacy guidelines as laid out by NCHC so individual births occurring after 2005 are not included due to geolocation concerns.
    We hope you find this dataset useful and can benefit from its content! With proper understanding of what each field contains, we are confident you will gain valuable insights on birth rates across counties within the United States during this period

    Research Ideas

    • Establishing county-level trends in birth rates for the US over time.
    • Analyzing the relationship between month of birth and health outcomes for US babies after they are born (e.g., infant mortality, neurological development, etc.).
    • Comparing state/county-level differences in average numbers of twins born each year

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    See the dataset description for more information.

    Columns

    File: allBirthData.csv | Column name | Description | |:-----------------|:-----------------------------------------------------------------------------------------------------------------| | State | The numerical order of the state where the mother lives. (Integer) | | Month | The month in which the birth took place. (Integer) | | Year | The year of the birth. (Integer) | | countyBirths | The calculated sum of births that occurred to mothers living in that county for that particular month. (Integer) | | stateBirths | The aggregate number at the level of entire states for any given month-year combination. (Integer) | | County | The county where the mother lives, coded using FIPS County Code. (Integer) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit data.world's Admin.

  12. s

    Facebook Deactivation Participants

    • socialmediaarchive.org
    pdf, xlsx
    Updated May 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Facebook Deactivation Participants [Dataset]. https://socialmediaarchive.org/record/61?v=pdf
    Explore at:
    xlsx(16172), xlsx(33969), pdf(813810)Available download formats
    Dataset updated
    May 21, 2024
    Description

    This table includes platform data for Facebook participants in the Deactivation experiment. Each row of the dataset corresponds to data from a participant’s Facebook user account. Each column contains a value, or set of values, that aggregates log data for this specific participant over a certain period of time.

  13. F

    Concrete Aggregate PSD Imaging Dataset

    • data.uni-hannover.de
    jpeg, png
    Updated Nov 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Institut für Baustoffe (2025). Concrete Aggregate PSD Imaging Dataset [Dataset]. https://data.uni-hannover.de/dataset/concrete-aggregate-psd-imaging-dataset
    Explore at:
    png, jpegAvailable download formats
    Dataset updated
    Nov 14, 2025
    Dataset authored and provided by
    Institut für Baustoffe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Summary

    This dataset provides image data of concrete aggregate for the task of estimating particle size distributions (PSD) using computer vision. The images were captured using a controlled camera setup installed above a conveyor belt in a concrete mixing research facility. Each image has a corresponding text file containing the ground-truth PSD obtained through mechanical sieving.

    Data Acquisition

    Data was recorded at a medium-scale concrete mixing plant equipped with:

    • Two Allied Vision Alvium 1800 C-508 cameras
      • 25 mm focal length → fine aggregate (0–2 mm)
      • 12 mm focal length → coarse aggregate (>2 mm)
    • Global shutter, 1 ms exposure time
    • LED panel illumination for motion-blur-free imaging
    • A sensor mount above the conveyor belt transporting the aggregate

    This setup enabled consistent imaging conditions with sufficient resolution for particle analysis.

    https://data.uni-hannover.de/dataset/6f844e22-12ed-48a7-9ccb-b502f8121650/resource/adcf7049-3ad3-4e9c-bd8b-1be00d867f46/download/sensorsetup.jpg" alt="Sensor setup used for data acquisition" title=" ">

    Datasets

    Two datasets were created to cover common aggregate size ranges used in concrete production:

    𝑀ᶠⁱⁿᵉ — Fine Material (< 2 mm)

    • 16 material samples
    • Natural river sand
    • PSDs synthetically varied by mixing pre-fractionated material

    𝑀ᶜᵒᵃʳˢᵉ — Coarse Material (2–16 mm)

    • 26 material samples
      • 16 natural river gravel
      • 10 recycled concrete aggregate (RCA)

    Each material sample weighs 150 kg, and its PSD was systematically varied to cover a broad range of grading curves.

    Note:
    This repository contains only a subset of the data set that was used in the paper. In order to receive the full data set, please reach out to the authors. The dataset represents controlled variability. While this is ideal for benchmarking and model development, real industrial plants may exhibit additional stochastic variability.

    Reference PSD Measurement

    A 10 kg subsample from each material batch was mechanically sieved to obtain the reference PSD.

    Each PSD is represented using six particle size intervals (B = 6):

    • Fine dataset:
      0.063, 0.125, 0.25, 0.5, 1.0, 2.0 mm
    • Coarse dataset:
      0, 2, 4, 8, 11.2, 16 mm

    Each .txt reference file contains six percentile values that sum to 1.0.

    https://data.uni-hannover.de/dataset/6f844e22-12ed-48a7-9ccb-b502f8121650/resource/2d5cae38-8ad7-4f82-9202-6ac807051c17/download/overview.png" alt="Example images and grading curves of the data sets" title=" ">

    Use Cases

    This dataset is intended for:

    • PSD estimation using deep learning or classical CV
    • Regression and distribution prediction tasks
    • Material characterization and granulometry research
    • Benchmarking computer vision methods on granular material datasets

    Citation

    If you use this dataset in academic or industrial research, please cite the corresponding paper:

    Coenen, M., Beyer, D., Mohammadi, S., Meyer, M., Heipke, C., and Haist, M. (2026): Towards an Automated Concrete Production Control via Computer Vision-based Characterisation of Concrete Aggregate.

  14. F

    Visual Granulometry: Image-based Granulometry of Concrete Aggregate

    • data.uni-hannover.de
    png, zip
    Updated Dec 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Institut für Baustoffe (2024). Visual Granulometry: Image-based Granulometry of Concrete Aggregate [Dataset]. https://data.uni-hannover.de/dataset/visual-granulometry
    Explore at:
    png, zipAvailable download formats
    Dataset updated
    Dec 12, 2024
    Dataset authored and provided by
    Institut für Baustoffe
    License

    Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
    License information was derived automatically

    Description

    Introduction

    Concrete is one if the most used building materials worldwide. With up to 80% of volume, a large constituent of concrete consists of fine and coarse aggregate particles (normally, sizes of 0.1mm to 32 mm) which are dispersed in a cement paste matrix. The size distribution of the aggregates (i.e. the grading curve) substantially affects the properties and quality characteristics of concrete, such as e.g. its workability at the fresh state and the mechanical properties at the hardened state. In practice, usually the size distribution of small samples of the aggregate is determined by manual mechanical sieving and is considered as representative for a large amount of aggregate. However, the size distribution of the actual aggregate used for individual production batches of concrete varies, especially when e.g. recycled material is used as aggregate. As a consequence, the unknown variations of the particle size distribution have a negative effect on the robustness and the quality of the final concrete produced from the raw material.

    Towards the goal of deriving precise knowledge about the actual particle size distribution of the aggregate, thus eliminating the unknown variations in the material’s properties, we propose a data set for the image based prediction of the size distribution of concrete aggregates. Incorporating such an approach into the production chain of concrete enables to react on detected variations in the size distribution of the aggregate in real-time by adapting the composition, i.e. the mixture design of the concrete accordingly, so that the desired concrete properties are reached.

    https://data.uni-hannover.de/dataset/f00bdcc4-8b27-4dc4-b48d-a84d75694e18/resource/042abf8d-e87a-4940-8195-2459627f57b6/download/overview.png" alt="Classicial vs. image based granulometry" title=" ">

    Classification data

    In the classification data, nine different grading curves are distinguished. In this context, the normative regulations of DIN 1045 are considered. The nine grading curves differ in their maximum particle size (8, 16, or 32 mm) and in the distribution of the particle size fractions allowing a categorisation of the curves to coarse-grained (A), medium-grained (B) and fine-grained (C) curves, respectively. A quantitative description of the grain size distribution of the nine curves distinguished is shown in the following figure, where the left side shows a histogram of the particle size fractions 0-2, 2-8, 8-16, and 16-32 mm and the right side shows the cumulative histograms of the grading curves (the vertical axes represent the mass-percentages of the material).

    For each of the grading curves, two samples (S1 and S2) of aggregate particles were created. Each sample consists of a total mass of 5 kg of aggregate material and is carefully designed according to the grain size distribution shwon in the figure by sieving the raw material in order to separate the different grain size fractions first, and subsequently, by composing the samples according to the dedicated mass-percentages of the size distributions.

    https://data.uni-hannover.de/dataset/f00bdcc4-8b27-4dc4-b48d-a84d75694e18/resource/17eb2a46-eb23-4ec2-9311-0f339e0330b4/download/statistics_classification-data.png" alt="Particle size distribution of the classification data">

    For data acquisition, a static setup was used for which the samples are placed in a measurement vessel equipped with a set of calibrated reference markers whose object coordinates are known and which are assembled in a way that they form a common plane with the surface of the aggregate sample. We acquired the data by taking images of the aggregate samples (and the reference markers) which are filled in the the measurement vessel and whose constellation within the vessel is perturbed between the acquisition of each image in order to obtain variations in the sample’s visual appearance. This acquisition strategy allows to record multiple different images for the individual grading curves by reusing the same sample, consequently reducing the labour-intensive part of material sieving and sample generation. In this way, we acquired a data set of 900 images in total, consisting of 50 images of each of the two samples (S1 and S2) which were created for each of the nine grading curve definitions, respectively (50 x 2 x 9 = 900). For each image, we automatically detect the reference markers, thus receiving the image coordinates of each marker in addition to its known object coordinates. We make use of these correspondences for the computation of the homography which describes the perspective transformation of the reference marker’s plane in object space (which corresponds to the surface plane of the aggregate sample) to the image plane. Using the computed homography, we transform the image in order to obtain an perspectively rectified representation of the aggregate sample with a known, and especially a for the entire image consistent, ground sampling distance (GSD) of 8 px/mm. In the following figure, example images of our data set showing aggregate samples of each of the distinguished grading curve classes are depicted.

    https://data.uni-hannover.de/dataset/f00bdcc4-8b27-4dc4-b48d-a84d75694e18/resource/59925f1d-3eef-4b50-986a-e8d2b0e14beb/download/examples_classification_data.png" alt="Example images of the classification data">

    Related publications:

    If you make use of the proposed data, please cite the publication listed below.

    • Coenen, M., Beyer, D., Heipke, C. and Haist, M., 2022: Learning to Sieve: Prediction of Grading Curves from Images of Concrete Aggregate. In: ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences V-2-2022, pp. 227-235, Link.
  15. Van Gogh Artworks

    • kaggle.com
    zip
    Updated Mar 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    alod83 (2022). Van Gogh Artworks [Dataset]. https://www.kaggle.com/datasets/alod83/van-gogh-artworks
    Explore at:
    zip(98626 bytes)Available download formats
    Dataset updated
    Mar 11, 2022
    Authors
    alod83
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    These datasets contain all the Van Gogh's artworks provided by Europeana, an aggregator for Cultural Heritage. The dataset has been extracted and cleaned through Versatile Data Kit (VDK), a framework released by VMware. You can find the whole example to extract data at this link. VDK permits you to ingest and process different formats of data. In this case, data have been ingested by exploiting the Europeana REST API.

    Content

    There are two versions of the same dataset: * assets, which contains the raw data extracted from Europeana. Since the Europeana REST API provides data as a JSON, a basic split in columns has been done. Thus, each column can contain a JSON object. * cleaned_assets, which contains only some cleaned fields of the original dataset.

    Acknowledgements

    The extraction of this dataset has been supported by VMware.

    Inspiration

    The edmpreview column of the cleaned_assets.csv dataset contains the link to the pictures. You could use them to train some models for image recognition.

  16. a

    SES Water Domestic Consumption

    • hub.arcgis.com
    Updated Apr 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SESWater2 (2024). SES Water Domestic Consumption [Dataset]. https://hub.arcgis.com/maps/f2cdc1248fcf4fd289ac1d3f25e75b3b_0/about
    Explore at:
    Dataset updated
    Apr 26, 2024
    Dataset authored and provided by
    SESWater2
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Overview    This dataset offers valuable insights into yearly domestic water consumption across various Lower Super Output Areas (LSOAs) or Data Zones, accompanied by the count of water meters within each area. It is instrumental for analysing residential water use patterns, facilitating water conservation efforts, and guiding infrastructure development and policy making at a localised level. Key Definitions    Aggregation   The process of summarising or grouping data to obtain a single or reduced set of information, often for analysis or reporting purposes.     AMR Meter Automatic meter reading (AMR) is the technology of automatically collecting consumption, diagnostic, and status data from a water meter remotely and periodically. Dataset   Structured and organised collection of related elements, often stored digitally, used for analysis and interpretation in various fields.  Data Zone Data zones are the key geography for the dissemination of small area statistics in Scotland Dumb Meter A dumb meter or analogue meter is read manually. It does not have any external connectivity. Granularity   Data granularity is a measure of the level of detail in a data structure. In time-series data, for example, the granularity of measurement might be based on intervals of years, months, weeks, days, or hours   ID   Abbreviation for Identification that refers to any means of verifying the unique identifier assigned to each asset for the purposes of tracking, management, and maintenance.    LSOA Lower Layer Super Output Areas (LSOA) are a geographic hierarchy designed to improve the reporting of small area statistics in England and Wales. Open Data Triage   The process carried out by a Data Custodian to determine if there is any evidence of sensitivities associated with Data Assets, their associated Metadata and Software Scripts used to process Data Assets if they are used as Open Data.    Schema   Structure for organising and handling data within a dataset, defining the attributes, their data types, and the relationships between different entities. It acts as a framework that ensures data integrity and consistency by specifying permissible data types and constraints for each attribute.    Smart Meter A smart meter is an electronic device that records information and communicates it to the consumer and the supplier. It differs from automatic meter reading (AMR) in that it enables two-way communication between the meter and the supplier. Units   Standard measurements used to quantify and compare different physical quantities.  Water Meter Water metering is the practice of measuring water use. Water meters measure the volume of water used by residential and commercial building units that are supplied with water by a public water supply system. Data History    Data Origin    Domestic consumption data is recorded using water meters. The consumption recorded is then sent back to water companies. This dataset is extracted from the water companies. Data Triage Considerations    This section discusses the careful handling of data to maintain anonymity and addresses the challenges associated with data updates, such as identifying household changes or meter replacements. Identification of Critical Infrastructure  This aspect is not applicable for the dataset, as the focus is on domestic water consumption and does not contain any information that reveals critical infrastructure details. Commercial Risks and Anonymisation Individual Identification Risks There is a potential risk of identifying individuals or households if the consumption data is updated irregularly (e.g., every 6 months) and an out-of-cycle update occurs (e.g., after 2 months), which could signal a change in occupancy or ownership. Such patterns need careful handling to avoid accidental exposure of sensitive information. Meter and Property Association Challenges arise in maintaining historical data integrity when meters are replaced but the property remains the same. Ensuring continuity in the data without revealing personal information is crucial. Interpretation of Null Consumption Instances of null consumption could be misunderstood as a lack of water use, whereas they might simply indicate missing data. Distinguishing between these scenarios is vital to prevent misleading conclusions. Meter Re-reads The dataset must account for instances where meters are read multiple times for accuracy. Joint Supplies & Multiple Meters per Household Special consideration is required for households with multiple meters as well as multiple households that share a meter as this could complicate data aggregation. Schema Consistency with the Energy Industry: In formulating the schema for the domestic water consumption dataset, careful consideration was given to the potential risks to individual privacy. This evaluation included examining the frequency of data updates, the handling of property and meter associations, interpretations of null consumption, meter re-reads, joint suppliers, and the presence of multiple meters within a single household as described above. After a thorough assessment of these factors and their implications for individual privacy, it was decided to align the dataset's schema with the standards established within the energy industry. This decision was influenced by the energy sector's experience and established practices in managing similar risks associated with smart meters. This ensures a high level of data integrity and privacy protection. Schema The dataset schema is aligned with those used in the energy industry, which has encountered similar challenges with smart meters. However, it is important to note that the energy industry has a much higher density of meter distribution, especially smart meters. Aggregation to Mitigate Risks The dataset employs an elevated level of data aggregation to minimise the risk of individual identification. This approach is crucial in maintaining the utility of the dataset while ensuring individual privacy. The aggregation level is carefully chosen to remove identifiable risks without excluding valuable data, thus balancing data utility with privacy concerns. Data Freshness  Users should be aware that this dataset reflects historical consumption patterns and does not represent real-time data. Publish Frequency  Annually Data Triage Review Frequency    An annual review is conducted to ensure the dataset's relevance and accuracy, with adjustments made based on specific requests or evolving data trends. Data Specifications   For the domestic water consumption dataset, the data specifications are designed to ensure comprehensiveness and relevance, while maintaining clarity and focus. The specifications for this dataset include: Each dataset encompasses recordings of domestic water consumption as measured and reported by the data publisher. It excludes commercial consumption. Where it is necessary to estimate consumption, this is calculated based on actual meter readings. Meters of all types (smart, dumb, AMR) are included in this dataset. The dataset is updated and published annually. Historical data may be made available to facilitate trend analysis and comparative studies, although it is not mandatory for each dataset release. Context   Users are cautioned against using the dataset for immediate operational decisions regarding water supply management. The data should be interpreted considering potential seasonal and weather-related influences on water consumption patterns. The geographical data provided does not pinpoint locations of water meters within an LSOA. The dataset aims to cover a broad spectrum of households, from single-meter homes to those with multiple meters, to accurately reflect the diversity of water use within an LSOA.

  17. F

    Deep Granulometry

    • data.uni-hannover.de
    png, zip
    Updated Dec 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Institut für Baustoffe (2024). Deep Granulometry [Dataset]. https://data.uni-hannover.de/dataset/deep-granulometry
    Explore at:
    png, zipAvailable download formats
    Dataset updated
    Dec 12, 2024
    Dataset authored and provided by
    Institut für Baustoffe
    License

    Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
    License information was derived automatically

    Description

    This repository contains the data related to the paper ** "Granulometry transformer: image-based granulometry of concrete aggregate for an automated concrete production control" ** where a deep learning based method is proposed for the image based determination of concrete aggregate grading curves (cf. video).

    Watch the video

    More specifically, the data set consists of images showing concrete aggregate particles and reference data of the particle size distribution (grading curves) associated to each image. It is distinguished between the CoarseAggregateData and the FineAggregateData.

    Coarse Aggregate Data

    The coarse data consists of aggregate samples with different particles sizes ranging from 0.1 mm to 32 mm. The grading curves are designed by linearly interpolation between a very fine and a very coarse distribution for three variants with maximum grain sizes of 8 mm, 16 mm, and 32 mm, respectively. For each variant, we designed eleven grading curves, resulting in a total number 33, which are shown in the figure below. For each sample, we acquired 50 images with a GSD of 0.125 mm, resulting in a data set of 1650 images in total. Example images for a subset of the grading curves of this data set are shown in the following figure.

    https://data.uni-hannover.de/dataset/ecb0bf04-84c8-45b1-8a43-044f3f80d92c/resource/8cb30616-5b24-4028-9c1d-ea250ac8ac84/download/examplecoarse.png" alt="Example images and grading curves of the coarse data set" title=" ">

    Fine Aggregate Data

    Similar to the previous data set, the fine data set contains grading curves for the fine fraction of concrete aggregate of 0 to 2 mm with a GSD of 28.5 $\mu$m. We defined two base distributions of different shapes for the upper and lower bound, respectively, resulting in two interpolated grading curve sets (Set A and Set B). In total, 1700 images of 34 different particle size distributions were acquired. Example images of the data set and the corresponding grading curves are shown in the figure below. https://data.uni-hannover.de/dataset/ecb0bf04-84c8-45b1-8a43-044f3f80d92c/resource/c56f4298-9663-457f-aaa7-0ba113fec4c9/download/examplefine.png" alt="Example images and grading curves of the finedata set" title=" ">

    Related publications:

    If you make use of the proposed data, please cite.

    • Coenen, M., Beyer, D., and Haist, M., 2023: Granulometry Transformer: Image-based Granulometry of Concrete Aggregate for an automated Concrete Production Control. In: Proceedings of the European Conference on Computing in Construction (EC3), doi: 10.35490/EC3.2023.223.
  18. Data from: Mobile phone data to study the impacts of a severe local flood...

    • zenodo.org
    pdf, zip
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simone Loreti; Simone Loreti; Margreth Keiler; Margreth Keiler; Andreas Paul Zischg; Andreas Paul Zischg (2025). Mobile phone data to study the impacts of a severe local flood and social events on human mobility (in Switzerland) [Dataset]. http://doi.org/10.5281/zenodo.14568954
    Explore at:
    pdf, zipAvailable download formats
    Dataset updated
    Jan 7, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Simone Loreti; Simone Loreti; Margreth Keiler; Margreth Keiler; Andreas Paul Zischg; Andreas Paul Zischg
    Area covered
    Switzerland
    Description

    Introduction

    This Zenodo repository contains datasets related to the detected and inferred positions of individuals within a study area (approximately 15 km by 20 km) surrounding the city of Zofingen, Switzerland. The data provide aggregated and anonymized positions of individuals at a fine-grained level, corresponding to road segments or railway tracks, and span a 44-day period from June 17th to July 30th, 2017. This period includes a severe local flood event as well as social events. The datasets also include the postal codes of individuals' municipalities of residence who traveled to the study area. The data were acquired by the University of Bern from Swisscom, at a total cost of 32400 CHF (around 35796 USD as of December 2024). The related peer-reviewed research article will be available soon.

    Network components

    The network topology within the study area surrounding the city of Zofingen, Switzerland, is defined by two JSON files: "nodes.json" and "edges.json". These files respectively contain the attributes of all nodes and edges within the study area.

    The JSON file for the nodes is an array of nodes that contains the following information:

    • internalID: This is the unique ID used internally by Swisscom to identify the edges.
    • externalID: This is the unique ID used by the data source from which Swisscom retrieved that particular node. For example, if the node coincides with a node from OpenStreetMaps, then this will be the ID used by OpenStreetMaps.
    • nodeType: This indicates the data source from which the edge was retrieved. Usually OSM for OpenStreetMaps.
    • lat: The latitude as per the WGS84 coordinate system
    • lon: The longitude as per the WGS84 coordinate system

    The JSON file for the edges is an array of edges that contains the following information:

    • start: The node internalID corresponding to the beginning of the edge. An edge has no particular direction and Swisscom arranged them such that the starting node ID is smaller than the ending node ID.
    • end: The node internalID corresponding to the end of the edge.
    • edgeType: The type of infrastructure represented by the edge, typically road or rail. Swisscom also uses a virtual type that we use to connect the road to the rail network.
    • path: A list of node internalIDs that are present along the edge. Indeed, an edge can be quite long. This is the case for example for highway portions between two entries/exits. Even though Swisscom does not compute metrics down to the intermediate nodes, they use them to display smoother paths on maps.

    POSACT and PATHACT datasets

    These datasets provide estimated counts of people observed along road and rail networks, within a roughly 15 km by 20 km area surrounding the city of Zofingen, Switzerland. These estimates rely on the Swisscom's market penetration and are calculated using two methodologies called as the datasets (i.e. POSACT and PATHACT), and explained in the "explanation_of_POSACT_and_PATHACT.pdf" file. To safeguard the user privacy, estimates for edges with less than 20 detected users are not included (missing data for an edge is assumed to represent less than 20 observations).

    Both POSACT and PATHACT datasets have the following headers:
    | edgeStart | edgeEnd | hourOfDay | estimatedCount |

    An explanation of the headers:

    • edgeStart: The node internalID of the beginning of the edge.
    • edgeEnd: The node internalID of the end of the edge. Together with the internalID of the edge start, this uniquely identifies an edge.
    • hourOfDay: The period of aggregation, which is 4 hours for POSACT and 1 hour for PATHACT. The number shown here is the beginning of the aggregation period. For example, if the aggregation period is 4 hours, the number 0 corresponds to 0h00 -> 3h59, the number 4 corresponds to 4h00 -> 7h59, and so on.
    • estimatedCount: the Swisscom estimate for the number of people detected close to, or passing by, that edge.

    PLZs dataset

    This dataset provides estimates of the number of users traveling from their municipality of residence to the local area of study.

    The "plzs" dataset has the following headers:
    | municipalityID | locationName | hourOfDay | estimatedCount |

    An explanation of the headers:

    • municipalityID: Postal code used for the municipality of residence.
    • locationName: The city name associated with the zip code, provided just for reference.
    • hourOfDay: The period of aggregation, which is 1 hour for PLZs.
    • estimatedCount: Estimate of the number of people who travel from their municipality of residence to the local area of study.

    Disclaimer

    The information presented here on Zenodo is based on the details originally provided by Swisscom in their dataset package. The dissemination of this information on Zenodo strictly adheres to the contractual agreements in place between Swisscom and the University of Bern, as well as to the explicit consent granted by Swisscom via email, dated January 31, 2024. This consent specifically authorized the publication of this dataset package, including its data descriptions and explanations.

    Contributions of the authors

    S.L. conceived the idea of using mobile phone data, initiated and established the collaboration with Swisscom, uploaded the data to Zenodo, and wrote the data descriptions (based on the information provided by Swisscom). M.K. and A.Z. secured funding to support the data acquisition efforts.

    Corresponding person

    Simone Loreti

  19. d

    Data from: Monthly OpenET Image Collections (v2.0) Summarized by 12-Digit...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Nov 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Monthly OpenET Image Collections (v2.0) Summarized by 12-Digit Hydrologic Unit Codes, 2008-2023 [Dataset]. https://catalog.data.gov/dataset/monthly-openet-image-collections-v2-0-summarized-by-12-digit-hydrologic-unit-codes-2008-20
    Explore at:
    Dataset updated
    Nov 21, 2025
    Dataset provided by
    U.S. Geological Survey
    Description

    This dataset provides monthly summaries of evapotranspiration (ET) data from OpenET v2.0 image collections for the period 2008-2023 for all National Watershed Boundary Dataset subwatersheds (12-digit hydrologic unit codes [HUC12s]) in the US that overlap the spatial extent of OpenET datasets. For each HUC12, this dataset contains spatial aggregation statistics (minimum, mean, median, and maximum) for each of the ET variables from each of the publicly available image collections from OpenET for the six available models (DisALEXI, eeMETRIC, geeSEBAL, PT-JPL, SIMS, SSEBop) and the Ensemble image collection, which is a pixel-wise ensemble of all 6 individual models after filtering and removal of outliers according to the median absolute deviation approach (Melton and others, 2022). Data are available in this data release in two different formats: comma-separated values (CSV) and parquet, a high-performance format that is optimized for storage and processing of columnar data. CSV files containing data for each 4-digit HUC are grouped by 2-digit HUCs for easier access of regional data, and the single parquet file provides convenient access to the entire dataset. For each of the ET models (DisALEXI, eeMETRIC, geeSEBAL, PT-JPL, SIMS, SSEBop), variables in the model-specific CSV data files include: -huc12: The 12-digit hydrologic unit code -ET: Actual evapotranspiration (in millimeters) over the HUC12 area in the month calculated as the sum of daily ET interpolated between Landsat overpasses -statistic: Max, mean, median, or min. Statistic used in the spatial aggregation within each HUC12. For example, maximum ET is the maximum monthly pixel ET value occurring within the HUC12 boundary after summing daily ET in the month -year: 4-digit year -month: 2-digit month -count: Number of Landsat overpasses included in the ET calculation in the month -et_coverage_pct: Integer percentage of the HUC12 with ET data, which can be used to determine how representative the ET statistic is of the entire HUC12 -count_coverage_pct: Integer percentage of the HUC12 with count data, which can be different than the et_coverage_pct value because the “count” band in the source image collection extends beyond the “et” band in the eastern portion of the image collection extent For the Ensemble data, these additional variables are included in the CSV files: -et_mad: Ensemble ET value, computed as the mean of the ensemble after filtering outliers using the median absolute deviation (MAD) -et_mad_count: The number of models used to compute the ensemble ET value after filtering for outliers using the MAD -et_mad_max: The maximum value in the ensemble range, after filtering for outliers using the MAD -et_mad_min: The minimum value in the ensemble range, after filtering for outliers using the MAD -et_sam: A simple arithmetic mean (across the 6 models) of actual ET average without outlier removal Below are the locations of each OpenET image collection used in this summary: DisALEXI: https://developers.google.com/earth-engine/datasets/catalog/OpenET_DISALEXI_CONUS_GRIDMET_MONTHLY_v2_0 eeMETRIC: https://developers.google.com/earth-engine/datasets/catalog/OpenET_EEMETRIC_CONUS_GRIDMET_MONTHLY_v2_0 geeSEBAL: https://developers.google.com/earth-engine/datasets/catalog/OpenET_GEESEBAL_CONUS_GRIDMET_MONTHLY_v2_0 PT-JPL: https://developers.google.com/earth-engine/datasets/catalog/OpenET_PTJPL_CONUS_GRIDMET_MONTHLY_v2_0 SIMS: https://developers.google.com/earth-engine/datasets/catalog/OpenET_SIMS_CONUS_GRIDMET_MONTHLY_v2_0 SSEBop: https://developers.google.com/earth-engine/datasets/catalog/OpenET_SSEBOP_CONUS_GRIDMET_MONTHLY_v2_0 Ensemble: https://developers.google.com/earth-engine/datasets/catalog/OpenET_ENSEMBLE_CONUS_GRIDMET_MONTHLY_v2_0

  20. S USA.BdyOwn PADUS Designation USGS - Metadata Review

    • usfs.hub.arcgis.com
    Updated Aug 24, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Forest Service (2022). S USA.BdyOwn PADUS Designation USGS - Metadata Review [Dataset]. https://usfs.hub.arcgis.com/documents/c60d5c38020140d19e24dc1f120aebe2
    Explore at:
    Dataset updated
    Aug 24, 2022
    Dataset provided by
    U.S. Department of Agriculture Forest Servicehttp://fs.fed.us/
    Authors
    U.S. Forest Service
    Area covered
    United States,
    Description

    The USGS Protected Areas Database of the United States (PAD-US) is the nation's inventory of protected areas, including public land and voluntarily provided private protected areas, identified as an A-16 National Geospatial Data Asset in the Cadastre Theme ( https://communities.geoplatform.gov/ngda-cadastre/ ). The PAD-US is an ongoing project with several published versions of a spatial database including areas dedicated to the preservation of biological diversity, and other natural (including extraction), recreational, or cultural uses, managed for these purposes through legal or other effective means. The database was originally designed to support biodiversity assessments; however, its scope expanded in recent years to include all open space public and nonprofit lands and waters. Most are public lands owned in fee (the owner of the property has full and irrevocable ownership of the land); however, permanent and long-term easements, leases, agreements, Congressional (e.g. 'Wilderness Area'), Executive (e.g. 'National Monument'), and administrative designations (e.g. 'Area of Critical Environmental Concern') documented in agency management plans are also included. The PAD-US strives to be a complete inventory of U.S. public land and other protected areas, compiling “best available” data provided by managing agencies and organizations. The PAD-US geodatabase maps and describes areas using thirty-six attributes and five separate feature classes representing the U.S. protected areas network: Fee (ownership parcels), Designation, Easement, Marine, Proclamation and Other Planning Boundaries. An additional Combined feature class includes the full PAD-US inventory to support data management, queries, web mapping services, and analyses. The Feature Class (FeatClass) field in the Combined layer allows users to extract data types as needed. A Federal Data Reference file geodatabase lookup table (PADUS3_0Combined_Federal_Data_References) facilitates the extraction of authoritative federal data provided or recommended by managing agencies from the Combined PAD-US inventory. This PAD-US Version 3.0 dataset includes a variety of updates from the previous Version 2.1 dataset (USGS, 2020, https://doi.org/10.5066/P92QM3NT ), achieving goals to: 1) Annually update and improve spatial data representing the federal estate for PAD-US applications; 2) Update state and local lands data as state data-steward and PAD-US Team resources allow; and 3) Automate data translation efforts to increase PAD-US update efficiency. The following list summarizes the integration of "best available" spatial data to ensure public lands and other protected areas from all jurisdictions are represented in the PAD-US (other data were transferred from PAD-US 2.1). Federal updates - The USGS remains committed to updating federal fee owned lands data and major designation changes in annual PAD-US updates, where authoritative data provided directly by managing agencies are available or alternative data sources are recommended. The following is a list of updates or revisions associated with the federal estate: 1) Major update of the Federal estate (fee ownership parcels, easement interest, and management designations where available), including authoritative data from 8 agencies: Bureau of Land Management (BLM), U.S. Census Bureau (Census Bureau), Department of Defense (DOD), U.S. Fish and Wildlife Service (FWS), National Park Service (NPS), Natural Resources Conservation Service (NRCS), U.S. Forest Service (USFS), and National Oceanic and Atmospheric Administration (NOAA). The federal theme in PAD-US is developed in close collaboration with the Federal Geographic Data Committee (FGDC) Federal Lands Working Group (FLWG, https://communities.geoplatform.gov/ngda-govunits/federal-lands-workgroup/ ). 2) Improved the representation (boundaries and attributes) of the National Park Service, U.S. Forest Service, Bureau of Land Management, and U.S. Fish and Wildlife Service lands, in collaboration with agency data-stewards, in response to feedback from the PAD-US Team and stakeholders. 3) Added a Federal Data Reference file geodatabase lookup table (PADUS3_0Combined_Federal_Data_References) to the PAD-US 3.0 geodatabase to facilitate the extraction (by Data Provider, Dataset Name, and/or Aggregator Source) of authoritative data provided directly (or recommended) by federal managing agencies from the full PAD-US inventory. A summary of the number of records (Frequency) and calculated GIS Acres (vs Documented Acres) associated with features provided by each Aggregator Source is included; however, the number of records may vary from source data as the "State Name" standard is applied to national files. The Feature Class (FeatClass) field in the table and geodatabase describe the data type to highlight overlapping features in the full inventory (e.g. Designation features often overlap Fee features) and to assist users in building queries for applications as needed. 4) Scripted the translation of the Department of Defense, Census Bureau, and Natural Resource Conservation Service source data into the PAD-US format to increase update efficiency. 5) Revised conservation measures (GAP Status Code, IUCN Category) to more accurately represent protected and conserved areas. For example, Fish and Wildlife Service (FWS) Waterfowl Production Area Wetland Easements changed from GAP Status Code 2 to 4 as spatial data currently represents the complete parcel (about 10.54 million acres primarily in North Dakota and South Dakota). Only aliquot parts of these parcels are documented under wetland easement (1.64 million acres). These acreages are provided by the U.S. Fish and Wildlife Service and are referenced in the PAD-US geodatabase Easement feature class 'Comments' field. State updates - The USGS is committed to building capacity in the state data-steward network and the PAD-US Team to increase the frequency of state land updates, as resources allow. The USGS supported efforts to significantly increase state inventory completeness with the integration of local parks data in the PAD-US 2.1, and developed a state-to-PAD-US data translation script during PAD-US 3.0 development to pilot in future updates. Additional efforts are in progress to support the technical and organizational strategies needed to increase the frequency of state updates. The PAD-US 3.0 included major updates to the following three states: 1) California - added or updated state, regional, local, and nonprofit lands data from the California Protected Areas Database (CPAD), managed by GreenInfo Network, and integrated conservation and recreation measure changes following review coordinated by the data-steward with state managing agencies. Developed a data translation Python script (see Process Step 2 Source Data Documentation) in collaboration with the data-steward to increase the accuracy and efficiency of future PAD-US updates from CPAD. 2) Virginia - added or updated state, local, and nonprofit protected areas data (and removed legacy data) from the Virginia Conservation Lands Database, provided by the Virginia Department of Conservation and Recreation's Natural Heritage Program, and integrated conservation and recreation measure changes following review by the data-steward. 3) West Virginia - added or updated state, local, and nonprofit protected areas data provided by the West Virginia University, GIS Technical Center. For more information regarding the PAD-US dataset please visit, https://www.usgs.gov/gapanalysis/PAD-US/. For more information about data aggregation please review the PAD-US Data Manual available at https://www.usgs.gov/core-science-systems/science-analytics-and-synthesis/gap/pad-us-data-manual . A version history of PAD-US updates is summarized below (See https://www.usgs.gov/core-science-systems/science-analytics-and-synthesis/gap/pad-us-data-history for more information): 1) First posted - April 2009 (Version 1.0 - available from the PAD-US: Team pad-us@usgs.gov). 2) Revised - May 2010 (Version 1.1 - available from the PAD-US: Team pad-us@usgs.gov). 3) Revised - April 2011 (Version 1.2 - available from the PAD-US: Team pad-us@usgs.gov). 4) Revised - November 2012 (Version 1.3) https://doi.org/10.5066/F79Z92XD 5) Revised - May 2016 (Version 1.4) https://doi.org/10.5066/F7G73BSZ 6) Revised - September 2018 (Version 2.0) https://doi.org/10.5066/P955KPLE 7) Revised - September 2020 (Version 2.1) https://doi.org/10.5066/P92QM3NT 8) Revised - January 2022 (Version 3.0) https://doi.org/10.5066/P9Q9LQ4B Comparing protected area trends between PAD-US versions is not recommended without consultation with USGS as many changes reflect improvements to agency and organization GIS systems, or conservation and recreation measure classification, rather than actual changes in protected area acquisition on the ground.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Samuel Guay; Eric Earl; Hao-Ting Wang; Remi Gau; Dorota Jarecka; David Keator; Melissa Kline Struhl; Satra Ghosh; Louis De Beaumont; Adam G. Thomas (2022). BIDS Phenotype Aggregation Example Dataset [Dataset]. http://doi.org/10.18112/openneuro.ds004130.v1.0.0
Organization logo

BIDS Phenotype Aggregation Example Dataset

Explore at:
Dataset updated
Jun 4, 2022
Dataset provided by
OpenNeurohttps://openneuro.org/
Authors
Samuel Guay; Eric Earl; Hao-Ting Wang; Remi Gau; Dorota Jarecka; David Keator; Melissa Kline Struhl; Satra Ghosh; Louis De Beaumont; Adam G. Thomas
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

BIDS Phenotype Aggregation Example COPY OF "The NIMH Healthy Research Volunteer Dataset" (ds003982)

Modality-agnostic files were copied over and the CHANGES file was updated. Data was aggregated using:

python phenotype.py aggregate subject -i segregated_subject -o aggregated_subject

phenotype.py came from the GitHub repository: https://github.com/ericearl/bids-phenotype

THE ORIGINAL DATASET ds003982 README FOLLOWS

A comprehensive clinical, MRI, and MEG collection characterizing healthy research volunteers collected at the National Institute of Mental Health (NIMH) Intramural Research Program (IRP) in Bethesda, Maryland using medical and mental health assessments, diagnostic and dimensional measures of mental health, cognitive and neuropsychological functioning, structural and functional magnetic resonance imaging (MRI), along with diffusion tensor imaging (DTI), and a comprehensive magnetoencephalography battery (MEG).

In addition, blood samples are currently banked for future genetic analysis. All data collected in this protocol are broadly shared in the OpenNeuro repository, in the Brain Imaging Data Structure (BIDS) format. In addition, blood samples of healthy volunteers are banked for future analyses. All data collected in this protocol are broadly shared here, in the Brain Imaging Data Structure (BIDS) format. In addition, task paradigms and basic pre-processing scripts are shared on GitHub. This dataset is unique in its depth of characterization of a healthy population in terms of brain health and will contribute to a wide array of secondary investigations of non-clinical and clinical research questions.

This dataset is licensed under the Creative Commons Zero (CC0) v1.0 License.

Recruitment

Inclusion criteria for the study require that participants are adults at or over 18 years of age in good health with the ability to read, speak, understand, and provide consent in English. All participants provided electronic informed consent for online screening and written informed consent for all other procedures. Exclusion criteria include:

  • A history of significant or unstable medical or mental health condition requiring treatment
  • Current self-injury, suicidal thoughts or behavior
  • Current illicit drug use by history or urine drug screen
  • Abnormal physical exam or laboratory result at the time of in-person assessment
  • Less than an 8th grade education or IQ below 70
  • Current employees, or first-degree relatives of NIMH employees

Study participants are recruited through direct mailings, bulletin boards and listservs, outreach exhibits, print advertisements, and electronic media.

Clinical Measures

All potential volunteers first visit the study website (https://nimhresearchvolunteer.ctss.nih.gov), check a box indicating consent, and complete preliminary self-report screening questionnaires. The study website is HIPAA compliant and therefore does not collect PII ; instead, participants are instructed to contact the study team to provide their identity and contact information. The questionnaires include demographics, clinical history including medications, disability status (WHODAS 2.0), mental health symptoms (modified DSM-5 Self-Rated Level 1 Cross-Cutting Symptom Measure), substance use survey (DSM-5 Level 2), alcohol use (AUDIT), handedness (Edinburgh Handedness Inventory), and perceived health ratings. At the conclusion of the questionnaires, participants are again prompted to send an email to the study team. Survey results, supplemented by NIH medical records review (if present), are reviewed by the study team, who determine if the participant is likely eligible for the protocol. These participants are then scheduled for an in-person assessment. Follow-up phone screenings were also used to determine if participants were eligible for in-person screening.

In-person Assessments

At this visit, participants undergo a comprehensive clinical evaluation to determine final eligibility to be included as a healthy research volunteer. The mental health evaluation consists of a psychiatric diagnostic interview (Structured Clinical Interview for DSM-5 Disorders (SCID-5), along with self-report surveys of mood (Beck Depression Inventory-II (BD-II) and anxiety (Beck Anxiety Inventory, BAI) symptoms. An intelligence quotient (IQ) estimation is determined with the Kaufman Brief Intelligence Test, Second Edition (KBIT-2). The KBIT-2 is a brief (20-30 minute) assessment of intellectual functioning administered by a trained examiner. There are three subtests, including verbal knowledge, riddles, and matrices.

Medical Evaluation

Medical evaluation includes medical history elicitation and systematic review of systems. Biological and physiological measures include vital signs (blood pressure, pulse), as well as weight, height, and BMI. Blood and urine samples are taken and a complete blood count, acute care panel, hepatic panel, thyroid stimulating hormone, viral markers (HCV, HBV, HIV), C-reactive protein, creatine kinase, urine drug screen and urine pregnancy tests are performed. In addition, blood samples that can be used for future genomic analysis, development of lymphoblastic cell lines or other biomarker measures are collected and banked with the NIMH Repository and Genomics Resource (Infinity BiologiX). The Family Interview for Genetic Studies (FIGS) was later added to the assessment in order to provide better pedigree information; the Adverse Childhood Events (ACEs) survey was also added to better characterize potential risk factors for psychopathology. The entirety of the in-person assessment not only collects information relevant for eligibility determination, but it also provides a comprehensive set of standardized clinical measures of volunteer health that can be used for secondary research.

MRI Scan

Participants are given the option to consent for a magnetic resonance imaging (MRI) scan, which can serve as a baseline clinical scan to determine normative brain structure, and also as a research scan with the addition of functional sequences (resting state and diffusion tensor imaging). The MR protocol used was initially based on the ADNI-3 basic protocol, but was later modified to include portions of the ABCD protocol in the following manner:

  1. The T1 scan from ADNI3 was replaced by the T1 scan from the ABCD protocol.
  2. The Axial T2 2D FLAIR acquisition from ADNI2 was added, and fat saturation turned on.
  3. Fat saturation was turned on for the pCASL acquisition.
  4. The high-resolution in-plane hippocampal 2D T2 scan was removed and replaced with the whole brain 3D T2 scan from the ABCD protocol (which is resolution and bandwidth matched to the T1 scan).
  5. The slice-select gradient reversal method was turned on for DTI acquisition, and reconstruction interpolation turned off.
  6. Scans for distortion correction were added (reversed-blip scans for DTI and resting state scans).
  7. The 3D FLAIR sequence was made optional and replaced by one where the prescription and other acquisition parameters provide resolution and geometric correspondence between the T1 and T2 scans.

At the time of the MRI scan, volunteers are administered a subset of tasks from the NIH Toolbox Cognition Battery. The four tasks include:

  1. Flanker inhibitory control and attention task assesses the constructs of attention and executive functioning.
  2. Executive functioning is also assessed using a dimensional change card sort test.
  3. Episodic memory is evaluated using a picture sequence memory test.
  4. Working memory is evaluated using a list sorting test.

MEG

An optional MEG study was added to the protocol approximately one year after the study was initiated, thus there are relatively fewer MEG recordings in comparison to the MRI dataset. MEG studies are performed on a 275 channel CTF MEG system (CTF MEG, Coquiltam BC, Canada). The position of the head was localized at the beginning and end of each recording using three fiducial coils. These coils were placed 1.5 cm above the nasion, and at each ear, 1.5 cm from the tragus on a line between the tragus and the outer canthus of the eye. For 48 participants (as of 2/1/2022), photographs were taken of the three coils and used to mark the points on the T1 weighted structural MRI scan for co-registration. For the remainder of the participants (n=16 as of 2/1/2022), a Brainsight neuronavigation system (Rogue Research, Montréal, Québec, Canada) was used to coregister the MRI and fiducial localizer coils in realtime prior to MEG data acquisition.

Specific Measures within Dataset

Online and In-person behavioral and clinical measures, along with the corresponding phenotype file name, sorted first by measurement location and then by file name.

LocationMeasureFile Name
OnlineAlcohol Use Disorders Identification Test (AUDIT)audit
Demographicsdemographics
DSM-5 Level 2 Substance Use - Adultdrug_use
Edinburgh Handedness Inventory (EHI)ehi
Health History Formhealth_history_questions
Perceived Health Rating - selfhealth_rating
Search
Clear search
Close search
Google apps
Main menu