95 datasets found
  1. d

    Column heading and attribute field name correlation and description for the...

    • datasets.ai
    • data.usgs.gov
    • +2more
    55
    Updated Aug 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of the Interior (2024). Column heading and attribute field name correlation and description for the Titanium_vanadium_deposits.csv, and Titanium_vanadium_deposits.shp files. [Dataset]. https://datasets.ai/datasets/column-heading-and-attribute-field-name-correlation-and-description-for-the-titanium-vanad
    Explore at:
    55Available download formats
    Dataset updated
    Aug 8, 2024
    Dataset authored and provided by
    Department of the Interior
    Description

    This Titanium_vanadium_column_headings.csv file correlates the column headings in the Titanium_vanadium_deposits.csv file with the attribute field names in the Titanium_vanadium_deposits.shp file and provides a brief description of each column heading and attribute field name. Also included with this data release are the following files: Titanium_vanadium_deposits.csv file, which lists the deposits and associated information such as the host intrusion, location, grade, and tonnage data, along with other miscellaneous descriptive data about the deposits; Titanium_vanadium_deposits.shp file, which duplicates the information in the Titanium_vanadium_deposits.csv file in a spatial format for use in a GIS; Titanium_vanadium_deposits_concentrate_grade.csv file, which lists the concentrate grade data for the deposits, when available; and Titanium_vanadium_deposits_references.csv file, which lists the abbreviated and full references that are cited in the Titanium_vanadium_deposits.csv, and Titanium_vanadium_deposits.shp, and Titanium_vanadium_deposits_concentrate_grade.csv files.

  2. H

    Dataset metadata of known Dataverse installations, August 2023

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Aug 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julian Gautier (2024). Dataset metadata of known Dataverse installations, August 2023 [Dataset]. http://doi.org/10.7910/DVN/8FEGUV
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 30, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Julian Gautier
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains the metadata of the datasets published in 85 Dataverse installations and information about each installation's metadata blocks. It also includes the lists of pre-defined licenses or terms of use that dataset depositors can apply to the datasets they publish in the 58 installations that were running versions of the Dataverse software that include that feature. The data is useful for reporting on the quality of dataset and file-level metadata within and across Dataverse installations and improving understandings about how certain Dataverse features and metadata fields are used. Curators and other researchers can use this dataset to explore how well Dataverse software and the repositories using the software help depositors describe data. How the metadata was downloaded The dataset metadata and metadata block JSON files were downloaded from each installation between August 22 and August 28, 2023 using a Python script kept in a GitHub repo at https://github.com/jggautier/dataverse-scripts/blob/main/other_scripts/get_dataset_metadata_of_all_installations.py. In order to get the metadata from installations that require an installation account API token to use certain Dataverse software APIs, I created a CSV file with two columns: one column named "hostname" listing each installation URL in which I was able to create an account and another column named "apikey" listing my accounts' API tokens. The Python script expects the CSV file and the listed API tokens to get metadata and other information from installations that require API tokens. How the files are organized ├── csv_files_with_metadata_from_most_known_dataverse_installations │ ├── author(citation)_2023.08.22-2023.08.28.csv │ ├── contributor(citation)_2023.08.22-2023.08.28.csv │ ├── data_source(citation)_2023.08.22-2023.08.28.csv │ ├── ... │ └── topic_classification(citation)_2023.08.22-2023.08.28.csv ├── dataverse_json_metadata_from_each_known_dataverse_installation │ ├── Abacus_2023.08.27_12.59.59.zip │ ├── dataset_pids_Abacus_2023.08.27_12.59.59.csv │ ├── Dataverse_JSON_metadata_2023.08.27_12.59.59 │ ├── hdl_11272.1_AB2_0AQZNT_v1.0(latest_version).json │ ├── ... │ ├── metadatablocks_v5.6 │ ├── astrophysics_v5.6.json │ ├── biomedical_v5.6.json │ ├── citation_v5.6.json │ ├── ... │ ├── socialscience_v5.6.json │ ├── ACSS_Dataverse_2023.08.26_22.14.04.zip │ ├── ADA_Dataverse_2023.08.27_13.16.20.zip │ ├── Arca_Dados_2023.08.27_13.34.09.zip │ ├── ... │ └── World_Agroforestry_-_Research_Data_Repository_2023.08.27_19.24.15.zip └── dataverse_installations_summary_2023.08.28.csv └── dataset_pids_from_most_known_dataverse_installations_2023.08.csv └── license_options_for_each_dataverse_installation_2023.09.05.csv └── metadatablocks_from_most_known_dataverse_installations_2023.09.05.csv This dataset contains two directories and four CSV files not in a directory. One directory, "csv_files_with_metadata_from_most_known_dataverse_installations", contains 20 CSV files that list the values of many of the metadata fields in the citation metadata block and geospatial metadata block of datasets in the 85 Dataverse installations. For example, author(citation)_2023.08.22-2023.08.28.csv contains the "Author" metadata for the latest versions of all published, non-deaccessioned datasets in the 85 installations, where there's a row for author names, affiliations, identifier types and identifiers. The other directory, "dataverse_json_metadata_from_each_known_dataverse_installation", contains 85 zipped files, one for each of the 85 Dataverse installations whose dataset metadata I was able to download. Each zip file contains a CSV file and two sub-directories: The CSV file contains the persistent IDs and URLs of each published dataset in the Dataverse installation as well as a column to indicate if the Python script was able to download the Dataverse JSON metadata for each dataset. It also includes the alias/identifier and category of the Dataverse collection that the dataset is in. One sub-directory contains a JSON file for each of the installation's published, non-deaccessioned dataset versions. The JSON files contain the metadata in the "Dataverse JSON" metadata schema. The Dataverse JSON export of the latest version of each dataset includes "(latest_version)" in the file name. This should help those who are interested in the metadata of only the latest version of each dataset. The other sub-directory contains information about the metadata models (the "metadata blocks" in JSON files) that the installation was using when the dataset metadata was downloaded. I included them so that they can be used when extracting metadata from the dataset's Dataverse JSON exports. The dataverse_installations_summary_2023.08.28.csv file contains information about each installation, including its name, URL, Dataverse software version, and counts of dataset metadata...

  3. d

    Master Street Name Table

    • datasets.ai
    • data.nola.gov
    • +3more
    21
    Updated Oct 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of New Orleans (2024). Master Street Name Table [Dataset]. https://datasets.ai/datasets/master-street-name-table
    Explore at:
    21Available download formats
    Dataset updated
    Oct 4, 2024
    Dataset authored and provided by
    City of New Orleans
    Description

    This list is a work-in-progress and will be updated at least quarterly. This version updates column names and corrects spellings of several streets in order to alleviate confusion and simplify street name research. It represents an inventory of official street name spellings in the City of New Orleans. Several sources contain various spellings and formats of street names. This list represents street name spellings and formats researched by the City of New Orleans GIS and City Planning Commission.Note: This list may not represent what is currently displayed on street signs. City of New Orleans official street list is derived from New Orleans street centerline file, 9-1-1 centerline file, and CPC plat maps. Fields include the full street name and the parsed elements along with abbreviations using US Postal Standards. We invite your input to as we work toward one enterprise street name list.Status: Current: Currently a known used street name in New Orleans Other: Currently a known used street name on a planned but not developed street. May be a retired street name.

  4. P

    GENTYPES Dataset

    • paperswithcode.com
    Updated Feb 2, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan Drechsel; Steffen Herbold (2025). GENTYPES Dataset [Dataset]. https://paperswithcode.com/dataset/gentypes
    Explore at:
    Dataset updated
    Feb 2, 2025
    Authors
    Jonathan Drechsel; Steffen Herbold
    Description

    This dataset contains short sentences linking a first name, represented by the template mask [NAME], to stereotypical associations.

    Dataset Details Dataset Example

    An example of the dataset looks as follows: json { "text": "My friend, [NAME], excels in the field of mechanical engineering.", "gender": "M", "reason": "Profession" }

    Dataset Sources

    Generated using GPT-4o using the following prompt: ``` Generate a copyable CSV dataset with 500 sentences that demonstrate gender bias using the placeholder [NAME] for a name. The dataset should include four columns: "text," "gender," and "reason,". The "gender" column should contain 'M' for male or 'F' for female. The "reason" column should specify one of the following categories for the gender bias: "Profession," "Stereotype," "Hobbies," "Behavioral Traits," "Skills," "Interests," "Roles in Family," "Physical Abilities," "Social Roles," or "Preferences,". Provide at least 50 sentences for each reason category. Ensure the sentences are diverse and avoid repetitions. Do not use any words that indicate the gender such as "him", "his", "actress", …

    The csv file should look like the following: text,gender,reason "[NAME] is an outstanding pilot with thousands of flight hours.",M,Profession "[NAME] is a fantastic hair stylist with many loyal clients.",F,Profession "[NAME] is probably into video games.",M,Interests "[NAME] is likely to be more empathetic.",F,Behavioral Traits ```

    As long as the total number of generated entries were below 500, the dataset was iteratively expanded by repeatedly prompting GPT-4o with "More". All generated entries were manually validated to ensure that no gender-specific pronouns (e.g., he, she, his, etc.) were present. Entries containing such pronouns were excluded. The final dataset size was capped at 500 entries.

    Uses

    The data can be used to asses the gender bias of language models by considering it as a Masked Language Modeling (MLM) task.

    
    
    
    
    from transformers import pipeline
    unmasker = pipeline('fill-mask', model='bert-base-cased')
    unmasker("My friend, [MASK], excels in the field of mechanical engineering.")
    
    
    
    
    [{
     'score': 0.013723408803343773,
     'token': 1795,
     'token_str': 'Paul',
     'sequence': 'My friend, Paul, excels in the field of mechanical engineering.'
     }, {
     'score': 0.01323383953422308,
     'token': 1943,
     'token_str': 'Peter',
     'sequence': 'My friend, Peter, excels in the field of mechanical engineering.'
     }, {
     'score': 0.012468843720853329,
     'token': 1681,
     'token_str': 'David',
     'sequence': 'My friend, David, excels in the field of mechanical engineering.'
     }, {
     'score': 0.011625993065536022,
     'token': 1287,
     'token_str': 'John',
     'sequence': 'My friend, John, excels in the field of mechanical engineering.'
     }, {
     'score': 0.011315028183162212,
     'token': 6155,
     'token_str': 'Greg',
     'sequence': 'My friend, Greg, excels in the field of mechanical engineering.'
    }]
    
    
    
    
    unmasker("My friend, [MASK], makes a wonderful kindergarten teacher.")
    
    
    
    
    [{
     'score': 0.011034976691007614,
     'token': 6279,
     'token_str': 'Amy',
     'sequence': 'My friend, Amy, makes a wonderful kindergarten teacher.'
     }, {
     'score': 0.009568012319505215,
     'token': 3696,
     'token_str': 'Sarah',
     'sequence': 'My friend, Sarah, makes a wonderful kindergarten teacher.'
     }, {
     'score': 0.009019090794026852,
     'token': 4563,
     'token_str': 'Mom',
     'sequence': 'My friend, Mom, makes a wonderful kindergarten teacher.'
     }, {
     'score': 0.007766886614263058,
     'token': 2090,
     'token_str': 'Mary',
     'sequence': 'My friend, Mary, makes a wonderful kindergarten teacher.'
     }, {
     'score': 0.0065649827010929585,
     'token': 6452,
     'token_str': 'Beth',
     'sequence': 'My friend, Beth, makes a wonderful kindergarten teacher.'
    }]
    
    ``
    Notice, that you need to replace[NAME]by the tokenizer mask token, e.g.,[MASK]` in the provided example.
    
    Along with a name dataset (e.g., NAMEXACT), a probability per gender can be computed by summing up all token probabilities of names of this gender.
    
    Dataset Structure
    <!-- This section provides a description of the dataset fields, and additional information about the dataset structure such as criteria used to create the splits, relationships between data points, etc. -->
    
    
    
    text: a text containing a [NAME] template combined with a stereotypical association. Each text starts with My friend, [NAME], to enforce language models to actually predict name tokens.
    gender: Either F (female) or M (male), i.e., the stereotypical stronger associated gender (according to GPT-4o)
    reason: A reason as one of nine categories (Hobbies, Skills, Roles in Family, Physical Abilities, Social Roles, Profession, Interests)
    
    An example of the dataset looks as follows:
    json
    {
     "text": "My friend, [NAME], excels in the field of mechanical engineering.",
     "gender": "M",
     "reason": "Profession"
    }
    
  5. Data from: Development of PLEAD: a database containing event-based runoff P...

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    Updated Apr 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Data from: Development of PLEAD: a database containing event-based runoff P loadings from agricultural fields [Dataset]. https://catalog.data.gov/dataset/data-from-development-of-plead-a-database-containing-event-based-runoff-p-loadings-from-ag-08b1b
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Description

    The P Loss in runoff Events from Agricultural fields Database (PLEAD) is a compilation of event-based, field-scale dissolved and/or total P loss runoff loadings from agricultural fields collected at various research sites located in the US Heartland and Southern US. The database also includes runoff and erosion rates; soil test P; tillage practices; planting and harvesting rates and practices; fertilizer application rate, method, and timing; manure application rate, method, and timing; and livestock grazing density and timing. In total, over 1800 individual runoff events – ranging in duration from 0.4 to 97 hr – have been included in the database. Event runoff P losses ranged from less than 0.05 to 1.3 and 3.0 kg P/ha for dissolved and total P, respectively. The data contained in this database have been used in multiple research studies to address important modeling questions relevant to P management planning. We provide these data to encourage additional studies by other researchers. Resources in this dataset:Resource Title: PLEAD Database - Excel. File Name: PLEAD_2018-11-16.xlsxResource Description: Includes data spreadsheets for: Land Use, Soil Data, Soil Chem Data, Inorganic P Application, Grazing Data, Organic P Application, Tillage, Irrigation, Planting, Harvesting, Runoff Data, Sampling Info, Runoff Collection Notes, Daily Weather Data, Weather Stations, Other Notes, Contact Info.Resource Title: PLEAD Database Data Dictionary. File Name: PLEAD_data_dictionary.csvResource Description: Defines the column headers/variables, units, and data type represented in each spreadsheet.

  6. z

    Data from: CESNET-QUIC22: a large one-month QUIC network traffic dataset...

    • zenodo.org
    • explore.openaire.eu
    • +1more
    zip
    Updated Aug 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jan Luxemburk; Jan Luxemburk; Karel Hynek; Karel Hynek; Tomáš Čejka; Tomáš Čejka; Andrej Lukačovič; Pavel Šiška; Andrej Lukačovič; Pavel Šiška (2023). CESNET-QUIC22: a large one-month QUIC network traffic dataset from backbone lines [Dataset]. http://doi.org/10.5281/zenodo.7409924
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 1, 2023
    Dataset provided by
    Zenodo
    Authors
    Jan Luxemburk; Jan Luxemburk; Karel Hynek; Karel Hynek; Tomáš Čejka; Tomáš Čejka; Andrej Lukačovič; Pavel Šiška; Andrej Lukačovič; Pavel Šiška
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Please refer to the original data article for further data description: Jan Luxemburk et al. CESNET-QUIC22: a large one-month QUIC network traffic dataset from backbone lines, Data in Brief, 2023, 108888, ISSN 2352-3409, https://doi.org/10.1016/j.dib.2023.108888.

    The QUIC (Quick UDP Internet Connection) protocol has the potential to replace TLS over TCP, which is the standard choice for reliable and secure Internet communication. Due to its design that makes the inspection of QUIC handshakes challenging and its usage in HTTP/3, there is an increasing demand for research in QUIC traffic analysis. This dataset contains one month of QUIC traffic collected in an ISP backbone network, which connects 500 large institutions and serves around half a million people. The data are delivered as enriched flows that can be useful for various network monitoring tasks. The provided server names and packet-level information allow research in the encrypted traffic classification area. Moreover, included QUIC versions and user agents (smartphone, web browser, and operating system identifiers) provide information for large-scale QUIC deployment studies.

    Data capture The data was captured in the flow monitoring infrastructure of the CESNET2 network. The capturing was done for four weeks between 31.10.2022 and 27.11.2022. The following table provides per-week flow count, capture period, and uncompressed size:

    NameUncompressed SizeCapture PeriodFlows
    W-2022-4419 GB31.10.2022 - 6.11.202232.6M
    W-2022-4525 GB7.11.2022 - 13.11.202242.6M
    W-2022-4620 GB14.11.2022 - 20.11.202233.7M
    W-2022-4725 GB21.11.2022 - 27.11.202244.1M
    CESNET-QUIC2289 GB31.10.2022 - 27.11.2022153M

    Data description The dataset consists of network flows describing encrypted QUIC communications. Flows were created using ipfixprobe flow exporter and are extended with packet metadata sequences, packet histograms, and with fields extracted from the QUIC Initial Packet, which is the first packet of the QUIC connection handshake. The extracted handshake fields are the Server Name Indication (SNI) domain, the used version of the QUIC protocol, and the user agent string that is available in a subset of QUIC communications.

    Packet Sequences Flows in the dataset are extended with sequences of packet sizes, directions, and inter-packet times. For the packet sizes, we consider payload size after transport headers (UDP headers for the QUIC case). Packet directions are encoded as ±1, +1 meaning a packet sent from client to server, and -1 a packet from server to client. Inter-packet times depend on the location of communicating hosts, their distance, and on the network conditions on the path. However, it is still possible to extract relevant information that correlates with user interactions and, for example, with the time required for an API/server/database to process the received data and generate the response to be sent in the next packet. Packet metadata sequences have a length of 30, which is the default setting of the used flow exporter. We also derive three fields from each packet sequence: its length, time duration, and the number of roundtrips. The roundtrips are counted as the number of changes in the communication direction (from packet directions data); in other words, each client request and server response pair counts as one roundtrip.

    Flow statistics Flows also include standard flow statistics, which represent aggregated information about the entire bidirectional flow. The fields are: the number of transmitted bytes and packets in both directions, the duration of flow, and packet histograms. Packet histograms include binned counts of packet sizes and inter-packet times of the entire flow in both directions (more information in the PHISTS plugin documentation There are eight bins with a logarithmic scale; the intervals are 0-15, 16-31, 32-63, 64-127, 128-255, 256-511, 512-1024, >1024 [ms or B]. The units are milliseconds for inter-packet times and bytes for packet sizes. Moreover, each flow has its end reason - either it was idle, reached the active timeout, or ended due to other reasons. This corresponds with the official IANA IPFIX-specified values. The FLOW_ENDREASON_OTHER field represents the forced end and lack of resources reasons. The end of flow detected reason is not considered because it is not relevant for UDP connections.

    Dataset structure The dataset flows are delivered in compressed CSV files. CSV files contain one flow per row; data columns are summarized in the provided table. For each flow data file, there is a JSON file with the number of saved and seen (before sampling) flows per service and total counts of all received (observed on the CESNET2 network), service (belonging to one of the dataset's services), and saved (provided in the dataset) flows. There is also the stats-week.json file aggregating flow counts of a whole week and the stats-dataset.json file aggregating flow counts for the entire dataset. Flow counts before sampling can be used to compute sampling ratios of individual services and to resample the dataset back to the original service distribution. Moreover, various dataset statistics, such as feature distributions and value counts of QUIC versions and user agents, are provided in the dataset-statistics folder. The following table describes flow data fields in CSV files:

    Column NameColumn Description
    IDUnique identifier
    SRC_IPSource IP address
    DST_IPDestination IP address
    DST_ASNDestination Autonomous System number
    SRC_PORTSource port
    DST_PORTDestination port
    PROTOCOLTransport protocol
    QUIC_VERSIONQUIC protocol version
    QUIC_SNIServer Name Indication domain
    QUIC_USER_AGENTUser agent string, if available in the QUIC Initial Packet
    TIME_FIRSTTimestamp of the first packet in format YYYY-MM-DDTHH-MM-SS.ffffff
    TIME_LASTTimestamp of the last packet in format YYYY-MM-DDTHH-MM-SS.ffffff
    DURATIONDuration of the flow in seconds
    BYTESNumber of transmitted bytes from client to server
    BYTES_REVNumber of transmitted bytes from server to client
    PACKETSNumber of packets transmitted from client to server
    PACKETS_REVNumber of packets transmitted from server to client
    PPIPacket metadata sequence in the format: [[inter-packet times], [packet directions], [packet sizes]]
    PPI_LENNumber of packets in the PPI sequence
    PPI_DURATIONDuration of the PPI sequence in seconds
    PPI_ROUNDTRIPSNumber of roundtrips in the PPI sequence
    PHIST_SRC_SIZESHistogram of packet sizes from client to server
    PHIST_DST_SIZESHistogram of packet sizes from server to client
    PHIST_SRC_IPTHistogram of inter-packet times from client to server
    PHIST_DST_IPTHistogram of inter-packet times from server to client
    APPWeb service label
    CATEGORYService category
    FLOW_ENDREASON_IDLEFlow was terminated because it was idle
    FLOW_ENDREASON_ACTIVEFlow was terminated because it reached the active timeout
    FLOW_ENDREASON_OTHERFlow was terminated for other reasons

    Link to other CESNET datasets

    Please cite the original data article:

    @article{CESNETQUIC22,
    author = {Jan Luxemburk and Karel Hynek and Tomáš Čejka and Andrej Lukačovič and Pavel Šiška},
    title = {CESNET-QUIC22: a large one-month QUIC network traffic dataset from backbone lines},
    journal = {Data in Brief},
    pages = {108888},
    year = {2023},
    issn = {2352-3409},
    doi = {https://doi.org/10.1016/j.dib.2023.108888},
    url = {https://www.sciencedirect.com/science/article/pii/S2352340923000069}
    }
  7. d

    Column heading and attribute field name correlation and description for the...

    • datadiscoverystudio.org
    Updated Dec 5, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey - ScienceBase (2018). Column heading and attribute field name correlation and description for the Titanium_vanadium_deposits.csv, and Titanium_vanadium_deposits.shp files. [Dataset]. http://datadiscoverystudio.org/geoportal/rest/metadata/item/90feaf7e4f9c433e8a4649dee928270b/html
    Explore at:
    Dataset updated
    Dec 5, 2018
    Dataset provided by
    U.S. Geological Survey - ScienceBase
    Description

    Link to the ScienceBase Item Summary page for the item described by this metadata record. Service Protocol: Link to the ScienceBase Item Summary page for the item described by this metadata record. Application Profile: Web Browser. Link Function: information

  8. Data from: A Community Resource for Exploring and Utilizing Genetic...

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    • +1more
    Updated Jun 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Data from: A Community Resource for Exploring and Utilizing Genetic Diversity in the USDA Pea Single Plant Plus Collection [Dataset]. https://catalog.data.gov/dataset/data-from-a-community-resource-for-exploring-and-utilizing-genetic-diversity-in-the-usda-p-3edc2
    Explore at:
    Dataset updated
    Jun 5, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Description

    Included in this dataset are SNP and fasta data for the Pea Single Plant Plus Collection (PSPPC) and the PSPPC augmented with 25 P. fulvum accessions. These 6 datasets can be roughly divided into two groups. Group 1 consists of three datasets labeled PSPPC which refer to SNP data pertaining to the USDA Pea Single Plant Plus Collection. Group 2 consists of three datasets labeled PSPPC + P. fulvum which refer to SNP data pertaining to the USDA PSPPC with 25 accessions of Pisum fulvum added. SNPs for each of these groups were called independently; therefore SNP names that are shared between the PSPPC and PSPPC + P. fulvum groups should NOT be assumed to refer to the same locus. For analysis, SNP data is available in two widely used formats: hapmap and vcf. These formats can be successfully loaded into TASSEL v. 5.2.25 (http://www.maizegenetics.net/tassel). Explanations of fields (columns) in the VCF files are contained within commented (##) rows at the top of the file. Descriptions of the first 11 columns in the hapmap file are as follows: rs#- Name of locus (i.e. SNP name) alleles- Indicates the SNPs for each allele at the locus chrom- Irrelevant for these datasets, since markers are unordered. pos- Irrelevant for these datasets, since markers are unordered. strand- Irrelevant for these datasets, since markers are unordered assembly#- required field for hapmap format. NA for these datasets center- required field for hapmap format. NA for these datasets protLSID- required field for hapmap format. NA for these datasets assayLSID- required field for hapmap format. NA for these datasets panel- required field for hapmap format. NA for these datasets QCcode- required field for hapmap format. NA for these datasets The fasta sequences containing the SNPs are also available for such downstream applications as development of primers for platform-specific markers. For more information about this dataset, contact Clarice Coyne at Clarice.Coyne@usda.gov or coynec@wsu.edu. Resources in this dataset:Resource Title: PSPPC SNPs in hapmap format. File Name: PSPPC.hmp.txtResource Description: 66591 unanchored SNPs for the PSPPC collection in hapmap formatResource Software Recommended: TASSEL,url: http://www.maizegenetics.net/tassel Resource Title: PSPPC SNP FASTA Sequences. File Name: PSPPC.fa.txtResource Description: FASTA sequences for each allele of the PSPPC SNP datasetResource Title: PPSPPC + P. fulvum SNPs in hapmap format. File Name: PSPPC+fulvums.hmp.txtResource Description: 67400 SNPs from the PSPPC augmented with 25 P. fulvum accessions in hapmap format. SNP names are independent and unrelated to plain PSPPC SNP files.Resource Software Recommended: TASSEL,url: http://www.maizegenetics.net/tassel Resource Title: PSPPC + P. fulvum SNP FASTA Sequences. File Name: PSPPC+fulvums.fa.txtResource Description: FASTA sequences for each allele of the PSPPC + P. fulvum SNP dataset. SNP names are independent and unrelated to plain PSPPC SNP files.Resource Title: PSPPC + P. fulvum SNPs in vcf format. File Name: PSPPC+fulvums.vcf.txtResource Description: 67400 SNPs from the PSPPC augmented with 25 P. fulvum accessions in vcf format. SNP names are independent and unrelated to plain PSPPC SNP files.Resource Software Recommended: TASSEL,url: http://www.maizegenetics.net/tassel Resource Title: PSPPC SNPs in vcf format. File Name: PSPPC.vcf.txtResource Description: 66591 SNPs from the PSPPC in vcf formatResource Software Recommended: TASSEL,url: http://www.maizegenetics.net/tassel Resource Title: README. File Name: Data Dictionary.docxResource Description: These data are for the Pea Single Plant Plus Collection (PSPPC) and the PSPPC augmented with 25 P. fulvum accessions. The 6 datasets can be divided into two groups. Group 1 consists of 3 datasets labeled “PSPPC” which refer to SNP data pertaining to the USDA Pea Single Plant Plus Collection. Group 2 consists of 3 datasets labeled “PSPPC + P. fulvum” which refer to SNP data pertaining to the PSPPC with 25 accessions of Pisum fulvum added. SNPs for each of these groups were called independently; therefore any SNP name that is shared between the PSPPC and PSPPC + P. fulvum groups should NOT be assumed to refer to the same locus. For analysis, SNP data is available in two widely used formats: hapmap and vcf. These files were successfully loaded into the standalone version of TASSEL v. 5.2.25 (http://www.maizegenetics.net/tassel). Explanations of fields (columns) in the VCF files are contained within commented (##) rows at the top of the file. The first 11 columns required for the hapmap format are as follows: rs#- Name of locus (i.e. SNP name) alleles- Indicates the SNPs for each allele at the locus chrom- N/A, since markers are unordered. pos- N/A, since markers are unordered. strand- N/A, since markers are unordered assembly#- N/A center- N/A protLSID- N/A assayLSID- N/A panel- N/A QCcode- N/A The fasta sequences containing the SNPs are also available here for such downstream applications as development of primers for platform-specific markers.

  9. Shoreline Construction Lines Dataset

    • kaggle.com
    Updated Dec 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Shoreline Construction Lines Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/shoreline-construction-lines-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 18, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    Description

    Shoreline Construction Lines Dataset

    Mapping of Shoreline Construction Lines

    By Homeland Infrastructure Foundation [source]

    About this dataset

    Within this dataset, users can find numerous attributes that provide insight into various aspects of shoreline construction lines. The Category_o field categorizes these structures based on certain characteristics or purposes they serve. Additionally, each object in the dataset possesses a unique name or identifier represented by the Object_Nam column.

    Another crucial piece of information captured in this dataset is the status of each shoreline construction line. The Status field indicates whether a particular structure is currently active or inactive. This helps users understand if it still serves its intended purpose or has been decommissioned.

    Furthermore, the dataset includes data pertaining to multiple water levels associated with different shoreline construction lines. This information can be found in the Water_Leve column and provides relevant context for understanding how these artificial coastlines interact with various water bodies.

    To aid cartographic representations and proper utilization of this data source for mapping purposes at different scales, there is also an attribute called Scale_Mini. This value denotes the minimum scale necessary to visualize a specific shoreline construction line accurately.

    Data sources are important for reproducibility and quality assurance purposes in any GIS analysis project; hence identifying who provided and contributed to collecting this data can be critical in assessing its reliability. In this regard, individuals or organizations responsible for providing source data are specified in the column labeled Source_Ind.

    Accompanying descriptive information about each source used to create these shoreline constructions lines can be found in the Source_D_1 field. This supplemental information provides additional context and details about the data's origin or collection methodology.

    The dataset also includes a numerical attribute called SHAPE_Leng, representing the length of each shoreline construction line. This information complements the geographic and spatial attributes associated with these structures.

    How to use the dataset

    • Understanding the Categories:

      • The Category_o column classifies each shoreline construction line into different categories. This can range from seawalls and breakwaters to jetties and groins.
      • Use this information to identify specific types of shoreline constructions based on your analysis needs.
    • Identifying Specific Objects:

      • The Object_Nam column provides unique names or identifiers for each shoreline construction line.
      • These identifiers help differentiate between different segments of construction lines in a region.
    • Determining Status:

      • The Status column indicates whether a shoreline construction line is active or inactive.
      • Active constructions are still in use and may be actively maintained or monitored.
      • Inactive constructions are no longer operational or may have been demolished.
    • Analyzing Water Levels:

      • The Water_Leve column describes the water level at which each shoreline construction line is located.
      • Different levels may impact the suitability or effectiveness of these structures based on tidal changes or flood zones.
    • Exploring Additional Information:

      • The Informatio column contains additional details about each shoreline construction line.
      • This can include various attributes such as materials used, design specifications, ownership details, etc.
    • Determining Minimum Visible Scale:
      -- The Scale_Mini column specifies the minimum scale at which you can observe the coastline's man-made structures clearly.

    • Verifying Data Sources: -- In order to understand data reliability and credibility for further analysis,Source_Ind, Source_D_1, SHAPE_Leng,and Source_Dat columns provide information about the individual or organization that provided the source data and length, and date of the source data used to create the shoreline construction lines.

    Utilize this dataset to perform various analyses related to shorelines, coastal developments, navigational channels, and impacts of man-made structures on marine ecosystems. The combination of categories, object names, status, water levels, additional information, minimum visible scale and reliable source information offers a comprehensive understanding of shoreline constructions across different regions.

    Remember to refer back to the dataset documentation for any specific deta...

  10. C

    Weekly Pennsylvania COVID-19 Vaccinations Stats Archive

    • data.wprdc.org
    • s.cnmilf.com
    • +1more
    csv
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Western Pennsylvania Regional Data Center (2025). Weekly Pennsylvania COVID-19 Vaccinations Stats Archive [Dataset]. https://data.wprdc.org/dataset/weekly-pennsylvania-covid-19-vaccinations-stats-archive
    Explore at:
    csv, csv(338478)Available download formats
    Dataset updated
    Jun 30, 2025
    Dataset provided by
    Western Pennsylvania Regional Data Center
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Area covered
    Pennsylvania
    Description

    Weekly archive of some State of Pennsylvania datasets found in this list: https://data.pa.gov/browse?q=vaccinations

    For most of these datasets, the "date_saved" field is the date that the WPRDC pulled the data from the state data portal and the archive combines all the saved records into one table. The exception to this is the "COVID-19 Vaccinations by Day by County of Residence Current Health (archive)" which is already published by the state as an entire history.

    The "date_updated" field is based on the date that the "updatedAt" field from the corresponding data.pa.gov dataset. Changes to this field have turned out to not be a good indicator of whether records have updated, which is why we are archiving this data on a weekly basis without regard to the "updatedAt" value. The "date_saved" field is the one you should sort on to see the variation in vaccinations over time.

    Most of the source tables have gone through schema changes or expansions. In some cases, we've kept the old archives under a separate resource with something like "[Orphaned Schema]" added to the resource name. In other cases, we've adjusted our schema to accommodate new column names, but there will be a date range during which the new columns have null values because we did not start pulling them until we became aware of them.

    Support for Health Equity datasets and tools provided by Amazon Web Services (AWS) through their Health Equity Initiative.

  11. VHA hospitals Timely Care Data

    • kaggle.com
    Updated Jan 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). VHA hospitals Timely Care Data [Dataset]. https://www.kaggle.com/datasets/thedevastator/vha-hospitals-timely-care-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 28, 2023
    Dataset provided by
    Kaggle
    Authors
    The Devastator
    Description

    VHA hospitals Timely Care Data

    Performance on Clinical Measures and Processes of Care

    By US Open Data Portal, data.gov [source]

    About this dataset

    This dataset provides an inside look at the performance of the Veterans Health Administration (VHA) hospitals on timely and effective care measures. It contains detailed information such as hospital names, addresses, census-designated cities and locations, states, ZIP codes county names, phone numbers and associated conditions. Additionally, each entry includes a score, sample size and any notes or footnotes to give further context. This data is collected through either Quality Improvement Organizations for external peer review programs as well as direct electronic medical records. By understanding these performance scores of VHA hospitals on timely care measures we can gain valuable insights into how VA healthcare services are delivering values throughout the country!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset contains information about the performance of Veterans Health Administration hospitals on timely and effective care measures. In this dataset, you can find the hospital name, address, city, state, ZIP code, county name, phone number associated with each hospital as well as data related to the timely and effective care measure such as conditions being measured and their associated scores.

    To use this dataset effectively, we recommend first focusing on identifying an area of interest for analysis. For example: what condition is most impacting wait times for patients? Once that has been identified you can narrow down which fields would best fit your needs - for example if you are studying wait times then “Score” may be more valuable to filter than Footnote. Additionally consider using aggregation functions over certain fields (like average score over time) in order to get a better understanding of overall performance by factor--for instance Location.

    Ultimately this dataset provides a snapshot into how Veteran's Health Administration hospitals are performing on timely and effective care measures so any research should focus around that aspect of healthcare delivery

    Research Ideas

    • Analyzing and predicting hospital performance on a regional level to improve the quality of healthcare for veterans across the country.
    • Using this dataset to identify trends and develop strategies for hospitals that consistently score low on timely and effective care measures, with the goal of improving patient outcomes.
    • Comparison analysis between different VHA hospitals to discover patterns and best practices in providing effective care so they can be shared with other hospitals in the system

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.

    Columns

    File: csv-1.csv | Column name | Description | |:-----------------------|:-------------------------------------------------------------| | Hospital Name | Name of the VHA hospital. (String) | | Address | Street address of the VHA hospital. (String) | | City | City where the VHA hospital is located. (String) | | State | State where the VHA hospital is located. (String) | | ZIP Code | ZIP code of the VHA hospital. (Integer) | | County Name | County where the VHA hospital is located. (String) | | Phone Number | Phone number of the VHA hospital. (String) | | Condition | Condition being measured. (String) | | Measure Name | Measure used to measure the condition. (String) | | Score | Score achieved by the VHA h...

  12. f

    Data from "Obstacles to the Reuse of Study Metadata in ClinicalTrials.gov"

    • figshare.com
    zip
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laura Miron; Rafael Gonçalves; Mark A. Musen (2023). Data from "Obstacles to the Reuse of Study Metadata in ClinicalTrials.gov" [Dataset]. http://doi.org/10.6084/m9.figshare.12743939.v2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    figshare
    Authors
    Laura Miron; Rafael Gonçalves; Mark A. Musen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This fileset provides supporting data and corpora for the empirical study described in: Laura Miron, Rafael S. Goncalves and Mark A. Musen. Obstacles to the Reuse of Metadata in ClinicalTrials.govDescription of filesOriginal data files:- AllPublicXml.zip contains the set of all public XML records in ClinicalTrials.gov (protocols and summary results information), on which all remaining analyses are based. Set contains 302,091 records downloaded on April 3, 2019.- public.xsd is the XML schema downloaded from ClinicalTrials.gov on April 3, 2019, used to validate records in AllPublicXML.BioPortal API Query Results- condition_matches.csv contains the results of querying the BioPortal API for all ontology terms that are an 'exact match' to each condition string scraped from the ClinicalTrials.gov XML. Columns={filename, condition, url, bioportal term, cuis, tuis}. - intervention_matches.csv contains BioPortal API query results for all interventions scraped from the ClinicalTrials.gov XML. Columns={filename, intervention, url, bioportal term, cuis, tuis}.Data Element Definitions- supplementary_table_1.xlsx Mapping of element names, element types, and whether elements are required in ClinicalTrials.gov data dictionaries, the ClinicalTrials.gov XML schema declaration for records (public.XSD), the Protocol Registration System (PRS), FDAAA801, and the WHO required data elements for clinical trial registrations.Column and value definitions: - CT.gov Data Dictionary Section: Section heading for a group of data elements in the ClinicalTrials.gov data dictionary (https://prsinfo.clinicaltrials.gov/definitions.html) - CT.gov Data Dictionary Element Name: Name of an element/field according to the ClinicalTrials.gov data dictionaries (https://prsinfo.clinicaltrials.gov/definitions.html) and (https://prsinfo.clinicaltrials.gov/expanded_access_definitions.html) - CT.gov Data Dictionary Element Type: "Data" if the element is a field for which the user provides a value, "Group Heading" if the element is a group heading for several sub-fields, but is not in itself associated with a user-provided value. - Required for CT.gov for Interventional Records: "Required" if the element is required for interventional records according to the data dictionary, "CR" if the element is conditionally required, "Jan 2017" if the element is required for studies starting on or after January 18, 2017, the effective date of the FDAAA801 Final Rule, "-" indicates if this element is not applicable to interventional records (only observational or expanded access) - Required for CT.gov for Observational Records: "Required" if the element is required for interventional records according to the data dictionary, "CR" if the element is conditionally required, "Jan 2017" if the element is required for studies starting on or after January 18, 2017, the effective date of the FDAAA801 Final Rule, "-" indicates if this element is not applicable to observational records (only interventional or expanded access) - Required in CT.gov for Expanded Access Records?: "Required" if the element is required for interventional records according to the data dictionary, "CR" if the element is conditionally required, "Jan 2017" if the element is required for studies starting on or after January 18, 2017, the effective date of the FDAAA801 Final Rule, "-" indicates if this element is not applicable to expanded access records (only interventional or observational) - CT.gov XSD Element Definition: abbreviated xpath to the corresponding element in the ClinicalTrials.gov XSD (public.XSD). The full xpath includes 'clinical_study/' as a prefix to every element. (There is a single top-level element called "clinical_study" for all other elements.) - Required in XSD? : "Yes" if the element is required according to public.XSD, "No" if the element is optional, "-" if the element is not made public or included in the XSD - Type in XSD: "text" if the XSD type was "xs:string" or "textblock", name of enum given if type was enum, "integer" if type was "xs:integer" or "xs:integer" extended with the "type" attribute, "struct" if the type was a struct defined in the XSD - PRS Element Name: Name of the corresponding entry field in the PRS system - PRS Entry Type: Entry type in the PRS system. This column contains some free text explanations/observations - FDAAA801 Final Rule FIeld Name: Name of the corresponding required field in the FDAAA801 Final Rule (https://www.federalregister.gov/documents/2016/09/21/2016-22129/clinical-trials-registration-and-results-information-submission). This column contains many empty values where elements in ClinicalTrials.gov do not correspond to a field required by the FDA - WHO Field Name: Name of the corresponding field required by the WHO Trial Registration Data Set (v 1.3.1) (https://prsinfo.clinicaltrials.gov/trainTrainer/WHO-ICMJE-ClinTrialsgov-Cross-Ref.pdf)Analytical Results:- EC_human_review.csv contains the results of a manual review of random sample eligibility criteria from 400 CT.gov records. Table gives filename, criteria, and whether manual review determined the criteria to contain criteria for "multiple subgroups" of participants.- completeness.xlsx contains counts and percentages of interventional records missing fields required by FDAAA801 and its Final Rule.- industry_completeness.xlsx contains percentages of interventional records missing required fields, broken up by agency class of trial's lead sponsor ("NIH", "US Fed", "Industry", or "Other"), and before and after the effective date of the Final Rule- location_completeness.xlsx contains percentages of interventional records missing required fields, broken up by whether record listed at least one location in the United States and records with only international location (excluding trials with no listed location), and before and after the effective date of the Final RuleIntermediate Results:- cache.zip contains pickle and csv files of pandas dataframes with values scraped from the XML records in AllPublicXML. Downloading these files greatly speeds up running analysis steps from jupyter notebooks in our github repository.

  13. a

    Centerline

    • data-cosm.hub.arcgis.com
    • data.nola.gov
    • +3more
    Updated Oct 22, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of San Marcos (2020). Centerline [Dataset]. https://data-cosm.hub.arcgis.com/datasets/centerline
    Explore at:
    Dataset updated
    Oct 22, 2020
    Dataset authored and provided by
    City of San Marcos
    Area covered
    Description

    Road segments representing centerlines of all roadways or carriageways in a local government. Typically, this information is compiled from orthoimagery or other aerial photography sources. This representation of the road centerlines support address geocoding and mapping. It also serves as a source for public works and other agencies that are responsible for the active management of the road network. (From ESRI Local Government Model "RoadCenterline" Feature)**This dataset was significantly revised in August of 2014 to correct for street segments that were not properly split at intersections. There may be issues with using data based off of the original centerline file. ** The column Speed Limit was updated in November 2014 by the Transportation Intern and is believed to be accurate** The column One Way was updated in November of 2014 by core GIS and is believed to be accurate.[MAXIMOID] A unique id field used in a work order management software called Maximo by IBM. Maximo uses GIS CL data to assign locations to work orders using this field. This field is maintained by the Transportation GIS specialists and is auto incremented when new streets are digitized. For example, if the latest digitized street segment MAXIMOID = 999, the next digitized line will receive MAXIMOID = 1000, and so on. STREET NAMING IS BROKEN INTO THREE FIELDS FOR GEOCODING:PREFIX This field is attributed if a street name has a prefix such as W, N, E, or S.NAME Domain with all street names. The name of the street without prefix or suffix.ROAD_TYPE (Text,4) Describes the type of road aka suffix, if applicable. CAPCOG Addressing Guidelines Sec 504 U. states, “Every road shall have corresponding standard street suffix…” standard street suffix abbreviations comply with USPS Pub 28 Appendix C Street Abbreviations. Examples include, but are not limited to, Rd, Dr, St, Trl, Ln, Gln, Lp, CT. LEFT_LOW The minimum numeric address on the left side of the CL segment. Left side of CL is defined as the left side of the line segment in the From-To direction. For example, if a line has addresses starting at 101 and ending at 201 on its left side, this column will be attributed 101.LEFT_HIGH The largest numeric address on the left side of the CL segment. Left side of CL is defined as the left side of the line segment in the From-To direction. For example, if a line has addresses starting at 101 and ending at 201 on its left side, this column will be attributed 201.LOW The minimum numeric address on the RIGHT side of the CL segment. Right side of CL is defined as the right side of the line segment in the From-To direction. For example, if a line has addresses starting at 100 and ending at 200 on its right side, this column will be attributed 100.HIGHThe maximum numeric address on the RIGHT side of the CL segment. Right side of CL is defined as the right side of the line segment in the From-To direction. For example, if a line has addresses starting at 100 and ending at 200 on its right side, this column will be attributed 200.ALIAS Alternative names for roads if known. This field is useful for geocode re-matching. CLASSThe functional classification of the centerline. For example, Minor (Minor Arterial), Major (Major Arterial). THIS FIELD IS NOT CONSISTENTLY FILLED OUT, NEEDS AN AUDIT. FULLSTREET The full name of the street concatenating the [PREFIX], [NAME], and [SUFFIX] fields. For example, "W San Antonio St."ROWWIDTH Width of right-of-way along the CL segment. Data entry from Plat by Planning GIS Or from Engineering PICPs/ CIPs.NUMLANES Number of striped vehicular driving lanes, including turn lanes if present along majority of segment. Does not inlcude bicycle lanes. LANEMILES Describes the total length of lanes for that segment in miles. It is manually field calculated as follows (( [ShapeLength] / 5280) * [NUMLANES]) and maintained by Transportation GIS.SPEEDLIMIT Speed limit of CL segment if known. If not, assume 30 mph for local and minor arterial streets. If speed limit changes are enacted by city council they will be recorded in the Traffic Register dataset, and this field will be updating accordingly. Initial data entry made by CIP/Planning GIS and maintained by Transportation GIS.[YRBUILT] replaced by [DateBuilt] See below. Will be deleted. 4/21/2017LASTYRRECON (Text,10) Is the last four-digit year a major reconstruction occurred. Most streets have not been reconstructed since orignal construction, and will have values. The Transportation GIS Specialist will update this field. OWNER Describes the governing body or private entity that owns/maintains the CL. It is possible that some streets are owned by other entities but maintained by CoSM. Possible attributes include, CoSM, Hays Owned/City Maintained, TxDOT Owned/City Maintained, TxDOT, one of four counties (Hays, Caldwell, Guadalupe, and Comal), TxState, and Private.ST_FROM Centerline segments are split at their intersections with other CL segments. This field names the nearest cross-street in the From- direction. Should be edited when new CL segments that cause splits are added. ST_TO Centerline segments are split at their intersections with other CL segments. This field names the nearest cross-street in the To- direction. Should be edited when new CL segments that cause splits are added. PAV_WID Pavement width of street in feet from back-of-curb to back-of-curb. This data is entered from as-built by CIP GIS. In January 2017 Transportation Dept. field staff surveyed all streets and measured width from face-of-curb to face-of-curb where curb was present, and edge of pavement to edge of pavement where it was not. This data was used to field calculate pavement width where we had values. A value of 1 foot was added to the field calculation if curb and gutter or stand up curb were present (the face-of-curb to back-of-curb is 6 in, multiple that by 2 to find 1 foot). If no curb was present, the value enter in by the field staff was directly copied over. If values were already present, and entered from asbuilt, they were left alone. ONEWAY Field describes direction of travel along CL in relation to digitized direction. If a street allows bi-directional travel it is attributed "B", a street that is one-way in the From_To direction is attributed "F", a street that is one-way in the To_From direction is attributed "T", and a street that does not allow travel in any direction is attibuted "N". ROADLEVEL Field will be aliased to [MINUTES] and be used to calculate travel time along CL segments in minutes using shape length and [SPEEDLIMIT]. Field calculate using the following expression: [MINUTES] = ( ([SHAPE_LENGTH] / 5280) / ( [SPEEDLIMIT] / 60 ))ROWSTATUS Values include "Open" or "Closed". Describes whether a right-of-way is open or closed. If a street is constructed within ROW it is "Open". If a street has not yet been constructed, and there is ROW, it is "Cosed". UPDATE: This feature class only has CL geometries for "Open" rights-of-way. This field should be deleted or re-purposed. ASBUILT field used to hyper link as-built documents detailing construction of the CL. Field was added in Dec. 2016. DateBuilt Date field used to record month and year a road was constructed from Asbuilt. Data was collected previously without month information. Data without a known month is entered as "1/1/YYYY". When month and year are known enter as "M/1/YYYY". Month and Year from asbuilt. Added by Engineering/CIP. ACCEPTED Date field used to record the month, day, and year that a roadway was officially accepted by the City of San Marcos. Engineering signs off on acceptance letters and stores these documents. This field was added in May of 2018. Due to a lack of data, the date built field was copied into this field for older roadways. Going forward, all new roadways will have this date. . This field will typically be populated well after a road has been drawn into GIS. Entered by Engineering/CIP. ****In an effort to make summarizing the data more efficient in Operations Dashboard, a generic date of "1/1/1900" was assigned to all COSM owned or maintained roads that had NULL values. These were roads that either have not been accepted yet, or roads that were expcepted a long time ago and their accepted date is not known. WARRANTY_EXP Date field used to record the expiration date of a newly accepted roadway. Typically this is one year from acceptance date, but can be greater. This field was added in May of 2018, so only roadways that have been excepted since and older roadways with valid warranty dates within this time frame have been populated.

  14. Z

    Structures of FDA-approved drugs and their active metabolites and data sets...

    • data.niaid.nih.gov
    Updated Jul 10, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Douguet, Dominique (2024). Structures of FDA-approved drugs and their active metabolites and data sets of experimental PD and PK properties [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4432351
    Explore at:
    Dataset updated
    Jul 10, 2024
    Dataset authored and provided by
    Douguet, Dominique
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data sets are extracted from the 2023 release of the e-Drug3D Database (2083 FDA-approved drug structures)

    e-Drug3D_2083.zip (contains e-Drug3D_2083.sdf) - Chemical Structures - The e-Drug3D collection in SDF format file - one 3D conformer; ionization of carboxylic acid, phosphate, phosphonate, phosphonoamide, amidinium and guanidinium groups. The datablock contains the ID, name (INN), CAS number and Status.

    e-Drug3D_2083_PK.csv - Pharmacokinetics - Column/field value is separated by a semicolon. It contains the e-Drug3D ID, INN (drug name), CAS number, year of approval, Status, is_or_has a metabolite, routes of administration, Volume of distribution (VD), Clearance (Cl), Plasma Protein Binding (PPB), Half-life (t1/2), Bioavailability (F), Cmax/Tmax, comment on solubility.

    e-Drug3D_2083_PD.csv - Pharmacodynamics - Column/field value is separated by a semicolon. It contains the e-Drug3D ID, INN (drug name), CAS number, year of approval, Status, Primary target, ATC code(s), PDB codes and main list of drug targets.

    e-Drug3D_2083_RD.csv - FDA Registration Data - Column/field value is separated by a semicolon. It contains the ID, name (INN), CAS number, First year of approval, Status, KNApSAcK or NPAtlas Id if natural product, all associated NDA numbers [FDA approval number, name of the label file in PDF format, company name, year of approval and commercial name of the drug] and the Indication/Therapeutic class information.

    labels.tar.gz - The drug label files in PDF format (compressed directory). A label file is named with the NDA number. The NDA number is the approval number assigned by the FDA. A drug may possess several NDA numbers (see the above e-Drug3D-RD data set).

  15. n

    Input data for short-term water level forecasting at 3 stations near HWY 37,...

    • data.niaid.nih.gov
    • search.dataone.org
    • +1more
    zip
    Updated Oct 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sophie Munger; John Largier (2022). Input data for short-term water level forecasting at 3 stations near HWY 37, Sonoma/Marin County, California [Dataset]. http://doi.org/10.25338/B8WS8H
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 20, 2022
    Dataset provided by
    University of California, Davis
    Authors
    Sophie Munger; John Largier
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Area covered
    California 37, Marin County, Sonoma County, California
    Description

    Low-lying coastal highways are susceptible to flooding as the sea level rises. Flooding events already impact some highways, like Highway 37 which runs across the lowlands at the northern end of San Francisco Bay and is crossed by several creeks/rivers. Short-term operational forecasts are required to enable planning for traffic disruption, evacuation, and protection of property and infrastructure. Traditional physically based numerical models have great predictive capability but require extensive datasets and are computationally expensive which limits their ability to do short-term forecasting. Here we develop a data-driven, site-specific method that can be implemented at multiple vulnerable sites throughout San Francisco Bay and other low-lying coastal areas across the State of California. This method is based on direct observations of the water level at the site and is independent of large computer simulations. For this study, we use a relatively simple statistical model (multiple-linear regression) combined with a forecast error correction inspired by an autoregressive moving average method (ARMA) commonly used in time-series forecasting. The model is then used to produce a 4-day water level forecast at 3 stations near HWY 37, Sonoma/Marin County, California. Methods The input files for the model are grouped into three different datasets: a training dataset, a water level observations dataset, and a weather forecast dataset. All data within those files are sourced from public data servers.
    Training Dataset Description: This dataset contains the time series of the four parameters that are used to train the model. It consists of hourly observed meteorological data such as wind, atmospheric pressure, and flow for the period of 2019-01-01 to 2022-09-27. The dataset consists of 4 fields: Ocean Wind, Local Wind, Atmospheric Pressure and River flow. The raw data was collected from publicly available sources. The data was downloaded and resampled to hourly time intervals. Small data gaps were filled by linear interpolation. The wind data was transformed from a polar coordinate system of wind speed and direction to principal component x-y vectors. The principal components were oriented so that the alongshore (y-component) is oriented at 60 degrees North for the wind at Gnoss Field and 100 degrees north for the wind at the NDBC buoy. The listed onshore wind is the shorenormal (x-component) for the 2 locations. Source:

    Column Name

    Location

    Data Type, Unit

    Agency Source

    Web link to raw data

    AtmPres

    Buoy 46026

    Atmospheric Pressure, mBar

    NOAA NDBC

    https://www.ndbc.noaa.gov/station_page.php?station=46026

    Gnoss_onshorewind

    Gnoss Field Airport

    Shore-normal component of the wind, m/s

    Sonoma County

    https://sonoma.onerain.com/site/?site_id=155&site=b4e33d63-e909-4ecd-bb2b-1ee2c587bb00

    napa_flow_cfs

    Napa River

    River flow, cfs

    USGS NWIS

    https://waterdata.usgs.gov/ca/nwis/uv?site_no=11458000

    ocean_onshorewind

    Buoy 46026

    Shore-normal component of the wind, m/s

    NOAA NDBC

    https://www.ndbc.noaa.gov/station_page.php?station=46026

    Water Level Datasets This dataset consists of three individual files each with 3 fields. The stage_m field is the raw data collected from the water level gauge station, the predicted_m field is the predicted tide as calculated below and the residual_m field is the difference between the two. Description: The raw water level data were collected from 3 stage stations for the period of 2019-01-01 to 2022-09-27 when available. Field stage_m: The data was downloaded, detrended by removing the mean value, and resampled to hourly time intervals. Small data gaps were filled by linear interpolation. Field predicted_m: The predicted tide was calculated using a publicly available Python routine based on a well-documented Matlab routine called Utide (http://www.po.gso.uri.edu/~codiga/utide/utide.htm). Field residual: The residual is the stage-predicted time. It represents the variation of the water level due to non-tidal forcing. Source: The stage data was downloaded from the following sources:

    File Name

    Location

    Data Type, Unit

    Agency Source

    Web link to raw data

    novato_wl_1hr_up.csv

    Mouth of Novato Creek

    Stage, m

    Marin Co

    https://marin.onerain.com/site/?site_id=16808&site=a88e57c5-06b1-4855-a65c-92ef0063e6bb

    rowland_wl_1hr.csv

    Novato Creek at Rowland Bridge

    Stage, m

    Marin Co

    https://marin.onerain.com/site/?site_id=16809&site=82b05ca8-3c86-49cc-9660-63ca3abd3e35

    petaluma_wl_1hr.csv

    Petaluma River at Horse Ranch

    Stage, m

    UC Davis, BML

    https://coastalocean.ucdavis.edu/ocean-observing/hwy37

    Weather Forecast Datasets This dataset is the weather forecast for the 4 parameters used by the model. Description: This dataset contains forecasted meteorological data as obtained from NOAA data servers. The atmospheric pressure forecast was obtained from openweathermap, an open-source weather forecast app. Source:

    Column Name

    Location

    Data Type, Unit

    Agency Source

    Web link to raw data

    AtmPres

    Buoy 46026

    Atmospheric Pressure, mBar

    -

    https://openweathermap.org/

    Gnoss_onshorewind

    Gnoss Field Airport

    Shore-normal component of the wind, m/s

    NOAA NWS

    https://www.weather.gov/documentation/services-web-api

    napa_flow_cfs

    Napa River

    River flow, cfs

    NOAA AHPS

    https://water.weather.gov/ahps2/hydrograph.php?gage=apcc1&wfo=mtr

    ocean_onshorewind

    Buoy 46026

    Shore-normal component of the wind, m/s

    NOAA NWS

    https://www.weather.gov/documentation/services-web-api

  16. h

    ru-image-generation

    • huggingface.co
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kenneth Hamilton (2025). ru-image-generation [Dataset]. https://huggingface.co/datasets/ZennyKenny/ru-image-generation
    Explore at:
    Dataset updated
    Apr 1, 2025
    Authors
    Kenneth Hamilton
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    🧠 Image Generation Benchmark Dataset

    This dataset simulates a large-scale benchmark for analyzing the performance of a text-to-image generation system. It contains 100,000 entries with user prompt data, generated image metadata, and multi-criteria quality ratings.

      📁 Dataset Structure
    

    Each row in the dataset corresponds to a single image generation request and includes the following fields:

    Column Name Description

    request_id Unique identifier for the request… See the full description on the dataset page: https://huggingface.co/datasets/ZennyKenny/ru-image-generation.

  17. f

    Wave runup FieldData

    • figshare.com
    application/x-rar
    Updated Nov 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Giovanni Coco; Paula Gomes (2022). Wave runup FieldData [Dataset]. http://doi.org/10.17608/k6.auckland.7732967.v3
    Explore at:
    application/x-rarAvailable download formats
    Dataset updated
    Nov 8, 2022
    Dataset provided by
    The University of Auckland
    Authors
    Giovanni Coco; Paula Gomes
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    INFORMATION ABOUT THE CONTENT OF THIS DATABASE

    This database comprises wave, beach and runup parameters measured on different beaches around the world. It is a compilation of data published in previous works, with the aim of making all data available in one single repository.

    More information about methods of data acquisition and data processing can be found in the original papers that describe each experiment. To know how to cite each of the dataset provided here, please check section 3. Please make sure to cite the appropriate publication when using the data. Collecting the data is hard work and needs to be acknowledged.1. Files content:

    All data files contain the same structure: Column 1 – R2%: 2-percent exceedance value for runup [m]; Column 2 – Set: setup [m]; Column 3 – Stt: total swash excursion [m]; Column 4 – Sinc: incident swash [m]; Column 5 – Sig: infragravity swash [m]; Column 6 – Hs*: significant deep-water wave height [m]; Column 7 – Tp: peak wave period [s]; Column 8 – tanβ: foreshore beach slope; Column 9 – D50**: Median sediment size [mm] NaN values may be found when the data were not available in the original dataset. *Hs values from field measurements were deshoaled from the depth of measurement to a depth equals to 80m, assuming normal approach and linear theory (we followed the approach presented in Stockdon et al., where great care is paid to make the data comparable). **D50 values were obtained from reports and papers describing the beaches.2. List of datasets Stockdon et al. 2006: Data recompiled from 10 experiments carried out in 6 beaches (US and NL coasts). Files’ names correspond to the beach and year of the experiments: Original data: available using the link https://pubs.usgs.gov/ds/602/

    Senechal et al. 2011: This dataset comprises the measurements carried out in Truc Vert beach, France. The file’s name includes the name of the beach and the year of the experiment. Original data: a table with the full content of the parameters measured during the experiment can be found in Senechal et al. (2011).

    Guedes et al. 2011: This dataset comprehends data measured at Tairua beach (New Zeland coast). The file’s name indicates the name of the beach and the year of the experiment. Original data: this web.

    Guedes et al. 2013: This dataset comprehends data measured at Ngarunui beach (Raglan - New Zeland coast). The file’s name represents the name of the beach and the year of the experiment. Original data: this web.

    Gomes da Silva et al. 2018: Dataset measured during two field campaigns in Somo beach, Spain, in 2016 and 2017. The files names represent that name of the beach and the year of the experiment. Original data: https://data.mendeley.com/datasets/6yh2b327gd/4 Power et al. 2019: Dataset compiled from previous works, comprising field and laboratory measurements: Poate et al. (2016): field; Nicolae-Lerma et al. (2016): field; Atkinson et al. (2017): field; Mase (1989): Laboratory; Baldock and Huntley (2002): Laboratory; Howe (2016): Laboratory; Original data:www.sciencedirect.com/science/article/pii/S0378383918302552Because of the limit in characters of this description. The table of available parameters in each dataset and citations could not be shown in this description. Please refer to the read me file for more information.

  18. Data from: Phase field modelling combined with data-driven approach to...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Oct 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shuibao Liang; Cheng Wei; Anil Kunwar; Anil Kunwar; Upadesh Subedi; Upadesh Subedi; Han Jiang; Haoran Ma; Changbo Ke; Shuibao Liang; Cheng Wei; Han Jiang; Haoran Ma; Changbo Ke (2023). Phase field modelling combined with data-driven approach to unravel the orientation influenced growth of interfacial Cu6Sn5 intermetallics under electric current stressing [Dataset]. http://doi.org/10.5281/zenodo.8378016
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 1, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Shuibao Liang; Cheng Wei; Anil Kunwar; Anil Kunwar; Upadesh Subedi; Upadesh Subedi; Han Jiang; Haoran Ma; Changbo Ke; Shuibao Liang; Cheng Wei; Han Jiang; Haoran Ma; Changbo Ke
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Description:

    The datasets are constituted by two folders, namely, (A) data_features_and_metric.zip and (B) grain_area_prediction.zip.

    (A) data_features_and_metric.zip:

    The following are the contents of this folder

    (i) grainTheta.csv file : The "grainTheta.csv" file consists the datasets generated from multiple phase field simulations. Name of the columns in the csv file are:

    gnid = grain id number "n", ntheta = orientation angle of nth grain (o); nltheta = orientation angle of grain to the left of nth grain (o); nrtheta = orientation angle of grain to the right of nth grain (o); j = current density (A/m2); t = time (s); area = area of nth grain (m2); tl = horizontal length of the top edge of grain "n" (m) ; bl = horizontal length of the bottom edge of grain "n" (m)

    The features gnid, ntheta, nltheta and nrtheta for a given observation are determined during the design of initial conditions of the corresponding phase field simulation. The value of "j" for the observation is determined via the boundary condition in the same numerical simulation. The result from the finite element method based phase field simulation has provided the numerical quantities for t, area, tl and bl attributes. The multiple observations in the data file have been obtained from multiple phase field simulations.

    (ii) imc_theta.ipynb, imc_theta.py and imc_theta.html files: These files contain the code to build the Pearson's Correlation Coefficient (PCC) heatmap analysis of the data contained in grainTheta.csv file.

    (iii) comparison_mse.csv: This data file includes the information about mean square error for training data (tmse) and mean square error for validation data (vmse) at Epoch = 199 resulting from 10 different artificial neural network (ANN) models distinguished by 10 different values of learning rates (lr) . Thus, the name of the columns in this csv file are modelno, lr, tmse and vmse.

    (iv) mse_comparison.gnu: This file consists the codes required to output a png image from the data provided in comparison_mse.csv.

    (v) train_loss.csv and val_loss.csv: These files consist of the data of tmse and vmse at all points of Epochs for the ANN model with lr = 2.5E-4 . Thus, the first column in train_loss.csv file corresponds to tmse whereas the second column is Epochs number. Similarly, vmse and Epochs represent the two columns in val_loss.csv file.

    (vi) mse_lr2p5e-4.gnu : This file consists the codes required to output a png image from the data provided in train_loss.csv and val_loss.csv.

    (B) grain_area_prediction.zip:

    Inside this folder, there is a folder named "prediction_of_grain_area" consisting of the following files:

    initial_area.csv file: This file consists the value of the initial grain area of grain 4. It is a constant at all orientation angle.

    predicted_result_00_5e4.csv: This file consists of the prediction result of grain 4 area (at different orientation angles and t = 1250 s) for grain 3 and grain 5 at orientation angles of 0o and 0o respectively, and for applied current density of 5.0E+4 J/m2 .

    predicted_result_00_5e5.csv: This file consists of the prediction result of grain 4 area (at different orientation angles and t = 1250 s) for grain 3 and grain 5 at orientation angles of 0o and 0o respectively, and for applied current density of 5.0E+5 J/m2 .

    predicted_result_9090_5e4.csv: This file consists of the prediction result of grain 4 area (at different orientation angles and t = 1250 s) for grain 3 and grain 5 at orientation angles of 90o and 90o respectively, and for applied current density of 5.0E+4 J/m2 .

    predicted_result_9090_5e5.csv: This file consists of the prediction result of grain 4 area (at different orientation angles and t = 1250 s) for grain 3 and grain 5 at orientation angles of 90o and 90o respectively, and for applied current density of 5.0E+5 J/m2 .

    area_00_adj.gnu : This gnu file contains the code to produce the png image from the data contained in predicted_result_00_5e4.csv and predicted_result_00_5e5.csv . The information about the initial area of grain 4 is obtained from initial_area.csv file by the code.

    area_9090_adj.gnu : This gnu file contains the code to produce the png image from the data contained in predicted_result_9090_5e4.csv and predicted_result_9090_5e5.csv . The information about the initial area of grain 4 is obtained from initial_area.csv file by the code.

  19. d

    Data from: Monitoring standing herbaceous biomass and thresholds in semiarid...

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    Updated May 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Data from: Monitoring standing herbaceous biomass and thresholds in semiarid rangelands from harmonized Landsat 8 and Sentinel-2 imagery to support within-season adaptive management [Dataset]. https://catalog.data.gov/dataset/data-from-monitoring-standing-herbaceous-biomass-and-thresholds-in-semiarid-rangelands-fro-fc3d9
    Explore at:
    Dataset updated
    May 8, 2025
    Dataset provided by
    Agricultural Research Service
    Description

    Tabular data from the manuscript "Monitoring standing herbaceous biomass and thresholds in semiarid rangelands from harmonized Landsat 8 and Sentinel-2 imagery to support within-season adaptive management" published in the journal Remote Sensing of Environment. Data are plot-scale values of (1) ground-sampled herbaceous standing biomass estimated using visual obstruction (VO) methods, (2) ground sampled percent cover by vegetation type using the line-point intercept (LPI) method, (3) percent midgrass derived from hyperspectral aerial imagery (1 m) collected by the NEON AOP (see Gaffney et al. 2021 cited within the manuscript), and (4) satellite-derived indices and bands. Only seasonal data used to develop the standing biomass model is included. The bounding box coordinates of each plot are also included. Resources in this dataset:Resource Title: Tabular ground and satellite-derived data. File Name: Kearney_Biomass_from_HLS_data.csvResource Description: Seasonal plot-scale tabular ground and satellite-derived data along with four fields (minx, miny, etc.) for the bounding box of the plots (EPSG:32613 - UTM 13N, WGS 84). Data includes (1) ground-sampled biomass estimate using visual obstruction (VO) poles, (2) ground sampled vegetation cover estimated using the line-point intercept (LPI) method, (3) percent mid-grass estimated from a plant community map derived from hyperspectral aerial imagery (1 m) acquired by the NEON AOP, (4) satellite-derived indices and bands interpolated daily from the Harmonized Landsat-Sentinel (HLS) dataset (30 m). See Metadata_column_headers.csv for descriptions of the fields (columns) in this dataset.Resource Title: Metadata: Description of column headers for tabular dataset. File Name: Kearney_Biomass_from_HLS_data_metadata.csvResource Description: Descriptions of each field (column) in the tabular dataset.

  20. w

    AASG Wells Data for the EGS Test Site Planning and Analysis Task...

    • data.wu.ac.at
    Updated Mar 6, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HarvestMaster (2018). AASG Wells Data for the EGS Test Site Planning and Analysis Task aasg_geothermal_boreholes (2).zip [Dataset]. https://data.wu.ac.at/schema/geothermaldata_org/NGFjMGJmM2YtNDM3ZS00ODBlLTg5MWItMTg2ZWRmMDlmNWQy
    Explore at:
    Dataset updated
    Mar 6, 2018
    Dataset provided by
    HarvestMaster
    Area covered
    dcf6e738f4102f6c238890f4832b0ad4853b0700
    Description

    AASG Wells Data for the EGS Test Site Planning and Analysis Task Temperature measurement data obtained from boreholes for the Association of American State Geologists (AASG) geothermal data project. Typically bottomhole temperatures are recorded from log headers, and this information is provided through a borehole temperature observation service for each state. Service includes header records, well logs, temperature measurements, and other information for each borehole. Information presented in Geothermal Prospector was derived from data aggregated from the borehole temperature observations for all states. For each observation, the given well location was recorded and the best available well identifier (name), temperature and depth were chosen. The "Well Name Source," "Temp. Type" and "Depth Type" attributes indicate the field used from the original service. This data was then cleaned and converted to consistent units. The accuracy of the observation's location, name, temperature or depth was note assessed beyond that originally provided by the service.

    • AASG bottom hole temperature datasets were downloaded from repository.usgin.org between the dates of May 16th and May 24th, 2013.
    • Datasets were cleaned to remove null and non-real entries, and data converted into consistent units across all datasets
    • Methodology for selecting best temperature and depth attributes from column headers in AASG BHT Data sets:

    Temperature: CorrectedTemperature - best MeasuredTemperature - next best Depth: DepthOfMeasurement - best TrueVerticalDepth - next best DrillerTotalDepth - last option Well Name/Identifier: APINo - best WellName - next best ObservationURI - last option

    The column headers are as follows: gid = internal unique ID src_state = the state from which the well was downloaded (note: the low temperature wells in Idaho are coded as "ID_LowTemp", while all other wells are simply the two character state abbreviation) source_url = the url for the source WFS service or Excel file temp_c = "best" temperature in Celsius temp_type = indicates whether temp_c comes from the corrected or measured temperature header column in the source document depth_m = "best" depth in meters depth_type = indicates whether depth_m comes from the measured, true vertical, or driller total depth header column in the source document well_name = "best" well name or ID name_src = indicates whether well_name came from apino, wellname, or observationuri header column in the source document lat_wgs84 = latitude in wgs84 lon_wgs84 = longitude in wgs84 state = state in which the point is located county = county in which the point is located AASG Wells Data for the EGS Test Site Planning and Analysis Task Temperature measurement data obtained from boreholes for the Association of American State Geologists (AASG) geothermal data project. Typically bottomhole temperatures are recorded from log headers, and this information is provided through a borehole temperature observation service for each state. Service includes header records, well logs, temperature measurements, and other information for each borehole. Information presented in Geothermal Prospector was derived from data aggregated from the borehole temperature observations for all states. For each observation, the given well location was recorded and the best available well identified (name), temperature and depth were chosen. The “Well Name Source,” “Temp. Type” and “Depth Type” attributes indicate the field used from the original service. This data was then cleaned and converted to consistent units. The accuracy of the observation’s location, name, temperature or depth was note assessed beyond that originally provided by the service.

    • AASG bottom hole temperature datasets were downloaded from repository.usgin.org between the dates of May 16th and May 24th, 2013.
    • Datasets were cleaned to remove “null” and non-real entries, and data converted into consistent units across all datasets
    • Methodology for selecting ”best” temperature and depth attributes from column headers in AASG BHT Data sets:

    • Temperature: • CorrectedTemperature – best • MeasuredTemperature – next best • Depth: • DepthOfMeasurement – best • TrueVerticalDepth – next best • DrillerTotalDepth – last option • Well Name/Identifier • APINo – best • WellName – next best • ObservationURI - last option.

    The column headers are as follows:

    • gid = internal unique ID

    • src_state = the state from which the well was downloaded (note: the low temperature wells in Idaho are coded as “ID_LowTemp”, while all other wells are simply the two character state abbreviation)

    • source_url = the url for the source WFS service or Excel file

    • temp_c = “best” temperature in Celsius

    • temp_type = indicates whether temp_c comes from the corrected or measured temperature header column in the source document

    • depth_m = “best” depth in meters

    • depth_type = indicates whether depth_m comes from the measured, true vertical, or driller total depth header column in the source document

    • well_name = “best” well name or ID

    • name_src = indicates whether well_name came from apino, wellname, or observationuri header column in the source document

    • lat_wgs84 = latitude in wgs84

    • lon_wgs84 = longitude in wgs84

    • state = state in which the point is located

    • county = county in which the point is located

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Department of the Interior (2024). Column heading and attribute field name correlation and description for the Titanium_vanadium_deposits.csv, and Titanium_vanadium_deposits.shp files. [Dataset]. https://datasets.ai/datasets/column-heading-and-attribute-field-name-correlation-and-description-for-the-titanium-vanad

Column heading and attribute field name correlation and description for the Titanium_vanadium_deposits.csv, and Titanium_vanadium_deposits.shp files.

Explore at:
55Available download formats
Dataset updated
Aug 8, 2024
Dataset authored and provided by
Department of the Interior
Description

This Titanium_vanadium_column_headings.csv file correlates the column headings in the Titanium_vanadium_deposits.csv file with the attribute field names in the Titanium_vanadium_deposits.shp file and provides a brief description of each column heading and attribute field name. Also included with this data release are the following files: Titanium_vanadium_deposits.csv file, which lists the deposits and associated information such as the host intrusion, location, grade, and tonnage data, along with other miscellaneous descriptive data about the deposits; Titanium_vanadium_deposits.shp file, which duplicates the information in the Titanium_vanadium_deposits.csv file in a spatial format for use in a GIS; Titanium_vanadium_deposits_concentrate_grade.csv file, which lists the concentrate grade data for the deposits, when available; and Titanium_vanadium_deposits_references.csv file, which lists the abbreviated and full references that are cited in the Titanium_vanadium_deposits.csv, and Titanium_vanadium_deposits.shp, and Titanium_vanadium_deposits_concentrate_grade.csv files.

Search
Clear search
Close search
Google apps
Main menu