7 datasets found
  1. c

    Data from: Datasets used to train the Generative Adversarial Networks used...

    • opendata.cern.ch
    Updated 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ATLAS collaboration (2021). Datasets used to train the Generative Adversarial Networks used in ATLFast3 [Dataset]. http://doi.org/10.7483/OPENDATA.ATLAS.UXKX.TXBN
    Explore at:
    Dataset updated
    2021
    Dataset provided by
    CERN Open Data Portal
    Authors
    ATLAS collaboration
    Description

    Three datasets are available, each consisting of 15 csv files. Each file containing the voxelised shower information obtained from single particles produced at the front of the calorimeter in the |η| range (0.2-0.25) simulated in the ATLAS detector. Two datasets contain photons events with different statistics; the larger sample has about 10 times the number of events as the other. The other dataset contains pions. The pion dataset and the photon dataset with the lower statistics were used to train the corresponding two GANs presented in the AtlFast3 paper SIMU-2018-04.

    The information in each file is a table; the rows correspond to the events and the columns to the voxels. The voxelisation procedure is described in the AtlFast3 paper linked above and in the dedicated PUB note ATL-SOFT-PUB-2020-006. In summary, the detailed energy deposits produced by ATLAS were converted from x,y,z coordinates to local cylindrical coordinates defined around the particle 3-momentum at the entrance of the calorimeter. The energy deposits in each layer were then grouped in voxels and for each voxel the energy was stored in the csv file. For each particle, there are 15 files corresponding to the 15 energy points used to train the GAN. The name of the csv file defines both the particle and the energy of the sample used to create the file.

    The size of the voxels is described in the binning.xml file. Software tools to read the XML file and manipulate the spatial information of voxels are provided in the FastCaloGAN repository.

    Updated on February 10th 2022. A new dataset photons_samples_highStat.tgz was added to this record and the binning.xml file was updated accordingly.

    Updated on April 18th 2023. A new dataset pions_samples_highStat.tgz was added to this record.

  2. Code of Federal Regulations in XML

    • catalog.data.gov
    • s.cnmilf.com
    Updated Nov 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office of the Federal Register (2024). Code of Federal Regulations in XML [Dataset]. https://catalog.data.gov/dataset/code-of-federal-regulations-in-xml
    Explore at:
    Dataset updated
    Nov 7, 2024
    Dataset provided by
    Office of the Federal Register
    Description

    The Code of Federal Regulations (CFR) is the codification of the general and permanent rules published in the Federal Register by the executive departments and agencies of the Federal Government. It is divided into 50 titles that represent broad areas subject to Federal regulation. Each print volume of the CFR is updated once each calendar year, and is issued on a quarterly basis. Bulk data downloads of Code of Federal Regulations files in XML format are available from 1996 to the present, by year, title, and volume. The current XML data set is not yet an official format of the Code of Federal Regulations. Only the PDF and Text versions have legal status as parts of the official online format of the Code of Federal Regulations. The XML-structured files are derived from SGML-tagged data and printing codes, which may produce anomalies in display. In addition, the XML data does not yet include image files. Users who require a higher level of assurance may wish to consult the official version of the Code of Federal Regulations on Govinfo.gov. The FDsys data set includes digitally signed Code of Federal Regulations PDF files, which may be relied upon as evidence in a court of law. See: https://www.govinfo.gov/app/collection/cfr/

  3. Z

    bioRxiv 10k with assets

    • data.niaid.nih.gov
    Updated Oct 29, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ecer, Daniel (2021). bioRxiv 10k with assets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5592546
    Explore at:
    Dataset updated
    Oct 29, 2021
    Dataset authored and provided by
    Ecer, Daniel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is a superset of the bioRxiv 10k dataset. It additionally includes the assets (usually images) that are linked by the XML files.

    The assets were retrieved from the bioRxiv's tdm bucket. Assets that are not linked by the XML were omitted. Examples of those would be some large videos.

    This dataset serves a similar purpose as the bioRxiv 10k dataset, but for use-cases that require the assets. e.g. training and evaluation of figure image extraction.

    This dataset mirrors the exact same documents and structure as the bioRxiv 10k dataset. But rather than just containing the PDF and XML files, it also contains the linked assets (often images, but not necessarily).

    The dataset was created as part of eLife's ScienceBeam project.

  4. Data from: Knowledge graphs for seismic data and metadata

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Sep 19, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    William Davis; Cassandra Hunt (2023). Knowledge graphs for seismic data and metadata [Dataset]. http://doi.org/10.6078/D1P430
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 19, 2023
    Dataset provided by
    University of California, San Diego
    Relational AI
    Authors
    William Davis; Cassandra Hunt
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    The increasing scale and diversity of seismic data, and the growing role of big data in seismology, has raised interest in methods to make data exploration more accessible. This paper presents the use of knowledge graphs (KGs) for representing seismic data and metadata to improve data exploration and analysis, focusing on usability, flexibility, and extensibility. Using constraints derived from domain knowledge in seismology, we define semantic models of seismic station and event information used to construct the KGs. Our approach utilizes the capability of KGs to integrate data across many sources and diverse schema formats. We use schema-diverse, real-world seismic data to construct KGs with millions of nodes, and illustrate potential applications with three big-data examples. Our findings demonstrate the potential of KGs to enhance the efficiency and efficacy of seismological workflows in research and beyond, indicating a promising interdisciplinary future for this technology. Methods The data here consists of, and was collected from:

    Station metadata, in StationXML format, acquired from IRIS DMC using the fdsnws-station webservice (https://service.iris.edu/fdsnws/station/1/). Earthquake event data, in NDK format, acquired from the Global Centroid-Moment Tensor (GCMT) catalog webservice (https://www.globalcmt.org) [1,2]. Earthquake event data, in CSV format, acquired from the USGS earthquake catalog webservice (https://doi.org/10.5066/F7MS3QZH) [3].

    The format of the data is described in the README. In addition, a complete description of the StationXML, NDK, and USGS file formats can be found at https://www.fdsn.org/xml/station/, https://www.ldeo.columbia.edu/~gcmt/projects/CMT/catalog/allorder.ndk_explained, and https://earthquake.usgs.gov/data/comcat/#event-terms, respectively. Also provided are conversions from NDK and StationXML file formats into JSON format. References: [1] Dziewonski, A. M., Chou, T. A., & Woodhouse, J. H. (1981). Determination of earthquake source parameters from waveform data for studies of global and regional seismicity. Journal of Geophysical Research: Solid Earth, 86(B4), 2825-2852. [2] Ekström, G., Nettles, M., & Dziewoński, A. M. (2012). The global CMT project 2004–2010: Centroid-moment tensors for 13,017 earthquakes. Physics of the Earth and Planetary Interiors, 200, 1-9. [3] U.S. Geological Survey, Earthquake Hazards Program, 2017, Advanced National Seismic System (ANSS) Comprehensive Catalog of Earthquake Events and Products: Various, https://doi.org/10.5066/F7MS3QZH.

  5. a

    Land Use, Twin Cities Metropolitan Area, 1968

    • hub.arcgis.com
    Updated Aug 27, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    kerni016_cicgddp (2019). Land Use, Twin Cities Metropolitan Area, 1968 [Dataset]. https://hub.arcgis.com/datasets/5ea6d14533e84d22a45154ddfc597f89
    Explore at:
    Dataset updated
    Aug 27, 2019
    Dataset authored and provided by
    kerni016_cicgddp
    Area covered
    Description

    High-quality GIS land use maps for the Twin Cities Metropolitan Area for 1968 that were developed from paper maps (no GIS version existed previously).The GIS shapefiles were exported using ArcGIS Quick Import Tool from the Data Interoperability Toolbox. The coverage files was imported into a file geodatabase then exported to a .shp file for long-term use without proprietary software. An example output of the final GIS file is include as a pdf, in addition, a scan of the original 1968 map (held in the UMN Borchert Map Library) is included as a pdf. Metadata was extracted as an xml file. Finally, all associated coverage files and original map scans were zipped into one file for download and reuse. Data was uploaded to ArcGIS Online 3/9/2020. Original dataset available from the Data Repository of the University of Minnesota: http://dx.doi.org/10.13020/D63W22

  6. a

    L1958 poly

    • umn.hub.arcgis.com
    Updated Apr 28, 2003
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Minnesota (2003). L1958 poly [Dataset]. https://umn.hub.arcgis.com/maps/UMN::l1958-poly
    Explore at:
    Dataset updated
    Apr 28, 2003
    Dataset authored and provided by
    University of Minnesota
    Area covered
    Description

    The GIS shapefiles were exported using ArcGIS Quick Import Tool from the Data Interoperability Toolbox. The coverage files was imported into a file geodatabase then exported to a .shp file for long-term use without proprietary software. An example output of the final GIS file is include as a pdf, in addition, a scan of the original 1958 map (held in the UMN Borchert Map Library) is included as a pdf. Metadata was extracted as an xml file. Finally, all associated coverage files and original map scans were zipped into one file for download and reuse.Date completed4/28/2003Geographic coverageBounding box (W, S, E, N): -93.770810, 44.468717, -92.725647, 45.303848Persistent link to this itemhttps://dx.doi.org/10.13020/D6059Jhttps://hdl.handle.net/11299/160503ServicesFull Metadata (xml)View Usage StatisticsFunding Information:Sponsorship: MnDOT Report 2003-37Funding agency: Minnesota Department of TransportationFunding agency ID: Contract #: (c) 81655 (wo) 8Sponsorship grant: If They Come, Will You Build It? Urban Transportation Network Growth Models.Referenced byLevinson, David, and Wei Chen (2007) "Area Based Models of New Highway Route Growth." ASCE Journal of Urban Planning and Development 133(4) 250-254.https://doi.org/10.1061/(ASCE)0733-9488(2007)133:4(250)Levinson, David and Wei Chen (2005) "Paving New Ground" in Access to Destinations (ed. David Levinson and Kevin Krizek) Elsevier Publishers.

  7. v_90can85_snd: 9-second gridded continental Australia composite ecological...

    • data.csiro.au
    • researchdata.edu.au
    • +1more
    Updated Jun 16, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kristen Williams; Tom Harwood; Simon Ferrier; Suzanne Prober; Noboru Ota; Justin Perry (2015). v_90can85_snd: 9-second gridded continental Australia composite ecological change for Vascular Plants 1990:2050 CanESM2 RCP 8.5 (CMIP5) (GDM: VAS_v5_r11) [Dataset]. http://doi.org/10.4225/08/5489A3BAE3532
    Explore at:
    Dataset updated
    Jun 16, 2015
    Dataset provided by
    CSIROhttp://www.csiro.au/
    Authors
    Kristen Williams; Tom Harwood; Simon Ferrier; Suzanne Prober; Noboru Ota; Justin Perry
    License

    https://research.csiro.au/dap/licences/csiro-data-licence/https://research.csiro.au/dap/licences/csiro-data-licence/

    Time period covered
    Jan 1, 1975 - Jan 1, 2065
    Area covered
    Dataset funded by
    CSIROhttp://www.csiro.au/
    Description

    Composite ecological change as a function of three metrics (the potential degree of ecological change and of disappearing and novel ecological environments) shows where change might be greatest and different types of vulnerability using 30-year climate averages between the present (1990:1976- 2005) and projected future (2050:2036-2065) under the CanESM2 global climate model (RCP 8.5), based on a Generalised Dissimilarity Modelling (GDM) of compositional turnover for vascular plants (VAS_v5_r11).

    Wherever the Potential degree of ecological change is scored low, ecological environments can neither be novel nor disappearing and minimal change is expected. But when the Potential degree of ecological change is scored high, a variety of possible types of change can occur depending on whether scores for Novel and/or Disappearing ecological environments are also high.

    To create a composite view, we assigned each of the three component measures to a colour band in a composite-band raster: local similarity as shades of green (inverted, 1-0 rescaled 0-255); novel as shades of blue (0-1 rescaled 0-255); and disappearing as shades of red (0-1 rescaled 0-255). The three layers can then be mapped simultaneously (red: band 3; green: band 1; blue: band 2) each scaled 0-255 to show the varying degrees of similar, novel and disappearing ecological environments and their combinations.

    This metric was developed along with others for use in an assessment of the efficacy of the protected area system for biodiversity under climate change at continental and global scales, presented at the IUCN World Parks Congress 2014. It is described in the AdaptNRM Guide “Implications of Climate Change for Biodiversity: a community-level modelling approach”, available online at: www.adaptnrm.org.

    Data are provided as zipped ESRI tiff grids containing: raster image (.tif) with associated header (.tfw) and projection (*.xml) files. After extracting from the zip archive, these files can be imported into most GIS software packages. A readme file describes how to correctly reproduce the colour legend. In ArcGIS, the symbology statistics file can be used: "SND_display.stat.XML".

    Reproducing RGB composite colours for 3-band raster in ArcGIS: 1. In file properties in ARCGIS, Symbology tab, Load XML "SND_display.stat.XML" 2. RED = BAND_3 (Disappearing) 3. GREEN = BAND_1 (Similarity ) 4. BLUE = BAND_2 (Novel) 5. Always use min-max legend 6. Set each band in the custom range 0-255, mean = 126, std = 0

    Layers in this 9s series use a consistent naming convention: BIOLOGICAL GROUP _ FROM BASE TO SCENARIO _ ANALYSIS e.g. A_90CAN85_SND or R_90MIR85_SND where BIOLOGICAL GROUP is A: amphibians, M: mammals, R: reptiles and V: vascular plants and scenario is CAN: CanESM2; MIR: MIROC5 analysis, SND refers to – similarity, novel, disappearing

    Lineage: Ecological similarity ranges between 0 and 1: the closer to zero, the greater the potential for compositional change in biodiversity. Each of the three ecological similarity measures were rescaled between 0 and 255 as integers to match the RGB colour scale, but the Potential degree of ecological change measure was inverted first (1-0 rescaled 0-255).

    Using the Composite Bands tool in ArcGIS 10.2.2, a three-band raster was created with band1 = similarity, S; band 2 = novel, N; and band 3 = disappearing, D.

    In ArcGIS mapping symbology, each of the three component measures are then assigned to a colour band: RED channel = BAND_3 (Disappearing) GREEN channel = BAND_1 (Similarity) BLUE channel = BAND_2 (Novel)

    The gamma stretch legend scaling is not used and the min-max legend stretch is applied with statistics defined from the same custom settings for each band: minimum = 0; maximum= 255, mean = 126, std = 0.

    These settings correctly reproduce the colours.

    The composite ecological change index derives from the following three measures that are elsewhere described:

    1. S, similarity: 9-second gridded continental Australia potential degree of ecological change for Vascular Plants 1990:2050 CanESM2 RCP 8.5 (CMIP5) (GDM: VAS_v5_r11)
    2. N, novel: 9-second gridded continental Australia novel ecological environments for Vascular Plants 1990:2050 CanESM2 RCP 8.5 (CMIP5) (GDM: VAS_v5_r11)
    3. D, dissimilarity: 9-second gridded continental Australia disappearing ecological environments for Vascular Plants 1990:2050 CanESM2 RCP 8.5 (CMIP5) (GDM: VAS_v5_r11)

    More detail of the calculations and methods used to derive the individual measures are given in the document “9sMethodsSummary.pdf” provided with the data download.

    Each of these three measures use the GDM model that is elsewhere described: Generalised dissimilarity model of compositional turnover in vascular plant species for continental Australia at 9 second resolution using ANHAT data extracted 4 April 2013 (GDM: VAS_v5_r11)

    Climate data. Generalised dissimilarity models were built and projected using climate data that are elsewhere described: a) 9-second gridded climatology for continental Australia 1976-2005: Summary variables with elevation and radiative adjustment b) 9-second gridded climatology for continental Australia 2036-2065 CanESM2 RCP 8.5 (CMIP5): Summary variables with elevation and radiative adjustment

    A brief summary of the climate downscaling method is given in the document “9sMethodsSummary.pdf” provided with the data download.

    Further details about the CanESM2 global climate model: Chylek P, Li J, Dubey MK, Wang M and Lesins G (2011) ‘Observed and model simulated 20th century Arctic temperature variability: Canadian Earth System Model CanESM2’, ATMOSPHERIC CHEMISTRY and PHYSICS DISCUSSIONS 11, 22893—22907 doi:10.5194/acpd-11-22893-2011

  8. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
ATLAS collaboration (2021). Datasets used to train the Generative Adversarial Networks used in ATLFast3 [Dataset]. http://doi.org/10.7483/OPENDATA.ATLAS.UXKX.TXBN

Data from: Datasets used to train the Generative Adversarial Networks used in ATLFast3

Related Article
Explore at:
8 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
2021
Dataset provided by
CERN Open Data Portal
Authors
ATLAS collaboration
Description

Three datasets are available, each consisting of 15 csv files. Each file containing the voxelised shower information obtained from single particles produced at the front of the calorimeter in the |η| range (0.2-0.25) simulated in the ATLAS detector. Two datasets contain photons events with different statistics; the larger sample has about 10 times the number of events as the other. The other dataset contains pions. The pion dataset and the photon dataset with the lower statistics were used to train the corresponding two GANs presented in the AtlFast3 paper SIMU-2018-04.

The information in each file is a table; the rows correspond to the events and the columns to the voxels. The voxelisation procedure is described in the AtlFast3 paper linked above and in the dedicated PUB note ATL-SOFT-PUB-2020-006. In summary, the detailed energy deposits produced by ATLAS were converted from x,y,z coordinates to local cylindrical coordinates defined around the particle 3-momentum at the entrance of the calorimeter. The energy deposits in each layer were then grouped in voxels and for each voxel the energy was stored in the csv file. For each particle, there are 15 files corresponding to the 15 energy points used to train the GAN. The name of the csv file defines both the particle and the energy of the sample used to create the file.

The size of the voxels is described in the binning.xml file. Software tools to read the XML file and manipulate the spatial information of voxels are provided in the FastCaloGAN repository.

Updated on February 10th 2022. A new dataset photons_samples_highStat.tgz was added to this record and the binning.xml file was updated accordingly.

Updated on April 18th 2023. A new dataset pions_samples_highStat.tgz was added to this record.

Search
Clear search
Close search
Google apps
Main menu