29 datasets found
  1. SDSS Galaxy Subset

    • zenodo.org
    application/gzip
    Updated Sep 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nuno Ramos Carvalho; Nuno Ramos Carvalho (2022). SDSS Galaxy Subset [Dataset]. http://doi.org/10.5281/zenodo.6696565
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Sep 5, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Nuno Ramos Carvalho; Nuno Ramos Carvalho
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Sloan Digital Sky Survey (SDSS) is a comprehensive survey of the northern sky. This dataset contains a subset of this survey, of 60247 objects classified as galaxies, it includes a CSV file with a collection of information and a set of files for each object, namely JPG image files, FITS and spectra data. This dataset is used to train and explore the astromlp-models collection of deep learning models for galaxies characterisation.

    The dataset includes a CSV data file where each row is an object from the SDSS database, and with the following columns (note that some data may not be available for all objects):

    • objid: unique SDSS object identifier
    • mjd: MJD of observation
    • plate: plate identifier
    • tile: tile identifier
    • fiberid: fiber identifier
    • run: run number
    • rerun: rerun number
    • camcol: camera column
    • field: field number
    • ra: right ascension
    • dec: declination
    • class: spectroscopic class (only objetcs with GALAXY are included)
    • subclass: spectroscopic subclass
    • modelMag_u: better of DeV/Exp magnitude fit for band u
    • modelMag_g: better of DeV/Exp magnitude fit for band g
    • modelMag_r: better of DeV/Exp magnitude fit for band r
    • modelMag_i: better of DeV/Exp magnitude fit for band i
    • modelMag_z: better of DeV/Exp magnitude fit for band z
    • redshift: final redshift from SDSS data z
    • stellarmass: stellar mass extracted from the eBOSS Firefly catalog
    • w1mag: WISE W1 "standard" aperture magnitude
    • w2mag: WISE W2 "standard" aperture magnitude
    • w3mag: WISE W3 "standard" aperture magnitude
    • w4mag: WISE W4 "standard" aperture magnitude
    • gz2c_f: Galaxy Zoo 2 classification from Willett et al 2013
    • gz2c_s: simplified version of Galaxy Zoo 2 classification (labels set)

    Besides the CSV file a set of directories are included in the dataset, in each directory you'll find a list of files named after the objid column from the CSV file, with the corresponding data, the following directories tree is available:

    sdss-gs/
    ├── data.csv
    ├── fits
    ├── img
    ├── spectra
    └── ssel

    Where, each directory contains:

    • img: RGB images from the object in JPEG format, 150x150 pixels, generated using the SkyServer DR16 API
    • fits: FITS data subsets around the object across the u, g, r, i, z bands; cut is done using the ImageCutter library
    • spectra: full best fit spectra data from SDSS between 4000 and 9000 wavelengths
    • ssel: best fit spectra data from SDSS for specific selected intervals of wavelengths discussed by Sánchez Almeida 2010

    Changelog

    • v0.0.3 - Increase number of objects to ~80k.
    • v0.0.2 - Increase number of objects to ~60k.
    • v0.0.1 - Initial import.
  2. Source Code - Characterizing Variability and Uncertainty for Parameter...

    • catalog.data.gov
    • s.cnmilf.com
    Updated May 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2025). Source Code - Characterizing Variability and Uncertainty for Parameter Subset Selection in PBPK Models [Dataset]. https://catalog.data.gov/dataset/source-code-characterizing-variability-and-uncertainty-for-parameter-subset-selection-in-p
    Explore at:
    Dataset updated
    May 1, 2025
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    Source Code for the manuscript "Characterizing Variability and Uncertainty for Parameter Subset Selection in PBPK Models" -- This R code generates the results presented in this manuscript; the zip folder contains PBPK model files (for chloroform and DCM) and corresponding scripts to compile the models, generate human equivalent doses, and run sensitivity analysis.

  3. g

    Source Code - Characterizing Variability and Uncertainty for Parameter...

    • gimi9.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Source Code - Characterizing Variability and Uncertainty for Parameter Subset Selection in PBPK Models | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_source-code-characterizing-variability-and-uncertainty-for-parameter-subset-selection-in-p/
    Explore at:
    Description

    Source Code for the manuscript "Characterizing Variability and Uncertainty for Parameter Subset Selection in PBPK Models" -- This R code generates the results presented in this manuscript; the zip folder contains PBPK model files (for chloroform and DCM) and corresponding scripts to compile the models, generate human equivalent doses, and run sensitivity analysis.

  4. h

    2024-election-subreddit-threads-173k

    • huggingface.co
    Updated Nov 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Binghamton University (2024). 2024-election-subreddit-threads-173k [Dataset]. https://huggingface.co/datasets/BinghamtonUniversity/2024-election-subreddit-threads-173k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 15, 2024
    Dataset authored and provided by
    Binghamton University
    Description

    About

    This dataset contains threads from 23 political subreddits from July 2024 - November 2024 (about a week after the US election). Use this dataset as a baseline for subsets pertaining to Reddit's opinion on the 2024 election. We recommend using each thread's metadata as guidance. E.g.,

    r/politics subset controversial comments subset highly upvoted posts subset leftist/liberal threads subset

    etc.

      Subreddits
    

    These are the subreddits scraped. Each conversation's… See the full description on the dataset page: https://huggingface.co/datasets/BinghamtonUniversity/2024-election-subreddit-threads-173k.

  5. Data from: Defining Privileged Reagents Using Subsimilarity Comparison

    • figshare.com
    txt
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brett A. Tounge; Charles H. Reynolds (2023). Defining Privileged Reagents Using Subsimilarity Comparison [Dataset]. http://doi.org/10.1021/ci049854j.s001
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    ACS Publications
    Authors
    Brett A. Tounge; Charles H. Reynolds
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    We have developed a new method for assigning a drug-like score to reagents. This algorithm uses topological torsion (TT) 2D descriptors to compute the subsimilarity of any given reagent to a substructural element of any compound in the CMC. The utility of this approach is demonstrated by scoring a test set of reagents derived from the “Comprehensive Survey of Combinatorial Library Synthesis:  2000” (J. Comb. Chem.). R-groups were extracted from the most-active compounds found in each of the reviewed libraries, and the distribution of the subsimilarity scores for these monomers were compared to the ACD. This comparison showed a dramatic shift in the distribution of the JCC R-group subset toward higher subsimilarity scores in comparison to the entire ACD database. The ACD was also used to examine the relationship between molecular weight and various subsimilarity scoring algorithms. This analysis was used to derive a subsimilarity score that is less biased by molecular weight.

  6. Grib and ASCII data, subset ERA-I for shallow water waves Ocean Science...

    • zenodo.org
    bin, txt
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    J.-R. Bidlot; J.-R. Bidlot (2020). Grib and ASCII data, subset ERA-I for shallow water waves Ocean Science study [Dataset]. http://doi.org/10.5281/zenodo.831329
    Explore at:
    txt, binAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    J.-R. Bidlot; J.-R. Bidlot
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Specific output from ERA-I reanalysis (wave model component) containing interated parameters, see https://doi.org/10.5194/os-13-1-2017

  7. d

    MERRA-2 subset for evaluation of renewables with merra2ools R-package:...

    • datadryad.org
    zip
    Updated Mar 29, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oleg Lugovoy; Shuo Gao (2021). MERRA-2 subset for evaluation of renewables with merra2ools R-package: 1980-2020 hourly, 0.5° lat x 0.625° lon global grid [Dataset]. http://doi.org/10.5061/dryad.v41ns1rtt
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 29, 2021
    Dataset provided by
    Dryad
    Authors
    Oleg Lugovoy; Shuo Gao
    Time period covered
    2021
    Description

    The merra2ools dataset has been assembled through the following steps:

    The MERRA-2 collections tavg1_2d_flx_Nx (Surface Flux Diagnostics), tavg1_2d_rad_Nx (Radiation Diagnostics), and tavg1_2d_slv_Nx (Single-level atmospheric state variables) downloaded from NASA Goddard Earth Sciences (GES) Data and Information Services Center (DISC) (https://disc.gsfc.nasa.gov/datasets?project=MERRA-2) using GNU Wget network utility (https://disc.gsfc.nasa.gov/data-access). Every of the three collections consist of daily netCDF-4 files with 3-dimensional variables (lon x lat x hour). 
    The following variables obtained from the netCDF-4 files and merged into long-term time-series:
    
    
    
    Northward (V) and Eastward (U) wind at 10 and 50 meters (V10M, V50M, U10M, U50M, respectively), and 10-meter air temperature (T10M) from the tavg1_2d_slv_Nx collection;
    Incident shortwave land (SWGDN) and Surface albedo (ALBEDO) fro...
    
  8. f

    Data from: MonteCat: A Basin-Hopping-Inspired Catalyst Descriptor Search...

    • acs.figshare.com
    xlsx
    Updated Feb 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fernando Garcia-Escobar; Toshiaki Taniike; Keisuke Takahashi (2024). MonteCat: A Basin-Hopping-Inspired Catalyst Descriptor Search Algorithm for Machine Learning Models [Dataset]. http://doi.org/10.1021/acs.jcim.3c01952.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 22, 2024
    Dataset provided by
    ACS Publications
    Authors
    Fernando Garcia-Escobar; Toshiaki Taniike; Keisuke Takahashi
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Proposing relevant catalyst descriptors that can relate the information on a catalyst’s composition to its actual performance is an ongoing area in catalyst informatics, as it is a necessary step to improve our understanding on the target reactions. Herein, a small descriptor-engineered data set containing 3289 descriptor variables and the performance of 200 catalysts for the oxidative coupling of methane (OCM) is analyzed, and a descriptor search algorithm based on the workflow of the Basin-hopping optimization methodology is proposed to select the descriptors that better fit a predictive model. The algorithm, which can be considered wrapper in nature, consists of the successive generation of random-based modifications to the descriptor subset used in a regression model and adopting them depending on their effect on the model’s score. The results are presented after being tested on linear and Support Vector Regression models with average cross-validation r2 scores of 0.8268 and 0.6875, respectively.

  9. Z

    MedleyDB - Melody Subset

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 21, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juan Pablo Bello (2020). MedleyDB - Melody Subset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2600316
    Explore at:
    Dataset updated
    Jan 21, 2020
    Dataset provided by
    Justin Salamon
    Juan Pablo Bello
    Rachel Bittner
    Chris Cannam
    Mike Tierney
    Matthias Mauch
    Description

    Subset of MedleyDB: Mix audio files and Melody Annotations for the 108 files in the MedleyDB multitrack dataset containing melody.

    There are 3 types of melody annotations released:

    melody1: "The f0 curve of the predominant melodic line drawn from a single source"

    melody2: "The f0 curve of the predominant melodic line drawn from multiple sources"

    melody3: "The f0 curves of all melodic lines drawn from multiple sources"

    For further details, refer to the MedleyDB website.

    Further Annotation and Metadata files are version controlled and are available in the MedleyDB github repository: Metadata can be found here, Annotations can be found here.

    For detailed information about the dataset, please visit MedleyDB's website.

    If you make use of MedleyDB for academic purposes, please cite the following publication:

    R. Bittner, J. Salamon, M. Tierney, M. Mauch, C. Cannam and J. P. Bello, "MedleyDB: A Multitrack Dataset for Annotation-Intensive MIR Research", in 15th International Society for Music Information Retrieval Conference, Taipei, Taiwan, Oct. 2014.

  10. a

    NEON Woody plant survey data: ACCE DTP analytical subset

    • annakrystalli.me
    csv
    Updated Mar 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anna Krystalli (2025). NEON Woody plant survey data: ACCE DTP analytical subset [Dataset]. https://annakrystalli.me/project/data/index.html
    Explore at:
    csvAvailable download formats
    Dataset updated
    Mar 19, 2025
    Dataset provided by
    R-RSE SMPC
    Authors
    Anna Krystalli
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    May 18, 2015 - Nov 16, 2018
    Area covered
    Variables measured
    uid, date, height, easting, plot_id, site_id, uid_map, uid_ppl, event_id, northing, and 22 more
    Dataset funded by
    National Science Foundation
    Description

    This data product, sourced from the NEON data portal for the purposes of the ACCE DTP tutorial, contains processed individual level data from measurements of woody individuals and shrub groups.

  11. E

    CELEX Dutch lexical database - Frequency Subset

    • catalogue.elra.info
    • live.european-language-grid.eu
    Updated Oct 5, 2005
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2005). CELEX Dutch lexical database - Frequency Subset [Dataset]. https://catalogue.elra.info/en-us/repository/browse/ELRA-L0029_07/
    Explore at:
    Dataset updated
    Oct 5, 2005
    Dataset provided by
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
    ELRA (European Language Resources Association)
    License

    https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf

    https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf

    Description

    The Dutch CELEX data is derived from R.H. Baayen, R. Piepenbrock & L. Gulikers, The CELEX Lexical Database (CD-ROM), Release 2, Dutch Version 3.1, Linguistic Data Consortium, University of Pennsylvania, Philadelphia, PA, 1995.Apart from orthographic features, the CELEX database comprises representations of the phonological, morphological, syntactic and frequency properties of lemmata. For the Dutch data, frequencies have been disambiguated on the basis of the 42.4m Dutch Instituut voor Nederlandse Lexicologie text corpora.To make for greater compatibility with other operating systems, the databases have not been tailored to fit any particular database management program. Instead, the information is presented in a series of plain ASCII files, which can be queried with tools such as AWK and ICON. Unique identity numbers allow the linking of information from different files.This database can be divided into different subsets:· orthography: with or without diacritics, with or without word division positions, alternative spellings, number of letters/syllables;· phonology: phonetic transcriptions with syllable boundaries or primary and secondary stress markers, consonant-vowel patterns, number of phonemes/syllables, alternative pronunciations, frequency per phonetic syllable within words;· morphology: division into stems and affixes, flat or hierarchical representations, stems and their inflections;· syntax: word class, subcategorisations per word class;· frequency of the entries: disambiguated for homographic lemmata.

  12. Data from: Effects of nutrient enrichment on freshwater macrophyte and...

    • zenodo.org
    Updated Dec 13, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Floris K. Neijnens; Floris K. Neijnens; Hadassa Moreira; Hadassa Moreira; Melinda M.J. De Jonge; Melinda M.J. De Jonge; Bart B.H.P. Linssen; Mark A.J. Huijbregts; Mark A.J. Huijbregts; Gertjan W. Geerling; Gertjan W. Geerling; Aafke M. Schipper; Aafke M. Schipper; Bart B.H.P. Linssen (2023). Effects of nutrient enrichment on freshwater macrophyte and invertebrate abundance: A meta-analysis [Dataset]. http://doi.org/10.5281/zenodo.10372444
    Explore at:
    Dataset updated
    Dec 13, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Floris K. Neijnens; Floris K. Neijnens; Hadassa Moreira; Hadassa Moreira; Melinda M.J. De Jonge; Melinda M.J. De Jonge; Bart B.H.P. Linssen; Mark A.J. Huijbregts; Mark A.J. Huijbregts; Gertjan W. Geerling; Gertjan W. Geerling; Aafke M. Schipper; Aafke M. Schipper; Bart B.H.P. Linssen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The zip-file contains the data and code accompanying the paper 'Effects of nutrient enrichment on freshwater macrophyte and invertebrate abundance: A meta-analysis'. Together, these files should allow for the replication of the results.

    The 'raw_data' folder contains the 'MA_database.csv' file, which contains the extracted data from all primary studies that are used in the analysis. Furthermore, this folder contains the file 'MA_database_description.txt', which gives a description of each data column in the database.

    The 'derived_data' folder contains the files that are produced by the R-scripts in this study and used for data analysis. The 'MA_database_processed.csv' and 'MA_database_processed.RData' files contain the converted raw database that is suitable for analysis. The 'DB_IA_subsets.RData' file contains the 'Individual Abundance' (IA) data subsets based on taxonomic group (invertebrates/macrophytes) and inclusion criteria. The 'DB_IA_VCV_matrices.RData' contains for all IA data subsets the variance-covariance (VCV) matrices. The 'DB_AM_subsets.RData' file contains the 'Total Abundance' (TA) and 'Mean Abundance' (MA) data subsets based on taxonomic group (invertebrates/macrophytes) and inclusion criteria.

    The 'output_data' folder contains maps with the output data for each data subset (i.e. for each metric, taxonomic group and set of inclusion criteria). For each data subset, the map contains random effects selection results ('Results1_REsel_

    The 'scripts' folder contains all R-scripts that we used for this study. The 'PrepareData.R' script takes the database as input and adjusts the file so that it can be used for data analysis. The 'PrepareDataIA.R' and 'PrepareDataAM.R' scripts make subsets of the data and prepare the data for the meta-regression analysis and mixed-effects regression analysis, respectively. The regression analyses are performed in the 'SelectModelsIA.R' and 'SelectModelsAM.R' scripts to calculate the regression model results for the IA metric and MA/TA metrics, respectively. These scripts require the 'RandomAndFixedEffects.R' script, containing the random and fixed effects parameter combinations, as well as the 'Functions.R' script. The 'CreateMap.R' script creates a global map with the location of all studies included in the analysis (figure 1 in the paper). The 'CreateForestPlots.R' script creates plots showing the IA data distribution for both taxonomic groups (figure 2 in the paper). The 'CreateHeatMaps.R' script creates heat maps for all metrics and taxonomic groups (figure 3 in the paper, figures S11.1 and S11.2 in the appendix). The 'CalculateStatistics.R' script calculates the descriptive statistics that are reported throughout the paper, and creates the figures that describe the dataset characteristics (figures S3.1 to S3.5 in the appendix). The 'CreateFunnelPlots.R' script creates the funnel plots for both taxonomic groups (figures S6.1 and S6.2 in the appendix) and performs Egger's tests. The 'CreateControlGraphs.R' script creates graphs showing the dependency of the nutrient response to control concentrations for all metrics and taxonomic groups (figures S10.1 and S10.2 in the appendix).

    The 'figures' folder contains all figures that are included in this study.

  13. t

    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L. (2024)....

    • service.tib.eu
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L. (2024). Dataset: ImageNet Subsets. https://doi.org/10.57702/oetogsha [Dataset]. https://service.tib.eu/ldmservice/dataset/imagenet-subsets
    Explore at:
    Dataset updated
    Dec 16, 2024
    Description

    ImageNet Subsets

  14. Z

    SDSS Galaxy Subset

    • data.niaid.nih.gov
    Updated Sep 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carvalho, Nuno Ramos (2022). SDSS Galaxy Subset [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_6393487
    Explore at:
    Dataset updated
    Sep 6, 2022
    Dataset authored and provided by
    Carvalho, Nuno Ramos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Sloan Digital Sky Survey (SDSS) is a comprehensive survey of the northern sky. This dataset contains a subset of this survey, of 100077 objects classified as galaxies, it includes a CSV file with a collection of information and a set of files for each object, namely JPG image files, FITS and spectra data. This dataset is used to train and explore the astromlp-models collection of deep learning models for galaxies characterisation.

    The dataset includes a CSV data file where each row is an object from the SDSS database, and with the following columns (note that some data may not be available for all objects):

    objid: unique SDSS object identifier

    mjd: MJD of observation

    plate: plate identifier

    tile: tile identifier

    fiberid: fiber identifier

    run: run number

    rerun: rerun number

    camcol: camera column

    field: field number

    ra: right ascension

    dec: declination

    class: spectroscopic class (only objetcs with GALAXY are included)

    subclass: spectroscopic subclass

    modelMag_u: better of DeV/Exp magnitude fit for band u

    modelMag_g: better of DeV/Exp magnitude fit for band g

    modelMag_r: better of DeV/Exp magnitude fit for band r

    modelMag_i: better of DeV/Exp magnitude fit for band i

    modelMag_z: better of DeV/Exp magnitude fit for band z

    redshift: final redshift from SDSS data z

    stellarmass: stellar mass extracted from the eBOSS Firefly catalog

    w1mag: WISE W1 "standard" aperture magnitude

    w2mag: WISE W2 "standard" aperture magnitude

    w3mag: WISE W3 "standard" aperture magnitude

    w4mag: WISE W4 "standard" aperture magnitude

    gz2c_f: Galaxy Zoo 2 classification from Willett et al 2013

    gz2c_s: simplified version of Galaxy Zoo 2 classification (labels set)

    Besides the CSV file a set of directories are included in the dataset, in each directory you'll find a list of files named after the objid column from the CSV file, with the corresponding data, the following directories tree is available:

    sdss-gs/ ├── data.csv ├── fits ├── img ├── spectra └── ssel

    Where, each directory contains:

    img: RGB images from the object in JPEG format, 150x150 pixels, generated using the SkyServer DR16 API

    fits: FITS data subsets around the object across the u, g, r, i, z bands; cut is done using the ImageCutter library

    spectra: full best fit spectra data from SDSS between 4000 and 9000 wavelengths

    ssel: best fit spectra data from SDSS for specific selected intervals of wavelengths discussed by Sánchez Almeida 2010

    Changelog

    v0.0.4 - Increase number of objects to ~100k.

    v0.0.3 - Increase number of objects to ~80k.

    v0.0.2 - Increase number of objects to ~60k.

    v0.0.1 - Initial import.

  15. WABI Subset: Police

    • researchdata.edu.au
    Updated Jul 29, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    State Library of Western Australia (2016). WABI Subset: Police [Dataset]. https://researchdata.edu.au/wabi-subset-police/2994547
    Explore at:
    Dataset updated
    Jul 29, 2016
    Dataset provided by
    Data.govhttps://data.gov/
    Authors
    State Library of Western Australia
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Area covered
    Description

    This index was compiled by Miss Mollie Bentley from various records she has used relating to the police. These include: Almanac listings, Colonial Secretary's Office Records, Police Gazettes, various police department occurrence books and letter books, police journals, government gazettes, estimates, York police records etc.\r \r Entry is by name of policeman. Information given varies but is usually about appointments, promotions, retirements, transfers etc.\r \r The Western Australian Biographical Index (WABI) is a highly used resource at the State Library of Western Australia. A recent generous contribution by the Friends of Battye Library (FOBS) has enabled SLWA to have the original handwritten index cards scanned and later transcribed.\r \r The dataset contains: several csv files with data describing card number, card text and url link to image of the original handwritten card.\r \r The transcription was crowd-sourced and we are aware that there are some data quality issues including:\r \r * Some cards are missing\r * Transcripts are crowdsourced so may contain spelling errors and possibly missing information\r * Some cards are crossed out. Some of these are included in the collection and some are not\r * Some of the cards contain relevant information on the back (usually children of the person mentioned). This info should be on the next consecutive card\r * As the information is an index, collected in the 1970s from print material, it is incomplete. It is also unreferenced.\r It is still a very valuable dataset as it contains a wealth of information about early settlers in Western Australia. It is of particular interest to genealogists and historians.

  16. E

    Data from: Subset of turbulent energy fluxes, meteorology and soil physics...

    • catalogue.ceh.ac.uk
    • hosted-metadata.bgs.ac.uk
    • +2more
    zip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    R. Morrison; H.M. Cooper; A.M.J. Cumming; C. Evans; S. Oakley; N.P. McNamara; R. Pywell; P. Scarlett (2020). Subset of turbulent energy fluxes, meteorology and soil physics observations collected at eddy covariance sites in southeast England, June 2019 [Dataset]. http://doi.org/10.5285/0254620f-9cf1-4d5b-af3f-bd8a6af95e96
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    NERC EDS Environmental Information Data Centre
    Authors
    R. Morrison; H.M. Cooper; A.M.J. Cumming; C. Evans; S. Oakley; N.P. McNamara; R. Pywell; P. Scarlett
    Time period covered
    Jun 22, 2019 - Jul 6, 2019
    Area covered
    Dataset funded by
    Natural Environment Research Councilhttps://www.ukri.org/councils/nerc
    Description

    This dataset contains time series observations of surface-atmosphere exchanges of sensible heat (H) and latent heat (LE) and momentum (Ď„) measured at UKCEH eddy covariance flux observation sites during summer 2019. The dataset includes ancillary weather and soil physics observations made at each site. Eddy covariance (EC) and micrometeorological observations were collected using open-path eddy covariance systems. Flux, meteorological and soil physics observations were collected and processed using harmonised protocols across all sites. This work was supported by the Natural Environment Research Council award number NE/R016429/1 as part of the UK-SCAPE programme delivering National Capability.

  17. g

    Indonesian Family Life Study, merged subset

    • laurabotzet.github.io
    Updated 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    RAND corporation (2016). Indonesian Family Life Study, merged subset [Dataset]. https://laurabotzet.github.io/birth_order_ifls/2_codebook.html
    Explore at:
    Dataset updated
    2016
    Authors
    RAND corporation
    Time period covered
    2014 - 2015
    Area covered
    000 individuals living in 13 of the 27 provinces in the country. See URL for more., 13 Indonesian provinces. The sample is representative of about 83% of the Indonesian population and contains over 30
    Variables measured
    a1, a2, c1, c3, e1, e3, n2, n3, o1, o2, and 138 more
    Description

    Data from the IFLS, merged across waves, most outcomes taken from wave 5. Includes birth order, family structure, Big 5 Personality, intelligence tests, and risk lotteries

    Table of variables

    This table contains variable names, labels, and number of missing values. See the complete codebook for more.

    [truncated]

    Note

    This dataset was automatically described using the codebook R package (version 0.8.2).

  18. OpenML R Bot Benchmark Data (final subset)

    • figshare.com
    application/gzip
    Updated May 18, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel KĂĽhn; Philipp Probst; Janek Thomas; Bernd Bischl (2018). OpenML R Bot Benchmark Data (final subset) [Dataset]. http://doi.org/10.6084/m9.figshare.5882230.v2
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    May 18, 2018
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Daniel KĂĽhn; Philipp Probst; Janek Thomas; Bernd Bischl
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a clean subset of the data that was created by the OpenML R Bot that executed benchmark experiments on binary classification task of the OpenML100 benchmarking suite with six R algorithms: glmnet, rpart, kknn, svm, ranger and xgboost. The hyperparameters of these algorithms were drawn randomly. In total it contains more than 2.6 million benchmark experiments and can be used by other researchers. The subset was created by taking 500000 results of each learner (except of kknn for which only 1140 results are available). The csv-file for each learner is a table that for each benchmark experiment has a row that contains: OpenML-Data ID, hyperparameter values, performance measures (AUC, accuracy, brier score), runtime, scimark (runtime reference of the machine), and some meta features of the dataset.OpenMLRandomBotResults.RData (format for R) contains all data in seperate tables for the results, the hyperparameters, the meta features, the runtime, the scimark results and reference results.

  19. f

    Supporting Information S1 - Improving Power of Genome-Wide Association...

    • plos.figshare.com
    doc
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wan-Yu Lin; Wen-Chung Lee (2023). Supporting Information S1 - Improving Power of Genome-Wide Association Studies with Weighted False Discovery Rate Control and Prioritized Subset Analysis [Dataset]. http://doi.org/10.1371/journal.pone.0033716.s001
    Explore at:
    docAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Wan-Yu Lin; Wen-Chung Lee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    FDR of the WGA, the PSA, and the WEI (

         r
    
         = 2, 5, 10) when the prioritized region sizes were 2 Mb and 20 Mb (with adjustment to the PSA), respectively; power comparison between the WGA, the PSA, and the WEI (
    
         r
    
         = 2, 5, 10) when 14 2-Mb, 14 20-Mb, 22 2-Mb, and 22 20-Mb regions were prioritized (with adjustment to the PSA), respectively.
        (DOC)
    
  20. f

    The distribution of grades assigned by a subset of pathologists.

    • figshare.com
    xls
    Updated Jun 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thomas R. Fanshawe; Andrew G. Lynch; Ian O. Ellis; Andrew R. Green; Rudolf Hanka (2023). The distribution of grades assigned by a subset of pathologists. [Dataset]. http://doi.org/10.1371/journal.pone.0002925.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Thomas R. Fanshawe; Andrew G. Lynch; Ian O. Ellis; Andrew R. Green; Rudolf Hanka
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Grades (G1–G3) assigned by a selection of ten pathologists to 52 breast cancer tumour samples, with estimated agreement scores, and simulated results and parameter estimates from the Bayesian latent trait model.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Nuno Ramos Carvalho; Nuno Ramos Carvalho (2022). SDSS Galaxy Subset [Dataset]. http://doi.org/10.5281/zenodo.6696565
Organization logo

SDSS Galaxy Subset

Explore at:
application/gzipAvailable download formats
Dataset updated
Sep 5, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Nuno Ramos Carvalho; Nuno Ramos Carvalho
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The Sloan Digital Sky Survey (SDSS) is a comprehensive survey of the northern sky. This dataset contains a subset of this survey, of 60247 objects classified as galaxies, it includes a CSV file with a collection of information and a set of files for each object, namely JPG image files, FITS and spectra data. This dataset is used to train and explore the astromlp-models collection of deep learning models for galaxies characterisation.

The dataset includes a CSV data file where each row is an object from the SDSS database, and with the following columns (note that some data may not be available for all objects):

  • objid: unique SDSS object identifier
  • mjd: MJD of observation
  • plate: plate identifier
  • tile: tile identifier
  • fiberid: fiber identifier
  • run: run number
  • rerun: rerun number
  • camcol: camera column
  • field: field number
  • ra: right ascension
  • dec: declination
  • class: spectroscopic class (only objetcs with GALAXY are included)
  • subclass: spectroscopic subclass
  • modelMag_u: better of DeV/Exp magnitude fit for band u
  • modelMag_g: better of DeV/Exp magnitude fit for band g
  • modelMag_r: better of DeV/Exp magnitude fit for band r
  • modelMag_i: better of DeV/Exp magnitude fit for band i
  • modelMag_z: better of DeV/Exp magnitude fit for band z
  • redshift: final redshift from SDSS data z
  • stellarmass: stellar mass extracted from the eBOSS Firefly catalog
  • w1mag: WISE W1 "standard" aperture magnitude
  • w2mag: WISE W2 "standard" aperture magnitude
  • w3mag: WISE W3 "standard" aperture magnitude
  • w4mag: WISE W4 "standard" aperture magnitude
  • gz2c_f: Galaxy Zoo 2 classification from Willett et al 2013
  • gz2c_s: simplified version of Galaxy Zoo 2 classification (labels set)

Besides the CSV file a set of directories are included in the dataset, in each directory you'll find a list of files named after the objid column from the CSV file, with the corresponding data, the following directories tree is available:

sdss-gs/
├── data.csv
├── fits
├── img
├── spectra
└── ssel

Where, each directory contains:

  • img: RGB images from the object in JPEG format, 150x150 pixels, generated using the SkyServer DR16 API
  • fits: FITS data subsets around the object across the u, g, r, i, z bands; cut is done using the ImageCutter library
  • spectra: full best fit spectra data from SDSS between 4000 and 9000 wavelengths
  • ssel: best fit spectra data from SDSS for specific selected intervals of wavelengths discussed by Sánchez Almeida 2010

Changelog

  • v0.0.3 - Increase number of objects to ~80k.
  • v0.0.2 - Increase number of objects to ~60k.
  • v0.0.1 - Initial import.
Search
Clear search
Close search
Google apps
Main menu