Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Sloan Digital Sky Survey (SDSS) is a comprehensive survey of the northern sky. This dataset contains a subset of this survey, of 60247 objects classified as galaxies, it includes a CSV file with a collection of information and a set of files for each object, namely JPG image files, FITS and spectra data. This dataset is used to train and explore the astromlp-models collection of deep learning models for galaxies characterisation.
The dataset includes a CSV data file where each row is an object from the SDSS database, and with the following columns (note that some data may not be available for all objects):
Besides the CSV file a set of directories are included in the dataset, in each directory you'll find a list of files named after the objid column from the CSV file, with the corresponding data, the following directories tree is available:
sdss-gs/
├── data.csv
├── fits
├── img
├── spectra
└── ssel
Where, each directory contains:
Changelog
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
RMSE and R2 for different data groupings. The first column contains the species composition (inter, intra, or species specific) and the second column the data composition for each statistic reported in the remaining columns. Columns 3-5 contain the RMSE (kg), columns 6-8 the R2, and columns 9-11 the % error relative to the mean biomass in the dataset for BSD, D30, and DBH, respectively (see Methods). The subset of the data for trees that had all three measurements is denoted by the terms “Combined 3” and “Site 3” in the Data column. The lowest RMSE value in a row for each metric is in bold. Similarly, the highest R2 for each row is in bold. Values that represent means are underlined. The mean for each column and each grouping is given in the final two rows. The final row contains means for those trees with all three measures. The row above it contains means for all trees.
Facebook
TwitterSentences and citation contexts identified from the PubMed Central open access articles ---------------------------------------------------------------------- The dataset is delivered as 24 tab-delimited text files. The files contain 720,649,608 sentences, 75,848,689 of which are citation contexts. The dataset is based on a snapshot of articles in the XML version of the PubMed Central open access subset (i.e., the PMCOA subset). The PMCOA subset was collected in May 2019. The dataset is created as described in: Hsiao TK., & Torvik V. I. (manuscript) OpCitance: Citation contexts identified from the PubMed Central open access articles. Files: • A_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with A. • B_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with B. • C_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with C. • D_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with D. • E_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with E. • F_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with F. • G_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with G. • H_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with H. • I_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with I. • J_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with J. • K_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with K. • L_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with L. • M_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with M. • N_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with N. • O_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with O. • P_p1_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with P (part 1). • P_p2_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with P (part 2). • Q_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with Q. • R_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with R. • S_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with S. • T_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with T. • UV_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with U or V. • W_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with W. • XYZ_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with X, Y or Z. Each row in the file is a sentence/citation context and contains the following columns: • pmcid: PMCID of the article • pmid: PMID of the article. If an article does not have a PMID, the value is NONE. • location: The article component (abstract, main text, table, figure, etc.) to which the citation context/sentence belongs. • IMRaD: The type of IMRaD section associated with the citation context/sentence. I, M, R, and D represent introduction/background, method, results, and conclusion/discussion, respectively; NoIMRaD indicates that the section type is not identifiable. • sentence_id: The ID of the citation context/sentence in the article component • total_sentences: The number of sentences in the article component. • intxt_id: The ID of the citation. • intxt_pmid: PMID of the citation (as tagged in the XML file). If a citation does not have a PMID tagged in the XML file, the value is "-". • intxt_pmid_source: The sources where the intxt_pmid can be identified. Xml represents that the PMID is only identified from the XML file; xml,pmc represents that the PMID is not only from the XML file, but also in the citation data collected from the NCBI Entrez Programming Utilities. If a citation does not have an intxt_pmid, the value is "-". • intxt_mark: The citation marker associated with the inline citation. • best_id: The best source link ID (e.g., PMID) of the citation. • best_source: The sources that confirm the best ID. • best_id_diff: The comparison result between the best_id column and the intxt_pmid column. • citation: A citation context. If no citation is found in a sentence, the value is the sentence. • progression: Text progression of the citation context/sentence. Supplementary Files • PMC-OA-patci.tsv.gz – This file contains the best source link IDs for the references (e.g., PMID). Patci [1] was used to identify the best source link IDs. The best source link IDs are mapped to the citation contexts and displayed in the *_journal IntxtCit.tsv files as the best_id column. Each row in the PMC-OA-patci.tsv.gz file is a citation (i.e., a reference extracted from the XML file) and contains the following columns: • pmcid: PMCID of the citing article. • pos: The citation's position in the reference list. • fromPMID: PMID of the citing article. • toPMID: Source link ID (e.g., PMID) of the citation. This ID is identified by Patci. • SRC: The sources that confirm the toPMID. • MatchDB: The origin bibliographic database of the toPMID. • Probability: The match probability of the toPMID. • toPMID2: PMID of the citation (as tagged in the XML file). • SRC2: The sources that confirm the toPMID2. • intxt_id: The ID of the citation. • journal: The first letter of the journal title. This maps to the *_journal_IntxtCit.tsv files. • same_ref_string: Whether the citation string appears in the reference list more than once. • DIFF: The comparison result between the toPMID column and the toPMID2 column. • bestID: The best source link ID (e.g., PMID) of the citation. • bestSRC: The sources that confirm the best ID. • Match: Matching result produced by Patci. [1] Agarwal, S., Lincoln, M., Cai, H., & Torvik, V. (2014). Patci – a tool for identifying scientific articles cited by patents. GSLIS Research Showcase 2014. http://hdl.handle.net/2142/54885 • Supplementary_File_1.zip – This file contains the code for generating the dataset.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A probability score is propagated through a network structure starting from an initial starting node (sn). In lines 4–13, probability is split preferentially between unvisited network neighbors of the starting node by edge weight and propagated recursively to secondary neighbors until the probability being diffused is less than a defined parameter, thresholdDiff (default set to 0.01). If the starting node, sn, has no unvisited neighbors, p1 is distributed uniformly amongst all unvisited nodes, regardless of proximity to sn (lines 15–16).
Facebook
TwitterThis dataset was obtained [here][1] and their description is reproduced below.
Galaxies are fundamental structures in the Universe. Our Sun lives in the Milky Way Galaxy we can see as a patchy band of light across the sky. The components of a typical galaxy are: a vast number of stars (total mass ~106-1011 Mo where Mo is the unit of a solar mass), a complex interstellar medium of gas and dust from which stars form (typically 1-100% of the stellar component mass), a single supermassive black hole at the center (typically <1% of the stellar component mass), and a poorly understood component called Dark Matter with mass ~5-10-times all the other components combined.
Over the ~14 billion years since the Big Bang, the rate at which galaxies convert interstellar matter into stars has not been constant, and thus the brightness and color of galaxies change with cosmic time. This phenomenon has several names in the astronomical community: the history of star formation in the Universe, chemical evolution of galaxies, or simply galaxy evolution. A major effort over several decades has been made to quantify and understand galaxy evolution using telescopes at all wavelengths.
The traditional tool for such studies has been optical spectroscopy which easily reveals signatures of star formation in nearby galaxies. However, to study star formation in the galaxies recently emerged after the Big Bang, we must examine extremely faint galaxies which are too faint for spectroscopy, even using the biggest available telescopes. A feasible alternative is to obtain images of faint galaxies at random locations in the sky in narrow spectral bands, and thereby construct crude spectra. First, statistical analysis of such multiband photometric datasets are used to classify galaxies, stars and quasars. Second, for the galaxies, multivariate regression is made to develop photometric estimates of redshift, which is a measure both of distance from us and age since the Big Bang. Third, one can examine galaxy colors as a function of redshift (after various corrections are made) to study the evolution of star formation. The present dataset is taken after these first two steps are complete.
[Wolf et al. (2004)][2] provide the first public catalog of a large dataset (63,501 objects) with brightness measurements in 17 bands in the visible band. (Note that the Sloan Digital Sky Survey provides a much larger dataset of 108 objects with measurements in 5 bands.) We provide here a subset of their catalog with 65 columns of information on 3,462 galaxies. These are objects in the Chandra Deep Field South field which Wolf and colleagues have classified as `Galaxies'. The column headings are formally described in their Table 3, and the columns we provide are summarized here with brief commentary:
Col 1: Nr, object number
Col 2-3: Total R (red band) magnitude and its error. This was the band at which the basic catalog was constructed. Magnitudes are inverted logarithmic measures of brightness. A galaxy with R=21 is 100-times brighter than one with R=26. The error is the standard deviation derived from detailed knowledge of the measurement process. This dataset is an excellent example of astronomical datasets where each variable is accompanied by heteroscedastic measurement errors of known variances.
Col 4-5: ApDRmag is the difference between the total and aperture magnitude in the R band. This is a rough measure of the size of the galaxy in the image where ApDRmag=0 corresponds to a point source. Negative values are not physically meaningful. mu_max is the central surface brightness of the object in the R band. The difference between Rmag and mu_max should also be an indicator of galaxy size.
Col 6-9: Mcz and MCzml are two redshift estimates. Mcz is the preferred value. e.Mcz is its estimated error, and chi2red is the reduced chi-squared value of the least-squares fit of the 17-band magnitudes to the best-fit template galaxy spectrum. Galaxies with large e.Mcz or chi2red might be omitted as unreliable.
Col 10-29: These give the absolute magnitudes (i.e. intrinsic luminosities) of the galaxy in 10 bands, with their measurement errors. They are based on the measured magnitudes and the redshifts, and represent the intrinsic luminosities of the galaxies; a galaxy with M=-15 is 100-times less luminous than one with M=-20. These magnitudes are not all independent of each others, but the are important for representing intrinsic properties of the galaxies. Below is one of several redshift-stratified plots of the B-band absolute magnitude (abscissa) against the difference of magnitude (i.e. ratio of luminosities) between the 2800A ultraviolet and blue band, which is a sensitive indicator of star formation. A redshift-dependent bimodal distribution is see...
Facebook
TwitterFile List argos.csv (MD5: 231845256e0e3e2780b89b1979b11593) dive.csv (MD5: 0637c54ff8df8166d7f07b80295eaa5a) example.R (MD5: )5c05f59f227390015949a4789d2feef9 dat4bugsCOV.R (MD5: e1648cc3144721f82d351922cd28f7e5) ssmMeta.R (MD5: 63baa0688eb6061afb2f8d7a80a6d2f4) DCRWSmeta_cov1oneway.txt (MD5: 4d1eadc5bc3c88f9d3ac1534c71ab5ba) Description Supplement contains sample data, functions and scripts to prepare the data and fit the covariate state-space model presented in this paper. Two raw data files contain samples of Argos location data (argos.csv) and individual dive data (dive.csv) for a subset of Southern elephant seals tagged with CTD-SRDLs at Davis Station, Antarctica in 2011 under the Australian Integrated Marine Observing System. Following the worked example given in the example.R file should enable readers to run the covariate state-space model and/or fit their own Argos and dive data using the R and WinBUGS codes provided (dat4bugsCOV.R, ssmMeta.R, and DCRWSmeta_cov1oneway.txt). The argos.csv file is a comma-separated file containing the Southern elephant seal raw Argos tracking data. Column definitions "id" - is a unique identifier for the seal from which the tracking data set came. "time" - is the GMT date-time of each observation with the following format: "2001-11-13 07:59:59". "lc" is the Argos location quality class of each observation. Values in ascending order of quality are: "Z", "B", "A", "0", "1", "2", "3". "lon" is the observed longitude in decimal degrees. "lat" is the observed latitude in decimal degress. The dive.csv file is a comma-separated file containing the Southern elephant seal individual dive data. Column definitions "id" - is a unique identifier for the seal from which the dive dataset came (same as for the argos.csv file). "time" - is the GMT date-time of each observation with the following format: "2001-11-13 07:59:59". "MAX_DEP" – the dive covariate used here is the logarithm of the maximum depth in meters recorded for the dive. The reader may use any dive covariate, raw or pre-processed (with any name), here. example.R contains R code for the worked Southern elephant seal example showing model implementation. dat4bugsCOV.R is an R function which prepares and writes input data for WinBUGS model. ssmMeta.R is an R function which fits covariate state-space model to Argos tracking data. DCRWSmeta_cov1oneway.txt contains WinBUGS code for the covariate state-space model.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Sloan Digital Sky Survey (SDSS) is a comprehensive survey of the northern sky. This dataset contains a subset of this survey, of 60247 objects classified as galaxies, it includes a CSV file with a collection of information and a set of files for each object, namely JPG image files, FITS and spectra data. This dataset is used to train and explore the astromlp-models collection of deep learning models for galaxies characterisation.
The dataset includes a CSV data file where each row is an object from the SDSS database, and with the following columns (note that some data may not be available for all objects):
Besides the CSV file a set of directories are included in the dataset, in each directory you'll find a list of files named after the objid column from the CSV file, with the corresponding data, the following directories tree is available:
sdss-gs/
├── data.csv
├── fits
├── img
├── spectra
└── ssel
Where, each directory contains:
Changelog