100+ datasets found
  1. CSV file used in statistical analyses

    • data.csiro.au
    • researchdata.edu.au
    • +1more
    Updated Oct 13, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CSIRO (2014). CSV file used in statistical analyses [Dataset]. http://doi.org/10.4225/08/543B4B4CA92E6
    Explore at:
    Dataset updated
    Oct 13, 2014
    Dataset authored and provided by
    CSIROhttp://www.csiro.au/
    License

    https://research.csiro.au/dap/licences/csiro-data-licence/https://research.csiro.au/dap/licences/csiro-data-licence/

    Time period covered
    Mar 14, 2008 - Jun 9, 2009
    Dataset funded by
    CSIROhttp://www.csiro.au/
    Description

    A csv file containing the tidal frequencies used for statistical analyses in the paper "Estimating Freshwater Flows From Tidally-Affected Hydrographic Data" by Dan Pagendam and Don Percival.

  2. Raw Data - CSV Files

    • osf.io
    Updated Apr 27, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Katelyn Conn (2020). Raw Data - CSV Files [Dataset]. https://osf.io/h5wbt
    Explore at:
    Dataset updated
    Apr 27, 2020
    Dataset provided by
    Center for Open Sciencehttps://cos.io/
    Authors
    Katelyn Conn
    Description

    Raw Data in .csv format for use with the R data wrangling scripts.

  3. all csv files used for analysis of NCBI data

    • figshare.com
    txt
    Updated Oct 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cassandre Pyne (2023). all csv files used for analysis of NCBI data [Dataset]. http://doi.org/10.6084/m9.figshare.24461239.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Oct 30, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Cassandre Pyne
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    all csv files used for analysis of NCBIall files with "WOAH" in it are the disease and disease agents from WOAH's list (see manuscript for link) all breed files (with breed names in name) are from web scrapingMASTER_DATA_coordinates_FINAL_AUG_5: cleaned mined data from NCBI

  4. Bulk data files for all years – releases, disposals, transfers and facility...

    • open.canada.ca
    • ouvert.canada.ca
    csv, html
    Updated Nov 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Environment and Climate Change Canada (2024). Bulk data files for all years – releases, disposals, transfers and facility locations [Dataset]. https://open.canada.ca/data/en/dataset/40e01423-7728-429c-ac9d-2954385ccdfb
    Explore at:
    csv, htmlAvailable download formats
    Dataset updated
    Nov 28, 2024
    Dataset provided by
    Environment And Climate Change Canadahttps://www.canada.ca/en/environment-climate-change.html
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Time period covered
    Jan 1, 1993 - Dec 31, 2023
    Description

    The National Pollutant Release Inventory (NPRI) is Canada's public inventory of pollutant releases (to air, water and land), disposals and transfers for recycling. Each file contains data from 1993 to the latest reporting year. These CSV format datasets are in normalized or ‘list’ format and are optimized for pivot table analyses. Here is a description of each file: - The RELEASES file contains all substance release quantities. - The DISPOSALS file contains all on-site and off-site disposal quantities, including tailings and waste rock (TWR). - The TRANSFERS file contains all quantities transferred for recycling or treatment prior to disposal. - The COMMENTS file contains all the comments provided by facilities about substances included in their report. - The GEO LOCATIONS file contains complete geographic information for all facilities that have reported to the NPRI. Please consult the following resources to enhance your analysis: - Guide on using and Interpreting NPRI Data: https://www.canada.ca/en/environment-climate-change/services/national-pollutant-release-inventory/using-interpreting-data.html - Access additional data from the NPRI, including datasets and mapping products: https://www.canada.ca/en/environment-climate-change/services/national-pollutant-release-inventory/tools-resources-data/exploredata.html Supplemental Information More NPRI datasets and mapping products are available here: https://www.canada.ca/en/environment-climate-change/services/national-pollutant-release-inventory/tools-resources-data/access.html Supporting Projects: National Pollutant Release Inventory (NPRI)

  5. r

    1000 Empirical Time series

    • researchdata.edu.au
    • bridges.monash.edu
    • +1more
    Updated May 5, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ben Fulcher (2022). 1000 Empirical Time series [Dataset]. http://doi.org/10.6084/m9.figshare.5436136.v10
    Explore at:
    Dataset updated
    May 5, 2022
    Dataset provided by
    Monash University
    Authors
    Ben Fulcher
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A diverse selection of 1000 empirical time series, along with results of an hctsa feature extraction, using v1.06 of hctsa and Matlab 2019b, computed on a server at The University of Sydney.


    The results of the computation are in the hctsa file, HCTSA_Empirical1000.mat for use in Matlab using v1.06 of hctsa.

    The same data is also provided in .csv format for the hctsa_datamatrix.csv (results of feature computation), with information about rows (time series) in hctsa_timeseries-info.csv, information about columns (features) in hctsa_features.csv (and corresponding hctsa code used to compute each feature in hctsa_masterfeatures.csv), and the data of individual time series (each line a time series, for time series described in hctsa_timeseries-info.csv) is in hctsa_timeseries-data.csv.

    These .csv files were produced by running >>OutputToCSV(HCTSA_Empirical1000.mat,true,true); in hctsa.

    The input file, INP_Empirical1000.mat, is for use with hctsa, and contains the time-series data and metadata for the 1000 time series. For example, massive feature extraction from these data on the user's machine, using hctsa, can proceed as
    >> TS_Init('INP_Empirical1000.mat');

    Some visualizations of the dataset are in CarpetPlot.png (first 1000 samples of all time series as a carpet (color) plot) and 150TS-250samples.png (conventional time-series plots of the first 250 samples of a sample of 150 time series from the dataset). More visualizations can be performed by the user using TS_PlotTimeSeries from the hctsa package.

    See links in references for more comprehensive documentation for performing methodological comparison using this dataset, and on how to download and use v1.06 of hctsa.

  6. Data articles in journals

    • zenodo.org
    • data.niaid.nih.gov
    bin, csv, txt
    Updated Sep 22, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carlota Balsa-Sanchez; Carlota Balsa-Sanchez; Vanesa Loureiro; Vanesa Loureiro (2023). Data articles in journals [Dataset]. http://doi.org/10.5281/zenodo.8367960
    Explore at:
    bin, csv, txtAvailable download formats
    Dataset updated
    Sep 22, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Carlota Balsa-Sanchez; Carlota Balsa-Sanchez; Vanesa Loureiro; Vanesa Loureiro
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Version: 5

    Authors: Carlota Balsa-Sánchez, Vanesa Loureiro

    Date of data collection: 2023/09/05

    General description: The publication of datasets according to the FAIR principles, could be reached publishing a data paper (or software paper) in data journals or in academic standard journals. The excel and CSV file contains a list of academic journals that publish data papers and software papers.
    File list:

    - data_articles_journal_list_v5.xlsx: full list of 140 academic journals in which data papers or/and software papers could be published
    - data_articles_journal_list_v5.csv: full list of 140 academic journals in which data papers or/and software papers could be published

    Relationship between files: both files have the same information. Two different formats are offered to improve reuse

    Type of version of the dataset: final processed version

    Versions of the files: 5th version
    - Information updated: number of journals, URL, document types associated to a specific journal.

    Version: 4

    Authors: Carlota Balsa-Sánchez, Vanesa Loureiro

    Date of data collection: 2022/12/15

    General description: The publication of datasets according to the FAIR principles, could be reached publishing a data paper (or software paper) in data journals or in academic standard journals. The excel and CSV file contains a list of academic journals that publish data papers and software papers.
    File list:

    - data_articles_journal_list_v4.xlsx: full list of 140 academic journals in which data papers or/and software papers could be published
    - data_articles_journal_list_v4.csv: full list of 140 academic journals in which data papers or/and software papers could be published

    Relationship between files: both files have the same information. Two different formats are offered to improve reuse

    Type of version of the dataset: final processed version

    Versions of the files: 4th version
    - Information updated: number of journals, URL, document types associated to a specific journal, publishers normalization and simplification of document types
    - Information added : listed in the Directory of Open Access Journals (DOAJ), indexed in Web of Science (WOS) and quartile in Journal Citation Reports (JCR) and/or Scimago Journal and Country Rank (SJR), Scopus and Web of Science (WOS), Journal Master List.

    Version: 3

    Authors: Carlota Balsa-Sánchez, Vanesa Loureiro

    Date of data collection: 2022/10/28

    General description: The publication of datasets according to the FAIR principles, could be reached publishing a data paper (or software paper) in data journals or in academic standard journals. The excel and CSV file contains a list of academic journals that publish data papers and software papers.
    File list:

    - data_articles_journal_list_v3.xlsx: full list of 124 academic journals in which data papers or/and software papers could be published
    - data_articles_journal_list_3.csv: full list of 124 academic journals in which data papers or/and software papers could be published

    Relationship between files: both files have the same information. Two different formats are offered to improve reuse

    Type of version of the dataset: final processed version

    Versions of the files: 3rd version
    - Information updated: number of journals, URL, document types associated to a specific journal, publishers normalization and simplification of document types
    - Information added : listed in the Directory of Open Access Journals (DOAJ), indexed in Web of Science (WOS) and quartile in Journal Citation Reports (JCR) and/or Scimago Journal and Country Rank (SJR).

    Erratum - Data articles in journals Version 3:

    Botanical Studies -- ISSN 1999-3110 -- JCR (JIF) Q2
    Data -- ISSN 2306-5729 -- JCR (JIF) n/a
    Data in Brief -- ISSN 2352-3409 -- JCR (JIF) n/a

    Version: 2

    Author: Francisco Rubio, Universitat Politècnia de València.

    Date of data collection: 2020/06/23

    General description: The publication of datasets according to the FAIR principles, could be reached publishing a data paper (or software paper) in data journals or in academic standard journals. The excel and CSV file contains a list of academic journals that publish data papers and software papers.
    File list:

    - data_articles_journal_list_v2.xlsx: full list of 56 academic journals in which data papers or/and software papers could be published
    - data_articles_journal_list_v2.csv: full list of 56 academic journals in which data papers or/and software papers could be published

    Relationship between files: both files have the same information. Two different formats are offered to improve reuse

    Type of version of the dataset: final processed version

    Versions of the files: 2nd version
    - Information updated: number of journals, URL, document types associated to a specific journal, publishers normalization and simplification of document types
    - Information added : listed in the Directory of Open Access Journals (DOAJ), indexed in Web of Science (WOS) and quartile in Scimago Journal and Country Rank (SJR)

    Total size: 32 KB

    Version 1: Description

    This dataset contains a list of journals that publish data articles, code, software articles and database articles.

    The search strategy in DOAJ and Ulrichsweb was the search for the word data in the title of the journals.
    Acknowledgements:
    Xaquín Lores Torres for his invaluable help in preparing this dataset.

  7. f

    Central Bank of Brazil data of foreign capital transfers, 2000-2011

    • su.figshare.com
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alice Dauriach; Emma Sundström; Beatrice Crona; Victor Galaz (2023). Central Bank of Brazil data of foreign capital transfers, 2000-2011 [Dataset]. http://doi.org/10.17045/sthlmuni.5857716.v4
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Stockholm University
    Authors
    Alice Dauriach; Emma Sundström; Beatrice Crona; Victor Galaz
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Area covered
    Brazil
    Description

    This data set is a subset of the "Records of foreign capital" (Registros de capitais estrangeiros", RCE) published by the Central Bank of Brazil (CBB) on their website.The data set consists of three data files and three corresponding metadata files. All files are in openly accessible .csv or .txt formats. See detailed outline below for data contained in each. Data files contain transaction-specific data such as unique identifier, currency, cancelled status and amount. Metadata files outline variables in the corresponding data file.RCE_Unclean_full_dataset.csv - all transactions published to the Central Bank website from the four main categories outlined belowMetadata_Unclean_full_dataset.csvRCE_Unclean_cancelled_dataset.csv - data extracted from the RCE_Unclean_full_dataset.csv where transactions were registered then cancelledMetadata_Unclean_cancelled_dataset.csvRCE_Clean_selection_dataset.csv - transaction data extracted from RCE_Unclean_full_dataset.csv and RCE_Unclean_cancelled_dataset.csv for the nine companies and criteria identified belowMetadata_Clean_selection_dataset.csvThe data include the period between October 2000 and July 2011. This is the only time span for the data provided by the Central Bank of Brazil at this stage. The records were published monthly by the Central Bank of Brazil as required by Art. 66 in Decree nº 55.762 of 17 February 1965, modified by Decree nº 4.842 of 17 September 2003. The records were published on the bank’s website starting October 2000, as per communique nº 011489 of 7 October 2003. This remained the case until August 2011, after which the amount of each transaction was no longer disclosed (and publication of these stopped altogether after October 2011). The disclosure of the records was suspended in order to review their legal and technical aspects, and ensure their suitability to the requirements of the rules governing the confidentiality of the information (Law nº 12.527 of 18 November 2011 and Decree nº 7724 of May 2012) (pers. comm. Central Bank of Brazil, 2016. Name of contact available upon request to Authors).The records track transfers of foreign capital made from abroad to companies domiciled in Brazil, with information on the foreign company (name and country) transferring the money, and on the company receiving the capital (name and federative unit). For the purpose of this study, we consider the four categories of foreign capital transactions which are published with their amount and currency in the Central Bank’s data, and which are all part of the “Register of financial transactions” (abbreviated RDE-ROF): loans, leasing, financed import and cash in advance (see below for a detailed description). Additional categories exist, such as foreign direct investment (RDE-IED) and External Investment in Portfolio (RDE-Portfólio), for which no amount is published and which are therefore not included.We used the data posted online as PDFs on the bank’s website, and created a script to extract the data automatically from these four categories into the RCE_Unclean_full_dataset.csv file. This data set has not been double-checked manually and may contain errors. We used a similar script to extract rows from the "cancelled transactions" sections of the PDFs into the RCE_Unclean_cancelled_dataset.csv file. This is useful to identify transactions that have been registered to the Central Bank but later cancelled. This data set has not been double-checked manually and may contain errors.From these raw data sets, we conducted the following selections and calculations in order to create the RCE_Clean_selection_dataset.csv file. This data set has been double-checked manually to secure that no errors have been made in the extraction process.We selected all transactions whose recipient company name corresponds to one of these nine companies, or to one of their known subsidiaries in Brazil, according to the list of subsidiaries recorded in the Orbis database, maintained by Bureau Van Dijk. Transactions are included if the recipient company name matches one of the following:- the current or former name of one of the nine companies in our sample (former names are identified using Orbis, Bloomberg’s company profiles or the company website);- the name of a known subsidiary of one of the nine companies, if and only if we find evidence (in Orbis, Bloomberg’s company profiles or on the company website) that this subsidiary was owned at some point during the period 2000-2011, and that it operated in a sector related to the soy or beef industry (including fertilizers and trading activities).For each transaction, we extracted the name of the company sending capital and when possible, attributed the transaction to the known ultimate owner.The name of the countries of origin sometimes comes with typos or different denominations: we harmonized them.A manual check of all the selected data unveiled that a few transactions (n=14), appear twice in the database while bearing the same unique identification number. According to the Central Bank of Brazil (pers. comm., November 2016), this is due to errors in their routine of data extraction. We therefore deleted duplicates in our database, keeping only the latest occurrence of each unique transaction. Six (6) transactions recorded with an amount of zero were also deleted. Two (2) transactions registered in August 2003 with incoherent currencies (Deutsche Mark and Dutch guilder, which were demonetised in early 2002) were also deleted.To secure that the import of data from PDF to the database did not contain any systematic errors, for instance due to mistakes in coding, data were checked in two ways. First, because the script identifies the end of the row in the PDF using the amount of the transaction, which can sometimes fail if the amount is not entered correctly, we went through the extracted raw data (2798 rows) and cleaned all rows whose end had not been correctly identified by the script. Next, we manually double-checked the 486 largest transactions representing 90% of the total amount of capital inflows, as well as 140 randomly selected additional rows representing 5% of the total rows, compared the extracted data to the original PDFs, and found no mistakes.Transfers recorded in the database have been made in different currencies, including US dollars, Euros, Japanese Yens, Brazilian Reais, and more. The conversion to US dollars of all amounts denominated in other currencies was done using the average monthly exchange rate as published by the International Monetary Fund (International Financial Statistics: Exchange rates, national currency per US dollar, period average). Due to the limited time period, we have not corrected for inflation but aggregated nominal amounts in USD over the period 2000-2011.The categories loans, cash in advance (anticipated payment for exports), financed import, and leasing/rental, are those used by the Central Bank of Brazil in their published data. They are denominated respectively: “Loans” (“emprestimos” in original source) - : includes all loans, either contracted directly with creditors or indirectly through the issuance of securities, brokered by foreign agents. “Anticipated payment for exports” (“pagamento/renovacao pagamento antecipado de exportacao” in original source): defined as a type of loan (used in trade finance)“Financed import” (“importacao financiada” in original source): comprises all import financing transactions either direct (contracted by the importer with a foreign bank or with a foreign supplier), or indirect (contracted by Brazilian banks with foreign banks on behalf of Brazilian importers). They must be declared to the Central Bank if their term of payment is superior to 360 days.“Leasing/rental” (“arrendamento mercantil, leasing e aluguel” in original source) : concerns all types of external leasing operations consented by a Brazilian entity to a foreign one. They must be declared if the term of payment is superior to 360 days.More information about the different categories can be found through the Central Bank online.(Research Data Support provided by Springer Nature)

  8. Raw ERP data in csv format

    • osf.io
    Updated Feb 23, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Baker (2019). Raw ERP data in csv format [Dataset]. https://osf.io/xu87h
    Explore at:
    Dataset updated
    Feb 23, 2019
    Dataset provided by
    Center for Open Sciencehttps://cos.io/
    Authors
    Daniel Baker
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    No description was included in this Dataset collected from the OSF

  9. Sample Graph Datasets in CSV Format

    • zenodo.org
    csv
    Updated Dec 9, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Edwin Carreño; Edwin Carreño (2024). Sample Graph Datasets in CSV Format [Dataset]. http://doi.org/10.5281/zenodo.14335015
    Explore at:
    csvAvailable download formats
    Dataset updated
    Dec 9, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Edwin Carreño; Edwin Carreño
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Sample Graph Datasets in CSV Format

    Note: none of the data sets published here contain actual data, they are for testing purposes only.

    Description

    This data repository contains graph datasets, where each graph is represented by two CSV files: one for node information and another for edge details. To link the files to the same graph, their names include a common identifier based on the number of nodes. For example:

    • dataset_30_nodes_interactions.csv:contains 30 rows (nodes).
    • dataset_30_edges_interactions.csv: contains 47 rows (edges).
    • the common identifier dataset_30 refers to the same graph.

    CSV nodes

    Each dataset contains the following columns:

    Name of the ColumnTypeDescription
    UniProt IDstringprotein identification
    labelstringprotein label (type of node)
    propertiesstringa dictionary containing properties related to the protein.

    CSV edges

    Each dataset contains the following columns:

    Name of the ColumnTypeDescription
    Relationship IDstringrelationship identification
    Source IDstringidentification of the source protein in the relationship
    Target IDstringidentification of the target protein in the relationship
    labelstringrelationship label (type of relationship)
    propertiesstringa dictionary containing properties related to the relationship.

    Metadata

    GraphNumber of NodesNumber of EdgesSparse graph

    dataset_30*

    30

    47

    Y

    dataset_60*

    60

    181

    Y

    dataset_120*

    120

    689

    Y

    dataset_240*

    240

    2819

    Y

    dataset_300*

    300

    4658

    Y

    dataset_600*

    600

    18004

    Y

    dataset_1200*

    1200

    71785

    Y

    dataset_2400*

    2400

    288600

    Y

    dataset_3000*

    3000

    449727

    Y

    dataset_6000*

    6000

    1799413

    Y

    dataset_12000*

    12000

    7199863

    Y

    dataset_24000*

    24000

    28792361

    Y

    dataset_30000*

    30000

    44991744

    Y

    This repository include two (2) additional tiny graph datasets to experiment before dealing with larger datasets.

    CSV nodes (tiny graphs)

    Each dataset contains the following columns:

    Name of the ColumnTypeDescription
    IDstringnode identification
    labelstringnode label (type of node)
    propertiesstringa dictionary containing properties related to the node.

    CSV edges (tiny graphs)

    Each dataset contains the following columns:

    Name of the ColumnTypeDescription
    IDstringrelationship identification
    sourcestringidentification of the source node in the relationship
    targetstringidentification of the target node in the relationship
    labelstringrelationship label (type of relationship)
    propertiesstringa dictionary containing properties related to the relationship.

    Metadata (tiny graphs)

    GraphNumber of NodesNumber of EdgesSparse graph
    dataset_dummy*36N
    dataset_dummy2*36N
  10. Z

    SDSS Galaxy Subset

    • data.niaid.nih.gov
    • zenodo.org
    Updated Sep 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carvalho, Nuno Ramos (2022). SDSS Galaxy Subset [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_6393487
    Explore at:
    Dataset updated
    Sep 6, 2022
    Dataset authored and provided by
    Carvalho, Nuno Ramos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Sloan Digital Sky Survey (SDSS) is a comprehensive survey of the northern sky. This dataset contains a subset of this survey, of 100077 objects classified as galaxies, it includes a CSV file with a collection of information and a set of files for each object, namely JPG image files, FITS and spectra data. This dataset is used to train and explore the astromlp-models collection of deep learning models for galaxies characterisation.

    The dataset includes a CSV data file where each row is an object from the SDSS database, and with the following columns (note that some data may not be available for all objects):

    objid: unique SDSS object identifier

    mjd: MJD of observation

    plate: plate identifier

    tile: tile identifier

    fiberid: fiber identifier

    run: run number

    rerun: rerun number

    camcol: camera column

    field: field number

    ra: right ascension

    dec: declination

    class: spectroscopic class (only objetcs with GALAXY are included)

    subclass: spectroscopic subclass

    modelMag_u: better of DeV/Exp magnitude fit for band u

    modelMag_g: better of DeV/Exp magnitude fit for band g

    modelMag_r: better of DeV/Exp magnitude fit for band r

    modelMag_i: better of DeV/Exp magnitude fit for band i

    modelMag_z: better of DeV/Exp magnitude fit for band z

    redshift: final redshift from SDSS data z

    stellarmass: stellar mass extracted from the eBOSS Firefly catalog

    w1mag: WISE W1 "standard" aperture magnitude

    w2mag: WISE W2 "standard" aperture magnitude

    w3mag: WISE W3 "standard" aperture magnitude

    w4mag: WISE W4 "standard" aperture magnitude

    gz2c_f: Galaxy Zoo 2 classification from Willett et al 2013

    gz2c_s: simplified version of Galaxy Zoo 2 classification (labels set)

    Besides the CSV file a set of directories are included in the dataset, in each directory you'll find a list of files named after the objid column from the CSV file, with the corresponding data, the following directories tree is available:

    sdss-gs/ ├── data.csv ├── fits ├── img ├── spectra └── ssel

    Where, each directory contains:

    img: RGB images from the object in JPEG format, 150x150 pixels, generated using the SkyServer DR16 API

    fits: FITS data subsets around the object across the u, g, r, i, z bands; cut is done using the ImageCutter library

    spectra: full best fit spectra data from SDSS between 4000 and 9000 wavelengths

    ssel: best fit spectra data from SDSS for specific selected intervals of wavelengths discussed by Sánchez Almeida 2010

    Changelog

    v0.0.4 - Increase number of objects to ~100k.

    v0.0.3 - Increase number of objects to ~80k.

    v0.0.2 - Increase number of objects to ~60k.

    v0.0.1 - Initial import.

  11. Gene expression csv files

    • figshare.com
    txt
    Updated Jun 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cristina Alvira (2023). Gene expression csv files [Dataset]. http://doi.org/10.6084/m9.figshare.21861975.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 12, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Cristina Alvira
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Csv files containing all detectable genes.

  12. d

    Minimum Data Set Frequency

    • catalog.data.gov
    • healthdata.gov
    • +1more
    Updated Feb 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Centers for Medicare & Medicaid Services (2025). Minimum Data Set Frequency [Dataset]. https://catalog.data.gov/dataset/minimum-data-set-frequency
    Explore at:
    Dataset updated
    Feb 3, 2025
    Dataset provided by
    Centers for Medicare & Medicaid Services
    Description

    The Minimum Data Set (MDS) Frequency data summarizes health status indicators for active residents currently in nursing homes. The MDS is part of the Federally-mandated process for clinical assessment of all residents in Medicare and Medicaid certified nursing homes. This process provides a comprehensive assessment of each resident's functional capabilities and helps nursing home staff identify health problems. Care Area Assessments (CAAs) are part of this process, and provide the foundation upon which a resident's individual care plan is formulated. MDS assessments are completed for all residents in certified nursing homes, regardless of source of payment for the individual resident. MDS assessments are required for residents on admission to the nursing facility, periodically, and on discharge. All assessments are completed within specific guidelines and time frames. In most cases, participants in the assessment process are licensed health care professionals employed by the nursing home. MDS information is transmitted electronically by nursing homes to the national MDS database at CMS. When reviewing the MDS 3.0 Frequency files, some common software programs e.g., ‘Microsoft Excel’ might inaccurately strip leading zeros from designated code values (i.e., "01" becomes "1") or misinterpret code ranges as dates (i.e., O0600 ranges such as 02-04 are misread as 04-Feb). As each piece of software is unique, if you encounter an issue when reading the CSV file of Frequency data, please open the file in a plain text editor such as ‘Notepad’ or ‘TextPad’ to review the underlying data, before reaching out to CMS for assistance.

  13. c

    Data from: Datasets used to train the Generative Adversarial Networks used...

    • opendata.cern.ch
    Updated 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ATLAS collaboration (2021). Datasets used to train the Generative Adversarial Networks used in ATLFast3 [Dataset]. http://doi.org/10.7483/OPENDATA.ATLAS.UXKX.TXBN
    Explore at:
    Dataset updated
    2021
    Dataset provided by
    CERN Open Data Portal
    Authors
    ATLAS collaboration
    Description

    Three datasets are available, each consisting of 15 csv files. Each file containing the voxelised shower information obtained from single particles produced at the front of the calorimeter in the |η| range (0.2-0.25) simulated in the ATLAS detector. Two datasets contain photons events with different statistics; the larger sample has about 10 times the number of events as the other. The other dataset contains pions. The pion dataset and the photon dataset with the lower statistics were used to train the corresponding two GANs presented in the AtlFast3 paper SIMU-2018-04.

    The information in each file is a table; the rows correspond to the events and the columns to the voxels. The voxelisation procedure is described in the AtlFast3 paper linked above and in the dedicated PUB note ATL-SOFT-PUB-2020-006. In summary, the detailed energy deposits produced by ATLAS were converted from x,y,z coordinates to local cylindrical coordinates defined around the particle 3-momentum at the entrance of the calorimeter. The energy deposits in each layer were then grouped in voxels and for each voxel the energy was stored in the csv file. For each particle, there are 15 files corresponding to the 15 energy points used to train the GAN. The name of the csv file defines both the particle and the energy of the sample used to create the file.

    The size of the voxels is described in the binning.xml file. Software tools to read the XML file and manipulate the spatial information of voxels are provided in the FastCaloGAN repository.

    Updated on February 10th 2022. A new dataset photons_samples_highStat.tgz was added to this record and the binning.xml file was updated accordingly.

    Updated on April 18th 2023. A new dataset pions_samples_highStat.tgz was added to this record.

  14. v

    PB detection database

    • data.lib.vt.edu
    txt
    Updated Sep 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vincent Adkins (2023). PB detection database [Dataset]. http://doi.org/10.7294/22148648.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Sep 20, 2023
    Dataset provided by
    University Libraries, Virginia Tech
    Authors
    Vincent Adkins
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Comma separated value files containing data regarding plasma bubbles and a readme file for data descriptions.

    Each CSV file contains data along either the northern equatorial ionization anomaly, magnetic equator, or southern equatorial ionization anomaly.

    CSV file datasets are organized by row and column. The first row of each CSV contains headers to briefly describe the data stored in each column. Each row represents a specific plasma bubble detected. Each column provides information about the plasma bubble such as magnetic longitude, magnetic latitude, year, day of year, and so on. More details about the dataset are in the readme.txt file.

  15. RNA-Seq data files

    • figshare.com
    application/gzip
    Updated Mar 24, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jo Lynne Rokita (2019). RNA-Seq data files [Dataset]. http://doi.org/10.6084/m9.figshare.7751825.v5
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Mar 24, 2019
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Jo Lynne Rokita
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    RNA expression matrices - FPKM from STAR, cufflinks and TPM from STAR, RSEM, Toil. Fusion results from deFuse, SOAPfuse, fusioncatcher, STAR-Fusion.

  16. Meta Kaggle Code

    • kaggle.com
    zip
    Updated Mar 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaggle (2025). Meta Kaggle Code [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle-code/code
    Explore at:
    zip(133186454988 bytes)Available download formats
    Dataset updated
    Mar 20, 2025
    Dataset authored and provided by
    Kagglehttp://kaggle.com/
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Explore our public notebook content!

    Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.

    Why we’re releasing this dataset

    By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.

    Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.

    The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!

    Sensitive data

    While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.

    Joining with Meta Kaggle

    The files contained here are a subset of the KernelVersions in Meta Kaggle. The file names match the ids in the KernelVersions csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.

    File organization

    The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.

    The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays

    Questions / Comments

    We love feedback! Let us know in the Discussion tab.

    Happy Kaggling!

  17. g

    Vermont Fish and Wildlife Department Volume 1 (2014 - 2022) | gimi9.com

    • gimi9.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vermont Fish and Wildlife Department Volume 1 (2014 - 2022) | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_vermont-fish-and-wildlife-department-volume-1-2014-2022
    Explore at:
    Description

    This volume's release consists of 41933 media files captured by autonomous wildlife monitoring devices under the project, Vermont Fish and Wildlife Department. The attached files listed below include several CSV files that provide information about the data release. The file, "media.csv" provides the metadata about the media, such as filename and date/time of capture. The actual media files are housed within folders under the volume's "child items" as compressed files. A critical CSV file is "dictionary.csv", which describes each CSV file, including field names, data types, descriptions, and the relationship of each field to fields in other CSV files. Some of the media files may have been "tagged" or "annotated" by either humans or by machine learning models, identifying wildlife targets within the media. If so, this information is stored in "annotations.csv" and "modeloutputs.csv", respectively. To protect privacy, all personally identifiable information (PII) have been removed, locations have been "blurred" by bounding boxes, and media featuring sensitive taxa or humans have been omitted. To enhance data reuse, the sbRehydrate() function in the AMMonitor R package will download files and re-create the original AMMonitor project (database + media files). See source code at https://code.usgs.gov/vtcfwru/ammonitor.

  18. m

    BILO daily rainfall grids in CSV text file format 1981 to 2013

    • demo.dev.magda.io
    • researchdata.edu.au
    • +2more
    zip
    Updated Dec 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bioregional Assessment Program (2022). BILO daily rainfall grids in CSV text file format 1981 to 2013 [Dataset]. https://demo.dev.magda.io/dataset/ds-dga-d510313d-e3be-4a70-8791-c9925fc9e4e2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 4, 2022
    Dataset provided by
    Bioregional Assessment Program
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract The dataset was derived by the Bioregional Assessment Programme. This dataset was derived from BILO Gridded Climate Data provided by CSIRO. You can find a link to the parent datasets in the Lineage Field in this metadata statement. The History Field in this metadata statement describes how this dataset was derived. Time series of daily BILO precipitation from 19810101 through 20121231, for each mainland Australia 0.05 degree resolution grid cell to the east of the WA border. The …Show full descriptionAbstract The dataset was derived by the Bioregional Assessment Programme. This dataset was derived from BILO Gridded Climate Data provided by CSIRO. You can find a link to the parent datasets in the Lineage Field in this metadata statement. The History Field in this metadata statement describes how this dataset was derived. Time series of daily BILO precipitation from 19810101 through 20121231, for each mainland Australia 0.05 degree resolution grid cell to the east of the WA border. The filename represents the cell centre of the 0.05 degree resolution grid cell with longitude (in decimal degrees) being provided before latitude (again in decimal degrees). Data are in Comma Delimited (.csv) format with date and value for each day. Dataset History Time series of daily BILO precipitation from 19810101 through 20121231, for each mainland Australia 0.05 degree resolution grid cell to the east of the WA border. The filename represents the cell centre of the 0.05 degree resolution grid cell with longitude (in decimal degrees) being provided before latitude (again in decimal degrees).Data have been converted from rater based to Comma Delimited (.csv) using the cell centre to identify location and filename. There is one file per grid cell generated. Source data: BILO Gridded Climate Data: Daily Climate Data for each year from 1900 to 2012 (7aaf0621-a0e5-4b01-9333-53ebcb1f1c14). Dataset Citation Bioregional Assessment Programme (2014) BILO daily rainfall grids in CSV text file format 1981 to 2013. Bioregional Assessment Derived Dataset. Viewed 09 October 2017, http://data.bioregionalassessments.gov.au/dataset/67749ef0-7223-437a-851a-573edde09567. Dataset Ancestors Derived From BILO Gridded Climate Data: Daily Climate Data for each year from 1900 to 2012

  19. Health Insurance Marketplace

    • kaggle.com
    zip
    Updated May 1, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    US Department of Health and Human Services (2017). Health Insurance Marketplace [Dataset]. https://www.kaggle.com/hhs/health-insurance-marketplace
    Explore at:
    zip(868821924 bytes)Available download formats
    Dataset updated
    May 1, 2017
    Dataset authored and provided by
    US Department of Health and Human Services
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The Health Insurance Marketplace Public Use Files contain data on health and dental plans offered to individuals and small businesses through the US Health Insurance Marketplace.

    median plan premiums

    Exploration Ideas

    To help get you started, here are some data exploration ideas:

    • How do plan rates and benefits vary across states?
    • How do plan benefits relate to plan rates?
    • How do plan rates vary by age?
    • How do plans vary across insurance network providers?

    See this forum thread for more ideas, and post there if you want to add your own ideas or answer some of the open questions!

    Data Description

    This data was originally prepared and released by the Centers for Medicare & Medicaid Services (CMS). Please read the CMS Disclaimer-User Agreement before using this data.

    Here, we've processed the data to facilitate analytics. This processed version has three components:

    1. Original versions of the data

    The original versions of the 2014, 2015, 2016 data are available in the "raw" directory of the download and "../input/raw" on Kaggle Scripts. Search for "dictionaries" on this page to find the data dictionaries describing the individual raw files.

    2. Combined CSV files that contain

    In the top level directory of the download ("../input" on Kaggle Scripts), there are six CSV files that contain the combined at across all years:

    • BenefitsCostSharing.csv
    • BusinessRules.csv
    • Network.csv
    • PlanAttributes.csv
    • Rate.csv
    • ServiceArea.csv

    Additionally, there are two CSV files that facilitate joining data across years:

    • Crosswalk2015.csv - joining 2014 and 2015 data
    • Crosswalk2016.csv - joining 2015 and 2016 data

    3. SQLite database

    The "database.sqlite" file contains tables corresponding to each of the processed CSV files.

    The code to create the processed version of this data is available on GitHub.

  20. m

    Download CSV DB

    • maclookup.app
    json
    Updated Mar 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Download CSV DB [Dataset]. https://maclookup.app/downloads/csv-database
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Mar 17, 2025
    Description

    Free, daily updated MAC prefix and vendor CSV database. Download now for accurate device identification.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
CSIRO (2014). CSV file used in statistical analyses [Dataset]. http://doi.org/10.4225/08/543B4B4CA92E6
Organization logo

CSV file used in statistical analyses

Explore at:
Dataset updated
Oct 13, 2014
Dataset authored and provided by
CSIROhttp://www.csiro.au/
License

https://research.csiro.au/dap/licences/csiro-data-licence/https://research.csiro.au/dap/licences/csiro-data-licence/

Time period covered
Mar 14, 2008 - Jun 9, 2009
Dataset funded by
CSIROhttp://www.csiro.au/
Description

A csv file containing the tidal frequencies used for statistical analyses in the paper "Estimating Freshwater Flows From Tidally-Affected Hydrographic Data" by Dan Pagendam and Don Percival.

Search
Clear search
Close search
Google apps
Main menu