100+ datasets found
  1. Genomics examples

    • redivis.com
    Updated Oct 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Redivis Demo Organization (2025). Genomics examples [Dataset]. https://redivis.com/datasets/yz1s-d09009dbb
    Explore at:
    Dataset updated
    Oct 20, 2025
    Dataset provided by
    Redivis Inc.
    Authors
    Redivis Demo Organization
    Time period covered
    Jan 30, 2025
    Description

    This is an auto-generated index table corresponding to a folder of files in this dataset with the same name. This table can be used to extract a subset of files based on their metadata, which can then be used for further analysis. You can view the contents of specific files by navigating to the "cells" tab and clicking on an individual file_id.

  2. Training images

    • redivis.com
    Updated Oct 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Redivis Demo Organization (2025). Training images [Dataset]. https://redivis.com/datasets/yz1s-d09009dbb
    Explore at:
    Dataset updated
    Oct 20, 2025
    Dataset provided by
    Redivis Inc.
    Authors
    Redivis Demo Organization
    Time period covered
    Aug 8, 2022
    Description

    This is an auto-generated index table corresponding to a folder of files in this dataset with the same name. This table can be used to extract a subset of files based on their metadata, which can then be used for further analysis. You can view the contents of specific files by navigating to the "cells" tab and clicking on an individual file_kd.

  3. Sample data files for Python Course

    • figshare.com
    txt
    Updated Nov 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peter Verhaar (2022). Sample data files for Python Course [Dataset]. http://doi.org/10.6084/m9.figshare.21501549.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Nov 4, 2022
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Peter Verhaar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Sample data set used in an introductory course on Programming in Python

  4. ML Basics Data Files

    • kaggle.com
    Updated Dec 7, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Satish Gunjal (2020). ML Basics Data Files [Dataset]. https://www.kaggle.com/satishgunjal/ml-basics-data-files/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 7, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Satish Gunjal
    Description

    Dataset

    This dataset was created by Satish Gunjal

    Released under Other (specified in description)

    Contents

  5. f

    Representative sample of the data file required to input user-specific data...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Nov 2, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hughes, Laura D.; Hughes, Michael E.; Lewis, Scott A. (2017). Representative sample of the data file required to input user-specific data into ExpressionDB. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001798548
    Explore at:
    Dataset updated
    Nov 2, 2017
    Authors
    Hughes, Laura D.; Hughes, Michael E.; Lewis, Scott A.
    Description

    This example includes two tissues with three replicates apiece downloaded from GTEx. Complete.csv file here: https://github.com/5c077/ExpressionDB/tree/master/data.

  6. d

    Labo data file showing examples of available lab test results

    • datarade.ai
    .csv, .xls, .txt
    Updated Nov 22, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Medical Data Vision (2015). Labo data file showing examples of available lab test results [Dataset]. https://datarade.ai/data-products/labo-data-file-showing-examples-of-available-lab-test-results-medical-data-vision
    Explore at:
    .csv, .xls, .txtAvailable download formats
    Dataset updated
    Nov 22, 2015
    Dataset authored and provided by
    Medical Data Vision
    Area covered
    Japan
    Description

    The lab test results is already provided by about 20 % of hospitals providing us their medical data.

    This dataset is a valuable resource for healthcare professionals, researchers, and organizations looking to analyze and understand the prevalence and distribution of various medical conditions in Japan. It can be used for epidemiological studies, healthcare planning, and medical research. The inclusion of ICD-10 codes allows for standardized analysis and comparison of diseases, and the patient count provides essential data for assessing the burden and impact of these conditions on the healthcare system and population.

  7. d

    HTMLmetadata HTML formatted text files describing samples and spectra,...

    • catalog.data.gov
    • datasets.ai
    • +1more
    Updated Oct 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). HTMLmetadata HTML formatted text files describing samples and spectra, including photos [Dataset]. https://catalog.data.gov/dataset/htmlmetadata-html-formatted-text-files-describing-samples-and-spectra-including-photos
    Explore at:
    Dataset updated
    Oct 22, 2025
    Dataset provided by
    U.S. Geological Survey
    Description

    HTMLmetadata Text files in HTML-format containing metadata about samples and spectra. Also included in the zip file are folders containing information linked to from the HTML files, including: - README: contains a HTML version of the USGS Data Series publication, linked to this data release, that describes this spectral library (Kokaly and others, 2017). The folder also contains an HTML version of the release notes. - photo_images: contains full resolution images of photos of samples and field sites. - photo_thumbs: contains low-resolution thumbnail versions of photos of samples and field sites. GENERAL LIBRARY DESCRIPTION This data release provides the U.S. Geological Survey (USGS) Spectral Library Version 7 and all related documents. The library contains spectra measured with laboratory, field, and airborne spectrometers. The instruments used cover wavelengths from the ultraviolet to the far infrared (0.2 to 200 microns). Laboratory samples of specific minerals, plants, chemical compounds, and man-made materials were measured. In many cases, samples were purified, so that unique spectral features of a material can be related to its chemical structure. These spectro-chemical links are important for interpreting remotely sensed data collected in the field or from an aircraft or spacecraft. This library also contains physically-constructed as well as mathematically-computed mixtures. Measurements of rocks, soils, and natural mixtures of minerals have also been made with laboratory and field spectrometers. Spectra of plant components and vegetation plots, comprising many plant types and species with varying backgrounds, are also in this library. Measurements by airborne spectrometers are included for forested vegetation plots, in which the trees are too tall for measurement by a field spectrometer. The related U.S. Geological Survey Data Series publication, "USGS Spectral Library Version 7", describes the instruments used, metadata descriptions of spectra and samples, and possible artifacts in the spectral measurements (Kokaly and others, 2017). Four different spectrometer types were used to measure spectra in the library: (1) Beckman™ 5270 covering the spectral range 0.2 to 3 µm, (2) standard, high resolution (hi-res), and high-resolution Next Generation (hi-resNG) models of ASD field portable spectrometers covering the range from 0.35 to 2.5 µm, (3) Nicolet™ Fourier Transform Infra-Red (FTIR) interferometer spectrometers covering the range from about 1.12 to 216 µm, and (4) the NASA Airborne Visible/Infra-Red Imaging Spectrometer AVIRIS, covering the range 0.37 to 2.5 µm. Two fundamental spectrometer characteristics significant for interpreting and utilizing spectral measurements are sampling position (the wavelength position of each spectrometer channel) and bandpass (a parameter describing the wavelength interval over which each channel in a spectrometer is sensitive). Bandpass is typically reported as the Full Width at Half Maximum (FWHM) response at each channel (in wavelength units, for example nm or micron). The linked publication (Kokaly and others, 2017), includes a comparison plot of the various spectrometers used to measure the data in this release. Data for the sampling positions and the bandpass values (for each channel in the spectrometers) are included in this data release. These data are in the SPECPR files, as separate data records, and in the American Standard Code for Information Interchange (ASCII) text files, as separate files for wavelength and bandpass. Spectra are provided in files of ASCII text format (files with a .txt file extension). In the ASCII files, deleted channels (bad bands) are indicated by a value of -1.23e34. Metadata descriptions of samples, field areas, spectral measurements, and results from supporting material analyses – such as XRD – are provided in HyperText Markup Language HTML formatted ASCII text files (files with .html file extension). In addition, Graphics Interchange Format (GIF) images of plots of spectra are provided. For each spectrum a plot with wavelength in microns on the x-axis is provided. For spectra measured on the Nicolet spectrometer, an additional GIF image with wavenumber on the x-axis is provided. Data are also provided in SPECtrum Processing Routines (SPECPR) format (Clark, 1993) which packages spectra and associated metadata descriptions into a single file (see the linked publication, Kokaly and others, 2017, for additional details on the SPECPR format and freely-available software than can be used to read files in SPECPR format). The data measured on the source spectrometers are denoted by the “splib07a” tag in filenames. In addition to providing the original measurements, the spectra have been convolved and resampled to different spectrometer and multispectral sensor characteristics. The following list specifies the identifying tag for the measured and convolved libraries and gives brief descriptions of the sensors. splib07a – this is the name of the SPECPR file containing the spectra measured on the Beckman, ASD, Nicolet and AVIRIS spectrometers. The data are provided with their original sampling positions (wavelengths) and bandpass values. The prefix “splib07a_” is at the beginning of the ASCII and GIF files pertaining to the measured spectra. splib07b – this is the name of the SPECPR file containing a modified version of the original measurements. The results from using spectral convolution to convert measurements to other spectrometer characteristics can be improved by oversampling (increasing sample density). Thus, splib07b is an oversampled version of the library, computed using simple cubic-spline interpolation to produce spectra with fine sampling interval (therefore a higher number of channels) for Beckman and AVIRIS measurements. The spectra in this version of the library are the data used to create the convolved and resampled versions of the library. The prefix “splib07b_” is at the beginning of the ASCII and GIF files pertaining to the oversampled spectra. s07_ASD – this is the name of the SPECPR file containing the spectral library measurements convolved to standard resolution ASD full range spectrometer characteristics. The standard reported wavelengths of the ASD spectrometers used by the USGS were used (2151 channels with wavelength positions starting at 350 nm and increasing in 1 nm increments). The bandpass values of each channel were determined by comparing measurements of reference materials made on ASD spectrometers in comparison to measurements made of the same materials on higher resolution spectrometers (the procedure is described in Kokaly, 2011, and discussed in Kokaly and Skidmore, 2015, and Kokaly and others, 2017). The prefix “s07ASD_” is at the beginning of the ASCII and GIF files pertaining to this spectrometer. s07_AV95 – this is the name of the SPECPR file containing the spectral library measurements convolved to AVIRIS-Classic with spectral characteristics determined in the year 1995 (wavelength and bandpass values for the 224 channels provided with AVIRIS data by NASA/JPL). The prefix “s07_AV95_” is at the beginning of the ASCII and GIF files pertaining to this spectrometer. s07_AV96 – this is the name of the SPECPR file containing the spectral library measurements convolved to AVIRIS-Classic with spectral characteristics determined in the year 1996 (wavelength and bandpass values for the 224 channels provided with AVIRIS data by NASA/JPL). The prefix “s07_AV96_” is at the beginning of the ASCII, and GIF files. s07_AV97 – this is the name of the SPECPR file containing the spectral library measurements convolved to AVIRIS-Classic with spectral characteristics determined in the year 1997 (wavelength and bandpass values for the 224 channels provided with AVIRIS data by NASA/JPL). The prefix “s07_AV97_” is at the beginning of the ASCII and GIF files pertaining to this spectrometer. s07_AV98 – this is the name of the SPECPR file containing the spectral library measurements convolved to AVIRIS-Classic with spectral characteristics determined in the year 1998 (wavelength and bandpass values for the 224 channels provided with AVIRIS data by NASA/JPL). The prefix “s07_AV98_” is at the beginning of the ASCII and GIF files pertaining to this spectrometer. s07_AV99 – this is the name of the SPECPR file containing the spectral library measurements convolved to AVIRIS-Classic with spectral characteristics determined in the year 1999 (wavelength and bandpass values for the 224 channels provided with AVIRIS data by NASA/JPL). The prefix “s07_AV99_” is at the beginning of the ASCII and GIF files pertaining to this spectrometer. s07_AV00 – this is the name of the SPECPR file containing the spectral library measurements convolved to AVIRIS-Classic with spectral characteristics determined in the year 2000 (wavelength and bandpass values for the 224 channels provided with AVIRIS data by NASA/JPL). The prefix “s07_AV00_” is at the beginning of the ASCII and GIF files pertaining to this spectrometer. s07_AV01 – this is the name of the SPECPR file containing the spectral library measurements convolved to AVIRIS-Classic with spectral characteristics determined in the year 2001 (wavelength and bandpass values for the 224 channels provided with AVIRIS data by NASA/JPL). The prefix “s07_AV01_” is at the beginning of the ASCII and GIF files pertaining to this spectrometer. s07_AV05 – this is the name of the SPECPR file containing the spectral library measurements convolved to AVIRIS-Classic with spectral characteristics determined in the year 2005 (wavelength and bandpass values for the 224 channels provided with AVIRIS data by NASA/JPL). The prefix “s07_AV05_” is at the beginning of the ASCII and GIF files pertaining to this spectrometer. s07_AV06 – this is the name of the SPECPR file containing the spectral library measurements convolved to

  8. Sample CSV files

    • kaggle.com
    zip
    Updated Mar 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Naman Kumar (2022). Sample CSV files [Dataset]. https://www.kaggle.com/matcauthon49/sample-csv-files
    Explore at:
    zip(88875843 bytes)Available download formats
    Dataset updated
    Mar 8, 2022
    Authors
    Naman Kumar
    Description

    Dataset

    This dataset was created by Naman Kumar

    Contents

  9. Dataset #1: Cross-sectional survey data

    • figshare.com
    txt
    Updated Jul 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adam Baimel (2023). Dataset #1: Cross-sectional survey data [Dataset]. http://doi.org/10.6084/m9.figshare.23708730.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jul 19, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Adam Baimel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    N.B. This is not real data. Only here for an example for project templates.

    Project Title: Add title here

    Project Team: Add contact information for research project team members

    Summary: Provide a descriptive summary of the nature of your research project and its aims/focal research questions.

    Relevant publications/outputs: When available, add links to the related publications/outputs from this data.

    Data availability statement: If your data is not linked on figshare directly, provide links to where it is being hosted here (i.e., Open Science Framework, Github, etc.). If your data is not going to be made publicly available, please provide details here as to the conditions under which interested individuals could gain access to the data and how to go about doing so.

    Data collection details: 1. When was your data collected? 2. How were your participants sampled/recruited?

    Sample information: How many and who are your participants? Demographic summaries are helpful additions to this section.

    Research Project Materials: What materials are necessary to fully reproduce your the contents of your dataset? Include a list of all relevant materials (e.g., surveys, interview questions) with a brief description of what is included in each file that should be uploaded alongside your datasets.

    List of relevant datafile(s): If your project produces data that cannot be contained in a single file, list the names of each of the files here with a brief description of what parts of your research project each file is related to.

    Data codebook: What is in each column of your dataset? Provide variable names as they are encoded in your data files, verbatim question associated with each response, response options, details of any post-collection coding that has been done on the raw-response (and whether that's encoded in a separate column).

    Examples available at: https://www.thearda.com/data-archive?fid=PEWMU17 https://www.thearda.com/data-archive?fid=RELLAND14

  10. Test files

    • kaggle.com
    zip
    Updated Feb 2, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tr0uble (2021). Test files [Dataset]. https://www.kaggle.com/tr0uble/test-files
    Explore at:
    zip(865449 bytes)Available download formats
    Dataset updated
    Feb 2, 2021
    Authors
    Tr0uble
    Description

    Dataset

    This dataset was created by Tr0uble

    Contents

  11. B

    Data Cleaning Sample

    • borealisdata.ca
    • dataone.org
    Updated Jul 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 13, 2023
    Dataset provided by
    Borealis
    Authors
    Rong Luo
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Sample data for exercises in Further Adventures in Data Cleaning.

  12. Data from: Raw data files

    • figshare.com
    bin
    Updated Mar 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ronen Schuster (2021). Raw data files [Dataset]. http://doi.org/10.6084/m9.figshare.14319758.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Mar 26, 2021
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Ronen Schuster
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Raw data tables and the statistical analysis applied to the data. Files are labeled by figure number. Within each file, each table and linked graph and analysis is annotated by figure number and panel letter. All files are generated in graphpad prism.

  13. CSV file used in statistical analyses

    • data.csiro.au
    • researchdata.edu.au
    • +1more
    Updated Oct 13, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CSIRO (2014). CSV file used in statistical analyses [Dataset]. http://doi.org/10.4225/08/543B4B4CA92E6
    Explore at:
    Dataset updated
    Oct 13, 2014
    Dataset authored and provided by
    CSIROhttp://www.csiro.au/
    License

    https://research.csiro.au/dap/licences/csiro-data-licence/https://research.csiro.au/dap/licences/csiro-data-licence/

    Time period covered
    Mar 14, 2008 - Jun 9, 2009
    Dataset funded by
    CSIROhttp://www.csiro.au/
    Description

    A csv file containing the tidal frequencies used for statistical analyses in the paper "Estimating Freshwater Flows From Tidally-Affected Hydrographic Data" by Dan Pagendam and Don Percival.

  14. Vehicle licensing statistics data files

    • s3.amazonaws.com
    • gov.uk
    Updated May 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department for Transport (2022). Vehicle licensing statistics data files [Dataset]. https://s3.amazonaws.com/thegovernmentsays-files/content/181/1811927.html
    Explore at:
    Dataset updated
    May 24, 2022
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Department for Transport
    Description

    The following datafiles contain detailed information about vehicles in the UK, which would be too large to use as structured tables. They are provided as simple CSV text files that should be easier to use digitally.

    We welcome any feedback on the structure of our new datafiles, their usability, or any suggestions for improvements, please contact vehicles statistics.

    How to use CSV files

    CSV files can be used either as a spreadsheet (using Microsoft Excel or similar spreadsheet packages) or digitally using software packages and languages (for example, R or Python).

    When using as a spreadsheet, there will be no formatting, but the file can still be explored like our publication tables. Due to their size, older software might not be able to open the entire file.

    Download data files

    Make and model by quarter

    df_VEH0120_GB: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1077520/df_VEH0120_GB.csv">Vehicles at the end of the quarter by licence status, body type, make, generic model and model: Great Britain (CSV, 37.6 MB)

    Scope: All registered vehicles in Great Britain; from 1994 Quarter 4 (end December)

    Schema: BodyType, Make, GenModel, Model, LicenceStatus, [number of vehicles; one column per quarter]

    df_VEH0120_UK: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1077521/df_VEH0120_UK.csv">Vehicles at the end of the quarter by licence status, body type, make, generic model and model: United Kingdom (CSV, 20.8 MB)

    Scope: All registered vehicles in the United Kingdom; from 2014 Quarter 3 (end September)

    Schema: BodyType, Make, GenModel, Model, LicenceStatus, [number of vehicles; one column per quarter]

    df_VEH0160_GB: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1077522/df_VEH0160_GB.csv">Vehicles registered for the first time by body type, make, generic model and model: Great Britain (CSV, 17.1 MB)

    Scope: All vehicles registered for the first time in Great Britain; from 2001 Quarter 1 (January to March)

    Schema: BodyType, Make, GenModel, Model, [number of vehicles; one column per quarter]

    df_VEH0160_UK: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1077523/df_VEH0160_UK.csv">Vehicles registered for the first time by body type, make, generic model and model: United Kingdom (CSV, 4.93 MB)

    Scope: All vehicles registered for the first time in the United Kingdom; from 2014 Quarter 3 (July to September)

    Schema: BodyType, Make, GenModel, Model, [number of vehicles; one column per quarter]

    Make and model by age

    df_VEH0124: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1077524/df_VEH0124.csv">Vehicles at the end of the quarter by licence status, body type, make, generic model, model, year of first use and year of manufacture: United Kingdom (CSV, 28.2 MB)

    Scope: All licensed vehicles in the United Kingdom; 2021 Quarter 4 (end December) only

    Schema: BodyType, Make, GenModel, Model, YearFirstUsed, YearManufacture, Licensed (number of vehicles), SORN (number of vehicles)

    Make and model by engine size

    df_VEH0220: <a class="govu

  15. CWL run of RNA-seq Analysis Workflow (CWLProv 0.5.0 Research Object)

    • zenodo.org
    • data.niaid.nih.gov
    • +3more
    bin, zip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Farah Zaib Khan; Farah Zaib Khan; Stian Soiland-Reyes; Stian Soiland-Reyes (2020). CWL run of RNA-seq Analysis Workflow (CWLProv 0.5.0 Research Object) [Dataset]. http://doi.org/10.17632/xnwncxpw42.1
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Farah Zaib Khan; Farah Zaib Khan; Stian Soiland-Reyes; Stian Soiland-Reyes
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This workflow adapts the approach and parameter settings of Trans-Omics for precision Medicine (TOPMed). The RNA-seq pipeline originated from the Broad Institute. There are in total five steps in the workflow starting from:

    1. Read alignment using STAR which produces aligned BAM files including the Genome BAM and Transcriptome BAM.
    2. The Genome BAM file is processed using Picard MarkDuplicates. producing an updated BAM file containing information on duplicate reads (such reads can indicate biased interpretation).
    3. SAMtools index is then employed to generate an index for the BAM file, in preparation for the next step.
    4. The indexed BAM file is processed further with RNA-SeQC which takes the BAM file, human genome reference sequence and Gene Transfer Format (GTF) file as inputs to generate transcriptome-level expression quantifications and standard quality control metrics.
    5. In parallel with transcript quantification, isoform expression levels are quantified by RSEM. This step depends only on the output of the STAR tool, and additional RSEM reference sequences.

    For testing and analysis, the workflow author provided example data created by down-sampling the read files of a TOPMed public access data. Chromosome 12 was extracted from the Homo Sapien Assembly 38 reference sequence and provided by the workflow authors. The required GTF and RSEM reference data files are also provided. The workflow is well-documented with a detailed set of instructions of the steps performed to down-sample the data are also provided for transparency. The availability of example input data, use of containerization for underlying software and detailed documentation are important factors in choosing this specific CWL workflow for CWLProv evaluation.

    This dataset folder is a CWLProv Research Object that captures the Common Workflow Language execution provenance, see https://w3id.org/cwl/prov/0.5.0 or use https://pypi.org/project/cwl

    Steps to reproduce

    To build the research object again, use Python 3 on macOS. Built with:

    • Processor 2.8GHz Intel Core i7
    • Memory: 16GB
    • OS: macOS High Sierra, Version 10.13.3
    • Storage: 250GB
    1. Install cwltool

      pip3 install cwltool==1.0.20180912090223
    2. Install git lfs
      The data download with the git repository requires the installation of Git lfs:
      https://www.atlassian.com/git/tutorials/git-lfs#installing-git-lfs

    3. Get the data and make the analysis environment ready:

      git clone https://github.com/FarahZKhan/cwl_workflows.git
      cd cwl_workflows/
      git checkout CWLProvTesting
      ./topmed-workflows/TOPMed_RNAseq_pipeline/input-examples/download_examples.sh
    4. Run the following commands to create the CWLProv Research Object:

      cwltool --provenance rnaseqwf_0.6.0_linux --tmp-outdir-prefix=/CWLProv_workflow_testing/intermediate_temp/temp --tmpdir-prefix=/CWLProv_workflow_testing/intermediate_temp/temp topmed-workflows/TOPMed_RNAseq_pipeline/rnaseq_pipeline_fastq.cwl topmed-workflows/TOPMed_RNAseq_pipeline/input-examples/Dockstore.json
      
      zip -r rnaseqwf_0.5.0_mac.zip rnaseqwf_0.5.0_mac
      sha256sum rnaseqwf_0.5.0_mac.zip > rnaseqwf_0.5.0_mac_mac.zip.sha256

    The https://github.com/FarahZKhan/cwl_workflows repository is a frozen snapshot from https://github.com/heliumdatacommons/TOPMed_RNAseq_CWL commit 027e8af41b906173aafdb791351fb29efc044120

  16. Basic Stand Alone Medicare Claims Public Use Files Data Package

    • johnsnowlabs.com
    csv
    Updated Jan 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Snow Labs (2021). Basic Stand Alone Medicare Claims Public Use Files Data Package [Dataset]. https://www.johnsnowlabs.com/marketplace/basic-stand-alone-medicare-claims-public-use-files-data-package/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 20, 2021
    Dataset authored and provided by
    John Snow Labs
    Description

    This data package contains claims-based data about beneficiaries of Medicare program services including Inpatient, Outpatient, related to Chronic Conditions, Skilled Nursing Facility, Home Health Agency, Hospice, Carrier, Durable Medical Equipment (DME) and data related to Prescription Drug Events. It is necessary to mention that the values are estimated and counted, by using a random sample of fee-for-service Medicare claims.

  17. Enterprise Survey 2009-2019, Panel Data - Slovenia

    • microdata.worldbank.org
    • catalog.ihsn.org
    Updated Aug 6, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Bank Group (WBG) (2020). Enterprise Survey 2009-2019, Panel Data - Slovenia [Dataset]. https://microdata.worldbank.org/index.php/catalog/3762
    Explore at:
    Dataset updated
    Aug 6, 2020
    Dataset provided by
    World Bank Grouphttp://www.worldbank.org/
    European Bank for Reconstruction and Developmenthttp://ebrd.com/
    European Investment Bankhttp://eib.org/
    Time period covered
    2008 - 2019
    Area covered
    Slovenia
    Description

    Abstract

    The documentation covers Enterprise Survey panel datasets that were collected in Slovenia in 2009, 2013 and 2019.

    The Slovenia ES 2009 was conducted between 2008 and 2009. The Slovenia ES 2013 was conducted between March 2013 and September 2013. Finally, the Slovenia ES 2019 was conducted between December 2018 and November 2019. The objective of the Enterprise Survey is to gain an understanding of what firms experience in the private sector.

    As part of its strategic goal of building a climate for investment, job creation, and sustainable growth, the World Bank has promoted improving the business environment as a key strategy for development, which has led to a systematic effort in collecting enterprise data across countries. The Enterprise Surveys (ES) are an ongoing World Bank project in collecting both objective data based on firms' experiences and enterprises' perception of the environment in which they operate.

    Geographic coverage

    National

    Analysis unit

    The primary sampling unit of the study is the establishment. An establishment is a physical location where business is carried out and where industrial operations take place or services are provided. A firm may be composed of one or more establishments. For example, a brewery may have several bottling plants and several establishments for distribution. For the purposes of this survey an establishment must take its own financial decisions and have its own financial statements separate from those of the firm. An establishment must also have its own management and control over its payroll.

    Universe

    As it is standard for the ES, the Slovenia ES was based on the following size stratification: small (5 to 19 employees), medium (20 to 99 employees), and large (100 or more employees).

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The sample for Slovenia ES 2009, 2013, 2019 were selected using stratified random sampling, following the methodology explained in the Sampling Manual for Slovenia 2009 ES and for Slovenia 2013 ES, and in the Sampling Note for 2019 Slovenia ES.

    Three levels of stratification were used in this country: industry, establishment size, and oblast (region). The original sample designs with specific information of the industries and regions chosen are included in the attached Excel file (Sampling Report.xls.) for Slovenia 2009 ES. For Slovenia 2013 and 2019 ES, specific information of the industries and regions chosen is described in the "The Slovenia 2013 Enterprise Surveys Data Set" and "The Slovenia 2019 Enterprise Surveys Data Set" reports respectively, Appendix E.

    For the Slovenia 2009 ES, industry stratification was designed in the way that follows: the universe was stratified into manufacturing industries, services industries, and one residual (core) sector as defined in the sampling manual. Each industry had a target of 90 interviews. For the manufacturing industries sample sizes were inflated by about 17% to account for potential non-response cases when requesting sensitive financial data and also because of likely attrition in future surveys that would affect the construction of a panel. For the other industries (residuals) sample sizes were inflated by about 12% to account for under sampling in firms in service industries.

    For Slovenia 2013 ES, industry stratification was designed in the way that follows: the universe was stratified into one manufacturing industry, and two service industries (retail, and other services).

    Finally, for Slovenia 2019 ES, three levels of stratification were used in this country: industry, establishment size, and region. The original sample design with specific information of the industries and regions chosen is described in "The Slovenia 2019 Enterprise Surveys Data Set" report, Appendix C. Industry stratification was done as follows: Manufacturing – combining all the relevant activities (ISIC Rev. 4.0 codes 10-33), Retail (ISIC 47), and Other Services (ISIC 41-43, 45, 46, 49-53, 55, 56, 58, 61, 62, 79, 95).

    For Slovenia 2009 and 2013 ES, size stratification was defined following the standardized definition for the rollout: small (5 to 19 employees), medium (20 to 99 employees), and large (more than 99 employees). For stratification purposes, the number of employees was defined on the basis of reported permanent full-time workers. This seems to be an appropriate definition of the labor force since seasonal/casual/part-time employment is not a common practice, except in the sectors of construction and agriculture.

    For Slovenia 2009 ES, regional stratification was defined in 2 regions. These regions are Vzhodna Slovenija and Zahodna Slovenija. The Slovenia sample contains panel data. The wave 1 panel “Investment Climate Private Enterprise Survey implemented in Slovenia” consisted of 223 establishments interviewed in 2005. A total of 57 establishments have been re-interviewed in the 2008 Business Environment and Enterprise Performance Survey.

    For Slovenia 2013 ES, regional stratification was defined in 2 regions (city and the surrounding business area) throughout Slovenia.

    Finally, for Slovenia 2019 ES, regional stratification was done across two regions: Eastern Slovenia (NUTS code SI03) and Western Slovenia (SI04).

    Mode of data collection

    Computer Assisted Personal Interview [capi]

    Research instrument

    Questionnaires have common questions (core module) and respectfully additional manufacturing- and services-specific questions. The eligible manufacturing industries have been surveyed using the Manufacturing questionnaire (includes the core module, plus manufacturing specific questions). Retail firms have been interviewed using the Services questionnaire (includes the core module plus retail specific questions) and the residual eligible services have been covered using the Services questionnaire (includes the core module). Each variation of the questionnaire is identified by the index variable, a0.

    Response rate

    Survey non-response must be differentiated from item non-response. The former refers to refusals to participate in the survey altogether whereas the latter refers to the refusals to answer some specific questions. Enterprise Surveys suffer from both problems and different strategies were used to address these issues.

    Item non-response was addressed by two strategies: a- For sensitive questions that may generate negative reactions from the respondent, such as corruption or tax evasion, enumerators were instructed to collect the refusal to respond as (-8). b- Establishments with incomplete information were re-contacted in order to complete this information, whenever necessary. However, there were clear cases of low response.

    For 2009 and 2013 Slovenia ES, the survey non-response was addressed by maximizing efforts to contact establishments that were initially selected for interview. Up to 4 attempts were made to contact the establishment for interview at different times/days of the week before a replacement establishment (with similar strata characteristics) was suggested for interview. Survey non-response did occur but substitutions were made in order to potentially achieve strata-specific goals. Further research is needed on survey non-response in the Enterprise Surveys regarding potential introduction of bias.

    For 2009, the number of contacted establishments per realized interview was 6.18. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The relatively low ratio of contacted establishments per realized interview (6.18) suggests that the main source of error in estimates in the Slovenia may be selection bias and not frame inaccuracy.

    For 2013, the number of realized interviews per contacted establishment was 25%. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The number of rejections per contact was 44%.

    Finally, for 2019, the number of interviews per contacted establishments was 9.7%. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The share of rejections per contact was 75.2%.

  18. d

    Example data file for TRUEMET Version 2.2

    • catalog.data.gov
    • datasets.ai
    Updated Oct 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Fish and Wildlife Service (2025). Example data file for TRUEMET Version 2.2 [Dataset]. https://catalog.data.gov/dataset/example-data-file-for-truemet-version-2-2
    Explore at:
    Dataset updated
    Oct 23, 2025
    Dataset provided by
    U.S. Fish and Wildlife Service
    Description

    This file is an example data set from the Central Valley of California from a drought study corresponding to “recent non-drought conditions” (Scenario 1 in Petrie et al., in review). In 2014, following an 8-year period with 7 below-normal to critically-dry water years, the bioenergetic model TRUEMET was used to assess the impacts of drought on wintering waterfowl habitat and bioenergetics in the Central Valley of California. The goal of the study was to assess whether available foraging habitats could provide enough food to support waterfowl populations (ducks and geese) under a variety of climate and population level scenarios. This information could then be used by managers to adapt their waterfowl habitat management plans to drought conditions. The study area spanned the Central Valley and included the Sacramento Valley in the north, the San Joaquin Valley in the south, and Suisun Marsh and Sacramento-San Joaquin River Delta (Delta) east of San Francisco Bay. The data set consists of two foraging guilds (ducks and geese/swans) and five forage types: harvested corn, rice (flooded), rice (unflooded), wetland invertebrates and wetland moist soil seeds. For more background on the data set, see Petrie et al. in review.

  19. a

    [Sample Dataset] April 2024 Public Data File from Crossref

    • academictorrents.com
    bittorrent
    Updated May 10, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    None (2024). [Sample Dataset] April 2024 Public Data File from Crossref [Dataset]. https://academictorrents.com/details/d47fbe29e5ef93a6695421f79a6efa4b801acff1
    Explore at:
    bittorrent(19721846)Available download formats
    Dataset updated
    May 10, 2024
    Authors
    None
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    [Sample Dataset] April 2024 Public Data File from Crossref. This dataset includes 100 random JSON records from the Crossref metadata corpus.

  20. Dirty Data Sample

    • kaggle.com
    zip
    Updated Feb 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shiva Vashishtha (2022). Dirty Data Sample [Dataset]. https://www.kaggle.com/datasets/shivavashishtha/dirty-data-sample
    Explore at:
    zip(52182 bytes)Available download formats
    Dataset updated
    Feb 22, 2022
    Authors
    Shiva Vashishtha
    Description

    Dataset

    This dataset was created by Shiva Vashishtha

    Contents

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Redivis Demo Organization (2025). Genomics examples [Dataset]. https://redivis.com/datasets/yz1s-d09009dbb
Organization logo

Genomics examples

Explore at:
176 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Oct 20, 2025
Dataset provided by
Redivis Inc.
Authors
Redivis Demo Organization
Time period covered
Jan 30, 2025
Description

This is an auto-generated index table corresponding to a folder of files in this dataset with the same name. This table can be used to extract a subset of files based on their metadata, which can then be used for further analysis. You can view the contents of specific files by navigating to the "cells" tab and clicking on an individual file_id.

Search
Clear search
Close search
Google apps
Main menu