100+ datasets found

Genomics examples
redivis.com
Updated Oct 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Redivis Demo Organization (2025). Genomics examples [Dataset]. https://redivis.com/datasets/yz1s-d09009dbb
Explore at:
Dataset updated
Oct 20, 2025
Dataset provided by
Redivis Inc.
Authors
Redivis Demo Organization
Time period covered
Jan 30, 2025
Description
This is an auto-generated index table corresponding to a folder of files in this dataset with the same name. This table can be used to extract a subset of files based on their metadata, which can then be used for further analysis. You can view the contents of specific files by navigating to the "cells" tab and clicking on an individual file_id.
Training images
redivis.com
Updated Oct 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Redivis Demo Organization (2025). Training images [Dataset]. https://redivis.com/datasets/yz1s-d09009dbb
Explore at:
Dataset updated
Oct 20, 2025
Dataset provided by
Redivis Inc.
Authors
Redivis Demo Organization
Time period covered
Aug 8, 2022
Description
This is an auto-generated index table corresponding to a folder of files in this dataset with the same name. This table can be used to extract a subset of files based on their metadata, which can then be used for further analysis. You can view the contents of specific files by navigating to the "cells" tab and clicking on an individual file_kd.
Sample data files for Python Course
figshare.com
txt
Updated Nov 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peter Verhaar (2022). Sample data files for Python Course [Dataset]. http://doi.org/10.6084/m9.figshare.21501549.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21501549.v1
Dataset updated
Nov 4, 2022
Dataset provided by
Figsharehttp://figshare.com/
Authors
Peter Verhaar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Sample data set used in an introductory course on Programming in Python
ML Basics Data Files
kaggle.com
Updated Dec 7, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Satish Gunjal (2020). ML Basics Data Files [Dataset]. https://www.kaggle.com/satishgunjal/ml-basics-data-files/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 7, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Satish Gunjal
Description
Dataset

This dataset was created by Satish Gunjal

Released under Other (specified in description)

Contents
f
Representative sample of the data file required to input user-specific data...
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Nov 2, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hughes, Laura D.; Hughes, Michael E.; Lewis, Scott A. (2017). Representative sample of the data file required to input user-specific data into ExpressionDB. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001798548
Explore at:
Dataset updated
Nov 2, 2017
Authors
Hughes, Laura D.; Hughes, Michael E.; Lewis, Scott A.
Description
This example includes two tissues with three replicates apiece downloaded from GTEx. Complete.csv file here: https://github.com/5c077/ExpressionDB/tree/master/data.
d
Labo data file showing examples of available lab test results
datarade.ai
.csv, .xls, .txt
Updated Nov 22, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Medical Data Vision (2015). Labo data file showing examples of available lab test results [Dataset]. https://datarade.ai/data-products/labo-data-file-showing-examples-of-available-lab-test-results-medical-data-vision
Explore at:
.csv, .xls, .txtAvailable download formats
Dataset updated
Nov 22, 2015
Dataset authored and provided by
Medical Data Vision
Area covered
Japan
Description
The lab test results is already provided by about 20 % of hospitals providing us their medical data.

This dataset is a valuable resource for healthcare professionals, researchers, and organizations looking to analyze and understand the prevalence and distribution of various medical conditions in Japan. It can be used for epidemiological studies, healthcare planning, and medical research. The inclusion of ICD-10 codes allows for standardized analysis and comparison of diseases, and the patient count provides essential data for assessing the burden and impact of these conditions on the healthcare system and population.
d
HTMLmetadata HTML formatted text files describing samples and spectra,...
catalog.data.gov
datasets.ai
+1more
Updated Oct 22, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). HTMLmetadata HTML formatted text files describing samples and spectra, including photos [Dataset]. https://catalog.data.gov/dataset/htmlmetadata-html-formatted-text-files-describing-samples-and-spectra-including-photos
Explore at:
Dataset updated
Oct 22, 2025
Dataset provided by
U.S. Geological Survey
Description
HTMLmetadata Text files in HTML-format containing metadata about samples and spectra. Also included in the zip file are folders containing information linked to from the HTML files, including: - README: contains a HTML version of the USGS Data Series publication, linked to this data release, that describes this spectral library (Kokaly and others, 2017). The folder also contains an HTML version of the release notes. - photo_images: contains full resolution images of photos of samples and field sites. - photo_thumbs: contains low-resolution thumbnail versions of photos of samples and field sites. GENERAL LIBRARY DESCRIPTION This data release provides the U.S. Geological Survey (USGS) Spectral Library Version 7 and all related documents. The library contains spectra measured with laboratory, field, and airborne spectrometers. The instruments used cover wavelengths from the ultraviolet to the far infrared (0.2 to 200 microns). Laboratory samples of specific minerals, plants, chemical compounds, and man-made materials were measured. In many cases, samples were purified, so that unique spectral features of a material can be related to its chemical structure. These spectro-chemical links are important for interpreting remotely sensed data collected in the field or from an aircraft or spacecraft. This library also contains physically-constructed as well as mathematically-computed mixtures. Measurements of rocks, soils, and natural mixtures of minerals have also been made with laboratory and field spectrometers. Spectra of plant components and vegetation plots, comprising many plant types and species with varying backgrounds, are also in this library. Measurements by airborne spectrometers are included for forested vegetation plots, in which the trees are too tall for measurement by a field spectrometer. The related U.S. Geological Survey Data Series publication, "USGS Spectral Library Version 7", describes the instruments used, metadata descriptions of spectra and samples, and possible artifacts in the spectral measurements (Kokaly and others, 2017). Four different spectrometer types were used to measure spectra in the library: (1) Beckman™ 5270 covering the spectral range 0.2 to 3 µm, (2) standard, high resolution (hi-res), and high-resolution Next Generation (hi-resNG) models of ASD field portable spectrometers covering the range from 0.35 to 2.5 µm, (3) Nicolet™ Fourier Transform Infra-Red (FTIR) interferometer spectrometers covering the range from about 1.12 to 216 µm, and (4) the NASA Airborne Visible/Infra-Red Imaging Spectrometer AVIRIS, covering the range 0.37 to 2.5 µm. Two fundamental spectrometer characteristics significant for interpreting and utilizing spectral measurements are sampling position (the wavelength position of each spectrometer channel) and bandpass (a parameter describing the wavelength interval over which each channel in a spectrometer is sensitive). Bandpass is typically reported as the Full Width at Half Maximum (FWHM) response at each channel (in wavelength units, for example nm or micron). The linked publication (Kokaly and others, 2017), includes a comparison plot of the various spectrometers used to measure the data in this release. Data for the sampling positions and the bandpass values (for each channel in the spectrometers) are included in this data release. These data are in the SPECPR files, as separate data records, and in the American Standard Code for Information Interchange (ASCII) text files, as separate files for wavelength and bandpass. Spectra are provided in files of ASCII text format (files with a .txt file extension). In the ASCII files, deleted channels (bad bands) are indicated by a value of -1.23e34. Metadata descriptions of samples, field areas, spectral measurements, and results from supporting material analyses – such as XRD – are provided in HyperText Markup Language HTML formatted ASCII text files (files with .html file extension). In addition, Graphics Interchange Format (GIF) images of plots of spectra are provided. For each spectrum a plot with wavelength in microns on the x-axis is provided. For spectra measured on the Nicolet spectrometer, an additional GIF image with wavenumber on the x-axis is provided. Data are also provided in SPECtrum Processing Routines (SPECPR) format (Clark, 1993) which packages spectra and associated metadata descriptions into a single file (see the linked publication, Kokaly and others, 2017, for additional details on the SPECPR format and freely-available software than can be used to read files in SPECPR format). The data measured on the source spectrometers are denoted by the “splib07a” tag in filenames. In addition to providing the original measurements, the spectra have been convolved and resampled to different spectrometer and multispectral sensor characteristics. The following list specifies the identifying tag for the measured and convolved libraries and gives brief descriptions of the sensors. splib07a – this is the name of the SPECPR file containing the spectra measured on the Beckman, ASD, Nicolet and AVIRIS spectrometers. The data are provided with their original sampling positions (wavelengths) and bandpass values. The prefix “splib07a_” is at the beginning of the ASCII and GIF files pertaining to the measured spectra. splib07b – this is the name of the SPECPR file containing a modified version of the original measurements. The results from using spectral convolution to convert measurements to other spectrometer characteristics can be improved by oversampling (increasing sample density). Thus, splib07b is an oversampled version of the library, computed using simple cubic-spline interpolation to produce spectra with fine sampling interval (therefore a higher number of channels) for Beckman and AVIRIS measurements. The spectra in this version of the library are the data used to create the convolved and resampled versions of the library. The prefix “splib07b_” is at the beginning of the ASCII and GIF files pertaining to the oversampled spectra. s07_ASD – this is the name of the SPECPR file containing the spectral library measurements convolved to standard resolution ASD full range spectrometer characteristics. The standard reported wavelengths of the ASD spectrometers used by the USGS were used (2151 channels with wavelength positions starting at 350 nm and increasing in 1 nm increments). The bandpass values of each channel were determined by comparing measurements of reference materials made on ASD spectrometers in comparison to measurements made of the same materials on higher resolution spectrometers (the procedure is described in Kokaly, 2011, and discussed in Kokaly and Skidmore, 2015, and Kokaly and others, 2017). The prefix “s07ASD_” is at the beginning of the ASCII and GIF files pertaining to this spectrometer. s07_AV95 – this is the name of the SPECPR file containing the spectral library measurements convolved to AVIRIS-Classic with spectral characteristics determined in the year 1995 (wavelength and bandpass values for the 224 channels provided with AVIRIS data by NASA/JPL). The prefix “s07_AV95_” is at the beginning of the ASCII and GIF files pertaining to this spectrometer. s07_AV96 – this is the name of the SPECPR file containing the spectral library measurements convolved to AVIRIS-Classic with spectral characteristics determined in the year 1996 (wavelength and bandpass values for the 224 channels provided with AVIRIS data by NASA/JPL). The prefix “s07_AV96_” is at the beginning of the ASCII, and GIF files. s07_AV97 – this is the name of the SPECPR file containing the spectral library measurements convolved to AVIRIS-Classic with spectral characteristics determined in the year 1997 (wavelength and bandpass values for the 224 channels provided with AVIRIS data by NASA/JPL). The prefix “s07_AV97_” is at the beginning of the ASCII and GIF files pertaining to this spectrometer. s07_AV98 – this is the name of the SPECPR file containing the spectral library measurements convolved to AVIRIS-Classic with spectral characteristics determined in the year 1998 (wavelength and bandpass values for the 224 channels provided with AVIRIS data by NASA/JPL). The prefix “s07_AV98_” is at the beginning of the ASCII and GIF files pertaining to this spectrometer. s07_AV99 – this is the name of the SPECPR file containing the spectral library measurements convolved to AVIRIS-Classic with spectral characteristics determined in the year 1999 (wavelength and bandpass values for the 224 channels provided with AVIRIS data by NASA/JPL). The prefix “s07_AV99_” is at the beginning of the ASCII and GIF files pertaining to this spectrometer. s07_AV00 – this is the name of the SPECPR file containing the spectral library measurements convolved to AVIRIS-Classic with spectral characteristics determined in the year 2000 (wavelength and bandpass values for the 224 channels provided with AVIRIS data by NASA/JPL). The prefix “s07_AV00_” is at the beginning of the ASCII and GIF files pertaining to this spectrometer. s07_AV01 – this is the name of the SPECPR file containing the spectral library measurements convolved to AVIRIS-Classic with spectral characteristics determined in the year 2001 (wavelength and bandpass values for the 224 channels provided with AVIRIS data by NASA/JPL). The prefix “s07_AV01_” is at the beginning of the ASCII and GIF files pertaining to this spectrometer. s07_AV05 – this is the name of the SPECPR file containing the spectral library measurements convolved to AVIRIS-Classic with spectral characteristics determined in the year 2005 (wavelength and bandpass values for the 224 channels provided with AVIRIS data by NASA/JPL). The prefix “s07_AV05_” is at the beginning of the ASCII and GIF files pertaining to this spectrometer. s07_AV06 – this is the name of the SPECPR file containing the spectral library measurements convolved to
Sample CSV files
kaggle.com
zip
Updated Mar 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Naman Kumar (2022). Sample CSV files [Dataset]. https://www.kaggle.com/matcauthon49/sample-csv-files
Explore at:
zip(88875843 bytes)Available download formats
Dataset updated
Mar 8, 2022
Authors
Naman Kumar
Description
Dataset

This dataset was created by Naman Kumar

Contents
Dataset #1: Cross-sectional survey data
figshare.com
txt
Updated Jul 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adam Baimel (2023). Dataset #1: Cross-sectional survey data [Dataset]. http://doi.org/10.6084/m9.figshare.23708730.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.23708730.v1
Dataset updated
Jul 19, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Adam Baimel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
N.B. This is not real data. Only here for an example for project templates.

Project Title: Add title here

Project Team: Add contact information for research project team members

Summary: Provide a descriptive summary of the nature of your research project and its aims/focal research questions.

Relevant publications/outputs: When available, add links to the related publications/outputs from this data.

Data availability statement: If your data is not linked on figshare directly, provide links to where it is being hosted here (i.e., Open Science Framework, Github, etc.). If your data is not going to be made publicly available, please provide details here as to the conditions under which interested individuals could gain access to the data and how to go about doing so.

Data collection details: 1. When was your data collected? 2. How were your participants sampled/recruited?

Sample information: How many and who are your participants? Demographic summaries are helpful additions to this section.

Research Project Materials: What materials are necessary to fully reproduce your the contents of your dataset? Include a list of all relevant materials (e.g., surveys, interview questions) with a brief description of what is included in each file that should be uploaded alongside your datasets.

List of relevant datafile(s): If your project produces data that cannot be contained in a single file, list the names of each of the files here with a brief description of what parts of your research project each file is related to.

Data codebook: What is in each column of your dataset? Provide variable names as they are encoded in your data files, verbatim question associated with each response, response options, details of any post-collection coding that has been done on the raw-response (and whether that's encoded in a separate column).

Examples available at: https://www.thearda.com/data-archive?fid=PEWMU17 https://www.thearda.com/data-archive?fid=RELLAND14
Test files
kaggle.com
zip
Updated Feb 2, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tr0uble (2021). Test files [Dataset]. https://www.kaggle.com/tr0uble/test-files
Explore at:
zip(865449 bytes)Available download formats
Dataset updated
Feb 2, 2021
Authors
Tr0uble
Description
Dataset

This dataset was created by Tr0uble

Contents
B
Data Cleaning Sample
borealisdata.ca
dataone.org
Updated Jul 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP3/ZCN177
Dataset updated
Jul 13, 2023
Dataset provided by
Borealis
Authors
Rong Luo
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Sample data for exercises in Further Adventures in Data Cleaning.
Data from: Raw data files
figshare.com
bin
Updated Mar 26, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ronen Schuster (2021). Raw data files [Dataset]. http://doi.org/10.6084/m9.figshare.14319758.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14319758.v1
Dataset updated
Mar 26, 2021
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Ronen Schuster
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Raw data tables and the statistical analysis applied to the data. Files are labeled by figure number. Within each file, each table and linked graph and analysis is annotated by figure number and panel letter. All files are generated in graphpad prism.
CSV file used in statistical analyses
data.csiro.au
researchdata.edu.au
+1more
Updated Oct 13, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CSIRO (2014). CSV file used in statistical analyses [Dataset]. http://doi.org/10.4225/08/543B4B4CA92E6
Explore at:
Unique identifier
https://doi.org/10.4225/08/543B4B4CA92E6
Dataset updated
Oct 13, 2014
Dataset authored and provided by
CSIROhttp://www.csiro.au/
License
https://research.csiro.au/dap/licences/csiro-data-licence/https://research.csiro.au/dap/licences/csiro-data-licence/
Time period covered
Mar 14, 2008 - Jun 9, 2009
Dataset funded by
CSIROhttp://www.csiro.au/
Description
A csv file containing the tidal frequencies used for statistical analyses in the paper "Estimating Freshwater Flows From Tidally-Affected Hydrographic Data" by Dan Pagendam and Don Percival.
Vehicle licensing statistics data files
s3.amazonaws.com
gov.uk
Updated May 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department for Transport (2022). Vehicle licensing statistics data files [Dataset]. https://s3.amazonaws.com/thegovernmentsays-files/content/181/1811927.html
Explore at:
Dataset updated
May 24, 2022
Dataset provided by
GOV.UKhttp://gov.uk/
Authors
Department for Transport
Description
The following datafiles contain detailed information about vehicles in the UK, which would be too large to use as structured tables. They are provided as simple CSV text files that should be easier to use digitally.

Data tables containing aggregated information about vehicles in the UK are also available.

We welcome any feedback on the structure of our new datafiles, their usability, or any suggestions for improvements, please contact vehicles statistics.

How to use CSV files

CSV files can be used either as a spreadsheet (using Microsoft Excel or similar spreadsheet packages) or digitally using software packages and languages (for example, R or Python).

When using as a spreadsheet, there will be no formatting, but the file can still be explored like our publication tables. Due to their size, older software might not be able to open the entire file.

Download data files

Make and model by quarter

df_VEH0120_GB: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1077520/df_VEH0120_GB.csv">Vehicles at the end of the quarter by licence status, body type, make, generic model and model: Great Britain (CSV, 37.6 MB)

Scope: All registered vehicles in Great Britain; from 1994 Quarter 4 (end December)

Schema: BodyType, Make, GenModel, Model, LicenceStatus, [number of vehicles; one column per quarter]

df_VEH0120_UK: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1077521/df_VEH0120_UK.csv">Vehicles at the end of the quarter by licence status, body type, make, generic model and model: United Kingdom (CSV, 20.8 MB)

Scope: All registered vehicles in the United Kingdom; from 2014 Quarter 3 (end September)

Schema: BodyType, Make, GenModel, Model, LicenceStatus, [number of vehicles; one column per quarter]

df_VEH0160_GB: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1077522/df_VEH0160_GB.csv">Vehicles registered for the first time by body type, make, generic model and model: Great Britain (CSV, 17.1 MB)

Scope: All vehicles registered for the first time in Great Britain; from 2001 Quarter 1 (January to March)

Schema: BodyType, Make, GenModel, Model, [number of vehicles; one column per quarter]

df_VEH0160_UK: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1077523/df_VEH0160_UK.csv">Vehicles registered for the first time by body type, make, generic model and model: United Kingdom (CSV, 4.93 MB)

Scope: All vehicles registered for the first time in the United Kingdom; from 2014 Quarter 3 (July to September)

Schema: BodyType, Make, GenModel, Model, [number of vehicles; one column per quarter]

Make and model by age

df_VEH0124: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1077524/df_VEH0124.csv">Vehicles at the end of the quarter by licence status, body type, make, generic model, model, year of first use and year of manufacture: United Kingdom (CSV, 28.2 MB)

Scope: All licensed vehicles in the United Kingdom; 2021 Quarter 4 (end December) only

Schema: BodyType, Make, GenModel, Model, YearFirstUsed, YearManufacture, Licensed (number of vehicles), SORN (number of vehicles)

Make and model by engine size

df_VEH0220: <a class="govu
CWL run of RNA-seq Analysis Workflow (CWLProv 0.5.0 Research Object)
zenodo.org
data.niaid.nih.gov
+3more
bin, zip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Farah Zaib Khan; Farah Zaib Khan; Stian Soiland-Reyes; Stian Soiland-Reyes (2020). CWL run of RNA-seq Analysis Workflow (CWLProv 0.5.0 Research Object) [Dataset]. http://doi.org/10.17632/xnwncxpw42.1
Explore at:
zip, binAvailable download formats
Unique identifier
https://doi.org/10.17632/xnwncxpw42.1
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Farah Zaib Khan; Farah Zaib Khan; Stian Soiland-Reyes; Stian Soiland-Reyes
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This workflow adapts the approach and parameter settings of Trans-Omics for precision Medicine (TOPMed). The RNA-seq pipeline originated from the Broad Institute. There are in total five steps in the workflow starting from:

Read alignment using STAR which produces aligned BAM files including the Genome BAM and Transcriptome BAM.

The Genome BAM file is processed using Picard MarkDuplicates. producing an updated BAM file containing information on duplicate reads (such reads can indicate biased interpretation).

SAMtools index is then employed to generate an index for the BAM file, in preparation for the next step.

The indexed BAM file is processed further with RNA-SeQC which takes the BAM file, human genome reference sequence and Gene Transfer Format (GTF) file as inputs to generate transcriptome-level expression quantifications and standard quality control metrics.

In parallel with transcript quantification, isoform expression levels are quantified by RSEM. This step depends only on the output of the STAR tool, and additional RSEM reference sequences.

For testing and analysis, the workflow author provided example data created by down-sampling the read files of a TOPMed public access data. Chromosome 12 was extracted from the Homo Sapien Assembly 38 reference sequence and provided by the workflow authors. The required GTF and RSEM reference data files are also provided. The workflow is well-documented with a detailed set of instructions of the steps performed to down-sample the data are also provided for transparency. The availability of example input data, use of containerization for underlying software and detailed documentation are important factors in choosing this specific CWL workflow for CWLProv evaluation.

This dataset folder is a CWLProv Research Object that captures the Common Workflow Language execution provenance, see https://w3id.org/cwl/prov/0.5.0 or use https://pypi.org/project/cwl

Steps to reproduce

To build the research object again, use Python 3 on macOS. Built with:

Processor 2.8GHz Intel Core i7

Memory: 16GB

OS: macOS High Sierra, Version 10.13.3

Storage: 250GB

Install cwltool

pip3 install cwltool==1.0.20180912090223

Install git lfs
The data download with the git repository requires the installation of Git lfs:
https://www.atlassian.com/git/tutorials/git-lfs#installing-git-lfs

Get the data and make the analysis environment ready:

git clone https://github.com/FarahZKhan/cwl_workflows.git cd cwl_workflows/ git checkout CWLProvTesting ./topmed-workflows/TOPMed_RNAseq_pipeline/input-examples/download_examples.sh

Run the following commands to create the CWLProv Research Object:

cwltool --provenance rnaseqwf_0.6.0_linux --tmp-outdir-prefix=/CWLProv_workflow_testing/intermediate_temp/temp --tmpdir-prefix=/CWLProv_workflow_testing/intermediate_temp/temp topmed-workflows/TOPMed_RNAseq_pipeline/rnaseq_pipeline_fastq.cwl topmed-workflows/TOPMed_RNAseq_pipeline/input-examples/Dockstore.json zip -r rnaseqwf_0.5.0_mac.zip rnaseqwf_0.5.0_mac sha256sum rnaseqwf_0.5.0_mac.zip > rnaseqwf_0.5.0_mac_mac.zip.sha256

The https://github.com/FarahZKhan/cwl_workflows repository is a frozen snapshot from https://github.com/heliumdatacommons/TOPMed_RNAseq_CWL commit 027e8af41b906173aafdb791351fb29efc044120
Basic Stand Alone Medicare Claims Public Use Files Data Package
johnsnowlabs.com
csv
Updated Jan 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Snow Labs (2021). Basic Stand Alone Medicare Claims Public Use Files Data Package [Dataset]. https://www.johnsnowlabs.com/marketplace/basic-stand-alone-medicare-claims-public-use-files-data-package/
Explore at:
csvAvailable download formats
Dataset updated
Jan 20, 2021
Dataset authored and provided by
John Snow Labs
Description
This data package contains claims-based data about beneficiaries of Medicare program services including Inpatient, Outpatient, related to Chronic Conditions, Skilled Nursing Facility, Home Health Agency, Hospice, Carrier, Durable Medical Equipment (DME) and data related to Prescription Drug Events. It is necessary to mention that the values are estimated and counted, by using a random sample of fee-for-service Medicare claims.
Enterprise Survey 2009-2019, Panel Data - Slovenia
microdata.worldbank.org
catalog.ihsn.org
Updated Aug 6, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
World Bank Group (WBG) (2020). Enterprise Survey 2009-2019, Panel Data - Slovenia [Dataset]. https://microdata.worldbank.org/index.php/catalog/3762
Explore at:
Dataset updated
Aug 6, 2020
Dataset provided by
World Bank Grouphttp://www.worldbank.org/
European Bank for Reconstruction and Developmenthttp://ebrd.com/
European Investment Bankhttp://eib.org/
Time period covered
2008 - 2019
Area covered
Slovenia
Description
Abstract

The documentation covers Enterprise Survey panel datasets that were collected in Slovenia in 2009, 2013 and 2019.

The Slovenia ES 2009 was conducted between 2008 and 2009. The Slovenia ES 2013 was conducted between March 2013 and September 2013. Finally, the Slovenia ES 2019 was conducted between December 2018 and November 2019. The objective of the Enterprise Survey is to gain an understanding of what firms experience in the private sector.

As part of its strategic goal of building a climate for investment, job creation, and sustainable growth, the World Bank has promoted improving the business environment as a key strategy for development, which has led to a systematic effort in collecting enterprise data across countries. The Enterprise Surveys (ES) are an ongoing World Bank project in collecting both objective data based on firms' experiences and enterprises' perception of the environment in which they operate.

Geographic coverage

National

Analysis unit

The primary sampling unit of the study is the establishment. An establishment is a physical location where business is carried out and where industrial operations take place or services are provided. A firm may be composed of one or more establishments. For example, a brewery may have several bottling plants and several establishments for distribution. For the purposes of this survey an establishment must take its own financial decisions and have its own financial statements separate from those of the firm. An establishment must also have its own management and control over its payroll.

Universe

As it is standard for the ES, the Slovenia ES was based on the following size stratification: small (5 to 19 employees), medium (20 to 99 employees), and large (100 or more employees).

Kind of data

Sample survey data [ssd]

Sampling procedure

The sample for Slovenia ES 2009, 2013, 2019 were selected using stratified random sampling, following the methodology explained in the Sampling Manual for Slovenia 2009 ES and for Slovenia 2013 ES, and in the Sampling Note for 2019 Slovenia ES.

Three levels of stratification were used in this country: industry, establishment size, and oblast (region). The original sample designs with specific information of the industries and regions chosen are included in the attached Excel file (Sampling Report.xls.) for Slovenia 2009 ES. For Slovenia 2013 and 2019 ES, specific information of the industries and regions chosen is described in the "The Slovenia 2013 Enterprise Surveys Data Set" and "The Slovenia 2019 Enterprise Surveys Data Set" reports respectively, Appendix E.

For the Slovenia 2009 ES, industry stratification was designed in the way that follows: the universe was stratified into manufacturing industries, services industries, and one residual (core) sector as defined in the sampling manual. Each industry had a target of 90 interviews. For the manufacturing industries sample sizes were inflated by about 17% to account for potential non-response cases when requesting sensitive financial data and also because of likely attrition in future surveys that would affect the construction of a panel. For the other industries (residuals) sample sizes were inflated by about 12% to account for under sampling in firms in service industries.

For Slovenia 2013 ES, industry stratification was designed in the way that follows: the universe was stratified into one manufacturing industry, and two service industries (retail, and other services).

Finally, for Slovenia 2019 ES, three levels of stratification were used in this country: industry, establishment size, and region. The original sample design with specific information of the industries and regions chosen is described in "The Slovenia 2019 Enterprise Surveys Data Set" report, Appendix C. Industry stratification was done as follows: Manufacturing – combining all the relevant activities (ISIC Rev. 4.0 codes 10-33), Retail (ISIC 47), and Other Services (ISIC 41-43, 45, 46, 49-53, 55, 56, 58, 61, 62, 79, 95).

For Slovenia 2009 and 2013 ES, size stratification was defined following the standardized definition for the rollout: small (5 to 19 employees), medium (20 to 99 employees), and large (more than 99 employees). For stratification purposes, the number of employees was defined on the basis of reported permanent full-time workers. This seems to be an appropriate definition of the labor force since seasonal/casual/part-time employment is not a common practice, except in the sectors of construction and agriculture.

For Slovenia 2009 ES, regional stratification was defined in 2 regions. These regions are Vzhodna Slovenija and Zahodna Slovenija. The Slovenia sample contains panel data. The wave 1 panel “Investment Climate Private Enterprise Survey implemented in Slovenia” consisted of 223 establishments interviewed in 2005. A total of 57 establishments have been re-interviewed in the 2008 Business Environment and Enterprise Performance Survey.

For Slovenia 2013 ES, regional stratification was defined in 2 regions (city and the surrounding business area) throughout Slovenia.

Finally, for Slovenia 2019 ES, regional stratification was done across two regions: Eastern Slovenia (NUTS code SI03) and Western Slovenia (SI04).

Mode of data collection

Computer Assisted Personal Interview [capi]

Research instrument

Questionnaires have common questions (core module) and respectfully additional manufacturing- and services-specific questions. The eligible manufacturing industries have been surveyed using the Manufacturing questionnaire (includes the core module, plus manufacturing specific questions). Retail firms have been interviewed using the Services questionnaire (includes the core module plus retail specific questions) and the residual eligible services have been covered using the Services questionnaire (includes the core module). Each variation of the questionnaire is identified by the index variable, a0.

Response rate

Survey non-response must be differentiated from item non-response. The former refers to refusals to participate in the survey altogether whereas the latter refers to the refusals to answer some specific questions. Enterprise Surveys suffer from both problems and different strategies were used to address these issues.

Item non-response was addressed by two strategies: a- For sensitive questions that may generate negative reactions from the respondent, such as corruption or tax evasion, enumerators were instructed to collect the refusal to respond as (-8). b- Establishments with incomplete information were re-contacted in order to complete this information, whenever necessary. However, there were clear cases of low response.

For 2009 and 2013 Slovenia ES, the survey non-response was addressed by maximizing efforts to contact establishments that were initially selected for interview. Up to 4 attempts were made to contact the establishment for interview at different times/days of the week before a replacement establishment (with similar strata characteristics) was suggested for interview. Survey non-response did occur but substitutions were made in order to potentially achieve strata-specific goals. Further research is needed on survey non-response in the Enterprise Surveys regarding potential introduction of bias.

For 2009, the number of contacted establishments per realized interview was 6.18. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The relatively low ratio of contacted establishments per realized interview (6.18) suggests that the main source of error in estimates in the Slovenia may be selection bias and not frame inaccuracy.

For 2013, the number of realized interviews per contacted establishment was 25%. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The number of rejections per contact was 44%.

Finally, for 2019, the number of interviews per contacted establishments was 9.7%. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The share of rejections per contact was 75.2%.
d
Example data file for TRUEMET Version 2.2
catalog.data.gov
datasets.ai
Updated Oct 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Fish and Wildlife Service (2025). Example data file for TRUEMET Version 2.2 [Dataset]. https://catalog.data.gov/dataset/example-data-file-for-truemet-version-2-2
Explore at:
Dataset updated
Oct 23, 2025
Dataset provided by
U.S. Fish and Wildlife Service
Description
This file is an example data set from the Central Valley of California from a drought study corresponding to “recent non-drought conditions” (Scenario 1 in Petrie et al., in review). In 2014, following an 8-year period with 7 below-normal to critically-dry water years, the bioenergetic model TRUEMET was used to assess the impacts of drought on wintering waterfowl habitat and bioenergetics in the Central Valley of California. The goal of the study was to assess whether available foraging habitats could provide enough food to support waterfowl populations (ducks and geese) under a variety of climate and population level scenarios. This information could then be used by managers to adapt their waterfowl habitat management plans to drought conditions. The study area spanned the Central Valley and included the Sacramento Valley in the north, the San Joaquin Valley in the south, and Suisun Marsh and Sacramento-San Joaquin River Delta (Delta) east of San Francisco Bay. The data set consists of two foraging guilds (ducks and geese/swans) and five forage types: harvested corn, rice (flooded), rice (unflooded), wetland invertebrates and wetland moist soil seeds. For more background on the data set, see Petrie et al. in review.
a
[Sample Dataset] April 2024 Public Data File from Crossref
academictorrents.com
bittorrent
Updated May 10, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
None (2024). [Sample Dataset] April 2024 Public Data File from Crossref [Dataset]. https://academictorrents.com/details/d47fbe29e5ef93a6695421f79a6efa4b801acff1
Explore at:
bittorrent(19721846)Available download formats
Dataset updated
May 10, 2024
Authors
None
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
[Sample Dataset] April 2024 Public Data File from Crossref. This dataset includes 100 random JSON records from the Crossref metadata corpus.
Dirty Data Sample
kaggle.com
zip
Updated Feb 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shiva Vashishtha (2022). Dirty Data Sample [Dataset]. https://www.kaggle.com/datasets/shivavashishtha/dirty-data-sample
Explore at:
zip(52182 bytes)Available download formats
Dataset updated
Feb 22, 2022
Authors
Shiva Vashishtha
Description
Dataset

This dataset was created by Shiva Vashishtha

Contents

Facebook

Twitter

Click to copy link

Link copied

Cite

Redivis Demo Organization (2025). Genomics examples [Dataset]. https://redivis.com/datasets/yz1s-d09009dbb

Genomics examples

Explore at:

176 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Oct 20, 2025

Dataset provided by

Redivis Inc.

Authors

Redivis Demo Organization

Time period covered

Jan 30, 2025

Description

This is an auto-generated index table corresponding to a folder of files in this dataset with the same name. This table can be used to extract a subset of files based on their metadata, which can then be used for further analysis. You can view the contents of specific files by navigating to the "cells" tab and clicking on an individual file_id.

Clear search

Close search

Google apps

Main menu

Genomics examples

Training images

Sample data files for Python Course

ML Basics Data Files

Dataset

Contents

Representative sample of the data file required to input user-specific data...

Labo data file showing examples of available lab test results

HTMLmetadata HTML formatted text files describing samples and spectra,...

Sample CSV files

Dataset

Contents

Dataset #1: Cross-sectional survey data

Test files

Dataset

Contents

Data Cleaning Sample

Data from: Raw data files

CSV file used in statistical analyses

Vehicle licensing statistics data files

How to use CSV files

Download data files

Make and model by quarter

Make and model by age

Make and model by engine size

CWL run of RNA-seq Analysis Workflow (CWLProv 0.5.0 Research Object)

Basic Stand Alone Medicare Claims Public Use Files Data Package

Enterprise Survey 2009-2019, Panel Data - Slovenia

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Response rate

Example data file for TRUEMET Version 2.2

[Sample Dataset] April 2024 Public Data File from Crossref

Dirty Data Sample

Dataset

Contents

Genomics examples