Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Mock community raw data (fastq.gz files) for long read ONT 16s with ONT primer dataset
Overview The primary purpose of WFIP2 Model Development Team is to improve existing numerical weather prediction models in a manner that leads to improved wind forecasts in regions of complex terrain. Improvements in the models will come through better understanding of the physics associated with the wind flow in and around the wind plant across a range of temporal and spatial scales, which will be gained through WFIP2’s observational field study and analysis. Data Details Initial conditions, lateral-boundary conditions, WRF namelists, and output graphics were archived from three real-time modeling frameworks: 1) RAP-ESRL: the experimental RAP (run hourly) 2) HRRR-ESRL: the experimental HRRR (run hourly) 3) HRRR-WFIP2: the experimental, WFIP2-provisional version of the HRRR, run twice daily at 0600 and 1800 UTC. The real-time HRRR-WFIP2 also ran with a concurrent 750-m nest (i.e., the HRRR-WFIP2 nest) that was initialized at 1 h into the HRRR forecast (i.e., 0700 and 1900 UTC). Each of these frameworks should be considered experimental, subject to intermittent production outages (sometimes persistent), data-assimilation outages, and changes to data-assimilation procedures and physical parameterizations. The archive of real-time data from these modeling frameworks consists of the following two zip-file aggregations: 1) files containing initial conditions, lateral boundary conditions, and WRF namelists: For RAP-ESRL and HRRR-ESRL runs, three files are compressed in a single zip file: i) wrfinput_d01: initial conditions (netCDF) ii) wrfbdy_d01: lateral-boundary conditions (netCDF) iii) namelist.input: the WRF-ARW namelist (plain text) The HRRR-WFIP2 archive also includes these files, but with the addition of "wrfinput_d02", the nested-domain initial conditions (netCDF). Note that while the archived HRRR-WFIP2 namelist specifies a 15-h forecast, lateral-boundary conditions for most runs are available for a 24-h forecast. 2) files containing output graphics (png). Given the large number of graphics files that are produced, a detailed description of the zip-file contents is not given here.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This deposit contains data associated with a feasibility study evaluating the use of individualized report cards to improve trial transparency at the Charité - Universitätsmedizin Berlin. It primarily includes large raw data files and other files compiled by, or used in the project code repository: https://github.com/quest-bih/tv-ct-transparency/. These data are deposited for documentation and computational reproducibility; they do not reflect the most current/accurate data available from each source.
The deposit contains:
Survey data (survey-data.csv
): Participant responses for an anonymous survey conducted to assess the usefulness of the report cards and infosheet. The survey was administered in LimeSurvey and hosted on a server at the QUEST Center for Responsible Research at the Berlin Institute of Health at Charité – Universitätsmedizin Berlin. Any information that could potentially identify participants, such as IP address and free-text fields (e.g., corrections, comments) were removed. This file serves as input for the analysis of the survey data.
Summary
One ultimate goal of visual neuroscience is to understand how the brain processes visual stimuli encountered in the natural environment. Achieving this goal requires records of brain responses under massive amounts of naturalistic stimuli. Although the scientific community has put in a lot of effort to collect large-scale functional magnetic resonance imaging (fMRI) data under naturalistic stimuli, more naturalistic fMRI datasets are still urgently needed. We present here the Natural Object Dataset (NOD), a large-scale fMRI dataset containing responses to 57,120 naturalistic images from 30 participants. NOD strives for a balance between sampling variation between individuals and sampling variation between stimuli. This enables NOD to be utilized not only for determining whether an observation is generalizable across many individuals, but also for testing whether a response pattern is generalized to a variety of naturalistic stimuli. We anticipate that the NOD together with existing naturalistic neuroimaging datasets will serve as a new impetus for our understanding of the visual processing of naturalistic stimuli.
Data record
The data were organized according to the Brain-Imaging-Data-Structure (BIDS) Specification version 1.7.0 and can be accessed from the OpenNeuro public repository (accession number: ds004496). In short, raw data of each subject were stored in “sub-
Stimulus images The stimulus images for different fMRI experiments are deposited in separate folders: “stimuli/imagenet”, “stimuli/coco”, “stimuli/prf”, and “stimuli/floc”. Each experiment folder contains corresponding stimulus images, and the auxiliary files can be found within the “info” subfolder.
Raw MRI data Each participant folder consists of several session folders: anat, coco, imagenet, prf, floc. Each session folder in turn includes “anat”, “func”, or “fmap” folders for corresponding modality data. The scan information for each session is provided in a TSV file.
Preprocessed volume data from fMRIprep The preprocessed volume-based fMRI data are in subject's native space, saved as “sub-
Preprocessed surface-based data from ciftify The preprocessed surface-based data are in standard fsLR space, saved as “sub-
Brain activation data from surface-based GLM analyses The brain activation data are derived from GLM analyses on the standard fsLR space, saved as “sub-
Vegetation Condition Benchmarks describe the reference state to which sites are compared to score their site-scale biodiversity values or set goals for management or restoration. This file contains some of the raw data used to create the most current vegetation condition benchmarks. Refer to the 'Dataset relationship' section, below, to access all the raw data files used in creating the Vegetation Condition Benchmarks V1.2.
The ‘Vegetation Condition Benchmarks Stems raw data V1.2’ file contains aggregated stems data from 2302 plots that were used to create number of large tree benchmarks. Refer to the info worksheet for further details of column headings.
For further details see Capararo S, Watson CJ, Somerville M, Travers SK, McNellie MJ, Dorrough J and Oliver I (2019) Function Attribute Benchmarks for the Biodiversity Assessment Method: Data audit, compilation and analysis. Department of Planning, Industry and Environment.
https://doi.org/10.4121/resource:terms_of_usehttps://doi.org/10.4121/resource:terms_of_use
This data set is part of the MSc thesis: 'High-Resolution Atmospheric Modelling and the Effects on the Prediction of wave characteristics. It provides the files used to run WRF and SWAN simulations for the most important simulations. Raw data file were not included due to their large size.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
The evolution of a software system can be studied in terms of how various properties as reflected by software metrics change over time. Current models of software evolution have allowed for inferences to be drawn about certain attributes of the software system, for instance, regarding the architecture, complexity and its impact on the development effort. However, an inherent limitation of these models is that they do not provide any direct insight into where growth takes place. In particular, we cannot assess the impact of evolution on the underlying distribution of size and complexity among the various classes. Such an analysis is needed in order to answer questions such as 'do developers tend to evenly distribute complexity as systems get bigger?', and 'do large and complex classes get bigger over time?'. These are questions of more than passing interest since by understanding what typical and successful software evolution looks like, we can identify anomalous situations and take action earlier than might otherwise be possible. Information gained from an analysis of the distribution of growth will also show if there are consistent boundaries within which a software design structure exists. The specific research questions that we address in Chapter 5 (Growth Dynamics) of the thesis this data accompanies are: What is the nature of distribution of software size and complexity measures? How does the profile and shape of this distribution change as software systems evolve? Is the rate and nature of change erratic? Do large and complex classes become bigger and more complex as software systems evolve? In our study of metric distributions, we focused on 10 different measures that span a range of size and complexity measures. In order to assess assigned responsibilities we use the two metrics Load Instruction Count and Store Instruction Count. Both metrics provide a measure for the frequency of state changes in data containers within a system. Number of Branches, on the other hand, records all branch instructions and is used to measure the structural complexity at class level. This measure is equivalent to Weighted Method Count (WMC) as proposed by Chidamber and Kemerer (1994) if a weight of 1 is applied for all methods and the complexity measure used is cyclomatic complexity. We use the measures of Fan-Out Count and Type Construction Count to obtain insight into the dynamics of the software systems. The former offers a means to document the degree of delegation, whereas the latter can be used to count the frequency of object instantiations. The remaining metrics provide structural size and complexity measures. In-Degree Count and Out-Degree Count reveal the coupling of classes within a system. These measures are extracted from the type dependency graph that we construct for each analyzed system. The vertices in this graph are classes, whereas the edges are directed links between classes. We associate popularity (i.e., the number of incoming links) with In-Degree Count and usage or delegation (i.e., the number of outgoing links) with Out-Degree Count. Number of Methods, Public Method Count, and Number of Attributes define typical object-oriented size measures and provide insights into the extent of data and functionality encapsulation. The raw metric data (4 .txt files and 1 .log file in a .zip file measuring ~0.5MB in total) is provided as a comma separated values (CSV) file, and the first line of the CSV file contains the header. A detailed output of the statistical analysis undertaken is provided as log files generated directly from Stata (statistical analysis software).
The State of Alaska Division of Geological & Geophysical Surveys (DGGS) produced airborne lidar-derived elevation data for the Pilgrim Hot Springs area, western Alaska. Both aerial lidar and ground control data were collected by DGGS. This data collection is being released as a Raw Data File with an open end-user license. These data were produced in support of active fault detection and geothermal hydrology research in the area. This data collection is being released as a Raw Data File with an open end-user license. All files can be downloaded free of charge from the Alaska Division of Geological & Geophysical Surveys website (http://doi.org/10.14509/30659).
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Sample data for exercises in Further Adventures in Data Cleaning.
The Alaska Division of Geological & Geophysical Surveys (DGGS) used aerial lidar to produce a classified point cloud and high-resolution digital terrain model (DTM), digital surface model (DSM), and intensity model of the Barry Arm landslide, northwest Prince William Sound, Alaska, during near snow-free ground conditions on June 26, 2020. The survey's goal is to provide high quality and high resolution (0.10 m) elevation data to assess potential landslide movement. Aerial lidar and ground control data were collected on June 26, 2020, and subsequently processed in Terrasolid and ArcGIS. Ground control was collected on June 26, 2020, as well. This data collection is released as a Raw Data File with an open end-user license. All files can be downloaded free of charge from the Alaska Division of Geological & Geophysical Surveys website (http://doi.org/10.14509/30593).
Vegetation Condition Benchmarks describe the reference state to which sites are compared to score their site-scale biodiversity values or set goals for management or restoration. This file contains some of the raw data used to create the most current vegetation condition benchmarks. Refer to the 'Dataset relationship' section, below, to access all the raw data files used in creating the Vegetation Condition Benchmarks V1.2.
The ‘Vegetation Condition Benchmarks Stems raw data V1.2’ file contains aggregated stems data from 2302 plots that were used to create number of large tree benchmarks. Refer to the info worksheet for further details of column headings.
For further details see Capararo S, Watson CJ, Somerville M, Travers SK, McNellie MJ, Dorrough J and Oliver I (2019) Function Attribute Benchmarks for the Biodiversity Assessment Method: Data audit, compilation and analysis. Department of Planning, Industry and Environment.
No description is available. Visit https://dataone.org/datasets/6add447b9cbe6fdfec7bd30cba174581 for complete metadata about this dataset.
The Alaska Division of Geological & Geophysical Surveys (DGGS) used aerial lidar to produce a digital terrain model (DTM), surface model (DSM), and intensity model for the area surrounding the community of Kotlik, Alaska. Detailed bare earth elevation data for the Kotlik area support and inform potential infrastructure development and provide critical information required to assess geomorphic activity. Airborne data were collected on August 17, 2019, and subsequently processed in Terrasolid and ArcGIS. Ground control was collected between August 20-22, 2019, by the Alaska Division of Mining, Land, and Water. This data collection is released as a Raw Data File with an open end-user license. All files can be downloaded free of charge from the Alaska Division of Geological & Geophysical Surveys website (http://doi.org/10.14509/30561).
THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE DEPARTMENT OF STATISTICS OF THE HASHEMITE KINGDOM OF JORDAN.
The Department of Statistics (DOS) carried out four rounds of the 2007 Employment and Unemployment Survey (EUS) during February, May, August and November 2007. The survey rounds covered a total sample of about fifty three thousand households Nation-wide. The sampled households were selected using a stratified multi-stage cluster sampling design. It is noteworthy that the sample represents the national level (Kingdom), governorates, the three Regions (Central, North and South), and the urban/rural areas.
The importance of this survey lies in that it provides a comprehensive data base on employment and unemployment that serves decision makers, researchers as well as other parties concerned with policies related to the organization of the Jordanian labor market.
The raw survey data provided by the Statistical Agency were cleaned and harmonized by the Economic Research Forum, in the context of a major project that started in 2009. During which extensive efforts have been exerted to acquire, clean, harmonize, preserve and disseminate micro data of existing labor force surveys in several Arab countries.
Covering a sample representative on the national level (Kingdom), governorates, the three Regions (Central, North and South), and the urban/rural areas.
1- Household/family. 2- Individual/person.
The survey covered a national sample of households and all individuals permanently residing in surveyed households.
Sample survey data [ssd]
THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE DEPARTMENT OF STATISTICS OF THE HASHEMITE KINGDOM OF JORDAN
The sample of this survey is based on the frame provided by the data of the Population and Housing Census, 2004. The Kingdom was divided into strata, where each city with a population of 100,000 persons or more was considered as a large city. The total number of these cities is 6. Each governorate (except for the 6 large cities) was divided into rural and urban areas. The rest of the urban areas in each governorate was considered as an independent stratum. The same was applied to rural areas where it was considered as an independent stratum. The total number of strata was 30.
In view of the existing significant variation in the socio-economic characteristics in large cities in particular and in urban in general, each stratum of the large cities and urban strata was divided into four sub-stratum according to the socio- economic characteristics provided by the population and housing census with the purpose of providing homogeneous strata.
The frame excludes collective dwellings, However, it is worth noting that the collective households identified in the harmonized data, through a variable indicating the household type, are those reported without heads in the raw data, and in which the relationship of all household members to head was reported "other".
This sample is also not representative for the non-Jordanian population.
The sample of this survey was designed, using the two-stage cluster stratified sampling method, based on the data of the population and housing census 2004 for carrying out household surveys. The sample is representative on the Kingdom, rural-urban regions and governorates levels. The total sample size for each round was 1336 Primary Sampling Units (PSUs) (clusters). These units were distributed to urban and rural regions in the governorates, in addition to the large cities in each governorate according to the weight of persons and households, and according to the variance within each stratum. Slight modifications regarding the number of these units were made to cope with the multiple of 8, the number of clusters for four rounds was 5344.
The main sample consists of 40 replicates, each replicate consists of 167 PSUs. For the purpose of each round, eight replicates of the main sample were used. The PSUs were ordered within each stratum according to geographic characteristics and then according to socio-economic characteristics in order to ensure good spread of the sample. Then, the sample was selected on two stages. In the first stage, the PSUs were selected using the Probability Proportionate to Size with systematic selection procedure. The number of households in each PSU served as its weight or size. In the second stage, the blocks of the PSUs (cluster) which were selected in the first stage have been updated. Then a constant number of households (10 households) was selected, using the random systematic sampling method as final PSUs from each PSU (cluster).
It is noteworthy that the sample of the present survey does not represent the non-Jordanian population, due to the fact that it is based on households living in conventional dwellings. In other words, it does not cover the collective households living in collective dwellings. Therefore, the non-Jordanian households covered in the present survey are either private households or collective households living in conventional dwellings.
Face-to-face [f2f]
The plan of the tabulation of survey results was guided by former Employment and Unemployment Surveys which were previously prepared and tested. The final survey report was then prepared to include all detailed tabulations as well as the methodology of the survey.
https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de448416https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de448416
Abstract (en): The Home Mortgage Disclosure Act (HMDA): Loan Application Register (LAR) and Transmittal Sheet (TS) Raw Data, 2007 contains information collected in calendar year 2006. The HMDA, enacted by Congress in 1975, requires most mortgage lenders located in metropolitan areas to report data about their housing-related lending activity. The HMDA data were collected from 8,886 lending institutions and cover approximately 34.1 million home purchase and home improvement loans and refinancings, including loan originations, loan purchases, and applications that were denied, incomplete or withdrawn. The Private Mortgage Insurance Companies (PMIC) data refer to applications for mortgage insurance to insure home purchase mortgages and to insure mortgages to refinance existing obligations. Part 1, HMDA Transmittal Sheet (TS), and Part 4, PMIC Transmittal Sheet (TS), include information submitted by reporting institutions with the Loan Application Register (LAR), such as the reporting institution's name, address, and Tax ID. Part 2, HMDA Reporter Panel, and Part 5, PMIC Reporter Panel, contain information on all institutions that reported data in activity year 2006. Part 3, HMDA MSA Offices, and Part 6, PMIC MSA Offices, contain information on all metropolitan statistical areas in the data. Parts 7 through 789 contain HMDA and PMIC Loan Application Register (LAR) files at the national level, at the agency level, and by MSA/MD. With some exceptions, for each transaction the institution reported data about the loan (or application), such as the type and amount of the loan made (or applied for) and, in limited circumstances, its price, the disposition of the application, such as whether it was denied or resulted in an origination of a loan, the property to which the loan relates, such as its type (single-family versus multi-family), and location (including the census tract), the sale of the loan, if it was sold, and the applicant's and co-applicant's ethnicity, race, sex, and income. The data are not weighted and do not contain any weight variables. ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection: Created variable labels and/or value labels.; Created online analysis version with question text.. Home purchase and home improvement loans and refinancings (or applications) lended or insured by financial institutions in the United States that were required to report HMDA data in 2007. Smallest Geographic Unit: city HMDA data were collected from 8,886 depository and nondepository institutions that were required to report HMDA data if they met the law's criteria for coverage. Generally, whether a lender is covered by HMDA depended on the lender's asset size, its location, and whether it is in the business of residential mortgage lending. PMIC data were collected from eight mortgage insurance companies that insured home purchase mortgages and to insure mortgages to refinance existing obligations. For more information about how respondents reported, please refer to A Guide to HMDA Reporting. 2016-12-12 The study title and collection dates have been revised to reflect the 2006 activity year, with data reported in 2007. Filesets 1 through 6 and the multi-part setup files will also be replaced to correct the study year. Variable descriptions for parts 1 through 6 have been incorporated into the ICPSR Codebooks; "Frequencies" documents that were included in previous releases have been retired with this update. SDA was removed from this study as the original SDA pages were processed without using hermes, and the SDA title could not be updated to reflect the correct reporting year. For datasets 7 through 789, ICPSR is releasing the original deposited data files in the condition they were received, along with SPSS, Stata, and SAS setup files.The data file for Part 7, HMDA Loan Application Register (LAR): National File, contains over 34 million records. Due to its large size, users are encouraged to open this dataset in SAS. All Census tract, county definitions, and population counts were based on the 2000 Census of Population and Housing. Value labels for the variable STATE_...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains data and models used in the following paper.
Swanson, K., Walther, P., Leitz, J., Mukherjee, S., Wu, J. C., Shivnaraine, R. V., & Zou, J. ADMET-AI: A machine learning ADMET platform for evaluation of large-scale chemical libraries. In review.
The data and models are meant to be used with the ADMET-AI code, which runs the ADMET-AI web server at admet.ai.greenstonebio.com.
The data.zip file has the following structure.
data
drugbank: Contains files with drugs from the DrugBank that have received regulatory approval. drugbank_approved.csv contains the full set of approved drugs along with ADMET-AI predictions, while the other files contain subsets of these molecules used for testing the speed of ADMET prediction tools.
tdc_admet_all: Contains the data (.csv files) and RDKit features (.npz files) for all 41 single-task ADMET datasets from the Therapeutics Data Commons (TDC).
tdc_admet_multitask: Contains the data (.csv files) and RDKit features (.npz files) for the two multi-task datasets (one regression and one classification) constructed by combining the tdc_admet_all datasets.
tdc_admet_all.csv: A CSV file containing all 41 ADMET datasets from tdc_admet_all. This can be used to easily look up all ADMET properties for a given molecule in the TDC.
tdc_admet_group: Contains the data (.csv files) and RDKit features (.npz files) for the 22 TDC ADMET Benchmark Group datasets with five splits per dataset.
tdc_admet_group_raw: Contains the raw data (.csv files) used to construct the five splits per dataset in tdc_admet_group.
The models.zip file has the following structure. Note that the ADMET-AI website and Python package use the multi-task Chemprop-RDKit models below.
models
tdc_admet_all: Contains Chemprop and Chemprop-RDKit models trained on all 41 single-task TDC ADMET datasets.
tdc_admet_all_multitask: Contains Chemprop and Chemprop-RDKit models trained on the two multi-task TDC ADMET datasets (one regression and one classification).
tdc_admet_group: Contains Chemprop and Chemprop-RDKit models trained on the 22 TDC ADMET Benchmark Group datasets.
These two shapefiles represent New Mexico NHD High Resolution stream segments and waterbodies, merged and clipped to the state boundary. RAW NHD High Resolution data, including additional layer files, is available from: https://viewer.nationalmap.gov/basic/
Original data files from LAS measurements.
Glider deployed as part of a larger program called Ecology and Oceanography of Harmful Algal Blooms in Florida (EcoHAB:Florida) to survey the physical oceanography, biological oceanography and circulation patterns for shelf scale modeling for predicting the occurrence and transport of Karenia brevis red tides. The glider was deployed to survey an area of the West Florida Shelf, in the Gulf of Mexico, and measure light attenuation, light absorption, colored dissolved organic matter (CDOM), temperature and salinity. This dataset includes raw measurements of these properties. This dataset was produced from the high resolution data files retrieved from the glider after the glider was recovered.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This file concludes two subfolders, which were archived relevant data that necessarily used for reconstructing the holographic images of stained tissue from mouse tails and unstained tissue from mouse brains, respectively. Both two subfolders have the same file components, including the MATLAB data and raw data collected from the data acquisition card that are necessary for holographic imaging reconstruction.
Here, 'Supplementary Data 1' is prepared for the holographic reconstruction of stained tissue from mouse tails, while 'Supplementary Data 2' is provided for unstained tissue from mouse brains.
*) biological_sample.mat: The raw data of imaging a slice of rat tail. The format of the data has been converted from .tdms to .mat file.
*) background_curvature.mat: The raw data used to correct for phase contaminations from system aberrations. The format of the data has been converted from .tdms to .mat file.
*) biological_sample_rawdata.tdms: The raw data of imaging a slice of rat tail. The data was collected through DAC and was in the format of TDMS.
*) background_curvature_rawdata.tdms: The raw data used to correct for phase contaminations from system aberrations. The data was collected through DAC and was in the format of TDMS.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Mock community raw data (fastq.gz files) for long read ONT 16s with ONT primer dataset