Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Dataset provided by = Björn Holzhauer
Dataset Description==Meta-analyses of clinical trials often treat the number of patients experiencing a medical event as binomially distributed when individual patient data for fitting standard time-to-event models are unavailable. Assuming identical drop-out time distributions across arms, random censorship and low proportions of patients with an event, a binomial approach results in a valid test of the null hypothesis of no treatment effect with minimal loss in efficiency compared to time-to-event methods. To deal with differences in follow-up - at the cost of assuming specific distributions for event and drop-out times - we propose a hierarchical multivariate meta-analysis model using the aggregate data likelihood based on the number of cases, fatal cases and discontinuations in each group, as well as the planned trial duration and groups sizes. Such a model also enables exchangeability assumptions about parameters of survival distributions, for which they are more appropriate than for the expected proportion of patients with an event across trials of substantially different length. Borrowing information from other trials within a meta-analysis or from historical data is particularly useful for rare events data. Prior information or exchangeability assumptions also avoid the parameter identifiability problems that arise when using more flexible event and drop-out time distributions than the exponential one. We discuss the derivation of robust historical priors and illustrate the discussed methods using an example. We also compare the proposed approach against other aggregate data meta-analysis methods in a simulation study.
Facebook
TwitterAbstractThe dataset provided here contains the efforts of independent data aggregation, quality control, and visualization of the University of Arizona (UofA) COVID-19 testing programs for the 2019 novel Coronavirus pandemic. The dataset is provided in the form of machine-readable tables in comma-separated value (.csv) and Microsoft Excel (.xlsx) formats.Additional InformationAs part of the UofA response to the 2019-20 Coronavirus pandemic, testing was conducted on students, staff, and faculty prior to start of the academic year and throughout the school year. These testings were done at the UofA Campus Health Center and through their instance program called "Test All Test Smart" (TATS). These tests identify active cases of SARS-nCoV-2 infections using the reverse transcription polymerase chain reaction (RT-PCR) test and the Antigen test. Because the Antigen test provided more rapid diagnosis, it was greatly used three weeks prior to the start of the Fall semester and throughout the academic year.As these tests were occurring, results were provided on the COVID-19 websites. First, beginning in early March, the Campus Health Alerts website reported the total number of positive cases. Later, numbers were provided for the total number of tests (March 12 and thereafter). According to the website, these numbers were updated daily for positive cases and weekly for total tests. These numbers were reported until early September where they were then included in the reporting for the TATS program.For the TATS program, numbers were provided through the UofA COVID-19 Update website. Initially on August 21, the numbers provided were the total number (July 31 and thereafter) of tests and positive cases. Later (August 25), additional information was provided where both PCR and Antigen testings were available. Here, the daily numbers were also included. On September 3, this website then provided both the Campus Health and TATS data. Here, PCR and Antigen were combined and referred to as "Total", and daily and cumulative numbers were provided.At this time, no official data dashboard was available until September 16, and aside from the information provided on these websites, the full dataset was not made publicly available. As such, the authors of this dataset independently aggregated data from multiple sources. These data were made publicly available through a Google Sheet with graphical illustration provided through the spreadsheet and on social media. The goal of providing the data and illustrations publicly was to provide factual information and to understand the infection rate of SARS-nCoV-2 in the UofA community.Because of differences in reported data between Campus Health and the TATS program, the dataset provides Campus Health numbers on September 3 and thereafter. TATS numbers are provided beginning on August 14, 2020.Description of Dataset ContentThe following terms are used in describing the dataset.1. "Report Date" is the date and time in which the website was updated to reflect the new numbers2. "Test Date" is to the date of testing/sample collection3. "Total" is the combination of Campus Health and TATS numbers4. "Daily" is to the new data associated with the Test Date5. "To Date (07/31--)" provides the cumulative numbers from 07/31 and thereafter6. "Sources" provides the source of information. The number prior to the colon refers to the number of sources. Here, "UACU" refers to the UA COVID-19 Update page, and "UARB" refers to the UA Weekly Re-Entry Briefing. "SS" and "WBM" refers to screenshot (manually acquired) and "Wayback Machine" (see Reference section for links) with initials provided to indicate which author recorded the values. These screenshots are available in the records.zip file.The dataset is distinguished where available by the testing program and the methods of testing. Where data are not available, calculations are made to fill in missing data (e.g., extrapolating backwards on the total number of tests based on daily numbers that are deemed reliable). Where errors are found (by comparing to previous numbers), those are reported on the above Google Sheet with specifics noted.For inquiries regarding the contents of this dataset, please contact the Corresponding Author listed in the README.txt file. Administrative inquiries (e.g., removal requests, trouble downloading, etc.) can be directed to data-management@arizona.edu
Facebook
TwitterA practical aggregation method for heterogeneous log-linear functions is presented. Inequality measures are employed in the construction of a simple but exact aggregate representation of an economy. Three macroeconomic applications are discussed: the aggregation of the Lucas supply function, the time-inconsistent behaviour of an egalitarian social planner facing heterogeneous discount rates, and the case of a simple heterogeneous growth model. In the latter application, aggregate CPS data is used to show that the slowdown that followed the first oil shock is worse than usually thought, and that the new economy growth resurgence is not as strong as it appears. The reaction of one man could be forecast by no known mathematics; the reaction of a billion is something else again.?Foundation and Empire, Isaac Asimov (1952)
Facebook
Twitter
As per our latest research, the global map data aggregation platform market size reached USD 4.92 billion in 2024, demonstrating robust growth dynamics. The market is projected to expand at a CAGR of 13.8% over the forecast period, resulting in a forecasted value of USD 15.13 billion by 2033. This remarkable growth is driven by the increasing integration of geospatial intelligence across industries, the proliferation of IoT devices, and the rising demand for real-time, accurate mapping solutions. The market's evolution is underpinned by rapid technological advancements, particularly in cloud computing and artificial intelligence, which are revolutionizing how map data is aggregated, processed, and utilized for diverse applications.
The primary growth factor for the map data aggregation platform market is the surging demand for precise geospatial data to power navigation systems, location-based services, and urban infrastructure planning. As smart cities initiatives gain momentum worldwide, governments and municipal authorities are increasingly relying on map data aggregation platforms to optimize traffic management, resource allocation, and public safety. The integration of advanced sensors, IoT devices, and real-time data feeds into these platforms enables dynamic mapping and analytics, which are essential for supporting autonomous vehicles, drone delivery systems, and next-generation mobility solutions. Furthermore, the expansion of e-commerce and on-demand services is fueling the need for accurate, up-to-date mapping data to enhance last-mile delivery efficiency and customer experience.
Another significant driver is the widespread adoption of cloud-based map data aggregation solutions, which offer scalability, flexibility, and cost efficiency. Enterprises across transportation, logistics, and real estate sectors are leveraging these platforms to streamline operations, improve asset tracking, and gain actionable insights from spatial data. The integration of artificial intelligence and machine learning algorithms into map data aggregation platforms is enabling automated data cleansing, anomaly detection, and predictive analytics, further enhancing the value proposition for end users. Additionally, the growing emphasis on environmental sustainability and disaster management is prompting governments and NGOs to utilize map data aggregation platforms for monitoring land use, tracking deforestation, and coordinating emergency response efforts.
The map data aggregation platform market is also witnessing growth due to the increasing need for interoperability and data standardization across diverse mapping applications. As organizations seek to consolidate disparate geospatial datasets and facilitate seamless data exchange between systems, the role of aggregation platforms becomes critical. These platforms are evolving to support open standards, APIs, and cross-platform compatibility, enabling integration with GIS tools, enterprise resource planning (ERP) systems, and customer relationship management (CRM) solutions. This trend is particularly evident in sectors such as utilities and retail, where organizations require comprehensive spatial intelligence to optimize asset management, site selection, and market analysis.
Regionally, North America continues to dominate the map data aggregation platform market, owing to the presence of major technology providers, robust digital infrastructure, and early adoption of advanced mapping technologies. However, the Asia Pacific region is emerging as the fastest-growing market, driven by rapid urbanization, government investments in smart city projects, and the proliferation of mobile and connected devices. Europe also holds a significant share, supported by stringent regulatory frameworks for data privacy and the growing adoption of location-based services in transportation and logistics. The Middle East & Africa and Latin America are gradually catching up, fueled by infrastructure development and increasing digital transformation initiatives.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In a previous version of this archive, geometry data and tables of opacity calculations were given that could be used to calculate the radiative pressure and absorption on fractal dust grains under Asymptotic Giant Branch (AGB) conditions (with a peak stellar wavelength of ~ 1 micron) for aggregates containing up to 256 primary particles. Because the focus of that work was on radiative pressure from a stellar spectrum peaking at approximately 1 micron, these data only covered the wavelength range from 0.3 to 30 microns. In this updated archive the wavelength range of the data has been expanded to allow calculation of the emission of the grains at longer wavelengths. Data are calculated for three common dust materials: forsterite, (Mg2SiO4), olivine, (Mg_(2x)Fe_(2(1-x))SiO4) with x=0.5, and 'astronomical silicate' (B.T. Draine and H.M. Lee, Optical Properties of Interstellar Graphite and Silicate Grains, Astrophysical Journal, 1984). In this updated version the range of aggregate sizes (number of primary particles in the aggregate) of some of these materials has also been increased from a maximum of 256 to 1024 constituent particles.
Example fractal aggregates were generated using the Diffusion Limited Aggregation (DLA) code as described in Wozniak M., Onofri F.R.A., Barbosa S., Yon J., Mroczka J., Comparison of methods to derive morphological parameters of multi-fractal samples of particle aggregates from TEM images, Journal of Aerosol Science 47: 12–26 (2012) and Onofri F.R.A., M. Wozniak, S. Barbosa, On the Optical Characterization of Nanoparticle and their Aggregates in Plasma Systems, Contributions to Plasma Physics 51(2-3):228-236 (2011). Aggregates were generated with a constant prefactor, kf=1.3, and two fractal dimensions (Df), representing open, porous (Df=1.8) aggregates and more compact (Df=2.8) aggregates.
The geometry files were produced with the DLA software. An example run using this software is shown for aggregates with 256 primary particles and a fractal dimension of 2.8 in the file 'dla_example.png'
For the fractal dimension=1.8 data, the number of primary particles in the aggregate, N, was increased up to 1024 from the previous maximum of 256 for all three dust materials investigated. In addition, the data for MgFeSiO4 with a fractal dimension of 2.8 was increased from 256 to 1024. As in the previous archive, 12 instances of each aggregate size were generated with primary particles having a radius of 0.5. These geometry data are given in:
aggregates_kf1.3_df1.8.zip --> Geometry for a prefactor of 1.3 and fractal dimension 1.8
aggregates_kf1.3_df2.8.zip --> Geometry for a prefactor of 1.3 and fractal dimension 2.8
An example file name for an aggregate is 'N_00000032_Agg_00000008.dat' where the first number is the number of primary particles in the aggregate (N=32) and the second number is the instance number (e.g. 8 of 12). The radius of each primary particle in an aggregate is 0.5. The geometry files have 4 columns: the x, y and z coordinates of each primary particle followed by the primary particle radius. In each zip file there is also a pdf document that describes the geometry data and shows an image of each geometry file.
These geometry data were then used to calculate the opacity of the aggregates using the Multiple Sphere T-Matrix code (MSTM v 3.0) developed by Daniel Mackowski (D.W. Mackowski, M.I. Mishchenko, A multiple sphere T-matrix Fortran code for use on parallel computer clusters, Journal of Quantitative Spectroscopy and Radiative Transfer, Volume 112, Issue 13, 2011). Data were generated using the first 10 instances of each aggregate size, and the geometry data were appropriately scaled to calculate the opacity data for primary particle radii ranging from 0.001 - 1.0 microns. As noted earlier, an earlier version of this archive was focused on radiative pressure on these aggregates and only covered the spectrum of a typical AGB star (0.3 to 30 microns wavelength). In this updated version this wavelength range has been increased to the longer wavelength limits of the optical data. By default, MSTM calculations are made along the z-axis of the geometry data. Additional calculations were made along the x and y axes for each aggregate. Therefore the final data set is the average of 30 values (10 instances each in the x,y,z directions).
The opacity data files are given in:
astronomical_silicate_df1.8.zip --> astronomical silicate aggregates with fractal dimension 1.8
astronomical_silicate_df2.8.zip --> astronomical silicate aggregates with fractal dimension 2.8
forsterite_df1.8.zip --> forsterite aggregates with fractal dimension 1.8
forsterite_df2.8.zip --> forsterite aggregates with fractal dimension 2.8
olivine_df1.8.zip --> olivine aggregates with fractal dimension 1.8
olivine_df2.8.zip --> olivine aggregates with fractal dimension 2.8
In the previous version of this archive, only the table files with the averages of the 10 instances were provided. In this updated version each of the individual opacity files used to create these tables is now also provided. These opacity files are numbered similar to the geometry files. For example, the opacity calculations for N=32, instance=5, angle=3 is given by
'opacity_results_N000032_I05_A03_file.dat.' Each file begins with a short header describing the data. For example, the astronomical silicate header for this N=32, instance=5, angle=3 file is:
#############################################################################################
# Number of primary particles in aggregate: 32
# Geometry Instance Number: 5
# Geometry File Name: N_00000032_Agg_00000005.dat
# Rotation Angles: 90.000 90.000 0.000
# Number of radius values: 30
# Minimum and maximum radius values in microns: 1.00000e-003 1.00000e+000
# Number of wavelength values: 92
# Minimum and maximum wavelength values in microns: 3.00000e-001 1.00000e+004
#############################################################################################
Afterwards the columns list the line number, the primary particle radius (microns), the wavelength (microns), the extinction efficiency factor, the absorption efficiency factor, the scattering absorption efficiency factor, the asymmetry factor and the radiation pressure efficiency factor. These efficiency factors are based on the effective radius of the aggregate described later in this document.
Within each of these zipped folders is a file that contains the averages of these individual opacity files. For example 'astronomical_silicate_df1.8.dat' is the averaged data for the astronomical silicate aggregates with a fractal dimension 1.8. As in the previous archive, the first lines of these table files are a header starting with the '#' character describing the table and the source of the optical data used.
After the header, the first line of data in the table has the following nine values giving the range for the data table and number of samples in N, (aggregate size), primary particle radius (microns) and wavelength (microns). These are:
Minimum aggregate size
Maximum aggregate size
Number of Aggregate samples
Primary Particle Minimum Radius (microns)
Primary Particle Maximum Radius (microns)
Number of Primary Particle radii samples
Wavelength minimum (microns)
Wavelength maximum (microns)
Number of Wavelength samples
Subsequent lines contain 13 columns. These columns give the efficiency factors and asymmetry factor for aggregates. These efficiency factors are based on the effective radius of the aggregate given by:
a_eff = a_primary*N^(1/3)
where a_primary is the primary particle radius and N is the number of primary particles in the aggregate.
For example, the absorption opacity of an aggregate would then be = pi*a_eff^2 * Q_abs.
The values in each column are:
Column 1: Primary particle radius in microns
Column 2: Wavelength in microns
Column 3: Number of primary particles in aggregate
Column 4: Mean Q_ext, mean extinction efficiency factor
Column 5: Standard Deviation of Mean Q_ext
Column 6: Mean Q_abs, mean absorption efficiency factor
Column 7: Standard Deviation of Mean Q_abs
Column 8: Mean Q_sca, mean scattering efficiency factor
Column 9: Standard Deviation of mean Q_sca
Column 10: Mean g_cos, mean asymmetry factor
Column 11: Standard Deviation of mean asymmetry factor
Column 12: Mean Q_pr, mean radiation pressure efficiency factor
Column 13: Standard Deviation of mean
Facebook
TwitterAttribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Concrete is one if the most used building materials worldwide. With up to 80% of volume, a large constituent of concrete consists of fine and coarse aggregate particles (normally, sizes of 0.1mm to 32 mm) which are dispersed in a cement paste matrix. The size distribution of the aggregates (i.e. the grading curve) substantially affects the properties and quality characteristics of concrete, such as e.g. its workability at the fresh state and the mechanical properties at the hardened state. In practice, usually the size distribution of small samples of the aggregate is determined by manual mechanical sieving and is considered as representative for a large amount of aggregate. However, the size distribution of the actual aggregate used for individual production batches of concrete varies, especially when e.g. recycled material is used as aggregate. As a consequence, the unknown variations of the particle size distribution have a negative effect on the robustness and the quality of the final concrete produced from the raw material.
Towards the goal of deriving precise knowledge about the actual particle size distribution of the aggregate, thus eliminating the unknown variations in the material’s properties, we propose a data set for the image based prediction of the size distribution of concrete aggregates. Incorporating such an approach into the production chain of concrete enables to react on detected variations in the size distribution of the aggregate in real-time by adapting the composition, i.e. the mixture design of the concrete accordingly, so that the desired concrete properties are reached.
https://data.uni-hannover.de/dataset/f00bdcc4-8b27-4dc4-b48d-a84d75694e18/resource/042abf8d-e87a-4940-8195-2459627f57b6/download/overview.png" alt="Classicial vs. image based granulometry" title=" ">
In the classification data, nine different grading curves are distinguished. In this context, the normative regulations of DIN 1045 are considered. The nine grading curves differ in their maximum particle size (8, 16, or 32 mm) and in the distribution of the particle size fractions allowing a categorisation of the curves to coarse-grained (A), medium-grained (B) and fine-grained (C) curves, respectively. A quantitative description of the grain size distribution of the nine curves distinguished is shown in the following figure, where the left side shows a histogram of the particle size fractions 0-2, 2-8, 8-16, and 16-32 mm and the right side shows the cumulative histograms of the grading curves (the vertical axes represent the mass-percentages of the material).
For each of the grading curves, two samples (S1 and S2) of aggregate particles were created. Each sample consists of a total mass of 5 kg of aggregate material and is carefully designed according to the grain size distribution shwon in the figure by sieving the raw material in order to separate the different grain size fractions first, and subsequently, by composing the samples according to the dedicated mass-percentages of the size distributions.
https://data.uni-hannover.de/dataset/f00bdcc4-8b27-4dc4-b48d-a84d75694e18/resource/17eb2a46-eb23-4ec2-9311-0f339e0330b4/download/statistics_classification-data.png" alt="Particle size distribution of the classification data">
For data acquisition, a static setup was used for which the samples are placed in a measurement vessel equipped with a set of calibrated reference markers whose object coordinates are known and which are assembled in a way that they form a common plane with the surface of the aggregate sample. We acquired the data by taking images of the aggregate samples (and the reference markers) which are filled in the the measurement vessel and whose constellation within the vessel is perturbed between the acquisition of each image in order to obtain variations in the sample’s visual appearance. This acquisition strategy allows to record multiple different images for the individual grading curves by reusing the same sample, consequently reducing the labour-intensive part of material sieving and sample generation. In this way, we acquired a data set of 900 images in total, consisting of 50 images of each of the two samples (S1 and S2) which were created for each of the nine grading curve definitions, respectively (50 x 2 x 9 = 900). For each image, we automatically detect the reference markers, thus receiving the image coordinates of each marker in addition to its known object coordinates. We make use of these correspondences for the computation of the homography which describes the perspective transformation of the reference marker’s plane in object space (which corresponds to the surface plane of the aggregate sample) to the image plane. Using the computed homography, we transform the image in order to obtain an perspectively rectified representation of the aggregate sample with a known, and especially a for the entire image consistent, ground sampling distance (GSD) of 8 px/mm. In the following figure, example images of our data set showing aggregate samples of each of the distinguished grading curve classes are depicted.
https://data.uni-hannover.de/dataset/f00bdcc4-8b27-4dc4-b48d-a84d75694e18/resource/59925f1d-3eef-4b50-986a-e8d2b0e14beb/download/examples_classification_data.png" alt="Example images of the classification data">
If you make use of the proposed data, please cite the publication listed below.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset encompasses an Excel datasheet containing aggregate data from thin section analyses of ice samples from the Fimbule Ice Shelf, carried out during the 2021/2022 Antarctic expedition. The research forms a crucial part of the Master's research project: "Development of a Multi-tier System for the Analysis of Ice Crystallography of Antarctic Shelf Ice", conducted by Steven McEwen. Each entry in the datasheet corresponds to a specific thin section or ice grain and includes the following parameters: Grid Number, A1 Axis Reading, A4 Reading, Corrected A4 values, number of readings, Mean C-axis Orientation, Grain Size, Date, Sample number, x-coordinate, y-coordinate, Degree of Orientation, and Spherical Aperture. These data points collectively facilitate a comprehensive understanding of the crystallography of the Fimbule Ice Shelf's ice samples. Data was collected and analyzed during the 2021/2022 Antarctic summer expedition, with additional analysis being performed in the Polar engineering Research Group's laboratory.
Facebook
TwitterAttribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
This repository contains the data related to the paper ** "Granulometry transformer: image-based granulometry of concrete aggregate for an automated concrete production control" ** where a deep learning based method is proposed for the image based determination of concrete aggregate grading curves (cf. video).
More specifically, the data set consists of images showing concrete aggregate particles and reference data of the particle size distribution (grading curves) associated to each image. It is distinguished between the CoarseAggregateData and the FineAggregateData.
The coarse data consists of aggregate samples with different particles sizes ranging from 0.1 mm to 32 mm. The grading curves are designed by linearly interpolation between a very fine and a very coarse distribution for three variants with maximum grain sizes of 8 mm, 16 mm, and 32 mm, respectively. For each variant, we designed eleven grading curves, resulting in a total number 33, which are shown in the figure below. For each sample, we acquired 50 images with a GSD of 0.125 mm, resulting in a data set of 1650 images in total. Example images for a subset of the grading curves of this data set are shown in the following figure.
https://data.uni-hannover.de/dataset/ecb0bf04-84c8-45b1-8a43-044f3f80d92c/resource/8cb30616-5b24-4028-9c1d-ea250ac8ac84/download/examplecoarse.png" alt="Example images and grading curves of the coarse data set" title=" ">
Similar to the previous data set, the fine data set contains grading curves for the fine
fraction of concrete aggregate of 0 to 2 mm with a GSD of 28.5 $\mu$m.
We defined two base distributions of different shapes for the upper and lower bound, respectively, resulting in two interpolated grading curve sets (Set A and Set B). In total, 1700
images of 34 different particle size distributions were acquired. Example images of the data set and the corresponding grading curves are shown in the figure below.
https://data.uni-hannover.de/dataset/ecb0bf04-84c8-45b1-8a43-044f3f80d92c/resource/c56f4298-9663-457f-aaa7-0ba113fec4c9/download/examplefine.png" alt="Example images and grading curves of the finedata set" title=" ">
If you make use of the proposed data, please cite.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data and Inferred Networks accompanying the manuscript entitled - “Aggregation of recount3 RNA-seq data improves the inference of consensus and context-specific gene co-expression networks”
Authors: Prashanthi Ravichandran, Princy Parsana, Rebecca Keener, Kaspar Hansen, Alexis Battle
Affiliations: Johns Hopkins University School of Medicine, Johns Hopkins University Department of Computer Science, Johns Hopkins University Bloomberg School of Public Health
Description:
This folder includes data produced in the analysis contained in the manuscript and inferred consensus and context-specific networks from graphical lasso and WGCNA with varying numbers of edges. Contents include:
all_metadata.rds: File including meta-data columns of study accession ID, sample ID, assigned tissue category, cancer status and disease status obtained through manual curation for the 95,484 RNA-seq samples used in the study.
all_counts.rds: log2 transformed RPKM normalized read counts for 5999 genes and 95,484 RNA-seq samples which was utilized for dimensionality reduction and data exploration
precision_matrices.zip: Zipped folder including networks inferred by graphical lasso for different experiments presented in the paper using weighted covariance aggregation following PC correction.
The networks can be found as follows. First, select the folder corresponding to the network of interest - for example, Blood, this will then include two or more folders which indicate the data aggregation utilized, select the folder corresponding appropriate level of data aggregation - either all samples/ GTEx for blood-specific networks, this includes precision matrices inferred across a range of penalization parameters. To view the precision matrix inferred for a particular value of the penalization parameter X, select the file labeled lambda_X.rds
For select networks, we have included the computed centrality measures which can be accessed at centrality_X.rds for a particular value of the penalization parameter X.
We have also included .rds files that list the hub genes from the consensus networks inferred from non-cancerous samples at “normal_hubs.rds”, and the consensus networks inferred from cancerous samples at “cancer_hubs.rds”
The file “context_specific_selected_networks.csv” includes the networks that were selected for downstream biological interpretation based on the scale-free criterion which is also summarized in the Supplementary Tables.
WGCNA.zip: A zipped folder containing gene modules inferred from WGCNA for sequentially aggregated GTEx, SRA, and blood studies. Select the data aggregated, and the number of studies based on folder names. For example, blood networks inferred from 20 studies can be accessed at blood/consensus/net_20. The individual networks correspond to distinct cut heights, and include information on the cut height used, the genes that the network was inferred over merged module labels, and merged module colors.
Facebook
TwitterThe Measurable AI Amazon Consumer Transaction Dataset is a leading source of email receipts and consumer transaction data, offering data collected directly from users via Proprietary Consumer Apps, with millions of opt-in users.
We source our email receipt consumer data panel via two consumer apps which garner the express consent of our end-users (GDPR compliant). We then aggregate and anonymize all the transactional data to produce raw and aggregate datasets for our clients.
Use Cases Our clients leverage our datasets to produce actionable consumer insights such as: - Market share analysis - User behavioral traits (e.g. retention rates) - Average order values - Promotional strategies used by the key players. Several of our clients also use our datasets for forecasting and understanding industry trends better.
Coverage - Asia (Japan) - EMEA (Spain, United Arab Emirates)
Granular Data Itemized, high-definition data per transaction level with metrics such as - Order value - Items ordered - No. of orders per user - Delivery fee - Service fee - Promotions used - Geolocation data and more
Aggregate Data - Weekly/ monthly order volume - Revenue delivered in aggregate form, with historical data dating back to 2018. All the transactional e-receipts are sent from app to users’ registered accounts.
Most of our clients are fast-growing Tech Companies, Financial Institutions, Buyside Firms, Market Research Agencies, Consultancies and Academia.
Our dataset is GDPR compliant, contains no PII information and is aggregated & anonymized with user consent. Contact business@measurable.ai for a data dictionary and to find out our volume in each country.
Facebook
Twitter
As per our latest research, the global Pet Wearable Data Aggregation Platforms market size in 2024 is valued at USD 2.37 billion, reflecting the rapid adoption of connected pet devices worldwide. The market is experiencing robust expansion with a compound annual growth rate (CAGR) of 16.5% from 2025 to 2033. By leveraging this CAGR, the market is forecasted to reach USD 9.44 billion by 2033. This impressive growth trajectory is primarily driven by rising pet ownership, increasing awareness about pet health, and the technological advancements in IoT-enabled pet wearables that facilitate real-time data aggregation and analysis.
One of the most significant growth factors for the Pet Wearable Data Aggregation Platforms market is the increasing humanization of pets, particularly in developed economies. Pet owners are now more inclined to invest in advanced technology that ensures the well-being, safety, and health monitoring of their animals. The proliferation of smart devices tailored for pets, such as collars, harnesses, and vests embedded with sensors, allows for continuous health and activity tracking. These devices collect large volumes of data, which can be aggregated and analyzed on centralized platforms to provide actionable insights for both pet owners and veterinarians. The trend is further supported by growing disposable incomes and the willingness of consumers to spend more on premium pet care solutions, fueling the demand for sophisticated wearable solutions.
Another key driver is the integration of artificial intelligence and machine learning algorithms into pet wearable data aggregation platforms. These advancements enable predictive analytics, early diagnosis of health issues, and personalized recommendations for pet care. The ability to remotely monitor pets' vital signs, activity levels, and behavioral patterns has proven invaluable during the post-pandemic era, where remote veterinary consultations and telemedicine services have gained traction. Additionally, the increasing prevalence of chronic diseases among pets, such as obesity and diabetes, is prompting pet owners to seek continuous health monitoring solutions, further propelling the adoption of pet wearable data aggregation platforms.
The growing ecosystem of partnerships between device manufacturers, software developers, and veterinary service providers is also contributing to market expansion. Collaborative efforts are leading to the creation of interoperable platforms that can integrate data from multiple devices, streamline information flow, and enhance the overall user experience. Moreover, data security and privacy have become paramount, prompting vendors to invest in secure cloud-based solutions that comply with global data protection regulations. These factors collectively are fostering an environment conducive to rapid market growth, as stakeholders across the value chain recognize the transformative potential of pet wearable data aggregation platforms.
Regionally, North America dominates the Pet Wearable Data Aggregation Platforms market, accounting for the largest share due to high pet ownership rates, advanced veterinary infrastructure, and the presence of leading market players. Europe follows closely, driven by similar trends and increasing awareness about animal welfare. The Asia Pacific region is emerging as a lucrative market, supported by rising disposable incomes, urbanization, and a burgeoning pet population. Latin America and the Middle East & Africa are also witnessing gradual adoption, albeit at a slower pace, as awareness and infrastructure continue to develop. Each region presents unique growth opportunities and challenges, shaping the global landscape of the pet wearable data aggregation platforms market.
The Pet Wearable Data Aggregation Platforms market by component is segmented into hardware, software, and services. Hardware forms the backbone of this ecosystem, comprising smart collars, harnesse
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The file set is a freely downloadable aggregation of information about Australian schools. The individual files represent a series of tables which, when considered together, form a relational database. The records cover the years 2008-2014 and include information on approximately 9500 primary and secondary school main-campuses and around 500 subcampuses. The records all relate to school-level data; no data about individuals is included. All the information has previously been published and is publicly available but it has not previously been released as a documented, useful aggregation. The information includes: (a) the names of schools (b) staffing levels, including full-time and part-time teaching and non-teaching staff (c) student enrolments, including the number of boys and girls (d) school financial information, including Commonwealth government, state government, and private funding (e) test data, potentially for school years 3, 5, 7 and 9, relating to an Australian national testing programme know by the trademark 'NAPLAN'
Documentation of this Edition 2016.1 is incomplete but the organization of the data should be readily understandable to most people. If you are a researcher, the simplest way to study the data is to make use of the SQLite3 database called 'school-data-2016-1.db'. If you are unsure how to use an SQLite database, ask a guru.
The database was constructed directly from the other included files by running the following command at a command-line prompt: sqlite3 school-data-2016-1.db < school-data-2016-1.sql Note that a few, non-consequential, errors will be reported if you run this command yourself. The reason for the errors is that the SQLite database is created by importing a series of '.csv' files. Each of the .csv files contains a header line with the names of the variable relevant to each column. The information is useful for many statistical packages but it is not what SQLite expects, so it complains about the header. Despite the complaint, the database will be created correctly.
Briefly, the data are organized as follows. (a) The .csv files ('comma separated values') do not actually use a comma as the field delimiter. Instead, the vertical bar character '|' (ASCII Octal 174 Decimal 124 Hex 7C) is used. If you read the .csv files using Microsoft Excel, Open Office, or Libre Office, you will need to set the field-separator to be '|'. Check your software documentation to understand how to do this. (b) Each school-related record is indexed by an identifer called 'ageid'. The ageid uniquely identifies each school and consequently serves as the appropriate variable for JOIN-ing records in different data files. For example, the first school-related record after the header line in file 'students-headed-bar.csv' shows the ageid of the school as 40000. The relevant school name can be found by looking in the file 'ageidtoname-headed-bar.csv' to discover that the the ageid of 40000 corresponds to a school called 'Corpus Christi Catholic School'. (3) In addition to the variable 'ageid' each record is also identified by one or two 'year' variables. The most important purpose of a year identifier will be to indicate the year that is relevant to the record. For example, if one turn again to file 'students-headed-bar.csv', one sees that the first seven school-related records after the header line all relate to the school Corpus Christi Catholic School with ageid of 40000. The variable that identifies the important differences between these seven records is the variable 'studentyear'. 'studentyear' shows the year to which the student data refer. One can see, for example, that in 2008, there were a total of 410 students enrolled, of whom 185 were girls and 225 were boys (look at the variable names in the header line). (4) The variables relating to years are given different names in each of the different files ('studentsyear' in the file 'students-headed-bar.csv', 'financesummaryyear' in the file 'financesummary-headed-bar.csv'). Despite the different names, the year variables provide the second-level means for joining information acrosss files. For example, if you wanted to relate the enrolments at a school in each year to its financial state, you might wish to JOIN records using 'ageid' in the two files and, secondarily, matching 'studentsyear' with 'financialsummaryyear'. (5) The manipulation of the data is most readily done using the SQL language with the SQLite database but it can also be done in a variety of statistical packages. (6) It is our intention for Edition 2016-2 to create large 'flat' files suitable for use by non-researchers who want to view the data with spreadsheet software. The disadvantage of such 'flat' files is that they contain vast amounts of redundant information and might not display the data in the form that the user most wants it. (7) Geocoding of the schools is not available in this edition. (8) Some files, such as 'sector-headed-bar.csv' are not used in the creation of the database but are provided as a convenience for researchers who might wish to recode some of the data to remove redundancy. (9) A detailed example of a suitable SQLite query can be found in the file 'school-data-sqlite-example.sql'. The same query, used in the context of analyses done with the excellent, freely available R statistical package (http://www.r-project.org) can be seen in the file 'school-data-with-sqlite.R'.
Facebook
TwitterMonthly report including total dispatched trips, total dispatched shared trips, and unique dispatched vehicles aggregated by FHV (For-Hire Vehicle) base. These have been tabulated from raw trip record submissions made by bases to the NYC Taxi and Limousine Commission (TLC). This dataset is typically updated monthly on a two-month lag, as bases have until the conclusion of the following month to submit a month of trip records to the TLC. In example, a base has until Feb 28 to submit complete trip records for January. Therefore, the January base aggregates will appear in March at the earliest. The TLC may elect to defer updates to the FHV Base Aggregate Report if a large number of bases have failed to submit trip records by the due date. Note: The TLC publishes base trip record data as submitted by the bases, and we cannot guarantee or confirm their accuracy or completeness. Therefore, this may not represent the total amount of trips dispatched by all TLC-licensed bases. The TLC performs routine reviews of the records and takes enforcement actions when necessary to ensure, to the extent possible, complete and accurate information.
Facebook
Twitter
According to our latest research, the global risk data aggregation and reporting for banks market size reached USD 7.9 billion in 2024, driven by the increasing regulatory requirements and the growing complexity of banking operations. The market is expected to expand at a robust CAGR of 14.2% from 2025 to 2033, reaching a projected value of USD 22.3 billion by 2033. This impressive growth is primarily fueled by the ongoing digital transformation initiatives within the banking sector, as well as the heightened focus on risk management and compliance. As per our latest research, banks globally are investing in advanced data aggregation and reporting solutions to meet evolving regulatory mandates and enhance operational efficiency.
One of the principal growth factors for the risk data aggregation and reporting for banks market is the tightening regulatory landscape. Financial authorities such as the Basel Committee on Banking Supervision (BCBS) have established stringent guidelines, notably BCBS 239, which require banks to improve their risk data aggregation capabilities and reporting practices. This has led to a surge in demand for robust solutions that can ensure data accuracy, consistency, and timeliness. Banks are compelled to invest in advanced software and services that facilitate real-time data integration, risk assessment, and regulatory reporting. The growing volume and complexity of banking transactions further underscore the need for comprehensive risk data aggregation and reporting frameworks, as traditional manual processes are no longer sufficient to meet regulatory expectations.
Another significant driver is the rapid digitalization of the banking sector. As banks embrace digital transformation, they are generating massive amounts of data from various sources, including online transactions, customer interactions, and third-party integrations. Efficient risk data aggregation and reporting solutions enable banks to harness this data, providing actionable insights for risk management and strategic decision-making. The adoption of technologies such as artificial intelligence, machine learning, and big data analytics is enhancing the capabilities of these solutions, allowing banks to identify emerging risks, optimize capital allocation, and improve overall governance. This digital shift is not just a response to regulatory pressure but also a strategic move to gain competitive advantage in a fast-evolving financial landscape.
Furthermore, the increasing focus on operational resilience and business continuity is propelling the adoption of risk data aggregation and reporting solutions. Banks are recognizing the need to quickly aggregate and analyze data from multiple sources to detect vulnerabilities, prevent fraud, and ensure compliance with internal and external policies. The COVID-19 pandemic has further highlighted the importance of real-time risk management and agile reporting, as financial institutions faced unprecedented disruptions and market volatility. As a result, investments in risk data infrastructure are becoming a top priority for banks of all sizes, paving the way for sustained market growth over the forecast period.
From a regional perspective, North America currently dominates the risk data aggregation and reporting for banks market, followed closely by Europe and Asia Pacific. The United States, in particular, has a mature banking sector with stringent regulatory requirements, driving early adoption of advanced risk data solutions. Meanwhile, the Asia Pacific region is witnessing the fastest growth, fueled by rapid digitalization, expanding banking networks, and increasing regulatory oversight in emerging economies such as China and India. Europe remains a key market due to the implementation of comprehensive financial regulations and the presence of major global banks. Latin America and the Middle East & Africa are also showing steady progress, albeit at a slower pace, as banks in these regions gradually upgrade their risk management capabilities.
Facebook
TwitterThe UK censuses took place on 21st April 1991. They were run by the Census Office for Northern Ireland, General Register Office for Scotland, and the Office of Population and Surveys for both England and Wales. The UK comprises the countries of England, Wales, Scotland and Northern Ireland.
Statistics from the UK censuses help paint a picture of the nation and how we live. They provide a detailed snapshot of the population and its characteristics, and underpin funding allocation to provide public services.
The aggregate data produced as outputs from censuses in Northern Ireland provide information on a wide range of demographic and socio-economic characteristics. They are predominantly a collection of aggregated or summary counts of the numbers of people or households resident in specific geographical areas possessing particular characteristics.
Facebook
TwitterObjective(s): Momentum for open access to research is growing. Funding agencies and publishers are increasingly requiring researchers make their data and research outputs open and publicly available. However, clinical researchers struggle to find real-world examples of Open Data sharing. The aim of this 1 hr virtual workshop is to provide real-world examples of Open Data sharing for both qualitative and quantitative data. Specifically, participants will learn: 1. Primary challenges and successes when sharing quantitative and qualitative clinical research data. 2. Platforms available for open data sharing. 3. Ways to troubleshoot data sharing and publish from open data. Workshop Agenda: 1. “Data sharing during the COVID-19 pandemic” - Speaker: Srinivas Murthy, Clinical Associate Professor, Department of Pediatrics, Faculty of Medicine, University of British Columbia. Investigator, BC Children's Hospital 2. “Our experience with Open Data for the 'Integrating a neonatal healthcare package for Malawi' project.” - Speaker: Maggie Woo Kinshella, Global Health Research Coordinator, Department of Obstetrics and Gynaecology, BC Children’s and Women’s Hospital and University of British Columbia This workshop draws on work supported by the Digital Research Alliance of Canada. Data Description: Presentation slides, Workshop Video, and Workshop Communication Srinivas Murthy: Data sharing during the COVID-19 pandemic presentation and accompanying PowerPoint slides. Maggie Woo Kinshella: Our experience with Open Data for the 'Integrating a neonatal healthcare package for Malawi' project presentation and accompanying Powerpoint slides. This workshop was developed as part of Dr. Ansermino's Data Champions Pilot Project supported by the Digital Research Alliance of Canada., NOTE for restricted files: If you are not yet a CoLab member, please complete our membership application survey to gain access to restricted files within 2 business days. Some files may remain restricted to CoLab members. These files are deemed more sensitive by the file owner and are meant to be shared on a case-by-case basis. Please contact the CoLab coordinator on this page under "collaborate with the pediatric sepsis colab."
Facebook
TwitterThis dataset includes soil wet aggregate stability measurements from the Upper Mississippi River Basin LTAR site in Ames, Iowa. Samples were collected in 2021 from this long-term tillage and cover crop trial in a corn-based agroecosystem. We measured wet aggregate stability using digital photography to quantify disintegration (slaking) of submerged aggregates over time, similar to the technique described by Fajardo et al. (2016) and Rieke et al. (2021). However, we adapted the technique to larger sample numbers by using a multi-well tray to submerge 20-36 aggregates simultaneously. We used this approach to measure slaking index of 160 soil samples (2120 aggregates). This dataset includes slaking index calculated for each aggregates, and also summarized by samples. There were usually 10-12 aggregates measured per sample. We focused primarily on methodological issues, assessing the statistical power of slaking index, needed replication, sensitivity to cultural practices, and sensitivity to sample collection date. We found that small numbers of highly unstable aggregates lead to skewed distributions for slaking index. We concluded at least 20 aggregates per sample were preferred to provide confidence in measurement precision. However, the experiment had high statistical power with only 10-12 replicates per sample. Slaking index was not sensitive to the initial size of dry aggregates (3 to 10 mm diameter); therefore, pre-sieving soils was not necessary. The field trial showed greater aggregate stability under no-till than chisel plow practice, and changing stability over a growing season. These results will be useful to researchers and agricultural practitioners who want a simple, fast, low-cost method for measuring wet aggregate stability on many samples.
Facebook
TwitterAttribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
This repository contains the data related to the paper "Granulometry transformer: image-based granulometry of concrete aggregate for an automated concrete production control" where a deep learning based method is proposed for the image based determination of concrete aggregate grading curves (cf. video). More specifically, the data set consists of images showing concrete aggregate particles and reference data of the particle size distribution (grading curves) associated to each image. It is distinguished between the CoarseAggregateData and the FineAggregateData. Coarse Aggregate Data The coarse data consists of aggregate samples with different particles sizes ranging from 0.1 mm to 32 mm. The grading curves are designed by linearly interpolation between a very fine and a very coarse distribution for three variants with maximum grain sizes of 8 mm, 16 mm, and 32 mm, respectively. For each variant, we designed eleven grading curves, resulting in a total number 33, which are shown in the figure below. For each sample, we acquired 50 images with a GSD of 0.125 mm, resulting in a data set of 1650 images in total. Example images for a subset of the grading curves of this data set are shown in the following figure. Fine Aggregate Data Similar to the previous data set, the fine data set contains grading curves for the fine fraction of concrete aggregate of 0 to 2 mm with a GSD of 28.5 $\mu$m. We defined two base distributions of different shapes for the upper and lower bound, respectively, resulting in two interpolated grading curve sets (Set A and Set B). In total, 1700 images of 34 different particle size distributions were acquired. Example images of the data set and the corresponding grading curves are shown in the figure below. Related publications:
Facebook
TwitterThis dataset was created to pilot techniques for creating synthetic data from datasets containing sensitive and protected information in the local government context. Synthetic data generation replaces actual data with representative data generated from statistical models; this preserves the key data properties that allow insights to be drawn from the data while protecting the privacy of the people included in the data. We invite you to read the Understanding Synthetic Data white paper for a concise introduction to synthetic data.
This effort was a collaboration of the Urban Institute, Allegheny County’s Department of Human Services (DHS) and CountyStat, and the University of Pittsburgh’s Western Pennsylvania Regional Data Center.
The source data for this project consisted of 1) month-by-month records of services included in Allegheny County's data warehouse and 2) demographic data about the individuals who received the services. As the County’s data warehouse combines this service and client data, this data is referred to as “Integrated Services data”. Read more about the data warehouse and the kinds of services it includes here.
Synthetic data are typically generated from probability distributions or models identified as being representative of the confidential data. For this dataset, a model of the Integrated Services data was used to generate multiple versions of the synthetic dataset. These different candidate datasets were evaluated to select for publication the dataset version that best balances utility and privacy. For high-level information about this evaluation, see the Synthetic Data User Guide.
For more information about the creation of the synthetic version of this data, see the technical brief for this project, which discusses the technical decision making and modeling process in more detail.
This disaggregated synthetic data allows for many analyses that are not possible with aggregate data (summary statistics). Broadly, this synthetic version of this data could be analyzed to better understand the usage of human services by people in Allegheny County, including the interplay in the usage of multiple services and demographic information about clients.
Some amount of deviation from the original data is inherent to the synthetic data generation process. Specific examples of limitations (including undercounts and overcounts for the usage of different services) are given in the Synthetic Data User Guide and the technical report describing this dataset's creation.
Please reach out to this dataset's data steward (listed below) to let us know how you are using this data and if you found it to be helpful. Please also provide any feedback on how to make this dataset more applicable to your work, any suggestions of future synthetic datasets, or any additional information that would make this more useful. Also, please copy wprdc@pitt.edu on any such feedback (as the WPRDC always loves to hear about how people use the data that they publish and how the data could be improved).
1) A high-level overview of synthetic data generation as a method for protecting privacy can be found in the Understanding Synthetic Data white paper.
2) The Synthetic Data User Guide provides high-level information to help users understand the motivation, evaluation process, and limitations of the synthetic version of Allegheny County DHS's Human Services data published here.
3) Generating a Fully Synthetic Human Services Dataset: A Technical Report on Synthesis and Evaluation Methodologies describes the full technical methodology used for generating the synthetic data, evaluating the various options, and selecting the final candidate for publication.
4) The WPRDC also hosts the Allegheny County Human Services Community Profiles dataset, which provides annual updates on human-services usage, aggregated by neighborhood/municipality. That data can be explored using the County's Human Services Community Profile web site.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Modality-agnostic files were copied over and the CHANGES file was updated. Data was aggregated using:
python phenotype.py aggregate subject -i segregated_subject -o aggregated_subject
phenotype.py came from the GitHub repository: https://github.com/ericearl/bids-phenotype
A comprehensive clinical, MRI, and MEG collection characterizing healthy research volunteers collected at the National Institute of Mental Health (NIMH) Intramural Research Program (IRP) in Bethesda, Maryland using medical and mental health assessments, diagnostic and dimensional measures of mental health, cognitive and neuropsychological functioning, structural and functional magnetic resonance imaging (MRI), along with diffusion tensor imaging (DTI), and a comprehensive magnetoencephalography battery (MEG).
In addition, blood samples are currently banked for future genetic analysis. All data collected in this protocol are broadly shared in the OpenNeuro repository, in the Brain Imaging Data Structure (BIDS) format. In addition, blood samples of healthy volunteers are banked for future analyses. All data collected in this protocol are broadly shared here, in the Brain Imaging Data Structure (BIDS) format. In addition, task paradigms and basic pre-processing scripts are shared on GitHub. This dataset is unique in its depth of characterization of a healthy population in terms of brain health and will contribute to a wide array of secondary investigations of non-clinical and clinical research questions.
This dataset is licensed under the Creative Commons Zero (CC0) v1.0 License.
Inclusion criteria for the study require that participants are adults at or over 18 years of age in good health with the ability to read, speak, understand, and provide consent in English. All participants provided electronic informed consent for online screening and written informed consent for all other procedures. Exclusion criteria include:
Study participants are recruited through direct mailings, bulletin boards and listservs, outreach exhibits, print advertisements, and electronic media.
All potential volunteers first visit the study website (https://nimhresearchvolunteer.ctss.nih.gov), check a box indicating consent, and complete preliminary self-report screening questionnaires. The study website is HIPAA compliant and therefore does not collect PII ; instead, participants are instructed to contact the study team to provide their identity and contact information. The questionnaires include demographics, clinical history including medications, disability status (WHODAS 2.0), mental health symptoms (modified DSM-5 Self-Rated Level 1 Cross-Cutting Symptom Measure), substance use survey (DSM-5 Level 2), alcohol use (AUDIT), handedness (Edinburgh Handedness Inventory), and perceived health ratings. At the conclusion of the questionnaires, participants are again prompted to send an email to the study team. Survey results, supplemented by NIH medical records review (if present), are reviewed by the study team, who determine if the participant is likely eligible for the protocol. These participants are then scheduled for an in-person assessment. Follow-up phone screenings were also used to determine if participants were eligible for in-person screening.
At this visit, participants undergo a comprehensive clinical evaluation to determine final eligibility to be included as a healthy research volunteer. The mental health evaluation consists of a psychiatric diagnostic interview (Structured Clinical Interview for DSM-5 Disorders (SCID-5), along with self-report surveys of mood (Beck Depression Inventory-II (BD-II) and anxiety (Beck Anxiety Inventory, BAI) symptoms. An intelligence quotient (IQ) estimation is determined with the Kaufman Brief Intelligence Test, Second Edition (KBIT-2). The KBIT-2 is a brief (20-30 minute) assessment of intellectual functioning administered by a trained examiner. There are three subtests, including verbal knowledge, riddles, and matrices.
Medical evaluation includes medical history elicitation and systematic review of systems. Biological and physiological measures include vital signs (blood pressure, pulse), as well as weight, height, and BMI. Blood and urine samples are taken and a complete blood count, acute care panel, hepatic panel, thyroid stimulating hormone, viral markers (HCV, HBV, HIV), C-reactive protein, creatine kinase, urine drug screen and urine pregnancy tests are performed. In addition, blood samples that can be used for future genomic analysis, development of lymphoblastic cell lines or other biomarker measures are collected and banked with the NIMH Repository and Genomics Resource (Infinity BiologiX). The Family Interview for Genetic Studies (FIGS) was later added to the assessment in order to provide better pedigree information; the Adverse Childhood Events (ACEs) survey was also added to better characterize potential risk factors for psychopathology. The entirety of the in-person assessment not only collects information relevant for eligibility determination, but it also provides a comprehensive set of standardized clinical measures of volunteer health that can be used for secondary research.
Participants are given the option to consent for a magnetic resonance imaging (MRI) scan, which can serve as a baseline clinical scan to determine normative brain structure, and also as a research scan with the addition of functional sequences (resting state and diffusion tensor imaging). The MR protocol used was initially based on the ADNI-3 basic protocol, but was later modified to include portions of the ABCD protocol in the following manner:
At the time of the MRI scan, volunteers are administered a subset of tasks from the NIH Toolbox Cognition Battery. The four tasks include:
An optional MEG study was added to the protocol approximately one year after the study was initiated, thus there are relatively fewer MEG recordings in comparison to the MRI dataset. MEG studies are performed on a 275 channel CTF MEG system (CTF MEG, Coquiltam BC, Canada). The position of the head was localized at the beginning and end of each recording using three fiducial coils. These coils were placed 1.5 cm above the nasion, and at each ear, 1.5 cm from the tragus on a line between the tragus and the outer canthus of the eye. For 48 participants (as of 2/1/2022), photographs were taken of the three coils and used to mark the points on the T1 weighted structural MRI scan for co-registration. For the remainder of the participants (n=16 as of 2/1/2022), a Brainsight neuronavigation system (Rogue Research, Montréal, Québec, Canada) was used to coregister the MRI and fiducial localizer coils in realtime prior to MEG data acquisition.
Online and In-person behavioral and clinical measures, along with the corresponding phenotype file name, sorted first by measurement location and then by file name.
| Location | Measure | File Name |
|---|---|---|
| Online | Alcohol Use Disorders Identification Test (AUDIT) | audit |
| Demographics | demographics | |
| DSM-5 Level 2 Substance Use - Adult | drug_use | |
| Edinburgh Handedness Inventory (EHI) | ehi | |
| Health History Form | health_history_questions | |
| Perceived Health Rating - self | health_rating | |
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Dataset provided by = Björn Holzhauer
Dataset Description==Meta-analyses of clinical trials often treat the number of patients experiencing a medical event as binomially distributed when individual patient data for fitting standard time-to-event models are unavailable. Assuming identical drop-out time distributions across arms, random censorship and low proportions of patients with an event, a binomial approach results in a valid test of the null hypothesis of no treatment effect with minimal loss in efficiency compared to time-to-event methods. To deal with differences in follow-up - at the cost of assuming specific distributions for event and drop-out times - we propose a hierarchical multivariate meta-analysis model using the aggregate data likelihood based on the number of cases, fatal cases and discontinuations in each group, as well as the planned trial duration and groups sizes. Such a model also enables exchangeability assumptions about parameters of survival distributions, for which they are more appropriate than for the expected proportion of patients with an event across trials of substantially different length. Borrowing information from other trials within a meta-analysis or from historical data is particularly useful for rare events data. Prior information or exchangeability assumptions also avoid the parameter identifiability problems that arise when using more flexible event and drop-out time distributions than the exponential one. We discuss the derivation of robust historical priors and illustrate the discussed methods using an example. We also compare the proposed approach against other aggregate data meta-analysis methods in a simulation study.