100+ datasets found

Data from: Meta-analysis of aggregate data on medical events
kaggle.com
zip
Updated Nov 18, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
mahdieh hajian (2024). Meta-analysis of aggregate data on medical events [Dataset]. https://www.kaggle.com/datasets/mahdiehhajian/meta-analysis-of-aggregate-data-on-medical-events/code
Explore at:
zip(1957 bytes)Available download formats
Dataset updated
Nov 18, 2024
Authors
mahdieh hajian
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset provided by = Björn Holzhauer

Dataset Description==Meta-analyses of clinical trials often treat the number of patients experiencing a medical event as binomially distributed when individual patient data for fitting standard time-to-event models are unavailable. Assuming identical drop-out time distributions across arms, random censorship and low proportions of patients with an event, a binomial approach results in a valid test of the null hypothesis of no treatment effect with minimal loss in efficiency compared to time-to-event methods. To deal with differences in follow-up - at the cost of assuming specific distributions for event and drop-out times - we propose a hierarchical multivariate meta-analysis model using the aggregate data likelihood based on the number of cases, fatal cases and discontinuations in each group, as well as the planned trial duration and groups sizes. Such a model also enables exchangeability assumptions about parameters of survival distributions, for which they are more appropriate than for the expected proportion of patients with an event across trials of substantially different length. Borrowing information from other trials within a meta-analysis or from historical data is particularly useful for rare events data. Prior information or exchangeability assumptions also avoid the parameter identifiability problems that arise when using more flexible event and drop-out time distributions than the exponential one. We discuss the derivation of robust historical priors and illustrate the discussed methods using an example. We also compare the proposed approach against other aggregate data meta-analysis methods in a simulation study.
Geometry and Opacity Data for Fractal Aggregates
zenodo.org
png, zip
Updated Jul 25, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Frank Ferguson; Frank Ferguson; John Paquette; Joseph Nuth; John Paquette; Joseph Nuth (2025). Geometry and Opacity Data for Fractal Aggregates [Dataset]. http://doi.org/10.5281/zenodo.16095931
Explore at:
zip, pngAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.16095931
Dataset updated
Jul 25, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Frank Ferguson; Frank Ferguson; John Paquette; Joseph Nuth; John Paquette; Joseph Nuth
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In a previous version of this archive, geometry data and tables of opacity calculations were given that could be used to calculate the radiative pressure and absorption on fractal dust grains under Asymptotic Giant Branch (AGB) conditions (with a peak stellar wavelength of ~ 1 micron) for aggregates containing up to 256 primary particles. Because the focus of that work was on radiative pressure from a stellar spectrum peaking at approximately 1 micron, these data only covered the wavelength range from 0.3 to 30 microns. In this updated archive the wavelength range of the data has been expanded to allow calculation of the emission of the grains at longer wavelengths. Data are calculated for three common dust materials: forsterite, (Mg2SiO4), olivine, (Mg_(2x)Fe_(2(1-x))SiO4) with x=0.5, and 'astronomical silicate' (B.T. Draine and H.M. Lee, Optical Properties of Interstellar Graphite and Silicate Grains, Astrophysical Journal, 1984). In this updated version the range of aggregate sizes (number of primary particles in the aggregate) of some of these materials has also been increased from a maximum of 256 to 1024 constituent particles.

Example fractal aggregates were generated using the Diffusion Limited Aggregation (DLA) code as described in Wozniak M., Onofri F.R.A., Barbosa S., Yon J., Mroczka J., Comparison of methods to derive morphological parameters of multi-fractal samples of particle aggregates from TEM images, Journal of Aerosol Science 47: 12–26 (2012) and Onofri F.R.A., M. Wozniak, S. Barbosa, On the Optical Characterization of Nanoparticle and their Aggregates in Plasma Systems, Contributions to Plasma Physics 51(2-3):228-236 (2011). Aggregates were generated with a constant prefactor, kf=1.3, and two fractal dimensions (Df), representing open, porous (Df=1.8) aggregates and more compact (Df=2.8) aggregates.

The geometry files were produced with the DLA software. An example run using this software is shown for aggregates with 256 primary particles and a fractal dimension of 2.8 in the file 'dla_example.png'

For the fractal dimension=1.8 data, the number of primary particles in the aggregate, N, was increased up to 1024 from the previous maximum of 256 for all three dust materials investigated. In addition, the data for MgFeSiO4 with a fractal dimension of 2.8 was increased from 256 to 1024. As in the previous archive, 12 instances of each aggregate size were generated with primary particles having a radius of 0.5. These geometry data are given in:
aggregates_kf1.3_df1.8.zip --> Geometry for a prefactor of 1.3 and fractal dimension 1.8
aggregates_kf1.3_df2.8.zip --> Geometry for a prefactor of 1.3 and fractal dimension 2.8

An example file name for an aggregate is 'N_00000032_Agg_00000008.dat' where the first number is the number of primary particles in the aggregate (N=32) and the second number is the instance number (e.g. 8 of 12). The radius of each primary particle in an aggregate is 0.5. The geometry files have 4 columns: the x, y and z coordinates of each primary particle followed by the primary particle radius. In each zip file there is also a pdf document that describes the geometry data and shows an image of each geometry file.

These geometry data were then used to calculate the opacity of the aggregates using the Multiple Sphere T-Matrix code (MSTM v 3.0) developed by Daniel Mackowski (D.W. Mackowski, M.I. Mishchenko, A multiple sphere T-matrix Fortran code for use on parallel computer clusters, Journal of Quantitative Spectroscopy and Radiative Transfer, Volume 112, Issue 13, 2011). Data were generated using the first 10 instances of each aggregate size, and the geometry data were appropriately scaled to calculate the opacity data for primary particle radii ranging from 0.001 - 1.0 microns. As noted earlier, an earlier version of this archive was focused on radiative pressure on these aggregates and only covered the spectrum of a typical AGB star (0.3 to 30 microns wavelength). In this updated version this wavelength range has been increased to the longer wavelength limits of the optical data. By default, MSTM calculations are made along the z-axis of the geometry data. Additional calculations were made along the x and y axes for each aggregate. Therefore the final data set is the average of 30 values (10 instances each in the x,y,z directions).

The opacity data files are given in:

astronomical_silicate_df1.8.zip --> astronomical silicate aggregates with fractal dimension 1.8
astronomical_silicate_df2.8.zip --> astronomical silicate aggregates with fractal dimension 2.8
forsterite_df1.8.zip --> forsterite aggregates with fractal dimension 1.8
forsterite_df2.8.zip --> forsterite aggregates with fractal dimension 2.8
olivine_df1.8.zip --> olivine aggregates with fractal dimension 1.8
olivine_df2.8.zip --> olivine aggregates with fractal dimension 2.8

In the previous version of this archive, only the table files with the averages of the 10 instances were provided. In this updated version each of the individual opacity files used to create these tables is now also provided. These opacity files are numbered similar to the geometry files. For example, the opacity calculations for N=32, instance=5, angle=3 is given by
'opacity_results_N000032_I05_A03_file.dat.' Each file begins with a short header describing the data. For example, the astronomical silicate header for this N=32, instance=5, angle=3 file is:

#############################################################################################
# Number of primary particles in aggregate: 32
# Geometry Instance Number: 5
# Geometry File Name: N_00000032_Agg_00000005.dat
# Rotation Angles: 90.000 90.000 0.000
# Number of radius values: 30
# Minimum and maximum radius values in microns: 1.00000e-003 1.00000e+000
# Number of wavelength values: 92
# Minimum and maximum wavelength values in microns: 3.00000e-001 1.00000e+004
#############################################################################################

Afterwards the columns list the line number, the primary particle radius (microns), the wavelength (microns), the extinction efficiency factor, the absorption efficiency factor, the scattering absorption efficiency factor, the asymmetry factor and the radiation pressure efficiency factor. These efficiency factors are based on the effective radius of the aggregate described later in this document.

Within each of these zipped folders is a file that contains the averages of these individual opacity files. For example 'astronomical_silicate_df1.8.dat' is the averaged data for the astronomical silicate aggregates with a fractal dimension 1.8. As in the previous archive, the first lines of these table files are a header starting with the '#' character describing the table and the source of the optical data used.

After the header, the first line of data in the table has the following nine values giving the range for the data table and number of samples in N, (aggregate size), primary particle radius (microns) and wavelength (microns). These are:
Minimum aggregate size
Maximum aggregate size
Number of Aggregate samples
Primary Particle Minimum Radius (microns)
Primary Particle Maximum Radius (microns)
Number of Primary Particle radii samples
Wavelength minimum (microns)
Wavelength maximum (microns)
Number of Wavelength samples

Subsequent lines contain 13 columns. These columns give the efficiency factors and asymmetry factor for aggregates. These efficiency factors are based on the effective radius of the aggregate given by:
a_eff = a_primary*N^(1/3)
where a_primary is the primary particle radius and N is the number of primary particles in the aggregate.

For example, the absorption opacity of an aggregate would then be = pi*a_eff^2 * Q_abs.
The values in each column are:
Column 1: Primary particle radius in microns
Column 2: Wavelength in microns
Column 3: Number of primary particles in aggregate
Column 4: Mean Q_ext, mean extinction efficiency factor
Column 5: Standard Deviation of Mean Q_ext
Column 6: Mean Q_abs, mean absorption efficiency factor
Column 7: Standard Deviation of Mean Q_abs
Column 8: Mean Q_sca, mean scattering efficiency factor
Column 9: Standard Deviation of mean Q_sca
Column 10: Mean g_cos, mean asymmetry factor
Column 11: Standard Deviation of mean asymmetry factor
Column 12: Mean Q_pr, mean radiation pressure efficiency factor
Column 13: Standard Deviation of mean
d
Amazon Email Receipt Data | Consumer Transaction Data | Asia, EMEA, LATAM,...
datarade.ai
.json, .xml, .csv
Updated Oct 12, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Measurable AI (2023). Amazon Email Receipt Data | Consumer Transaction Data | Asia, EMEA, LATAM, MENA, India | Granular & Aggregate Data available [Dataset]. https://datarade.ai/data-products/amazon-email-receipt-data-consumer-transaction-data-asia-measurable-ai
Explore at:
.json, .xml, .csvAvailable download formats
Dataset updated
Oct 12, 2023
Dataset authored and provided by
Measurable AI
Area covered
Asia, Latin America, Malaysia, Brazil, Mexico, Colombia, Japan, Chile, Thailand, United States of America, Pakistan, Argentina
Description
The Measurable AI Amazon Consumer Transaction Dataset is a leading source of email receipts and consumer transaction data, offering data collected directly from users via Proprietary Consumer Apps, with millions of opt-in users.

We source our email receipt consumer data panel via two consumer apps which garner the express consent of our end-users (GDPR compliant). We then aggregate and anonymize all the transactional data to produce raw and aggregate datasets for our clients.

Use Cases Our clients leverage our datasets to produce actionable consumer insights such as: - Market share analysis - User behavioral traits (e.g. retention rates) - Average order values - Promotional strategies used by the key players. Several of our clients also use our datasets for forecasting and understanding industry trends better.

Coverage - Asia (Japan) - EMEA (Spain, United Arab Emirates)

Granular Data Itemized, high-definition data per transaction level with metrics such as - Order value - Items ordered - No. of orders per user - Delivery fee - Service fee - Promotions used - Geolocation data and more

Aggregate Data - Weekly/ monthly order volume - Revenue delivered in aggregate form, with historical data dating back to 2018. All the transactional e-receipts are sent from app to users’ registered accounts.

Most of our clients are fast-growing Tech Companies, Financial Institutions, Buyside Firms, Market Research Agencies, Consultancies and Academia.

Our dataset is GDPR compliant, contains no PII information and is aggregated & anonymized with user consent. Contact business@measurable.ai for a data dictionary and to find out our volume in each country.
F
Deep Granulometry
data.uni-hannover.de
png, zip
Updated Dec 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Institut für Baustoffe (2024). Deep Granulometry [Dataset]. https://data.uni-hannover.de/dataset/deep-granulometry
Explore at:
png, zipAvailable download formats
Dataset updated
Dec 12, 2024
Dataset authored and provided by
Institut für Baustoffe
License
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Description
This repository contains the data related to the paper ** "Granulometry transformer: image-based granulometry of concrete aggregate for an automated concrete production control" ** where a deep learning based method is proposed for the image based determination of concrete aggregate grading curves (cf. video).

More specifically, the data set consists of images showing concrete aggregate particles and reference data of the particle size distribution (grading curves) associated to each image. It is distinguished between the CoarseAggregateData and the FineAggregateData.

Coarse Aggregate Data

The coarse data consists of aggregate samples with different particles sizes ranging from 0.1 mm to 32 mm. The grading curves are designed by linearly interpolation between a very fine and a very coarse distribution for three variants with maximum grain sizes of 8 mm, 16 mm, and 32 mm, respectively. For each variant, we designed eleven grading curves, resulting in a total number 33, which are shown in the figure below. For each sample, we acquired 50 images with a GSD of 0.125 mm, resulting in a data set of 1650 images in total. Example images for a subset of the grading curves of this data set are shown in the following figure.

https://data.uni-hannover.de/dataset/ecb0bf04-84c8-45b1-8a43-044f3f80d92c/resource/8cb30616-5b24-4028-9c1d-ea250ac8ac84/download/examplecoarse.png" alt="Example images and grading curves of the coarse data set" title=" ">

Fine Aggregate Data

Similar to the previous data set, the fine data set contains grading curves for the fine fraction of concrete aggregate of 0 to 2 mm with a GSD of 28.5 $\mu$m. We defined two base distributions of different shapes for the upper and lower bound, respectively, resulting in two interpolated grading curve sets (Set A and Set B). In total, 1700 images of 34 different particle size distributions were acquired. Example images of the data set and the corresponding grading curves are shown in the figure below. https://data.uni-hannover.de/dataset/ecb0bf04-84c8-45b1-8a43-044f3f80d92c/resource/c56f4298-9663-457f-aaa7-0ba113fec4c9/download/examplefine.png" alt="Example images and grading curves of the finedata set" title=" ">

Related publications:

If you make use of the proposed data, please cite.

Coenen, M., Beyer, D., and Haist, M., 2023: Granulometry Transformer: Image-based Granulometry of Concrete Aggregate for an automated Concrete Production Control. In: Proceedings of the European Conference on Computing in Construction (EC3), doi: 10.35490/EC3.2023.223.
u
Aggregate Thin Section Analysis Data from the Fimbule Ice Shelf Expedition
zivahub.uct.ac.za
xlsx
Updated Jun 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Steven Mcewen; Sebastian Skatulla; Keith MacHutchon (2023). Aggregate Thin Section Analysis Data from the Fimbule Ice Shelf Expedition [Dataset]. http://doi.org/10.25375/uct.22340143.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.25375/uct.22340143.v1
Dataset updated
Jun 8, 2023
Dataset provided by
University of Cape Town
Authors
Steven Mcewen; Sebastian Skatulla; Keith MacHutchon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset encompasses an Excel datasheet containing aggregate data from thin section analyses of ice samples from the Fimbule Ice Shelf, carried out during the 2021/2022 Antarctic expedition. The research forms a crucial part of the Master's research project: "Development of a Multi-tier System for the Analysis of Ice Crystallography of Antarctic Shelf Ice", conducted by Steven McEwen. Each entry in the datasheet corresponds to a specific thin section or ice grain and includes the following parameters: Grid Number, A1 Axis Reading, A4 Reading, Corrected A4 values, number of readings, Mean C-axis Orientation, Grain Size, Date, Sample number, x-coordinate, y-coordinate, Degree of Orientation, and Spherical Aperture. These data points collectively facilitate a comprehensive understanding of the crystallography of the Fimbule Ice Shelf's ice samples. Data was collected and analyzed during the 2021/2022 Antarctic summer expedition, with additional analysis being performed in the Polar engineering Research Group's laboratory.
d
FHV Base Aggregate Report
catalog.data.gov
data.cityofnewyork.us
+2more
Updated Nov 29, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.cityofnewyork.us (2025). FHV Base Aggregate Report [Dataset]. https://catalog.data.gov/dataset/fhv-base-aggregate-report
Explore at:
Dataset updated
Nov 29, 2025
Dataset provided by
data.cityofnewyork.us
Description
Monthly report including total dispatched trips, total dispatched shared trips, and unique dispatched vehicles aggregated by FHV (For-Hire Vehicle) base. These have been tabulated from raw trip record submissions made by bases to the NYC Taxi and Limousine Commission (TLC). This dataset is typically updated monthly on a two-month lag, as bases have until the conclusion of the following month to submit a month of trip records to the TLC. In example, a base has until Feb 28 to submit complete trip records for January. Therefore, the January base aggregates will appear in March at the earliest. The TLC may elect to defer updates to the FHV Base Aggregate Report if a large number of bases have failed to submit trip records by the due date. Note: The TLC publishes base trip record data as submitted by the bases, and we cannot guarantee or confirm their accuracy or completeness. Therefore, this may not represent the total amount of trips dispatched by all TLC-licensed bases. The TLC performs routine reviews of the records and takes enforcement actions when necessary to ensure, to the extent possible, complete and accurate information.
F
Visual Granulometry: Image-based Granulometry of Concrete Aggregate
data.uni-hannover.de
png, zip
Updated Dec 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Institut für Baustoffe (2024). Visual Granulometry: Image-based Granulometry of Concrete Aggregate [Dataset]. https://data.uni-hannover.de/dataset/visual-granulometry
Explore at:
png, zipAvailable download formats
Dataset updated
Dec 12, 2024
Dataset authored and provided by
Institut für Baustoffe
License
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Description
Introduction

Concrete is one if the most used building materials worldwide. With up to 80% of volume, a large constituent of concrete consists of fine and coarse aggregate particles (normally, sizes of 0.1mm to 32 mm) which are dispersed in a cement paste matrix. The size distribution of the aggregates (i.e. the grading curve) substantially affects the properties and quality characteristics of concrete, such as e.g. its workability at the fresh state and the mechanical properties at the hardened state. In practice, usually the size distribution of small samples of the aggregate is determined by manual mechanical sieving and is considered as representative for a large amount of aggregate. However, the size distribution of the actual aggregate used for individual production batches of concrete varies, especially when e.g. recycled material is used as aggregate. As a consequence, the unknown variations of the particle size distribution have a negative effect on the robustness and the quality of the final concrete produced from the raw material.

Towards the goal of deriving precise knowledge about the actual particle size distribution of the aggregate, thus eliminating the unknown variations in the material’s properties, we propose a data set for the image based prediction of the size distribution of concrete aggregates. Incorporating such an approach into the production chain of concrete enables to react on detected variations in the size distribution of the aggregate in real-time by adapting the composition, i.e. the mixture design of the concrete accordingly, so that the desired concrete properties are reached.

https://data.uni-hannover.de/dataset/f00bdcc4-8b27-4dc4-b48d-a84d75694e18/resource/042abf8d-e87a-4940-8195-2459627f57b6/download/overview.png" alt="Classicial vs. image based granulometry" title=" ">

Classification data

In the classification data, nine different grading curves are distinguished. In this context, the normative regulations of DIN 1045 are considered. The nine grading curves differ in their maximum particle size (8, 16, or 32 mm) and in the distribution of the particle size fractions allowing a categorisation of the curves to coarse-grained (A), medium-grained (B) and fine-grained (C) curves, respectively. A quantitative description of the grain size distribution of the nine curves distinguished is shown in the following figure, where the left side shows a histogram of the particle size fractions 0-2, 2-8, 8-16, and 16-32 mm and the right side shows the cumulative histograms of the grading curves (the vertical axes represent the mass-percentages of the material).

For each of the grading curves, two samples (S1 and S2) of aggregate particles were created. Each sample consists of a total mass of 5 kg of aggregate material and is carefully designed according to the grain size distribution shwon in the figure by sieving the raw material in order to separate the different grain size fractions first, and subsequently, by composing the samples according to the dedicated mass-percentages of the size distributions.

https://data.uni-hannover.de/dataset/f00bdcc4-8b27-4dc4-b48d-a84d75694e18/resource/17eb2a46-eb23-4ec2-9311-0f339e0330b4/download/statistics_classification-data.png" alt="Particle size distribution of the classification data">

For data acquisition, a static setup was used for which the samples are placed in a measurement vessel equipped with a set of calibrated reference markers whose object coordinates are known and which are assembled in a way that they form a common plane with the surface of the aggregate sample. We acquired the data by taking images of the aggregate samples (and the reference markers) which are filled in the the measurement vessel and whose constellation within the vessel is perturbed between the acquisition of each image in order to obtain variations in the sample’s visual appearance. This acquisition strategy allows to record multiple different images for the individual grading curves by reusing the same sample, consequently reducing the labour-intensive part of material sieving and sample generation. In this way, we acquired a data set of 900 images in total, consisting of 50 images of each of the two samples (S1 and S2) which were created for each of the nine grading curve definitions, respectively (50 x 2 x 9 = 900). For each image, we automatically detect the reference markers, thus receiving the image coordinates of each marker in addition to its known object coordinates. We make use of these correspondences for the computation of the homography which describes the perspective transformation of the reference marker’s plane in object space (which corresponds to the surface plane of the aggregate sample) to the image plane. Using the computed homography, we transform the image in order to obtain an perspectively rectified representation of the aggregate sample with a known, and especially a for the entire image consistent, ground sampling distance (GSD) of 8 px/mm. In the following figure, example images of our data set showing aggregate samples of each of the distinguished grading curve classes are depicted.

https://data.uni-hannover.de/dataset/f00bdcc4-8b27-4dc4-b48d-a84d75694e18/resource/59925f1d-3eef-4b50-986a-e8d2b0e14beb/download/examples_classification_data.png" alt="Example images of the classification data">

Related publications:

If you make use of the proposed data, please cite the publication listed below.

Coenen, M., Beyer, D., Heipke, C. and Haist, M., 2022: Learning to Sieve: Prediction of Grading Curves from Images of Concrete Aggregate. In: ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences V-2-2022, pp. 227-235, Link.
Z
Data from: Open-data release of aggregated Australian school-level...
data.niaid.nih.gov
zenodo.org
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Monteiro Lobato, (2020). Open-data release of aggregated Australian school-level information. Edition 2016.1 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_46086
Explore at:
Dataset updated
Jan 24, 2020
Authors
Monteiro Lobato,
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Australia
Description
The file set is a freely downloadable aggregation of information about Australian schools. The individual files represent a series of tables which, when considered together, form a relational database. The records cover the years 2008-2014 and include information on approximately 9500 primary and secondary school main-campuses and around 500 subcampuses. The records all relate to school-level data; no data about individuals is included. All the information has previously been published and is publicly available but it has not previously been released as a documented, useful aggregation. The information includes: (a) the names of schools (b) staffing levels, including full-time and part-time teaching and non-teaching staff (c) student enrolments, including the number of boys and girls (d) school financial information, including Commonwealth government, state government, and private funding (e) test data, potentially for school years 3, 5, 7 and 9, relating to an Australian national testing programme know by the trademark 'NAPLAN'

Documentation of this Edition 2016.1 is incomplete but the organization of the data should be readily understandable to most people. If you are a researcher, the simplest way to study the data is to make use of the SQLite3 database called 'school-data-2016-1.db'. If you are unsure how to use an SQLite database, ask a guru.

The database was constructed directly from the other included files by running the following command at a command-line prompt: sqlite3 school-data-2016-1.db < school-data-2016-1.sql Note that a few, non-consequential, errors will be reported if you run this command yourself. The reason for the errors is that the SQLite database is created by importing a series of '.csv' files. Each of the .csv files contains a header line with the names of the variable relevant to each column. The information is useful for many statistical packages but it is not what SQLite expects, so it complains about the header. Despite the complaint, the database will be created correctly.

Briefly, the data are organized as follows. (a) The .csv files ('comma separated values') do not actually use a comma as the field delimiter. Instead, the vertical bar character '|' (ASCII Octal 174 Decimal 124 Hex 7C) is used. If you read the .csv files using Microsoft Excel, Open Office, or Libre Office, you will need to set the field-separator to be '|'. Check your software documentation to understand how to do this. (b) Each school-related record is indexed by an identifer called 'ageid'. The ageid uniquely identifies each school and consequently serves as the appropriate variable for JOIN-ing records in different data files. For example, the first school-related record after the header line in file 'students-headed-bar.csv' shows the ageid of the school as 40000. The relevant school name can be found by looking in the file 'ageidtoname-headed-bar.csv' to discover that the the ageid of 40000 corresponds to a school called 'Corpus Christi Catholic School'. (3) In addition to the variable 'ageid' each record is also identified by one or two 'year' variables. The most important purpose of a year identifier will be to indicate the year that is relevant to the record. For example, if one turn again to file 'students-headed-bar.csv', one sees that the first seven school-related records after the header line all relate to the school Corpus Christi Catholic School with ageid of 40000. The variable that identifies the important differences between these seven records is the variable 'studentyear'. 'studentyear' shows the year to which the student data refer. One can see, for example, that in 2008, there were a total of 410 students enrolled, of whom 185 were girls and 225 were boys (look at the variable names in the header line). (4) The variables relating to years are given different names in each of the different files ('studentsyear' in the file 'students-headed-bar.csv', 'financesummaryyear' in the file 'financesummary-headed-bar.csv'). Despite the different names, the year variables provide the second-level means for joining information acrosss files. For example, if you wanted to relate the enrolments at a school in each year to its financial state, you might wish to JOIN records using 'ageid' in the two files and, secondarily, matching 'studentsyear' with 'financialsummaryyear'. (5) The manipulation of the data is most readily done using the SQL language with the SQLite database but it can also be done in a variety of statistical packages. (6) It is our intention for Edition 2016-2 to create large 'flat' files suitable for use by non-researchers who want to view the data with spreadsheet software. The disadvantage of such 'flat' files is that they contain vast amounts of redundant information and might not display the data in the form that the user most wants it. (7) Geocoding of the schools is not available in this edition. (8) Some files, such as 'sector-headed-bar.csv' are not used in the creation of the database but are provided as a convenience for researchers who might wish to recode some of the data to remove redundancy. (9) A detailed example of a suitable SQLite query can be found in the file 'school-data-sqlite-example.sql'. The same query, used in the context of analyses done with the excellent, freely available R statistical package (http://www.r-project.org) can be seen in the file 'school-data-with-sqlite.R'.
f
Independent Data Aggregation, Quality Control and Visualization of...
datasetcatalog.nlm.nih.gov
Updated Oct 21, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ly, Chun; Knott, Cheryl; McCleary, Jill; Castiello-Gutiérrez, Santiago (2020). Independent Data Aggregation, Quality Control and Visualization of University of Arizona COVID-19 Re-Entry Testing Data [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000484783
Explore at:
Dataset updated
Oct 21, 2020
Authors
Ly, Chun; Knott, Cheryl; McCleary, Jill; Castiello-Gutiérrez, Santiago
Description
AbstractThe dataset provided here contains the efforts of independent data aggregation, quality control, and visualization of the University of Arizona (UofA) COVID-19 testing programs for the 2019 novel Coronavirus pandemic. The dataset is provided in the form of machine-readable tables in comma-separated value (.csv) and Microsoft Excel (.xlsx) formats.Additional InformationAs part of the UofA response to the 2019-20 Coronavirus pandemic, testing was conducted on students, staff, and faculty prior to start of the academic year and throughout the school year. These testings were done at the UofA Campus Health Center and through their instance program called "Test All Test Smart" (TATS). These tests identify active cases of SARS-nCoV-2 infections using the reverse transcription polymerase chain reaction (RT-PCR) test and the Antigen test. Because the Antigen test provided more rapid diagnosis, it was greatly used three weeks prior to the start of the Fall semester and throughout the academic year.As these tests were occurring, results were provided on the COVID-19 websites. First, beginning in early March, the Campus Health Alerts website reported the total number of positive cases. Later, numbers were provided for the total number of tests (March 12 and thereafter). According to the website, these numbers were updated daily for positive cases and weekly for total tests. These numbers were reported until early September where they were then included in the reporting for the TATS program.For the TATS program, numbers were provided through the UofA COVID-19 Update website. Initially on August 21, the numbers provided were the total number (July 31 and thereafter) of tests and positive cases. Later (August 25), additional information was provided where both PCR and Antigen testings were available. Here, the daily numbers were also included. On September 3, this website then provided both the Campus Health and TATS data. Here, PCR and Antigen were combined and referred to as "Total", and daily and cumulative numbers were provided.At this time, no official data dashboard was available until September 16, and aside from the information provided on these websites, the full dataset was not made publicly available. As such, the authors of this dataset independently aggregated data from multiple sources. These data were made publicly available through a Google Sheet with graphical illustration provided through the spreadsheet and on social media. The goal of providing the data and illustrations publicly was to provide factual information and to understand the infection rate of SARS-nCoV-2 in the UofA community.Because of differences in reported data between Campus Health and the TATS program, the dataset provides Campus Health numbers on September 3 and thereafter. TATS numbers are provided beginning on August 14, 2020.Description of Dataset ContentThe following terms are used in describing the dataset.1. "Report Date" is the date and time in which the website was updated to reflect the new numbers2. "Test Date" is to the date of testing/sample collection3. "Total" is the combination of Campus Health and TATS numbers4. "Daily" is to the new data associated with the Test Date5. "To Date (07/31--)" provides the cumulative numbers from 07/31 and thereafter6. "Sources" provides the source of information. The number prior to the colon refers to the number of sources. Here, "UACU" refers to the UA COVID-19 Update page, and "UARB" refers to the UA Weekly Re-Entry Briefing. "SS" and "WBM" refers to screenshot (manually acquired) and "Wayback Machine" (see Reference section for links) with initials provided to indicate which author recorded the values. These screenshots are available in the records.zip file.The dataset is distinguished where available by the testing program and the methods of testing. Where data are not available, calculations are made to fill in missing data (e.g., extrapolating backwards on the total number of tests based on daily numbers that are deemed reliable). Where errors are found (by comparing to previous numbers), those are reported on the above Google Sheet with specifics noted.For inquiries regarding the contents of this dataset, please contact the Corresponding Author listed in the README.txt file. Administrative inquiries (e.g., removal requests, trouble downloading, etc.) can be directed to data-management@arizona.edu
US Births by County and State
kaggle.com
zip
Updated Jan 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). US Births by County and State [Dataset]. https://www.kaggle.com/datasets/thedevastator/us-births-by-county-and-state
Explore at:
zip(3159011 bytes)Available download formats
Dataset updated
Jan 22, 2023
Authors
The Devastator
Area covered
United States
Description
US Births by County and State

1985-2015 Aggregated Data

By data.world's Admin [source]

About this dataset

This dataset contains an aggregation of birth data from the United Statesbetween 1985 and 2015. It consists of information on mothers' locations by state (including District of Columbia) and county, as well as information such as the month they gave birth, and aggregates giving the sum of births during that month. This data has been provided by both the National Bureau for Economic Research and National Center for Health Statistics, whose shared mission is to understand how life works in order to aid individuals in making decisions about their health and wellbeing. This dataset provides valuable insight into population trends across time and location - for example, which states have higher or lower birthrates than others? Which counties experience dramatic fluctuations over time? Given its scope, this dataset could be used in a number of contexts--from epidemiology research to population forecasting. Be sure to check out our other datasets related to births while you're here!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset could be used to examine local trends in birth rates over time or analyze births at different geographical locations. In order to maximize your use of this dataset, it is important that you understand what information the various columns contain.

The main columns are: State (including District of Columbia), County (coded using the FIPS county code number), Month (numbering from 1 for January through 12 for December), Year (4-digit year) countyBirths (calculated sum of births that occurred to mothers living in a county for a given month) and stateBirths (calculated sum of births that occurred to mothers living in a state for a given month). These fields should provide enough information for you analyze trends across geographic locations both at monthly and yearly levels. You could also consider combining variables such as Year with State or Year with Month or any other grouping combinations depending on your analysis goal.

In addition, while all data were downloaded on April 5th 2017, it is worth noting that all sources used followed privacy guidelines as laid out by NCHC so individual births occurring after 2005 are not included due to geolocation concerns.
We hope you find this dataset useful and can benefit from its content! With proper understanding of what each field contains, we are confident you will gain valuable insights on birth rates across counties within the United States during this period

Research Ideas

Establishing county-level trends in birth rates for the US over time.

Analyzing the relationship between month of birth and health outcomes for US babies after they are born (e.g., infant mortality, neurological development, etc.).

Comparing state/county-level differences in average numbers of twins born each year

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

See the dataset description for more information.

Columns

File: allBirthData.csv | Column name | Description | |:-----------------|:-----------------------------------------------------------------------------------------------------------------| | State | The numerical order of the state where the mother lives. (Integer) | | Month | The month in which the birth took place. (Integer) | | Year | The year of the birth. (Integer) | | countyBirths | The calculated sum of births that occurred to mothers living in that county for that particular month. (Integer) | | stateBirths | The aggregate number at the level of entire states for any given month-year combination. (Integer) | | County | The county where the mother lives, coded using FIPS County Code. (Integer) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit data.world's Admin.
2
1991 Census: Aggregate Data; Northern Ireland
datacatalogue.ukdataservice.ac.uk
Updated Feb 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UK Data Service (2025). 1991 Census: Aggregate Data; Northern Ireland [Dataset]. http://doi.org/10.5257/census/aggregate-1991-1
Explore at:
Unique identifier
https://doi.org/10.5257/census/aggregate-1991-1
Dataset updated
Feb 28, 2025
Dataset provided by
UK Data Servicehttps://ukdataservice.ac.uk/
Area covered
Ireland, Northern Ireland, Scotland, Wales, England
Description
The UK censuses took place on 21st April 1991. They were run by the Census Office for Northern Ireland, General Register Office for Scotland, and the Office of Population and Surveys for both England and Wales. The UK comprises the countries of England, Wales, Scotland and Northern Ireland.
Statistics from the UK censuses help paint a picture of the nation and how we live. They provide a detailed snapshot of the population and its characteristics, and underpin funding allocation to provide public services.
The aggregate data produced as outputs from censuses in Northern Ireland provide information on a wide range of demographic and socio-economic characteristics. They are predominantly a collection of aggregated or summary counts of the numbers of people or households resident in specific geographical areas possessing particular characteristics.

Small Area Statistics for Northern Ireland have been made available as a result of a collaborative project between the Census Office for Northern Ireland, ESRC and Queen's University, Belfast. The Small Area Statistics (SAS) for Northern Ireland are the principal digital output from the 1991 Northern Ireland Census. They are broadly similar in structure to the 1991 Great Britain Census. They constitute the most detailed resource of 100 per cent sample socio-demographic data in Northern Ireland, with all variables available at a high level of spatial resolution (Enumeration District).

Data can be accessed through CKAN (to bulk download data).
d
Netflix & Streaming Peers Email Receipt Data | Consumer Transaction Data |...
datarade.ai
.json, .xml, .csv
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Measurable AI (2023). Netflix & Streaming Peers Email Receipt Data | Consumer Transaction Data | Asia, EMEA, LATAM, MENA, India | Granular & Aggregate Data available [Dataset]. https://datarade.ai/data-products/netflix-streaming-peers-email-receipt-data-consumer-trans-measurable-ai
Explore at:
.json, .xml, .csvAvailable download formats
Dataset updated
Nov 8, 2023
Dataset authored and provided by
Measurable AI
Area covered
Brazil, Colombia, Mexico, Japan, Argentina, United States of America, Chile, India, Latin America
Description
The Measurable AI Netflix and Other Streaming Services Email Receipt Datasets details data from subscription and cancellation email such as premium members, family plans, most popular shows, cancellation emails etc.

We source our email receipt consumer data panel via two consumer apps which garner the express consent of our end-users (GDPR compliant). We then aggregate and anonymize all the transactional data to produce raw and aggregate datasets for our clients.

Use Cases Our clients leverage our datasets to produce actionable consumer insights such as: - Market share analysis - User behavioral traits (e.g. retention rates) - Average order values - Promotional strategies used by the key players. Several of our clients also use our datasets for forecasting and understanding industry trends better.

Coverage - Asia (Japan) - EMEA (Spain, United Arab Emirates)

Granular Data Itemized, high-definition data per transaction level with metrics such as - Order value - Items ordered - No. of orders per user - Delivery fee - Service fee - Promotions used - Geolocation data and more

Aggregate Data - Weekly/ monthly order volume - Revenue delivered in aggregate form, with historical data dating back to 2018. All the transactional e-receipts are sent from the Careem Now food delivery app to users’ registered accounts.

Most of our clients are fast-growing Tech Companies, Financial Institutions, Buyside Firms, Market Research Agencies, Consultancies and Academia.

Our dataset is GDPR compliant, contains no PII information and is aggregated & anonymized with user consent. Contact business@measurable.ai for a data dictionary and to find out our volume in each country.
C
Synthetic Integrated Services Data
data.wprdc.org
csv, html, pdf, zip
Updated Jun 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Allegheny County (2024). Synthetic Integrated Services Data [Dataset]. https://data.wprdc.org/dataset/synthetic-integrated-services-data
Explore at:
html, csv(1375554033), pdf, zip(39231637)Available download formats
Dataset updated
Jun 25, 2024
Dataset authored and provided by
Allegheny County
Description
Motivation

This dataset was created to pilot techniques for creating synthetic data from datasets containing sensitive and protected information in the local government context. Synthetic data generation replaces actual data with representative data generated from statistical models; this preserves the key data properties that allow insights to be drawn from the data while protecting the privacy of the people included in the data. We invite you to read the Understanding Synthetic Data white paper for a concise introduction to synthetic data.

This effort was a collaboration of the Urban Institute, Allegheny County’s Department of Human Services (DHS) and CountyStat, and the University of Pittsburgh’s Western Pennsylvania Regional Data Center.

Collection

The source data for this project consisted of 1) month-by-month records of services included in Allegheny County's data warehouse and 2) demographic data about the individuals who received the services. As the County’s data warehouse combines this service and client data, this data is referred to as “Integrated Services data”. Read more about the data warehouse and the kinds of services it includes here.

Preprocessing

Synthetic data are typically generated from probability distributions or models identified as being representative of the confidential data. For this dataset, a model of the Integrated Services data was used to generate multiple versions of the synthetic dataset. These different candidate datasets were evaluated to select for publication the dataset version that best balances utility and privacy. For high-level information about this evaluation, see the Synthetic Data User Guide.

For more information about the creation of the synthetic version of this data, see the technical brief for this project, which discusses the technical decision making and modeling process in more detail.

Recommended Uses

This disaggregated synthetic data allows for many analyses that are not possible with aggregate data (summary statistics). Broadly, this synthetic version of this data could be analyzed to better understand the usage of human services by people in Allegheny County, including the interplay in the usage of multiple services and demographic information about clients.

Known Limitations/Biases

Some amount of deviation from the original data is inherent to the synthetic data generation process. Specific examples of limitations (including undercounts and overcounts for the usage of different services) are given in the Synthetic Data User Guide and the technical report describing this dataset's creation.

Feedback

Please reach out to this dataset's data steward (listed below) to let us know how you are using this data and if you found it to be helpful. Please also provide any feedback on how to make this dataset more applicable to your work, any suggestions of future synthetic datasets, or any additional information that would make this more useful. Also, please copy wprdc@pitt.edu on any such feedback (as the WPRDC always loves to hear about how people use the data that they publish and how the data could be improved).

Further Documentation and Resources

1) A high-level overview of synthetic data generation as a method for protecting privacy can be found in the Understanding Synthetic Data white paper.
2) The Synthetic Data User Guide provides high-level information to help users understand the motivation, evaluation process, and limitations of the synthetic version of Allegheny County DHS's Human Services data published here.
3) Generating a Fully Synthetic Human Services Dataset: A Technical Report on Synthesis and Evaluation Methodologies describes the full technical methodology used for generating the synthetic data, evaluating the various options, and selecting the final candidate for publication.
4) The WPRDC also hosts the Allegheny County Human Services Community Profiles dataset, which provides annual updates on human-services usage, aggregated by neighborhood/municipality. That data can be explored using the County's Human Services Community Profile web site.
d
Data from: Distribution and quality of potential sources of aggregate...
datadiscoverystudio.org
e00
Updated May 21, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2018). Distribution and quality of potential sources of aggregate infrastructure resources project area, Colorado-Wyoming. [Dataset]. http://datadiscoverystudio.org/geoportal/rest/metadata/item/f1b19f7dda174817a78ba6d91009eb06/html
Explore at:
e00Available download formats
Dataset updated
May 21, 2018
Description
description: Crushed stone and sand and gravel are the main types of natural aggregate used in the United States. Aggregate is used in nearly all residential, commercial, and industrial building construction, and in most public works projects including roads and highways, bridges, railroad and light rail beds, airports, water and sewer systems, and tunnels. Much of the infrastructure built during the 1950s and 1960s has deteriorated to a point that requires extensive repair or replacement. All this construction requires enormous amounts of aggregate. In Colorado, for example, nearly 45 million tons of aggregate, or about 12 tons per person, were produced during 1994.; abstract: Crushed stone and sand and gravel are the main types of natural aggregate used in the United States. Aggregate is used in nearly all residential, commercial, and industrial building construction, and in most public works projects including roads and highways, bridges, railroad and light rail beds, airports, water and sewer systems, and tunnels. Much of the infrastructure built during the 1950s and 1960s has deteriorated to a point that requires extensive repair or replacement. All this construction requires enormous amounts of aggregate. In Colorado, for example, nearly 45 million tons of aggregate, or about 12 tons per person, were produced during 1994.
E
Soil aggregate stability data from arable and grassland in Countryside...
catalogue.ceh.ac.uk
hosted-metadata.bgs.ac.uk
+1more
zip
Updated Mar 4, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
A.M. Keith; M.R. Cave; B.A. Dodd; S.M. Smart; G. Turner; A.M. Tye; C.M. Wood (2020). Soil aggregate stability data from arable and grassland in Countryside Survey, Great Britain 2007 [Dataset]. http://doi.org/10.5285/be3793b6-90fb-4e4c-9515-220cc33223b9
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5285/be3793b6-90fb-4e4c-9515-220cc33223b9
Dataset updated
Mar 4, 2020
Dataset provided by
NERC EDS Environmental Information Data Centre
Authors
A.M. Keith; M.R. Cave; B.A. Dodd; S.M. Smart; G. Turner; A.M. Tye; C.M. Wood
License
https://eidc.ac.uk/licences/ogl/plainhttps://eidc.ac.uk/licences/ogl/plain
Time period covered
May 1, 2007 - Oct 31, 2007
Area covered

Dataset funded by
Natural Environment Research Councilhttps://www.ukri.org/councils/nerc
Description
This dataset consists of Particle Size Distribution (PSD) measurements made on 419 archived topsoil samples and derived aggregate stability metrics from arable and grassland habitats across Great Britain in 2007. Laser granulometry was used to measure PSD of 1–2 mm aggregates before and after sonication and the difference in their Mean Weight Diameter (MWD) used to indicate aggregate stability. The samples were collected as part of the Countryside Survey monitoring programme, a unique study or ‘audit’ of the natural resources of the UK’s countryside. The analyses were conducted as part of study aiming to quantify how soil quality indicators change across a gradient of agricultural land management and to identify conditions that determine the ability of different soils to resist and recover from perturbations.
F
Concrete Aggregate PSD Imaging Dataset
data.uni-hannover.de
jpeg, png
Updated Nov 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Institut für Baustoffe (2025). Concrete Aggregate PSD Imaging Dataset [Dataset]. https://data.uni-hannover.de/dataset/concrete-aggregate-psd-imaging-dataset
Explore at:
png, jpegAvailable download formats
Dataset updated
Nov 14, 2025
Dataset authored and provided by
Institut für Baustoffe
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Summary

This dataset provides image data of concrete aggregate for the task of estimating particle size distributions (PSD) using computer vision. The images were captured using a controlled camera setup installed above a conveyor belt in a concrete mixing research facility. Each image has a corresponding text file containing the ground-truth PSD obtained through mechanical sieving.

Data Acquisition

Data was recorded at a medium-scale concrete mixing plant equipped with:

Two Allied Vision Alvium 1800 C-508 cameras

25 mm focal length → fine aggregate (0–2 mm)

12 mm focal length → coarse aggregate (>2 mm)

Global shutter, 1 ms exposure time

LED panel illumination for motion-blur-free imaging

A sensor mount above the conveyor belt transporting the aggregate

This setup enabled consistent imaging conditions with sufficient resolution for particle analysis.

https://data.uni-hannover.de/dataset/6f844e22-12ed-48a7-9ccb-b502f8121650/resource/adcf7049-3ad3-4e9c-bd8b-1be00d867f46/download/sensorsetup.jpg" alt="Sensor setup used for data acquisition" title=" ">

Datasets

Two datasets were created to cover common aggregate size ranges used in concrete production:

𝑀ᶠⁱⁿᵉ — Fine Material (< 2 mm)

16 material samples

Natural river sand

PSDs synthetically varied by mixing pre-fractionated material

𝑀ᶜᵒᵃʳˢᵉ — Coarse Material (2–16 mm)

26 material samples

16 natural river gravel

10 recycled concrete aggregate (RCA)

Each material sample weighs 150 kg, and its PSD was systematically varied to cover a broad range of grading curves.

Note:
This repository contains only a subset of the data set that was used in the paper. In order to receive the full data set, please reach out to the authors. The dataset represents controlled variability. While this is ideal for benchmarking and model development, real industrial plants may exhibit additional stochastic variability.

Reference PSD Measurement

A 10 kg subsample from each material batch was mechanically sieved to obtain the reference PSD.

Each PSD is represented using six particle size intervals (B = 6):

Fine dataset:
0.063, 0.125, 0.25, 0.5, 1.0, 2.0 mm

Coarse dataset:
0, 2, 4, 8, 11.2, 16 mm

Each .txt reference file contains six percentile values that sum to 1.0.

https://data.uni-hannover.de/dataset/6f844e22-12ed-48a7-9ccb-b502f8121650/resource/2d5cae38-8ad7-4f82-9202-6ac807051c17/download/overview.png" alt="Example images and grading curves of the data sets" title=" ">

Use Cases

This dataset is intended for:

PSD estimation using deep learning or classical CV

Regression and distribution prediction tasks

Material characterization and granulometry research

Benchmarking computer vision methods on granular material datasets

Citation

If you use this dataset in academic or industrial research, please cite the corresponding paper:

Coenen, M., Beyer, D., Mohammadi, S., Meyer, M., Heipke, C., and Haist, M. (2026): Towards an Automated Concrete Production Control via Computer Vision-based Characterisation of Concrete Aggregate.
d
Data from: Data and code from: A high throughput approach for measuring soil...
catalog.data.gov
agdatacommons.nal.usda.gov
Updated Sep 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Data and code from: A high throughput approach for measuring soil slaking index [Dataset]. https://catalog.data.gov/dataset/data-and-code-from-a-high-throughput-approach-for-measuring-soil-slaking-index
Explore at:
Dataset updated
Sep 2, 2025
Dataset provided by
Agricultural Research Service
Description
This dataset includes soil wet aggregate stability measurements from the Upper Mississippi River Basin LTAR site in Ames, Iowa. Samples were collected in 2021 from this long-term tillage and cover crop trial in a corn-based agroecosystem. We measured wet aggregate stability using digital photography to quantify disintegration (slaking) of submerged aggregates over time, similar to the technique described by Fajardo et al. (2016) and Rieke et al. (2021). However, we adapted the technique to larger sample numbers by using a multi-well tray to submerge 20-36 aggregates simultaneously. We used this approach to measure slaking index of 160 soil samples (2120 aggregates). This dataset includes slaking index calculated for each aggregates, and also summarized by samples. There were usually 10-12 aggregates measured per sample. We focused primarily on methodological issues, assessing the statistical power of slaking index, needed replication, sensitivity to cultural practices, and sensitivity to sample collection date. We found that small numbers of highly unstable aggregates lead to skewed distributions for slaking index. We concluded at least 20 aggregates per sample were preferred to provide confidence in measurement precision. However, the experiment had high statistical power with only 10-12 replicates per sample. Slaking index was not sensitive to the initial size of dry aggregates (3 to 10 mm diameter); therefore, pre-sieving soils was not necessary. The field trial showed greater aggregate stability under no-till than chisel plow practice, and changing stability over a growing season. These results will be useful to researchers and agricultural practitioners who want a simple, fast, low-cost method for measuring wet aggregate stability on many samples.
S
Death Dataset Aggregated at the State/County Level 1990 - Current Annual...
splitgraph.com
data.pa.gov
Updated Feb 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pennsylvania Department of Health (2024). Death Dataset Aggregated at the State/County Level 1990 - Current Annual Health [Dataset]. https://www.splitgraph.com/pa-gov/death-dataset-aggregated-at-the-statecounty-level-smxk-2cca/
Explore at:
application/openapi+json, json, application/vnd.splitgraph.imageAvailable download formats
Dataset updated
Feb 6, 2024
Dataset authored and provided by
Pennsylvania Department of Health
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
This dataset contains aggregate death data at the state and county level for Pennsylvania residents. The data are displayed by year, race/ethnicity, gender, age group and cause of death.

Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:

See the Splitgraph documentation for more information.
d
Hydroinformatics Instruction Module Example Code: Databases and SQL in...
search.dataone.org
beta.hydroshare.org
+1more
Updated Dec 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amber Spackman Jones; Jeffery S. Horsburgh; Camilo J. Bastidas Pacheco (2023). Hydroinformatics Instruction Module Example Code: Databases and SQL in Python [Dataset]. https://search.dataone.org/view/sha256%3A2f7a187ad86e4d584cd35755a67398ffa67d6ebfc81dc1ec01539b85ccd827dc
Explore at:
Dataset updated
Dec 30, 2023
Dataset provided by
Hydroshare
Authors
Amber Spackman Jones; Jeffery S. Horsburgh; Camilo J. Bastidas Pacheco
Description
This resource contains Jupyter Notebooks with examples that illustrate how to work with SQLite databases in Python including database creation and viewing and querying with SQL. The resource is part of set of materials for hydroinformatics and water data science instruction. Complete learning module materials are found in HydroLearn: Jones, A.S., Horsburgh, J.S., Bastidas Pacheco, C.J. (2022). Hydroinformatics and Water Data Science. HydroLearn. https://edx.hydrolearn.org/courses/course-v1:USU+CEE6110+2022/about..

This resources consists of 3 example notebooks and a SQLite database.

Notebooks: 1. Example 1: Querying databases using SQL in Python 2. Example 2: Python functions to query SQLite databases 3. Example 3: SQL join, aggregate, and subquery functions

Data files: These examples use a SQLite database that uses the Observations Data Model structure and is pre-populated with Logan River temperature data.
Z
CKW Smart Meter Data
data.niaid.nih.gov
data-staging.niaid.nih.gov
Updated Sep 22, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Barahona Garzon, Braulio (2024). CKW Smart Meter Data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13304498
Explore at:
Dataset updated
Sep 22, 2024
Dataset provided by
Lucerne University of Applied Sciences and Arts
Authors
Barahona Garzon, Braulio
Description
Overview

The CKW Group is a distribution system operator that supplies more than 200,000 end customers in Central Switzerland. Since October 2022, CKW publishes anonymised and aggregated data from smart meters that measure electricity consumption in canton Lucerne. This unique dataset is accessible in the ckw.ch/opendata platform.

Data set A - anonimised smart meter data

Data set B - aggregated smart meter data

Contents of this data set

This data set contains a small sample of the CKW data set A sorted per smart meter ID, stored as parquet files named with the id field of the corresponding smart meter anonymised data. Example: 027ceb7b8fd77a4b11b3b497e9f0b174.parquet

The orginal CKW data is available for download at https://open.data.axpo.com/%24web/index.html#dataset-a as a (gzip-compressed) csv files, which are are split into one file per calendar month. The columns in the files csv are:

id: the anonymized counter ID (text)

timestamp: the UTC time at the beginning of a 15-minute time window to which the consumption refers (ISO-8601 timestamp)

value_kwh: the consumption in kWh in the time window under consideration (float)

In this archive, data from:

| Dateigrösse | Export Datum | Zeitraum | Dateiname || ----------- | ------------ | -------- | --------- || 4.2GiB | 2024-04-20 | 202402 | ckw_opendata_smartmeter_dataset_a_202402.csv.gz || 4.5GiB | 2024-03-21 | 202401 | ckw_opendata_smartmeter_dataset_a_202401.csv.gz || 4.5GiB | 2024-02-20 | 202312 | ckw_opendata_smartmeter_dataset_a_202312.csv.gz || 4.4GiB | 2024-01-20 | 202311 | ckw_opendata_smartmeter_dataset_a_202311.csv.gz || 4.5GiB | 2023-12-20 | 202310 | ckw_opendata_smartmeter_dataset_a_202310.csv.gz || 4.4GiB | 2023-11-20 | 202309 | ckw_opendata_smartmeter_dataset_a_202309.csv.gz || 4.5GiB | 2023-10-20 | 202308 | ckw_opendata_smartmeter_dataset_a_202308.csv.gz || 4.6GiB | 2023-09-20 | 202307 | ckw_opendata_smartmeter_dataset_a_202307.csv.gz || 4.4GiB | 2023-08-20 | 202306 | ckw_opendata_smartmeter_dataset_a_202306.csv.gz || 4.6GiB | 2023-07-20 | 202305 | ckw_opendata_smartmeter_dataset_a_202305.csv.gz || 3.3GiB | 2023-06-20 | 202304 | ckw_opendata_smartmeter_dataset_a_202304.csv.gz || 4.6GiB | 2023-05-24 | 202303 | ckw_opendata_smartmeter_dataset_a_202303.csv.gz || 4.2GiB | 2023-04-20 | 202302 | ckw_opendata_smartmeter_dataset_a_202302.csv.gz || 4.7GiB | 2023-03-20 | 202301 | ckw_opendata_smartmeter_dataset_a_202301.csv.gz || 4.6GiB | 2023-03-15 | 202212 | ckw_opendata_smartmeter_dataset_a_202212.csv.gz || 4.3GiB | 2023-03-15 | 202211 | ckw_opendata_smartmeter_dataset_a_202211.csv.gz || 4.4GiB | 2023-03-15 | 202210 | ckw_opendata_smartmeter_dataset_a_202210.csv.gz || 4.3GiB | 2023-03-15 | 202209 | ckw_opendata_smartmeter_dataset_a_202209.csv.gz || 4.4GiB | 2023-03-15 | 202208 | ckw_opendata_smartmeter_dataset_a_202208.csv.gz || 4.4GiB | 2023-03-15 | 202207 | ckw_opendata_smartmeter_dataset_a_202207.csv.gz || 4.2GiB | 2023-03-15 | 202206 | ckw_opendata_smartmeter_dataset_a_202206.csv.gz || 4.3GiB | 2023-03-15 | 202205 | ckw_opendata_smartmeter_dataset_a_202205.csv.gz || 4.2GiB | 2023-03-15 | 202204 | ckw_opendata_smartmeter_dataset_a_202204.csv.gz || 4.1GiB | 2023-03-15 | 202203 | ckw_opendata_smartmeter_dataset_a_202203.csv.gz || 3.5GiB | 2023-03-15 | 202202 | ckw_opendata_smartmeter_dataset_a_202202.csv.gz || 3.7GiB | 2023-03-15 | 202201 | ckw_opendata_smartmeter_dataset_a_202201.csv.gz || 3.5GiB | 2023-03-15 | 202112 | ckw_opendata_smartmeter_dataset_a_202112.csv.gz || 3.1GiB | 2023-03-15 | 202111 | ckw_opendata_smartmeter_dataset_a_202111.csv.gz || 3.0GiB | 2023-03-15 | 202110 | ckw_opendata_smartmeter_dataset_a_202110.csv.gz || 2.7GiB | 2023-03-15 | 202109 | ckw_opendata_smartmeter_dataset_a_202109.csv.gz || 2.6GiB | 2023-03-15 | 202108 | ckw_opendata_smartmeter_dataset_a_202108.csv.gz || 2.4GiB | 2023-03-15 | 202107 | ckw_opendata_smartmeter_dataset_a_202107.csv.gz || 2.1GiB | 2023-03-15 | 202106 | ckw_opendata_smartmeter_dataset_a_202106.csv.gz || 2.0GiB | 2023-03-15 | 202105 | ckw_opendata_smartmeter_dataset_a_202105.csv.gz || 1.7GiB | 2023-03-15 | 202104 | ckw_opendata_smartmeter_dataset_a_202104.csv.gz || 1.6GiB | 2023-03-15 | 202103 | ckw_opendata_smartmeter_dataset_a_202103.csv.gz || 1.3GiB | 2023-03-15 | 202102 | ckw_opendata_smartmeter_dataset_a_202102.csv.gz || 1.3GiB | 2023-03-15 | 202101 | ckw_opendata_smartmeter_dataset_a_202101.csv.gz |

was processed into partitioned parquet files, and then organised by id into parquet files with data from single smart meters.

A small sample of all the smart meters data above, are archived in the cloud public cloud space of AISOP project https://os.zhdk.cloud.switch.ch/swift/v1/aisop_public/ckw/ts/batch_0424/batch_0424.zip and also here is this public record. For access to the complete data contact the authors of this archive.

It consists of the following parquet files:

| Size | Date | Name |

|------|------|------|

| 1.0M | Mar 4 12:18 | 027ceb7b8fd77a4b11b3b497e9f0b174.parquet |

| 979K | Mar 4 12:18 | 03a4af696ff6a5c049736e9614f18b1b.parquet |

| 1.0M | Mar 4 12:18 | 03654abddf9a1b26f5fbbeea362a96ed.parquet |

| 1.0M | Mar 4 12:18 | 03acebcc4e7d39b6df5c72e01a3c35a6.parquet |

| 1.0M | Mar 4 12:18 | 039e60e1d03c2afd071085bdbd84bb69.parquet |

| 931K | Mar 4 12:18 | 036877a1563f01e6e830298c193071a6.parquet |

| 1.0M | Mar 4 12:18 | 02e45872f30f5a6a33972e8c3ba9c2e5.parquet |

| 662K | Mar 4 12:18 | 03a25f298431549a6bc0b1a58eca1f34.parquet |

| 635K | Mar 4 12:18 | 029a46275625a3cefc1f56b985067d15.parquet |

| 1.0M | Mar 4 12:18 | 0301309d6d1e06c60b4899061deb7abd.parquet |

| 1.0M | Mar 4 12:18 | 0291e323d7b1eb76bf680f6e800c2594.parquet |

| 1.0M | Mar 4 12:18 | 0298e58930c24010bbe2777c01b7644a.parquet |

| 1.0M | Mar 4 12:18 | 0362c5f3685febf367ebea62fbc88590.parquet |

| 1.0M | Mar 4 12:18 | 0390835d05372cb66f6cd4ca662399e8.parquet |

| 1.0M | Mar 4 12:18 | 02f670f059e1f834dfb8ba809c13a210.parquet |

| 987K | Mar 4 12:18 | 02af749aaf8feb59df7e78d5e5d550e0.parquet |

| 996K | Mar 4 12:18 | 0311d3c1d08ee0af3edda4dc260421d1.parquet |

| 1.0M | Mar 4 12:18 | 030a707019326e90b0ee3f35bde666e0.parquet |

| 955K | Mar 4 12:18 | 033441231b277b283191e0e1194d81e2.parquet |

| 995K | Mar 4 12:18 | 0317b0417d1ec91b5c243be854da8a86.parquet |

| 1.0M | Mar 4 12:18 | 02ef4e49b6fb50f62a043fb79118d980.parquet |

| 1.0M | Mar 4 12:18 | 0340ad82e9946be45b5401fc6a215bf3.parquet |

| 974K | Mar 4 12:18 | 03764b3b9a65886c3aacdbc85d952b19.parquet |

| 1.0M | Mar 4 12:18 | 039723cb9e421c5cbe5cff66d06cb4b6.parquet |

| 1.0M | Mar 4 12:18 | 0282f16ed6ef0035dc2313b853ff3f68.parquet |

| 1.0M | Mar 4 12:18 | 032495d70369c6e64ab0c4086583bee2.parquet |

| 900K | Mar 4 12:18 | 02c56641571fc9bc37448ce707c80d3d.parquet |

| 1.0M | Mar 4 12:18 | 027b7b950689c337d311094755697a8f.parquet |

| 1.0M | Mar 4 12:18 | 02af272adccf45b6cdd4a7050c979f9f.parquet |

| 927K | Mar 4 12:18 | 02fc9a3b2b0871d3b6a1e4f8fe415186.parquet |

| 1.0M | Mar 4 12:18 | 03872674e2a78371ce4dfa5921561a8c.parquet |

| 881K | Mar 4 12:18 | 0344a09d90dbfa77481c5140bb376992.parquet |

| 1.0M | Mar 4 12:18 | 0351503e2b529f53bdae15c7fbd56fc0.parquet |

| 1.0M | Mar 4 12:18 | 033fe9c3a9ca39001af68366da98257c.parquet |

| 1.0M | Mar 4 12:18 | 02e70a1c64bd2da7eb0d62be870ae0d6.parquet |

| 1.0M | Mar 4 12:18 | 0296385692c9de5d2320326eaa000453.parquet |

| 962K | Mar 4 12:18 | 035254738f1cc8a31075d9fbe3ec2132.parquet |

| 991K | Mar 4 12:18 | 02e78f0d6a8fb96050053e188bf0f07c.parquet |

| 1.0M | Mar 4 12:18 | 039e4f37ed301110f506f551482d0337.parquet |

| 961K | Mar 4 12:18 | 039e2581430703b39c359dc62924a4eb.parquet |

| 999K | Mar 4 12:18 | 02c6f7e4b559a25d05b595cbb5626270.parquet |

| 1.0M | Mar 4 12:18 | 02dd91468360700a5b9514b109afb504.parquet |

| 938K | Mar 4 12:18 | 02e99c6bb9d3ca833adec796a232bac0.parquet |

| 589K | Mar 4 12:18 | 03aef63e26a0bdbce4a45d7cf6f0c6f8.parquet |

| 1.0M | Mar 4 12:18 | 02d1ca48a66a57b8625754d6a31f53c7.parquet |

| 1.0M | Mar 4 12:18 | 03af9ebf0457e1d451b83fa123f20a12.parquet |

| 1.0M | Mar 4 12:18 | 0289efb0e712486f00f52078d6c64a5b.parquet |

| 1.0M | Mar 4 12:18 | 03466ed913455c281ffeeaa80abdfff6.parquet |

| 1.0M | Mar 4 12:18 | 032d6f4b34da58dba02afdf5dab3e016.parquet |

| 1.0M | Mar 4 12:18 | 03406854f35a4181f4b0778bb5fc010c.parquet |

| 1.0M | Mar 4 12:18 | 0345fc286238bcea5b2b9849738c53a2.parquet |

| 1.0M | Mar 4 12:18 | 029ff5169155b57140821a920ad67c7e.parquet |

| 985K | Mar 4 12:18 | 02e4c9f3518f079ec4e5133acccb2635.parquet |

| 1.0M | Mar 4 12:18 | 03917c4f2aef487dc20238777ac5fdae.parquet |

| 969K | Mar 4 12:18 | 03aae0ab38cebcb160e389b2138f50da.parquet |

| 914K | Mar 4 12:18 | 02bf87b07b64fb5be54f9385880b9dc1.parquet |

| 1.0M | Mar 4 12:18 | 02776685a085c4b785a3885ef81d427a.parquet |

| 947K | Mar 4 12:18 | 02f5a82af5a5ffac2fe7551bf4a0a1aa.parquet |

| 992K | Mar 4 12:18 | 039670174dbc12e1ae217764c96bbeb3.parquet |

| 1.0M | Mar 4 12:18 | 037700bf3e272245329d9385bb458bac.parquet |

| 602K | Mar 4 12:18 | 0388916cdb86b12507548b1366554e16.parquet |

| 939K | Mar 4 12:18 | 02ccbadea8d2d897e0d4af9fb3ed9a8e.parquet |

| 1.0M | Mar 4 12:18 | 02dc3f4fb7aec02ba689ad437d8bc459.parquet |

| 1.0M | Mar 4 12:18 | 02cf12e01cd20d38f51b4223e53d3355.parquet |

| 993K | Mar 4 12:18 | 0371f79d154c00f9e3e39c27bab2b426.parquet |

where each file contains data from a single smart meter.

Acknowledgement

The AISOP project (https://aisopproject.com/) received funding in the framework of the Joint Programming Platform Smart Energy Systems from European Union's Horizon 2020 research and innovation programme under grant agreement No 883973. ERA-Net Smart Energy Systems joint call on digital transformation for green energy transition.

Facebook

Twitter

Click to copy link

Link copied

Cite

mahdieh hajian (2024). Meta-analysis of aggregate data on medical events [Dataset]. https://www.kaggle.com/datasets/mahdiehhajian/meta-analysis-of-aggregate-data-on-medical-events/code

Data from: Meta-analysis of aggregate data on medical events

Meta-analysis of aggregate data on medical events

Explore at:

zip(1957 bytes)Available download formats

Dataset updated

Nov 18, 2024

Authors

mahdieh hajian

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Dataset provided by = Björn Holzhauer

Dataset Description==Meta-analyses of clinical trials often treat the number of patients experiencing a medical event as binomially distributed when individual patient data for fitting standard time-to-event models are unavailable. Assuming identical drop-out time distributions across arms, random censorship and low proportions of patients with an event, a binomial approach results in a valid test of the null hypothesis of no treatment effect with minimal loss in efficiency compared to time-to-event methods. To deal with differences in follow-up - at the cost of assuming specific distributions for event and drop-out times - we propose a hierarchical multivariate meta-analysis model using the aggregate data likelihood based on the number of cases, fatal cases and discontinuations in each group, as well as the planned trial duration and groups sizes. Such a model also enables exchangeability assumptions about parameters of survival distributions, for which they are more appropriate than for the expected proportion of patients with an event across trials of substantially different length. Borrowing information from other trials within a meta-analysis or from historical data is particularly useful for rare events data. Prior information or exchangeability assumptions also avoid the parameter identifiability problems that arise when using more flexible event and drop-out time distributions than the exponential one. We discuss the derivation of robust historical priors and illustrate the discussed methods using an example. We also compare the proposed approach against other aggregate data meta-analysis methods in a simulation study.

Clear search

Close search

Google apps

Main menu

Data from: Meta-analysis of aggregate data on medical events

Geometry and Opacity Data for Fractal Aggregates

Amazon Email Receipt Data | Consumer Transaction Data | Asia, EMEA, LATAM,...

Deep Granulometry

Coarse Aggregate Data

Fine Aggregate Data

Related publications:

Aggregate Thin Section Analysis Data from the Fimbule Ice Shelf Expedition

FHV Base Aggregate Report

Visual Granulometry: Image-based Granulometry of Concrete Aggregate

Introduction

Classification data

Related publications:

Data from: Open-data release of aggregated Australian school-level...

Independent Data Aggregation, Quality Control and Visualization of...

US Births by County and State

US Births by County and State

1985-2015 Aggregated Data

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

1991 Census: Aggregate Data; Northern Ireland

Netflix & Streaming Peers Email Receipt Data | Consumer Transaction Data |...

Synthetic Integrated Services Data

Motivation

Collection

Preprocessing

Recommended Uses

Known Limitations/Biases

Feedback

Further Documentation and Resources

Data from: Distribution and quality of potential sources of aggregate...

Soil aggregate stability data from arable and grassland in Countryside...

Concrete Aggregate PSD Imaging Dataset

Dataset Summary

Data Acquisition

Datasets

𝑀ᶠⁱⁿᵉ — Fine Material (< 2 mm)

𝑀ᶜᵒᵃʳˢᵉ — Coarse Material (2–16 mm)

Reference PSD Measurement

Use Cases

Citation

Data from: Data and code from: A high throughput approach for measuring soil...

Death Dataset Aggregated at the State/County Level 1990 - Current Annual...

Hydroinformatics Instruction Module Example Code: Databases and SQL in...

CKW Smart Meter Data

Data from: Meta-analysis of aggregate data on medical eventsSee More Versions

Meta-analysis of aggregate data on medical events

Data from: Meta-analysis of aggregate data on medical events