Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the simulation data of the combinatorial metamaterial as used for the paper 'Machine Learning of Combinatorial Rules in Mechanical Metamaterials', as published in XXX.
In this paper, the data is used to classify each \(k \times k\) unit cell design into one of two classes (C or I) based on the scaling (linear or constant) of the number of zero modes \(M_k(n)\) for metamaterials consisting of an \(n\times n\) tiling of the corresponding unit cell. Additionally, a random walk through the design space starting from class C unit cells was performed to characterize the boundary between class C and I in design space. A more detailed description of the contents of the dataset follows below.
Modescaling_raw_data.zip
This file contains uniformly sampled unit cell designs and \(M_k(n)\) for \(1\leq n\leq 4\), which was used to classify the unit cell designs for the data set. There is a small subset of designs for \(k=\{3, 4, 5\}\) that do not neatly fall into the class C and I classification, and instead require additional simulation for \(4 \leq n \leq 6\) before either saturating to a constant number of zero modes (class I) or linearly increasing (class C). This file contains the simulation data of size \(3 \leq k \leq 8\) unit cells. The data is organized as follows.
Simulation data for \(3 \leq k \leq 5\) and \(1 \leq n \leq 4\) is stored in numpy array format (.npy) and can be readily loaded in Python with the Numpy package using the numpy.load command. These files are named "data_new_rrQR_i_n_M_kxk_fixn4.npy", and contain a [Nsim, 1+k*k+4] sized array, where Nsim is the number of simulated unit cells. Each row corresponds to a unit cell. The columns are organized as follows:
Note: the unit cell design uses the numbers \(\{0, 1, 2, 3\}\) to refer to each building block orientation. The building block orientations can be characterized through the orientation of the missing diagonal bar (see Fig. 2 in the paper), which can be Left Up (LU), Left Down (LD), Right Up (RU), or Right Down (RD). The numbers correspond to the building block orientation \(\{0, 1, 2, 3\} = \{\mathrm{LU, RU, RD, LD}\}\).
Simulation data for \(3 \leq k \leq 5\) and \(1 \leq n \leq 6\) for unit cells that cannot be classified as class C or I for \(1 \leq n \leq 4\) is stored in numpy array format (.npy) and can be readily loaded in Python with the Numpy package using the numpy.load command. These files are named "data_new_rrQR_i_n_M_kxk_fixn4_classX_extend.npy", and contain a [Nsim, 1+k*k+6] sized array, where Nsim is the number of simulated unit cells. Each row corresponds to a unit cell. The columns are organized as follows:
Simulation data for \(6 \leq k \leq 8\) unit cells are stored in numpy array format (.npy) and can be readily loaded in Python with the Numpy package using the numpy.load command. Note that the number of modes is now calculated for \(n_x \times n_y\) metamaterials, where we calculate \((n_x, n_y) = \{(1,1), (2, 2), (3, 2), (4,2), (2, 3), (2, 4)\}\) rather than \(n_x=n_y=n\) to save computation time. These files are named "data_new_rrQR_i_n_Mx_My_n4_kxk(_extended).npy", and contain a [Nsim, 1+k*k+8] sized array, where Nsim is the number of simulated unit cells. Each row corresponds to a unit cell. The columns are organized as follows:
Modescaling_classification_results.zip
This file contains the classification, slope, and offset of the scaling of the number of zero modes \(M_k(n)\) for the unit cells in Modescaling_raw_data.zip. The data is organized as follows.
The results for \(3 \leq k \leq 5\) based on the \(1 \leq n \leq 4\) mode scaling data is stored in "results_analysis_new_rrQR_i_Scen_slope_offset_M1k_kxk_fixn4.txt". The data can be loaded using ',' as delimiter. Every row corresponds to a unit cell design (see the label number to compare to the earlier data). The columns are organized as follows:
col 0: label number to keep track
col 1: the class, where 0 corresponds to class I, 1 to class C and 2 to class X (neither class I or C for \(1 \leq n \leq 4\))
col 2: slope from \(n \geq 2\) onward (undefined for class X)
col 3: the offset is defined as \(M_k(2) - 2 \cdot \mathrm{slope}\)
col 4: \(M_k(1)\)
The results for \(3 \leq k \leq 5\) based on the extended \(1 \leq n \leq 6\) mode scaling data is stored in "results_analysis_new_rrQR_i_Scen_slope_offset_M1k_kxk_fixn4_classC_extend.txt". The data can be loaded using ',' as delimiter. Every row corresponds to a unit cell design (see the label number to compare to the earlier data). The columns are organized as follows:
col 0: label number to keep track
col 1: the class, where 0 corresponds to class I, 1 to class C and 2 to class X (neither class I or C for \(1 \leq n \leq 6\))
col 2: slope from \(n \geq 2\) onward (undefined for class X)
col 3: the offset is defined as \(M_k(2) - 2 \cdot \mathrm{slope}\)
col 4: \(M_k(1)\)
The results for \(6 \leq k \leq 8\) based on the \(1 \leq n \leq 4\) mode scaling data is stored in "results_analysis_new_rrQR_i_Scenx_Sceny_slopex_slopey_offsetx_offsety_M1k_kxk(_extended).txt". The data can be loaded using ',' as delimiter. Every row corresponds to a unit cell design (see the label number to compare to the earlier data). The columns are organized as follows:
col 0: label number to keep track
col 1: the class_x based on \(M_k(n_x, 2)\), where 0 corresponds to class I, 1 to class C and 2 to class X (neither class I or C for \(1 \leq n_x \leq 4\))
col 2: the class_y based on \(M_k(2, n_y)\), where 0 corresponds to class I, 1 to class C and 2 to class X (neither class I or C for \(1 \leq n_y \leq 4\))
col 3: slope_x from \(n_x \geq 2\) onward (undefined for class X)
col 4: slope_y from \(n_y \geq 2\) onward (undefined for class X)
col 5: the offset_x is defined as \(M_k(2, 2) - 2 \cdot \mathrm{slope_x}\)
col 6: the offset_x is defined as \(M_k(2, 2) - 2 \cdot \mathrm{slope_y}\)
col 7: \(M_k(1, 1)\)
Random Walks Data
This file contains the random walks for \(3 \leq k \leq 8\) unit cells. The random walk starts from a class C unit cell design, for each step \(s\) a randomly picked unit cell is changed to a random new orientation for a total of \(s=k^2\) steps. The data is organized as follows.
The configurations for each step are stored in the files named "configlist_test_i.npy", where i is a number and corresponds to a different starting unit cell. The stored array has the shape [k*k+1, 2*k+2, 2*k+2]. The first dimension denotes the step \(s\), where \(s=0\) is the initial configuration. The second and third dimension denote the unit cell configuration in the pixel representation (see paper) padded with a single pixel wide layer using periodic boundary conditions.
The class for each configuration are stored in "lmlist_test_i.npy", where i corresponds to the same number as for the configurations in the "configlist_test_i.npy" file. The stored array has
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Modes Of Transport is a dataset for object detection tasks - it contains Cars Bikes annotations for 401 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Data sets consists of multiple multivariate time series. Each data set is further divided into training and test subsets. Each time series is from a different engine i.e., the data can be considered to be from a fleet of engines of the same type. Each engine starts with different degrees of initial wear and manufacturing variation which is unknown to the user. This wear and variation is considered normal, i.e., it is not considered a fault condition. There are three operational settings that have a substantial effect on engine performance. These settings are also included in the data. The data is contaminated with sensor noise. The engine is operating normally at the start of each time series, and develops a fault at some point during the series. In the training set, the fault grows in magnitude until system failure. In the test set, the time series ends some time prior to system failure. The objective of the competition is to predict the number of remaining operational cycles before failure in the test set, i.e., the number of operational cycles after the last cycle that the engine will continue to operate. Also provided a vector of true Remaining Useful Life (RUL) values for the test data. The data are provided as a zip-compressed text file with 26 columns of numbers, separated by spaces. Each row is a snapshot of data taken during a single operational cycle, each column is a different variable. The columns correspond to: 1) unit number 2) time, in cycles 3) operational setting 1 4) operational setting 2 5) operational setting 3 6) sensor measurement 1 7) sensor measurement 2 ... 26) sensor measurement 26 Data Set: FD001 Train trjectories: 100 Test trajectories: 100 Conditions: ONE (Sea Level) Fault Modes: ONE (HPC Degradation) Data Set: FD002 Train trjectories: 260 Test trajectories: 259 Conditions: SIX Fault Modes: ONE (HPC Degradation) Data Set: FD003 Train trjectories: 100 Test trajectories: 100 Conditions: ONE (Sea Level) Fault Modes: TWO (HPC Degradation, Fan Degradation) Data Set: FD004 Train trjectories: 248 Test trajectories: 249 Conditions: SIX Fault Modes: TWO (HPC Degradation, Fan Degradation) Reference: A. Saxena, K. Goebel, D. Simon, and N. Eklund, ‘Damage Propagation Modeling for Aircraft Engine Run-to-Failure Simulation’, in the Proceedings of the 1st International Conference on Prognostics and Health Management (PHM08), Denver CO, Oct 2008.
This dataset contains concurrent airborne DopplerScatt radar retrievals of surface vector winds and ocean currents from the Sub-Mesoscale Ocean Dynamics Experiment (S-MODE) during a pilot campaign conducted approximately 300 km offshore of San Francisco over two weeks in October 2021. S-MODE aims to understand how ocean dynamics acting on short spatial scales influence the vertical exchange of physical and biological variables in the ocean. DopplerScatt is a Ka-band (35.75 GHz) scatterometer with a swath width of 24 km that records Doppler measurements of the relative velocity between the platform and the surface. It is mounted on a B200 aircraft which flies daily surveys of the field domain during deployments, and data is used to give larger scale context, and also to compare with in-situ measurements of velocities and divergence. Level 2 data includes estimates of surface winds and currents. The V1 data have been cross-calibrated against SIO-DopVis leading to the 'dopvis_2021' current geophysical model function. It is expected that additional DopVis data will lead to a reprocessing of this data set and it should be regarded as provisional, to be refined after future S-MODE deployments. Data are available in netCDF format.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides the forcings and boundary conditions used for ModE-Sim Set 1420-2. The output for the individual ensemble members and ensemble statistics can be found in the other datasets within this dataset group. Example run scripts of the simulations can be found in second additional info file at the experiment level. Information on the experiment design and the variables included in this dataset can be found in the experiment summary and the additional information provided with it. For a detailed description of the ModE-Sim please refer to the documentation paper (reference provided in the summary at the experiment level).
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Datasets with dissolved gases concentrations in power transformer oil for remaining useful life (RUL), fault detection and diagnosis (FDD) problems.
Power transformers (PTs) are an important component of a nuclear power plant (NPP). They convert alternating voltage and are instrumental in power supply of both external NPP energy consumers and NPPs themselves. Currently, many PTs have exceeded planned service life that had been extended over the designated 25 years. Due to the extension, monitoring the PT technical condition becomes an urgent matter.
An important method for monitoring and diagnosing PTs is Chromatographic Analysis of Dissolved Gas (CADG). It is based on the principle of forced extraction and analysis of dissolved gases from PT oil. Almost all types of equipment defects are accompanied by formation of gases that dissolve in oil; certain types of defects generate certain gases in different quantities. The concentrations also differ on various stages of defects developing that allows to calculate RUL of the PT. At present, NPP control and diagnostic systems for PT equipment use predefined control limits for concentration of dissolved gases in oil. The main disadvantages of this approach are the lack of automatic control and insufficient quality of diagnostics, especially for PTs with extended service life. To combat these shortcomings in diagnostic systems for the analysis of data obtained using CADG, machine learning (ML) methods can be used, as they are used in diagnostics of many NNP components.
The datasets are available as .csv files containing 420 records of gas concentration, presented as a time dependence. The gasses are 𝐻2, 𝐶𝑂, 𝐶2𝐻4 и 𝐶2𝐻2. The period between time points is 12 hours. There are 3000 datasets splitted into train (2100 datasets) and test (900 datasets) sets.
For RUL problem, annotations are available (in the separate files): each .csv file corresponds to a value in points that is equal the time remaining until the equipment fails, at the end of record.
For FDD problems, there are labels (in the separate files) with four PT operating modes (classes): 1. Normal mode (2436 datasets); 2. Partial discharge: local dielectric breakdown in gas-filled cavities (127 datasets); 3. Low energy discharge: sparking or arc discharges in poor contact connections of structural elements with different or floating potential; discharges between PT core structural elements, high voltage winding taps and the tank, high voltage winding and grounding; discharges in oil during contact switching (162 datasets); 4. Low-temperature overheating: oil flow disruption in windings cooling channels, magnetic system causing low efficiency of the cooling system for temperatures < 300 °C (275 datasets).
Data in this repository is an extension (test set added) of data from here and here.
In our case, the fault detection problem transforms into a classification problem, since the data is related to one of four labeled classes (including one normal and three anomalous), so the model’s output needs to be a class number. The problem can be stated as binary classification (healthy/anomalous) for fault detection or multi class classification (on of 4 states) for fault diagnosis.
To ensure high-quality maintenance and repair, it is vital to be aware of potential malfunctions and predict RUL of transformer equipment. Therefore, it is necessary to create a mathematical model that will determine RUL by the final 420 points.
This dataset contains concurrent airborne DopplerScatt radar retrievals of surface vector winds and ocean currents from the Sub-Mesoscale Ocean Dynamics Experiment (S-MODE). S-MODE aims to understand how ocean dynamics acting on short spatial scales influence the vertical exchange of physical and biological variables in the ocean. Data were collected approximately 300 km offshore of San Fransisco during a pilot campaign in October 2021, and two intensive operating periods (IOPs) in Fall 2022 and Spring 2023. DopplerScatt is a Ka-band (35.75 GHz) scatterometer with a swath width of 24 km that records Doppler measurements of the relative velocity between the platform and the surface. It is mounted on a B200 aircraft which flies daily surveys of the field _domain during deployments, and data is used to give larger scale context, and also to compare with in-situ measurements of velocities and divergence. Level 2 data includes estimates of surface winds and currents. The V2 data have been cross-calibrated against ADCPs, surface drifters, and the SIO-DopVis instrument collected during the Pilot and IOP1 campaigns. Additional DopVis data collected during IOP1 and IOP2, in addition to IOP2 ADCP and surface drifter data will lead to a reprocessing of this dataset, and it should be regarded as provisional. Data are available in netCDF format.
The dataset consists of source code and LLVM IR pairs generated from accepted and de-duped programming contest solutions. The dataset is divided into language configs and mode splits. The language can be one of C, C++, D, Fortran, Go, Haskell, Nim, Objective-C, Python, Rust and Swift, indicating the source files' languages. The mode split indicates the compilation mode, which can be wither Size_Optimized or Perf_Optimized.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This training dataset was calculated using the mechanistic modeling approach. See the “Benchmark Synthetic Training Data for Artificial Intelligence-based Li-ion Diagnosis and Prognosis“ publication for mode details. More details will be added when published. The prognosis dataset was harder to define as there are no limits on how the three degradation modes can evolve. For this proof of concept work, we considered eight parameters to scan. For each degradation mode, degradation was chosen to follow equation (1).
%degradation=a × cycle+ (exp^(b×cycle)-1) (1)
Considering the three degradation modes, this accounts for six parameters to scan. In addition, two other parameters were added, a delay for the exponential factor for LLI, and a parameter for the reversibility of lithium plating. The delay was introduced to reflect degradation paths where plating cannot be explained by an increase of LAMs or resistance [55]. The chosen parameters and their values are summarized in Table S1 and their evolution is represented in Figure S1. Figure S1(a,b) presents the evolution of parameters p1 to p7. At the worst, the cells endured 100% of one of the degradation modes in around 1,500 cycles. Minimal LLI was chosen to be 20% after 3,000 cycles. This is to guarantee at least 20% capacity loss for all the simulations. For the LAMs, conditions were less restrictive, and, after 3,000 cycles, the lowest degradation is of 3%. The reversibility factor p8 was calculated with equation (2) when LAMNE > PT.
%LLI=%LLI+p8 (LAM_PE-PT) (2)
Where PT was calculated with equation (3) from [60].
PT=100-((100-LAMPE)/(100×LRini-LAMPE ))×(100-OFSini-LLI) (3)
Varying all those parameters accounted for more than 130,000 individual duty cycles. With one voltage curve for every 100 cycles. 6 MATLAB© .mat files are included: The GIC-LFP_duty_other.mat file contains 12 variables Qnorm: normalize capacity scale for all voltage curves
P1 to p8: values used to generate the duty cycles
Key: index for which values were used for each degradation paths. 1 -p1, … 8 - p8
QL: capacity loss, one line per path, one column per 100 cycles.
File GIC-LFP_duty_LLI-LAMsvalues.mat contains the values for LLI, LAMPE and LAMNE for all cycles (1line per 100 cycles) and duty cycles (columns).
Files GIC-LFP_duty_1 to _4 files contains the voltage data split into 1GB chunks (40,000 simulations). Each cell corresponds to 1 line in the key variable. Inside each cell, one colunm per 100 cycles.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset is part of the following publication at the TransAI 2023 conference: R. Wallsberger, R. Knauer, S. Matzka; "Explainable Artificial Intelligence in Mechanical Engineering: A Synthetic Dataset for Comprehensive Failure Mode Analysis" DOI: http://dx.doi.org/10.1109/TransAI60598.2023.00032
This is the original XAI Drilling dataset optimized for XAI purposes and it can be used to evaluate explanations of such algortihms. The dataset comprises 20,000 data points, i.e., drilling operations, stored as rows, 10 features, one binary main failure label, and 4 binary subgroup failure modes, stored in columns. The main failure rate is about 5.0 % for the whole dataset. The features that constitute this dataset are as follows:
Process time t (s): This feature captures the full duration of each drilling operation, providing insights into efficiency and potential bottlenecks.
Main failure: This binary feature indicates if any significant failure on the drill bit occurred during the drilling process. A value of 1 flags a drilling process that encountered issues, which in this case is true when any of the subgroup failure modes are 1, while 0 indicates a successful drilling operation without any major failures.
Subgroup failures: - Build-up edge failure (215x): Represented as a binary feature, a build-up edge failure indicates the occurrence of material accumulation on the cutting edge of the drill bit due to a combination of low cutting speeds and insufficient cooling. A value of 1 signifies the presence of this failure mode, while 0 denotes its absence. - Compression chips failure (344x): This binary feature captures the formation of compressed chips during drilling, resulting from the factors high feed rate, inadequate cooling and using an incompatible drill bit. A value of 1 indicates the occurrence of at least two of the three factors above, while 0 suggests a smooth drilling operation without compression chips. - Flank wear failure (278x): A binary feature representing the wear of the drill bit's flank due to a combination of high feed rates and low cutting speeds. A value of 1 indicates significant flank wear, affecting the drilling operation's accuracy and efficiency, while 0 denotes a wear-free operation. - Wrong drill bit failure (300x): As a binary feature, it indicates the use of an inappropriate drill bit for the material being drilled. A value of 1 signifies a mismatch, leading to potential drilling issues, while 0 indicates the correct drill bit usage.
This dataset (in .csv format), accompanying codebook and replication code serve as supplement to a study titled: “Does the mode of administration impact on quality of data? Comparing a traditional survey versus an online survey via a Voting Advice Application” submitted for publication to the journal: “Survey Research Methods”). The study involved comparisons of responses to two near-identical questionnaires administered via a traditional survey and through a Voting Advice Application (VAA) both designed for and administered during the pre-electoral period of the Cypriot Presidential Elections of 2013. The offline dataset consisted of questionnaires collected from 818 individuals whose participation was elicited through door-to-door stratified random sampling with replacement of individuals who could not be contacted. The strata were designed to take into account the regional population density, gender, age and whether the area was urban or rural. Offline participants completed a pen-and-paper questionnaire version of the VAA in a self-completing capacity, although the person administering the questionnaire remained present throughout. The online dataset involved responses from 10,241 VAA users who completed the Choose4Cyprus VAA. Voting Advice Applications are online platforms that provide voting recommendations to users based on their closeness to political parties after they declare their agreement or disagreement on a number of policy statements. VAA users freely visited the VAA website and completed the relevant questionnaire in a self-completing capacity. The two modes of administration (online and offline) involved respondents completing a series of supplementary questions (demographics, ideological affinity & political orientation [e.g. vote in the previous election]) prior to the main questionnaire consisting of 35 and 30 policy-related Likert-type items for the offline and online mode respectively. The dataset includes all 30 policy items that were common between the two modes, although only the first 19 (q1:q19) appeared in the same order and in the same position in the two questionnaires; as such, all analyses reported in the article were conducted using these 19 items only. The phrasing of the questions was identical for the two modes and is described per variable in the attached codebook.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Outdoor_mode_dataset is a dataset for instance segmentation tasks - it contains Pothole annotations for 813 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains a consolidated view of Official Utilisation figures across all transport modes (train, metro, bus, ferry and light rail). Opal daily tap-on/tap-off data is aggregated to a total monthly figure representing the estimated number of trips across all transport modes. Starting July 1, 2024, the methodology for calculating trip numbers for individual lines and operators will change to more accurately reflect the services our passengers use within the transport network. This new approach will apply to trains, metros, light rail, and ferries, and will soon be extended to buses. Aggregations between line, agency, and mode levels will no longer be valid, as a passenger may use multiple lines on a single trip. Trip numbers at the line, operator, or mode level should be used as reported, without further combinations. The dataset includes reports based on both the new and old methodologies, with a transition to the new method taking place over the coming months. As a result of this change, caution should be exercised when analysing longer trends that utilise both datasets. More information on NRT ROAM can be accessed here
Below are frequency comparisons of different models with experiment Note Modeshapes aren't very descriptive for higher modes. There is coupling between them so this is just an approximate naming scheme. See modeshape plots for more details. PDF files are provided with figures of the modeshapes for selected FEM TET10 model (Nov 2011) (CASE 10) Hex8 Modeshapes (CASE 4) TET10 no modelcart (CASE 5) HIRENASD TET model with modelcart - new OML HIRENASD HEX 8 Wing only model Mode 1 Mode 1 Mode 2 Mode 2 Mode 3 Mode 3 Mode 4 Mode 4 Mode 5 Mode 5 Mode 6 Mode 6 Mode 7 Mode 7 Mode 8 Mode 8 Mode 9 Mode 9 Mode 10 Mode 10 Mode 11 Mode 12
Maximilian B. Kiss, Sophia B. Coban, K. Joost Batenburg, Tristan van Leeuwen, and Felix Lucka "2DeteCT - A large 2D expandable, trainable, experimental Computed Tomography dataset for machine learning", Sci Data 10, 576 (2023) or arXiv:2306.05907 (2023)
Abstract: "Recent research in computational imaging largely focuses on developing machine learning (ML) techniques for image reconstruction, which requires large-scale training datasets consisting of measurement data and ground-truth images. However, suitable experimental datasets for X-ray Computed Tomography (CT) are scarce, and methods are often developed and evaluated only on simulated data. We fill this gap by providing the community with a versatile, open 2D fan-beam CT dataset suitable for developing ML techniques for a range of image reconstruction tasks. To acquire it, we designed a sophisticated, semi-automatic scan procedure that utilizes a highly-flexible laboratory X-ray CT setup. A diverse mix of samples with high natural variability in shape and density was scanned slice-by-slice (5000 slices in total) with high angular and spatial resolution and three different beam characteristics: A high-fidelity, a low-dose and a beam-hardening-inflicted mode. In addition, 750 out-of-distribution slices were scanned with sample and beam variations to accommodate robustness and segmentation tasks. We provide raw projection data, reference reconstructions and segmentations based on an open-source data processing pipeline."
The data collection has been acquired using a highly flexible, programmable and custom-built X-ray CT scanner, the FleX-ray scanner, developed by TESCAN-XRE NV, located in the FleX-ray Lab at the Centrum Wiskunde & Informatica (CWI) in Amsterdam, Netherlands. It consists of a cone-beam microfocus X-ray point source (limited to 90 kV and 90 W) that projects polychromatic X-rays onto a 14-bit CMOS (complementary metal-oxide semiconductor) flat panel detector with CsI(Tl) scintillator (Dexella 1512NDT) and 1536-by-1944 pixels, each. To create a 2D dataset, a fan-beam geometry was mimicked by only reading out the central row of the detector. Between source and detector there is a rotation stage, upon which samples can be mounted. The machine components (i.e., the source, the detector panel, and the rotation stage) are mounted on translation belts that allow the moving of the components independently from one another.
Please refer to the paper for all further technical details.
The complete data collection can be found via the following links: 1-1,000, 1,001-2,000, 2,001-3,000, 3,001-4,000, 4,001-5,000, 5,521-6,370.
Each slice folder ‘slice00001 - slice05000’ and ‘slice05521 - slice06370’ contains three folders for each mode: ‘mode1’, ‘mode2’, ‘mode3’. In each of these folders there are the sinogram, the dark-field, and the two flat-fields for the raw data archives, or just the reconstructions and for mode2 the additional reference segmentation.
The corresponding reference reconstructions and segmentations can be found via the following links: 1-1,000, 1,001-2,000, 2,001-3,000, 3,001-4,000, 4,001-5,000, 5,521-6,370.
The corresponding Python scripts for loading, pre-processing, reconstructing and segmenting the projection data in the way described in the paper can be found on github. A machine-readable file with the used scanning parameters and instrument data for each acquisition mode as well as a script loading it can be found on the GitHub repository as well.
Note: It is advisable to use the graphical user interface when decompressing the .zip archives. If you experience a zipbomb error when unzipping the file on a Linux system rerun the command with the UNZIP_DISABLE_ZIPBOMB_DETECTION=TRUE environment variable by setting in your .bashrc “export UNZIP_DISABLE_ZIPBOMB_DETECTION=TRUE”.
For more information or guidance in using the data collection, please get in touch with
Maximilian.Kiss [at] cwi.nl
Felix.Lucka [at] cwi.nl
Some data name
Just testing... this dataset is temporary and will be removed soon.
Usage
from datasets import load_dataset
ds = load_dataset(repo_id, name="luo_latn", split="train", streaming=True)
print(next(iter(ds)))
License
Internal testing only
Statistics
num_bytes num_examples
luo_latn 33655 5
gla_latn 231302 26
vie_latn11881006 997
bos_latn… See the full description on the dataset page: https://huggingface.co/datasets/malteos/tmp4c-2.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about book series. It has 1 row and is filtered where the books is English metre : major modes and critical questions. It features 2 columns including publication dates.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides counts of tap ons and tap offs made on the Opal ticketing system during two non-consecutive weeks in 2016. The Opal tap on and tap off dataset contains six CSV files covering two weeks (14 days) of Opal data across the four public transport modes.
Privacy is the utmost priority for all Transport for NSW Open Data and there is no information that can identify any individual in the Open Opal Tap On and Tap Off data. This means that any data that is, or can be, linked to an individual’s Opal card has been removed.
This dataset is subject to specific terms and conditions
There are three CSV files per week, and these provide a privacy-protected count of taps against:
Time – binned to 15 minutes by tap (tap on or tap off), by date and by mode
Location– by tap (tap on or tap off), by date and by mode
Time with location – binned to 15 minutes, by tap (tap on or tap off), by date and by mode
The tap on and tap off counts are not linked and individual trips cannot be derived using the data.
The two weeks of Opal data are:
Monday 21 November 2016 – Sunday 27 November 2016
Monday 26 December 2016 – Sunday 1 January 2017
Release 1 files are also linked below.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about book subjects. It has 2 rows and is filtered where the books is Allegory : the theory of a symbolic mode. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset was derived by the Bioregional Assessment Programme from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.
Various climate variables summary for all 15 subregions based on Bureau of Meteorology Australian Water Availability Project (BAWAP) climate grids. Including
Time series mean annual BAWAP rainfall from 1900 - 2012.
Long term average BAWAP rainfall and Penman Potentail Evapotranspiration (PET) from Jan 1981 - Dec 2012 for each month
Values calculated over the years 1981 - 2012 (inclusive), for 17 time periods (i.e., annual, 4 seasons and 12 months) for the following 8 meteorological variables: (i) BAWAP_P (precipitation); (ii) Penman ETp; (iii) Tavg (average temperature); (iv) Tmax (maximum temperature); (v) Tmin (minimum temperature); (vi) VPD (Vapour Pressure Deficit); (vii) Rn (net radiation); and (viii) Wind speed. For each of the 17 time periods for each of the 8 meteorological variables have calculated the: (a) average; (b) maximum; (c) minimum; (d) average plus standard deviation (stddev); (e) average minus stddev; (f) stddev; and (g) trend.
Correlation coefficients (-1 to 1) between rainfall and 4 remote rainfall drivers between 1957-2006 for the four seasons. The data and methodology are described in Risbey et al. (2009).
As described in the Risbey et al. (2009) paper, the rainfall was from 0.05 degree gridded data described in Jeffrey et al. (2001 - known as the SILO datasets); sea surface temperature was from the Hadley Centre Sea Ice and Sea Surface Temperature dataset (HadISST) on a 1 degree grid. BLK=Blocking; DMI=Dipole Mode Index; SAM=Southern Annular Mode; SOI=Southern Oscillation Index; DJF=December, January, February; MAM=March, April, May; JJA=June, July, August; SON=September, October, November. The analysis is a summary of Fig. 15 of Risbey et al. (2009).
There are 4 csv files here:
BAWAP_P_annual_BA_SYB_GLO.csv
Desc: Time series mean annual BAWAP rainfall from 1900 - 2012.
Source data: annual BILO rainfall
P_PET_monthly_BA_SYB_GLO.csv
long term average BAWAP rainfall and Penman PET from 198101 - 201212 for each month
Climatology_Trend_BA_SYB_GLO.csv
Values calculated over the years 1981 - 2012 (inclusive), for 17 time periods (i.e., annual, 4 seasons and 12 months) for the following 8 meteorological variables: (i) BAWAP_P; (ii) Penman ETp; (iii) Tavg; (iv) Tmax; (v) Tmin; (vi) VPD; (vii) Rn; and (viii) Wind speed. For each of the 17 time periods for each of the 8 meteorological variables have calculated the: (a) average; (b) maximum; (c) minimum; (d) average plus standard deviation (stddev); (e) average minus stddev; (f) stddev; and (g) trend
Risbey_Remote_Rainfall_Drivers_Corr_Coeffs_BA_NSB_GLO.csv
Correlation coefficients (-1 to 1) between rainfall and 4 remote rainfall drivers between 1957-2006 for the four seasons. The data and methodology are described in Risbey et al. (2009). As described in the Risbey et al. (2009) paper, the rainfall was from 0.05 degree gridded data described in Jeffrey et al. (2001 - known as the SILO datasets); sea surface temperature was from the Hadley Centre Sea Ice and Sea Surface Temperature dataset (HadISST) on a 1 degree grid. BLK=Blocking; DMI=Dipole Mode Index; SAM=Southern Annular Mode; SOI=Southern Oscillation Index; DJF=December, January, February; MAM=March, April, May; JJA=June, July, August; SON=September, October, November. The analysis is a summary of Fig. 15 of Risbey et al. (2009).
Dataset was created from various BAWAP source data, including Monthly BAWAP rainfall, Tmax, Tmin, VPD, etc, and other source data including monthly Penman PET, Correlation coefficient data. Data were extracted from national datasets for the GLO subregion.
BAWAP_P_annual_BA_SYB_GLO.csv
Desc: Time series mean annual BAWAP rainfall from 1900 - 2012.
Source data: annual BILO rainfall
P_PET_monthly_BA_SYB_GLO.csv
long term average BAWAP rainfall and Penman PET from 198101 - 201212 for each month
Climatology_Trend_BA_SYB_GLO.csv
Values calculated over the years 1981 - 2012 (inclusive), for 17 time periods (i.e., annual, 4 seasons and 12 months) for the following 8 meteorological variables: (i) BAWAP_P; (ii) Penman ETp; (iii) Tavg; (iv) Tmax; (v) Tmin; (vi) VPD; (vii) Rn; and (viii) Wind speed. For each of the 17 time periods for each of the 8 meteorological variables have calculated the: (a) average; (b) maximum; (c) minimum; (d) average plus standard deviation (stddev); (e) average minus stddev; (f) stddev; and (g) trend
Risbey_Remote_Rainfall_Drivers_Corr_Coeffs_BA_NSB_GLO.csv
Correlation coefficients (-1 to 1) between rainfall and 4 remote rainfall drivers between 1957-2006 for the four seasons. The data and methodology are described in Risbey et al. (2009). As described in the Risbey et al. (2009) paper, the rainfall was from 0.05 degree gridded data described in Jeffrey et al. (2001 - known as the SILO datasets); sea surface temperature was from the Hadley Centre Sea Ice and Sea Surface Temperature dataset (HadISST) on a 1 degree grid. BLK=Blocking; DMI=Dipole Mode Index; SAM=Southern Annular Mode; SOI=Southern Oscillation Index; DJF=December, January, February; MAM=March, April, May; JJA=June, July, August; SON=September, October, November. The analysis is a summary of Fig. 15 of Risbey et al. (2009).
Bioregional Assessment Programme (2014) GLO climate data stats summary. Bioregional Assessment Derived Dataset. Viewed 18 July 2018, http://data.bioregionalassessments.gov.au/dataset/afed85e0-7819-493d-a847-ec00a318e657.
Derived From Natural Resource Management (NRM) Regions 2010
Derived From Bioregional Assessment areas v03
Derived From BILO Gridded Climate Data: Daily Climate Data for each year from 1900 to 2012
Derived From Bioregional Assessment areas v01
Derived From Bioregional Assessment areas v02
Derived From GEODATA TOPO 250K Series 3
Derived From NSW Catchment Management Authority Boundaries 20130917
Derived From Geological Provinces - Full Extent
Derived From GEODATA TOPO 250K Series 3, File Geodatabase format (.gdb)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the simulation data of the combinatorial metamaterial as used for the paper 'Machine Learning of Combinatorial Rules in Mechanical Metamaterials', as published in XXX.
In this paper, the data is used to classify each \(k \times k\) unit cell design into one of two classes (C or I) based on the scaling (linear or constant) of the number of zero modes \(M_k(n)\) for metamaterials consisting of an \(n\times n\) tiling of the corresponding unit cell. Additionally, a random walk through the design space starting from class C unit cells was performed to characterize the boundary between class C and I in design space. A more detailed description of the contents of the dataset follows below.
Modescaling_raw_data.zip
This file contains uniformly sampled unit cell designs and \(M_k(n)\) for \(1\leq n\leq 4\), which was used to classify the unit cell designs for the data set. There is a small subset of designs for \(k=\{3, 4, 5\}\) that do not neatly fall into the class C and I classification, and instead require additional simulation for \(4 \leq n \leq 6\) before either saturating to a constant number of zero modes (class I) or linearly increasing (class C). This file contains the simulation data of size \(3 \leq k \leq 8\) unit cells. The data is organized as follows.
Simulation data for \(3 \leq k \leq 5\) and \(1 \leq n \leq 4\) is stored in numpy array format (.npy) and can be readily loaded in Python with the Numpy package using the numpy.load command. These files are named "data_new_rrQR_i_n_M_kxk_fixn4.npy", and contain a [Nsim, 1+k*k+4] sized array, where Nsim is the number of simulated unit cells. Each row corresponds to a unit cell. The columns are organized as follows:
Note: the unit cell design uses the numbers \(\{0, 1, 2, 3\}\) to refer to each building block orientation. The building block orientations can be characterized through the orientation of the missing diagonal bar (see Fig. 2 in the paper), which can be Left Up (LU), Left Down (LD), Right Up (RU), or Right Down (RD). The numbers correspond to the building block orientation \(\{0, 1, 2, 3\} = \{\mathrm{LU, RU, RD, LD}\}\).
Simulation data for \(3 \leq k \leq 5\) and \(1 \leq n \leq 6\) for unit cells that cannot be classified as class C or I for \(1 \leq n \leq 4\) is stored in numpy array format (.npy) and can be readily loaded in Python with the Numpy package using the numpy.load command. These files are named "data_new_rrQR_i_n_M_kxk_fixn4_classX_extend.npy", and contain a [Nsim, 1+k*k+6] sized array, where Nsim is the number of simulated unit cells. Each row corresponds to a unit cell. The columns are organized as follows:
Simulation data for \(6 \leq k \leq 8\) unit cells are stored in numpy array format (.npy) and can be readily loaded in Python with the Numpy package using the numpy.load command. Note that the number of modes is now calculated for \(n_x \times n_y\) metamaterials, where we calculate \((n_x, n_y) = \{(1,1), (2, 2), (3, 2), (4,2), (2, 3), (2, 4)\}\) rather than \(n_x=n_y=n\) to save computation time. These files are named "data_new_rrQR_i_n_Mx_My_n4_kxk(_extended).npy", and contain a [Nsim, 1+k*k+8] sized array, where Nsim is the number of simulated unit cells. Each row corresponds to a unit cell. The columns are organized as follows:
Modescaling_classification_results.zip
This file contains the classification, slope, and offset of the scaling of the number of zero modes \(M_k(n)\) for the unit cells in Modescaling_raw_data.zip. The data is organized as follows.
The results for \(3 \leq k \leq 5\) based on the \(1 \leq n \leq 4\) mode scaling data is stored in "results_analysis_new_rrQR_i_Scen_slope_offset_M1k_kxk_fixn4.txt". The data can be loaded using ',' as delimiter. Every row corresponds to a unit cell design (see the label number to compare to the earlier data). The columns are organized as follows:
col 0: label number to keep track
col 1: the class, where 0 corresponds to class I, 1 to class C and 2 to class X (neither class I or C for \(1 \leq n \leq 4\))
col 2: slope from \(n \geq 2\) onward (undefined for class X)
col 3: the offset is defined as \(M_k(2) - 2 \cdot \mathrm{slope}\)
col 4: \(M_k(1)\)
The results for \(3 \leq k \leq 5\) based on the extended \(1 \leq n \leq 6\) mode scaling data is stored in "results_analysis_new_rrQR_i_Scen_slope_offset_M1k_kxk_fixn4_classC_extend.txt". The data can be loaded using ',' as delimiter. Every row corresponds to a unit cell design (see the label number to compare to the earlier data). The columns are organized as follows:
col 0: label number to keep track
col 1: the class, where 0 corresponds to class I, 1 to class C and 2 to class X (neither class I or C for \(1 \leq n \leq 6\))
col 2: slope from \(n \geq 2\) onward (undefined for class X)
col 3: the offset is defined as \(M_k(2) - 2 \cdot \mathrm{slope}\)
col 4: \(M_k(1)\)
The results for \(6 \leq k \leq 8\) based on the \(1 \leq n \leq 4\) mode scaling data is stored in "results_analysis_new_rrQR_i_Scenx_Sceny_slopex_slopey_offsetx_offsety_M1k_kxk(_extended).txt". The data can be loaded using ',' as delimiter. Every row corresponds to a unit cell design (see the label number to compare to the earlier data). The columns are organized as follows:
col 0: label number to keep track
col 1: the class_x based on \(M_k(n_x, 2)\), where 0 corresponds to class I, 1 to class C and 2 to class X (neither class I or C for \(1 \leq n_x \leq 4\))
col 2: the class_y based on \(M_k(2, n_y)\), where 0 corresponds to class I, 1 to class C and 2 to class X (neither class I or C for \(1 \leq n_y \leq 4\))
col 3: slope_x from \(n_x \geq 2\) onward (undefined for class X)
col 4: slope_y from \(n_y \geq 2\) onward (undefined for class X)
col 5: the offset_x is defined as \(M_k(2, 2) - 2 \cdot \mathrm{slope_x}\)
col 6: the offset_x is defined as \(M_k(2, 2) - 2 \cdot \mathrm{slope_y}\)
col 7: \(M_k(1, 1)\)
Random Walks Data
This file contains the random walks for \(3 \leq k \leq 8\) unit cells. The random walk starts from a class C unit cell design, for each step \(s\) a randomly picked unit cell is changed to a random new orientation for a total of \(s=k^2\) steps. The data is organized as follows.
The configurations for each step are stored in the files named "configlist_test_i.npy", where i is a number and corresponds to a different starting unit cell. The stored array has the shape [k*k+1, 2*k+2, 2*k+2]. The first dimension denotes the step \(s\), where \(s=0\) is the initial configuration. The second and third dimension denote the unit cell configuration in the pixel representation (see paper) padded with a single pixel wide layer using periodic boundary conditions.
The class for each configuration are stored in "lmlist_test_i.npy", where i corresponds to the same number as for the configurations in the "configlist_test_i.npy" file. The stored array has