Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Major differences from v1: For level 2 catch: Catches and number raised to nominal are only raised to exactly matching stratas or if not existing, to a strata corresponding with UNK/NEI or 99.9. (new feature in v4) When nominal strata lack specific dimensions (e.g., fishing_mode always UNK) but georeferenced strata include them, the nominal data are “upgraded” to match—preventing loss of detail. Currently this adjustment aligns nominal values to georeferenced totals; future versions may apply proportional scaling. This does not create a direct raising but rather allows more precise reallocation. (new feature in v4) IATTC Purse seine catch-and-effort are available in 3 separate files according to the group of species: tuna, billfishes, sharks. This is due to the fact that PS data is collected from 2 sources: observer and fishing vessel logbooks. Observer records are used when available, and for unobserved trips logbooks are used. Both sources collect tuna data but only observers collect shark and billfish data. As an example, a strata may have observer effort and the number of sets from the observed trips would be counted for tuna and shark and billfish. But there may have also been logbook data for unobserved sets in the same strata so the tuna catch and number of sets for a cell would be added. This would make a higher total number of sets for tuna catch than shark or billfish. Efforts in the billfish and shark datasets might hence represent only a proportion of the total effort allocated in some strata since it is the observed effort, i.e. for which there was an observer onboard. As a result, catch in the billfish and shark datasets might represent only a proportion of the total catch allocated in some strata. Hence, shark and billfish catch were raised to the fishing effort reported in the tuna dataset. (new feature in v4, was done in Firms Level 0 before) Data with resolution of 10degx10deg is removed, it is considered to disaggregate it in next versions. Catches in tons, raised to match nominal values, now consider the geographic area of the nominal data for improved accuracy. (as v3) Captures in "Number of fish" are converted to weight based on nominal data. The conversion factors used in the previous version are no longer used, as they did not adequately represent the diversity of captures. (as v3) Number of fish without corresponding data in nominal are not removed as they were before, creating a huge difference for this measurement_unit between the two datasets. (as v3) Strata for which catches in tons are raised to match nominal data have had their numbers removed. (as v3) Raising only applies to complete years to avoid overrepresenting specific months, particularly in the early years of georeferenced reporting. (as v3) Strata where georeferenced data exceed nominal data have not been adjusted downward, as it is unclear if these discrepancies arise from missing nominal data or different aggregation methods in both datasets. (as v3) The data is not aggregated to 5-degree squares and thus remains unharmonized spatially. Aggregation can be performed using CWP codes for geographic identifiers. For example, an R function is available: source("https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/master/sardara_functions/transform_cwp_code_from_1deg_to_5deg.R") (as v3) This results in a raising of the data compared to v3 for IOTC, ICCAT, IATTC and WCPFC. However as the raising is more specific for CCSBT, the raising is of 22% less than in the previous version. Level 0 dataset has been modified creating differences in this new version notably : The species retained are different; only 32 major species are kept. Mappings have been somewhat modified based on new standards implemented by FIRMS. New rules have been applied for overlapping areas. Data is only displayed in 1 degrees square area and 5 degrees square areas. The data is enriched with "Species group", "Gear labels" using the fdiwg standards. These main differences are recapped in the Differences_v2018_v2024.zip Recommendations: To avoid converting data from number using nominal stratas, we recommend the use of conversion factors which could be provided by tRFMOs. In some strata, nominal data appears higher than georeferenced data, as observed during level 2 processing. These discrepancies may result from errors or differences in aggregation methods. Further analysis will examine these differences in detail to refine treatments accordingly. A summary of differences by tRFMOs, based on the number of strata, is included in the appendix. For level 0 effort : In some datasets—namely those from ICCAT and the purse seine (PS) data from WCPFC— same effort data has been reported multiple times by using different units which have been kept as is, since no official mapping allows conversion between these units. As a result, users have be remind that some ICCAT and WCPFC effort data are deliberately duplicated : in the case of ICCAT data, lines wi
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Major differences from previous work: For level 2 catch: Catches in tons, raised to match nominal values, now consider the geographic area of the nominal data for improved accuracy. Captures in "Number of fish" are converted to weight based on nominal data. The conversion factors used in the previous version are no longer used, as they did not adequately represent the diversity of captures. Number of fish without corresponding data in nominal are not removed as they were before, creating a huge difference for this measurement_unit between the two datasets. Nominal data from WCPFC includes fishing fleet information, and georeferenced data has been raised based on this instead of solely on the triplet year/gear/species, to avoid random reallocations. Strata for which catches in tons are raised to match nominal data have had their numbers removed. Raising only applies to complete years to avoid overrepresenting specific months, particularly in the early years of georeferenced reporting. Strata where georeferenced data exceed nominal data have not been adjusted downward, as it is unclear if these discrepancies arise from missing nominal data or different aggregation methods in both datasets. The data is not aggregated to 5-degree squares and thus remains unharmonized spatially. Aggregation can be performed using CWP codes for geographic identifiers. For example, an R function is available: source("https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/master/sardara_functions/transform_cwp_code_from_1deg_to_5deg.R") Level 0 dataset has been modified creating differences in this new version notably : The species retained are different; only 32 major species are kept. Mappings have been somewhat modified based on new standards implemented by FIRMS. New rules have been applied for overlapping areas. Data is only displayed in 1 degrees square area and 5 degrees square areas. The data is enriched with "Species group", "Gear labels" using the fdiwg standards. These main differences are recapped in the Differences_v2018_v2024.zip Recommendations: To avoid converting data from number using nominal stratas, we recommend the use of conversion factors which could be provided by tRFMOs. In some strata, nominal data appears higher than georeferenced data, as observed during level 2 processing. These discrepancies may result from errors or differences in aggregation methods. Further analysis will examine these differences in detail to refine treatments accordingly. A summary of differences by tRFMOs, based on the number of strata, is included in the appendix. Some nominal data have no equivalent in georeferenced data and therefore cannot be disaggregated. What could be done is to check for each nominal data without equivalence if a georeferenced data exists in different buffers, and to average the distribution of this footprint. Then, disaggregate the nominal data based on the georeferenced data. This would lead to the creation of data (approximately 3%), and would necessitate reducing/removing all georeferenced data without a nominal equivalent or with a lesser equivalent. Tests are currently being conducted with and without this. It would help improve the biomass captured footprint but could lead to unexpected discrepancies with current datasets. For level 0 effort : In some datasets—namely those from ICCAT and the purse seine (PS) data from WCPFC— same effort data has been reported multiple times by using different units which have been kept as is, since no official mapping allows conversion between these units. As a result, users have be remind that some ICCAT and WCPFC effort data are deliberately duplicated : in the case of ICCAT data, lines with identical strata but different effort units are duplicates reporting the same fishing activity with different measurement units. It is indeed not possible to infer strict equivalence between units, as some contain information about others (e.g., Hours.FAD and Hours.FSC may inform Hours.STD). in the case of WCPFC data, effort records were also kept in all originally reported units. Here, duplicates do not necessarily share the same “fishing_mode”, as SETS for purse seiners are reported with an explicit association to fishing_mode, while DAYS are not. This distinction allows SETS records to be separated by fishing mode, whereas DAYS records remain aggregated. Some limited harmonization—particularly between units such as NET-days and Nets—has not been implemented in the current version of the dataset, but may be considered in future releases if a consistent relationship can be established.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Key information about Philippines Nominal GDP
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
### On-/Off-Axis Data Release
#### (Version 1.0.1, dated 2024/08/12)
This tar archive contains the data release for ‘First measurement of muon neutrino charged-current interactions on hydrocarbon without pions in the final state using multiple detectors with correlated energy spectra at T2K’. It contains the cross-section data points and supporting information in ROOT and text format, which are detailed below:
+ `onoffaxis_xsec_data.root`
This ROOT file contains the extracted cross section and the nominal MC prediction as TH1D histograms for both the flattened 1D array of bins and in the angle binning for the analysis. The ROOT file also contains both the covariance and inverted covariance matrix for the result stored as TH2D histograms. The angle bin numbering and the corresponding bin edges are detailed at the end of the README.
+ `flux_analysis.root`
This ROOT file contains the nominal and post-fit flux histograms for ND280 and INGRID. Two different binnings are included: a fine binned histogram (220 bins) and a coarse binned histogram (20 bins). The coarse binned histogram corresponds to the flux parameters detailed in the paper (and bin edges listed in the appendix).
+ `xsec_data_mc.csv`
The extracted cross-section data points and the nominal MC prediction for each bin is stored as a comma-separated value (CSV) file with header row.
+ `cov_matrix.csv` and `inv_matrix.csv`
The covariance matrix and the inverted covariance matrix are both stored as CSV files with each row stored as a single line and columns separated by commas (there is no header row). Matrix element (0,0) corresponds to the first number in the file.
+ `nd280_analysis_binning.csv` and `ingrid_analysis_binning.csv`
The analysis bin edges are included as CSV files. The columns are labeled with a header row and denote the linear bin index and the lower and upper bin edge for the angle and momentum bins. The units are in cos(angle) for the angle bins and in MeV/c for the momentum bins.
+ `calc_chisq.cxx`
This is an example ROOT script to calculate the chi-square between the data and the nominal MC prediction using the ROOT file in the data release. To run, open ROOT and load the script (`.L calc_chisq.cxx`) and execute the function `calc_chisq("/path/to/file.root")`.
+ `calc_chisq.py`
This is an example Python script to calculate the chi-square between the data and the nominal MC prediction using the text/CSV files in the data release. The code requires NumPy as an external dependency, but otherwise uses built-in modules. To run, execute using a Python3 interpreter and give the file paths to the data/MC text file and the inverse covariance text file as the first and second arguments respectively -- e.g. `python3 calc_chisq.py /path/to/xsec_data_mc.csv /path/to/inv_matrix.csv`
+ ND280 angle bin numbering
- 0: `-1.0 < cos(#theta) < 0.20`
- 1: `0.20 < cos(#theta) < 0.60`
- 2: `0.60 < cos(#theta) < 0.70`
- 3: `0.70 < cos(#theta) < 0.80`
- 4: `0.80 < cos(#theta) < 0.85`
- 5: `0.85 < cos(#theta) < 0.90`
- 6: `0.90 < cos(#theta) < 0.94`
- 7: `0.94 < cos(#theta) < 0.98`
- 8: `0.98 < cos(#theta) < 1.00`
+ INGRID angle bin numbering
- 0: `0.50 < cos(#theta) < 0.82`
- 1: `0.82 < cos(#theta) < 0.94`
- 2: `0.94 < cos(#theta) < 1.00`
### Changelog
#### v1.0.1
Fix transcription error in INGRID momentum binning. The lowest momentum bin edge is at 350 MeV/c, not 300 MeV/c.
MC simulation QCD jet nominal samples from the ATLAS experiment
Market basket analysis with Apriori algorithm
The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.
Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.
Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.
Number of Attributes: 7
https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">
First, we need to load required libraries. Shortly I describe all libraries.
https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">
Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.
https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png">
https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">
After we will clear our data frame, will remove missing values.
https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">
To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Major differences from previous work: For level 2: Catches in tons, raised to match nominal values, now consider the geographic area of the nominal data for improved accuracy. Captures in "Number of fish" are converted to weight based on nominal data. The conversion factors used in the previous version are no longer used, as they did not adequately represent the diversity of captures. Number of fish without corresponding data in nominal are not removed as they were before, creating a huge difference for this measurement_unit between the two datasets. Nominal data from WCPFC includes fishing fleet information, and georeferenced data has been raised based on this instead of solely on the triplet year/gear/species, to avoid random reallocations. Strata for which catches in tons are raised to match nominal data have had their numbers removed. Raising only applies to complete years to avoid overrepresenting specific months, particularly in the early years of georeferenced reporting. Strata where georeferenced data exceed nominal data have not been adjusted downward, as it is unclear if these discrepancies arise from missing nominal data or different aggregation methods in both datasets. The data is not aggregated to 5-degree squares and thus remains unharmonized spatially. Aggregation can be performed using CWP codes for geographic identifiers. For example, an R function is available: source("https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/master/sardara_functions/transform_cwp_code_from_1deg_to_5deg.R") Level 0 dataset has been modified creating differences in this new version notably : The species retained are different; only 32 major species are kept. Mappings have been somewhat modified based on new standards implemented by FIRMS. New rules have been applied for overlapping areas. Data is only displayed in 1 degrees square area and 5 degrees square areas. The data is enriched with "Species group", "Gear labels" using the fdiwg standards. These main differences are recapped in the Differences_v2018_v2024.zip Recommendations: To avoid converting data from number using nominal stratas, we recommend the use of conversion factors which could be provided by tRFMOs. In some strata, nominal data appears higher than georeferenced data, as observed during level 2 processing. These discrepancies may result from errors or differences in aggregation methods. Further analysis will examine these differences in detail to refine treatments accordingly. A summary of differences by tRFMOs, based on the number of strata, is included in the appendix. Some nominal data have no equivalent in georeferenced data and therefore cannot be disaggregated. What could be done is to check for each nominal data without equivalence if a georeferenced data exists in different buffers, and to average the distribution of this footprint. Then, disaggregate the nominal data based on the georeferenced data. This would lead to the creation of data (approximately 3%), and would necessitate reducing/removing all georeferenced data without a nominal equivalent or with a lesser equivalent. Tests are currently being conducted with and without this. It would help improve the biomass captured footprint but could lead to unexpected discrepancies with current datasets.
The Fundamental Data Record (FDR) for Atmospheric Composition UVN Level 1b v.1.0 dataset is a cross-instrument Level-1 product [ATMOS_L1B] generated in 2023 and resulting from the _\(ESA FDR4ATMOS project\) https://atmos.eoc.dlr.de/FDR4ATMOS/ . The FDR contains selected Earth Observation Level 1b parameters (irradiance/reflectance) from the nadir-looking measurements of the ERS-2 GOME and Envisat SCIAMACHY missions for the period ranging from 1995 to 2012. The data record offers harmonised cross-calibrated spectra, essential for subsequent trace gas retrieval. The focus lies on spectral windows in the Ultraviolet-Visible-Near Infrared regions the retrieval of critical atmospheric constituents like ozone (O3), sulphur dioxide (SO2), nitrogen dioxide (NO2) column densities, alongside cloud parameters in the NIR spectrum. For many aspects, the FDR product has improved compared to the existing individual mission datasets: • GOME solar irradiances are harmonised using a validated SCIAMACHY solar reference spectrum, solving the problem of the fast-changing etalon present in the original GOME Level 1b data; • Reflectances for both GOME and SCIAMACHY are provided in the FDR product. GOME reflectances are harmonised to degradation-corrected SCIAMACHY values, using collocated data from the CEOS PIC sites; • SCIAMACHY data are scaled to the lowest integration time within the spectral band using high-frequency PMD measurements from the same wavelength range. This simplifies the use of the SCIAMACHY spectra which were split in a complex cluster structure (with own integration time) in the original Level 1b data; • The harmonization process applied mitigates the viewing angle dependency observed in the UV spectral region for GOME data; • Uncertainties are provided.
Each FDR product covers three FDRs (irradiance/reflectance for UV-VIS-NIR) for a single day within the same product including information from the individual ERS-2 GOME and Envisat SCIAMACHY orbits therein.
FDR has been generated in two formats: Level 1A and Level 1B targeting expert users and nominal applications respectively. The Level 1A [ATMOS_L1A] data include additional parameters such as harmonisation factors, PMD, and polarisation data extracted from the original mission Level 1 products. The ATMOS_L1A dataset is not part of the nominal dissemination to users. In case of specific requirements, please contact _\(EOHelp\) http://esatellus.service-now.com/csp?id=esa_simple_request&sys_id=f27b38f9dbdffe40e3cedb11ce961958 .
The FDR4ATMOS products should be regarded as experimental due to the innovative approach and the current use of a limited-sized test dataset to investigate the impact of harmonization on the Level 2 target species, specifically SO2, O3 and NO2. Presently, this analysis is being carried out within follow-on activities.
One of the main aspects of the project was the characterization of Level 1 uncertainties for both instruments, based on metrological best practices. The following documents are provided:
The FDR V1 is currently being extended to include the MetOp GOME-2 series.
All the new products are conveniently formatted in NetCDF. Free standard tools, such as _\(Panoply\) https://www.giss.nasa.gov/tools/panoply/ , can be used to read NetCDF data.
Panoply is sourced and updated by external entities. For further details, please consult our _\(Terms and Conditions page\) https://earth.esa.int/eogateway/terms-and-conditions .
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
General information: This dataset is meant to serve as a benchmark problem for fault detection and isolation in dynamic systems. It contains preprocessed sensor data from the adaptive high-rise demonstrator building D1244, built in the scope of the CRC1244. Parts of the measurements have been artificially corrupted and labeled accordingly. Please note that although the measurements are stored in Matlab's .mat-format (Version 7.0), they can easily be processed using free software such as the SciPy library in Python. Structure of the dataset: train contains training data (only nominal) validation contains validation data (nominal and faulty). Faulty samples were obtained by manipulating a single signal in a random nominal sample from the validation data. test contains test data (nominal and faulty). Faulty samples were obtained by manipulating a single signal in a random nominal sample from the test data. meta contains textual labels for all signals as well as additional information on the considered fault classes File contents: Each file contains the following data from 1200 timesteps (60 seconds sampled at 20 Hz): t: time in seconds u: actuator forces (obtained from pressure measurements) in newtons y: relative elongations as well as bending curvatures of structural elements obtained from strain gauge measurements, and actuator displacements measured by position encoders label: categorical label of the present fault class, where 0 denotes the nominal class and faults in the different signals are encoded according to their index in the list of fault types in meta/labels.mat Faulty samples additionally include the corresponding nominal values for reference u_true: actuator forces without faults y_true: measured outputs without faults Textual labels for all in- and output signals as well as all faults are given in the struct labels. Each sample's textual fault label is additionally contained in its filename (between the first and second underscore).
https://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain
View economic output, reported as the nominal value of all new goods and services produced by labor and property located in the U.S.
General information: This dataset is meant to serve as a benchmark problem for fault detection and isolation in dynamic systems. It contains preprocessed sensor data from the adaptive high-rise demonstrator building D1244, built in the scope of the CRC1244. Parts of the measurements have been artificially corrupted and labeled accordingly. Please note that although the measurements are stored in Matlab's .mat-format (Version 7.0), they can easily be processed using free software such as the SciPy library in Python. Structure of the dataset: train contains training data (only nominal) validation contains validation data (nominal and faulty). Faulty samples were obtained by manipulating a single signal in a random nominal sample from the validation data. test contains test data (nominal and faulty). Faulty samples were obtained by manipulating a single signal in a random nominal sample from the test data. meta contains textual labels for all signals as well as additional information on the considered fault classes File contents: Each file contains the following data from 1200 timesteps (60 seconds sampled at 20 Hz): t: time in seconds u: actuator forces (obtained from pressure measurements) in newtons y: relative elongations as well as bending curvatures of structural elements obtained from strain gauge measurements, and actuator displacements measured by position encoders label: categorical label of the present fault class, where 0 denotes the nominal class and faults in the different signals are encoded according to their index in the list of fault types in meta/labels.mat Faulty samples additionally include the corresponding nominal values for reference u_true: actuator forces without faults y_true: measured outputs without faults Textual labels for all in- and output signals as well as all faults are given in the struct labels. Each sample's textual fault label is additionally contained in its filename (between the first and second underscore).
Full title: Using Decision Trees to Detect and Isolate Simulated Leaks in the J-2X Rocket Engine Mark Schwabacher, NASA Ames Research Center Robert Aguilar, Pratt & Whitney Rocketdyne Fernando Figueroa, NASA Stennis Space Center Abstract The goal of this work was to use data-driven methods to automatically detect and isolate faults in the J-2X rocket engine. It was decided to use decision trees, since they tend to be easier to interpret than other data-driven methods. The decision tree algorithm automatically “learns” a decision tree by performing a search through the space of possible decision trees to find one that fits the training data. The particular decision tree algorithm used is known as C4.5. Simulated J-2X data from a high-fidelity simulator developed at Pratt & Whitney Rocketdyne and known as the Detailed Real-Time Model (DRTM) was used to “train” and test the decision tree. Fifty-six DRTM simulations were performed for this purpose, with different leak sizes, different leak locations, and different times of leak onset. To make the simulations as realistic as possible, they included simulated sensor noise, and included a gradual degradation in both fuel and oxidizer turbine efficiency. A decision tree was trained using 11 of these simulations, and tested using the remaining 45 simulations. In the training phase, the C4.5 algorithm was provided with labeled examples of data from nominal operation and data including leaks in each leak location. From the data, it “learned” a decision tree that can classify unseen data as having no leak or having a leak in one of the five leak locations. In the test phase, the decision tree produced very low false alarm rates and low missed detection rates on the unseen data. It had very good fault isolation rates for three of the five simulated leak locations, but it tended to confuse the remaining two locations, perhaps because a large leak at one of these two locations can look very similar to a small leak at the other location. Introduction The J-2X rocket engine will be tested on Test Stand A-1 at NASA Stennis Space Center (SSC) in Mississippi. A team including people from SSC, NASA Ames Research Center (ARC), and Pratt & Whitney Rocketdyne (PWR) is developing a prototype end-to-end integrated systems health management (ISHM) system that will be used to monitor the test stand and the engine while the engine is on the test stand[1]. The prototype will use several different methods for detecting and diagnosing faults in the test stand and the engine, including rule-based, model-based, and data-driven approaches. SSC is currently using the G2 tool http://www.gensym.com to develop rule-based and model-based fault detection and diagnosis capabilities for the A-1 test stand. This paper describes preliminary results in applying the data-driven approach to detecting and diagnosing faults in the J-2X engine. The conventional approach to detecting and diagnosing faults in complex engineered systems such as rocket engines and test stands is to use large numbers of human experts. Test controllers watch the data in near-real time during each engine test. Engineers study the data after each test. These experts are aided by limit checks that signal when a particular variable goes outside of a predetermined range. The conventional approach is very labor intensive. Also, humans may not be able to recognize faults that involve the relationships among large numbers of variables. Further, some potential faults could happen too quickly for humans to detect them and react before they become catastrophic. Automated fault detection and diagnosis is therefore needed. One approach to automation is to encode human knowledge into rules or models. Another approach is use data-driven methods to automatically learn models from historical data or simulated data. Our prototype will combine the data-driven approach with the model-based and rule-based appro
A set of MATLAB functions (HSI_PSFS, SC_RS_Analysis_NAD.m, SC_RS_Analysis_sim.m) were developed to assess the spatial coverage of pushbroom hyperspectral imaging (HSI) data. HSI_PSFs derives the net point spread function of HSI data based on nominal data acquisition and sensor parameters (sensor speed, sensor heading, sensor altitude, number of cross track pixels, sensor field of view, integration time, frame time and pixel summing level). SC_RS_Analysis_sim calculates a theoretical spatial coverage map for HSI data based on nominal data acquisition and sensor parameters. The spatial coverage map is the sum of the point spread functions of all the pixels collected within an HSI dataset. Practically, the spatial coverage map quantifies how HSI data spatially samples spectral information across an imaged scene. A secondary theoretical spatial coverage map is also calculated for spatially resampled (nearest neighbour approach) HSI data. The function also calculates theoretical resampling errors such as pixel duplication (%), pixel loss (%) and pixel shifting (m). SC_RS_Analysis_NAD calculates an empirical spatial coverage map for collected HSI data (before and after spatial resampling) based on its nominal data acquisition and sensor parameters. The function also calculates empirical resampling errors. The current implementation of SC_RS_Analysis_NAD only works for ITRES (Calgary, Alberta, Canada) data products as it uses auxiliary information generated during the ITRES data processing workflow. This auxiliary information includes a ground look-up table that specifies the location (easting and northing) of each pixel of the HSI data in its raw sensor geometry. This auxiliary information also includes the pixel-to-pixel mapping between the HSI data in its raw sensor geometry and the spatially resampled HSI data. SC_RS_Analysis_NAD can readily be modified to work with HSI data collected by sensors from other manufacturers so long as the required auxiliary information can be extracted during data processing.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains summary statistics for eQTL (Expression Quantitative Trait Loci) analyses for 120 human fetal brains from the second trimester of gestation (12 to 19post-conception weeks). Expression matrices, covariates, and summary statistics are provided for all tested eQTL and for top eQTL for all genes.The data are contained within a single .zip archive file. Individual data files are of openly accessible .txt text file format containing p- or q- values by SNP, and .bed Browser Extensible Data format files, containing annotation track data such as chromosomal coordinates. Data files of multiple GB in size are stored in individual .gz gzip compressed files.The related study investigates genetic influences on gene expression in the human fetal brain and their relationship with a variety of postnatal brain-related traits, including susceptibility to neuropsychiatric disorders. This dataset represents the first eQTL dataset derived exclusively from the human fetal brain, and is based on initial deep RNA sequencing and genotyping.The detailed breakdown of the files in this dataset is provided below and in README.md.Gene Level Analyses:
·
normalised, variance-stabilising transformed count
data (29,875 genes)
·
columns: chr, gene_start, gene_end, gene_id,
samples...
all_eqtls_gene.txt.gz· nominal p-values for all SNPs within 1 MB of each gene· columns: gene_id, variant_id, tss_distance, ma_samples, ma_count, maf, pval_nominal, slope, slope_se
top_eqtls_gene.txt.gz· q-values for most significant eQTL for each gene (includes nominal p-value thresholds that can be used to filter significant SNPs)· columns: chr, snp_start, snp_end, gene_id, num_var, beta_shape1, beta_shape2, true_df, pval_true_df, variant_id, tss_distance, minor_allele_samples, minor_allele_count, maf, ref_factor, pval_nominal, slope, slope_se, pval_perm, pval_beta, qval, pval_nominal_threshold
Transcript Level Analyses: - expression_transcript.bed.gz · normalised, variance-stabilising transformed count data (144,448 transcripts)· columns: chr, transcript_start, transcript_end, transcript_id, samples... - all_eqtls_transcript.txt.gz· nominal p-values for all SNPs within 1 MB of each transcript· columns: transcript_id, variant_id, tss_distance, ma_samples, ma_count, maf, pval_nominal, slope, slope_se - top_eqtls_transcript.txt.gz· q-values for most significant eQTL for each transcript (includes nominal p-value thresholds that can be used to filter significant SNPs)· columns: columns: chr, snp_start, snp_end, transcript_id, num_var, beta_shape1, beta_shape2, true_df, pval_true_df, variant_id, tss_distance, minor_allele_samples, minor_allele_count, maf, ref_factor, pval_nominal, slope, slope_se, pval_perm, pval_beta, qval, pval_nominal_thresholdCovariates (Used For Both Gene Level and Transcript-Level Analyses) - covariates.txt· columns: Sample, Sex, PCW, RIN, ReadLength, PC1, PC2, PC3, PEER1, PEER2, PEER3, PEER4, PEER5, PEER6, PEER7, PEER8, PEER9, PEER10
General information: This dataset is meant to serve as a benchmark problem for fault detection and isolation in dynamical systems. It contains pre-processed sensor data from the adaptive high-rise demonstrator building D1244, built in the scope of the CRC1244. Parts of the measurements have been artificially corrupted and labeled accordingly. Please note that although the measurements are stored in Matlab's .mat-format (Version 7.0), they can easily be processed using free software such as the SciPy library in Python. Structure of the dataset: train contains the training data (only nominal) test_easy contains test data (nominal and faulty with high fault amplitude). Faulty samples were obtained by manipulating a single signal in a random nominal sample from the test data. test_hard contains test data (nominal and faulty with low fault amplitude) meta contains textual labels for all signals and fault types File contents: Each file contains the following data from 16384 timesteps: t: time in seconds u: demanded actuator forces in newtons y: measured outputs (relative elongations measured by strain gauges and actuator displacements in meters measured by position encoders) label: categorical label of the present fault class, where 0 denotes the nominal class and faults in the different signals are encoded according to their index in the list of fault types meta/labels.txt Faulty samples additionally include the corresponding nominal values for reference u_true: delivered actuator forces y_true: measured outputs without faults A sample's textual fault label is also contained in its filename (between the first and second underscore).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data and probability for an incomplete 2×2 table.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Note: SurveyUSA weights are based on data from Simons & Chabris (2011), re-normed to 2010 Census data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Extreme events are defined as events that largely deviate from the nominal state of the system as observed in a time series. Due to the rarity and uncertainty of their occurrence, predicting extreme events has been challenging. In real life, some variables (passive variables) often encode significant information about the occurrence of extreme events manifested in another variable (active variable). For example, observables such as temperature, pressure, etc., act as passive variables in case of extreme precipitation events. These passive variables do not show any large excursion from the nominal condition yet carry the fingerprint of the extreme events. In this study, we propose a reservoir computation-based framework that can predict the preceding structure or pattern in the time evolution of the active variable that leads to an extreme event using information from the passive variable. An appropriate threshold height of events is a prerequisite for detecting extreme events and improving the skill of their prediction. We demonstrate that the magnitude of extreme events and the appearance of a coherent pattern before the arrival of the extreme event in a time series affect the prediction skill. Quantitatively, we confirm this using a metric describing the mean phase difference between the input time signals, which decreases when the magnitude of the extreme event is relatively higher, thereby increasing the predictability skill.
The dataset contains both the robot's high-level tool center position (TCP) health data and controller-level components' information (i.e., joint positions, velocities, currents, temperatures, currents). The datasets can be used by users (e.g., software developers, data scientists) who work on robot health management (including accuracy) but have limited or no access to robots that can capture real data. The datasets can support the: Development of robot health monitoring algorithms and tools Research of technologies and tools to support robot monitoring, diagnostics, prognostics, and health management (collectively called PHM) Validation and verification of the industrial PHM implementation. For example, the verification of a robot's TCP accuracy after the work cell has been reconfigured, or whenever a manufacturer wants to determine if the robot arm has experienced a degradation. For data collection, a trajectory is programmed for the Universal Robot (UR5) approaching and stopping at randomly-selected locations in its workspace. The robot moves along this preprogrammed trajectory during different conditions of temperature, payload, and speed. The TCP (x,y,z) of the robot are measured by a 7-D measurement system developed at NIST. Differences are calculated between the measured positions from the 7-D measurement system and the nominal positions calculated by the nominal robot kinematic parameters. The results are recorded within the dataset. Controller level sensing data are also collected from each joint (direct output from the controller of the UR5), to understand the influences of position degradation from temperature, payload, and speed. Controller-level data can be used for the root cause analysis of the robot performance degradation, by providing joint positions, velocities, currents, accelerations, torques, and temperatures. For example, the cold-start temperatures of the six joints were approximately 25 degrees Celsius. After two hours of operation, the joint temperatures increased to approximately 35 degrees Celsius. Control variables are listed in the header file in the data set (UR5TestResult_header.xlsx). If you'd like to comment on this data and/or offer recommendations on future datasets, please email guixiu.qiao@nist.gov.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This deposit contains various datasets describing tuna fisheries activities (currently catches and efforts) and different levels of processing on 1° or 5° spatial grids with a monthly temporal resolution. Lower levels of processing have been officially endorsed by FIRMS and are also published on Zenodo : see FIRMS Global Tuna Atlas datasets. Currently, FIRMS datasets only deal with catches and Level 0 data (a global dataset which remains as close as possible from datasets published on tuna RFMOs Website), including a lower spatio-temporal resolution dataset which gives the best estimates of total catches (nominal catches, per year and per ocean).
Data structure
All Global Tuna Atlas datasets comply with a common data format in line with CWP Reference Harmonization standard (https://www.fao.org/3/cc6734en/cc6734en.pdf) which is described in a json file (https://github.com/fdiwg/fdi-formats/blob/main/cwp_rh_generic_gta_taskI.json).
Global Catch dataset (IRD level 2)
IRD Level 2 denotes the series of processing steps applied by the French National Research Institute for Sustainable Development (IRD) to generate this dataset from the primary RFMO catch-and-effort data. Although some steps mirror those used in the FIRMS Level 0 product (DOI: https://doi.org/10.5281/zenodo.5745958), the entire workflow was rerun to integrate early adjustments to IATTC shark and billfish data prior to final aggregation.
This dataset compiles monthly global catch data for tuna, tuna-like species and sharks from 1950 through 2023. Catches are stratified according to the latest CWP standards update :
- month
- species
- gear_type (reporting fishing_gear)
- fishing_fleet (reporting country)
- fishing_mode (type of school used)
- geographic_identifier (1° or 5° grid cell)
- measurement_unit i.e. unit of catch (weight or number)
- measurement (catch)
- measurement_type (landings or retained catches)
- measurement_processing_level (original samples or processed data)
- a `label` column has been added for each field (e.g. `fishing_mode`, `species`, `gear_type`, etc.) to provide clear descriptive metadata
Warning: This dataset is designed to enhance the understanding of fish counts at level 0, and the amount of georeferenced data. It is not suitable for accurately georeferencing data by country or fishing fleet and should not be used for studies on fishing zone legality or quota management. While it offers a georeferenced footprint of captures to reflect reported biomass more closely, significant uncertainty remains regarding the precise locations of the catches.
Global level 2 processing includes the conversion and raising of georeferenced catch data to match nominal dataset values.
To reproduce the data and the workflow we provide a .zip with all the initial data used as well as labeling and the mapping to nominal geometries (see all_rawdata.zip)
Global Effort dataset (IRD Level 0)
We compiled a comprehensive dataset of geo-referenced fishing effort observations from global tuna fisheries, covering the period from 1950 to 2023. These data are collected from the public domain datasets released by the five tuna Regional Fisheries Management Organizations (t-RFMOs): CCSBT, IATTC, ICCAT, IOTC, and WCPFC. As with the catch dataset, the effort data were processed by using the same data generation workflow as the one used for FIRMS-GTA with a different parametrization complying with the standardized data structure promoted by the Coordinating Working Party (CWP) standards for (tuna) fisheries statistics.
Contrariwise to catches, effort values are reported using a significant number of measurement units (23). Only a few mapping between similar tRFMOs units has been managed based on fdiwg codelists (see GitHub repository: https://github.com/fdiwg/fdi-mappings). Each remaining unit reflects different operational aspects depending on the fishing gear, fleet behavior, and the reporting RFMO. The Level 0 global dataset includes all reported units without conversion or aggregation, to preserve the original semantic richness and reflect the heterogeneity in reporting practices.
This IRD Level 0 global effort dataset thus, preserves all original effort records from t-RFMOs and complies with a unified data structure while maintaining the granularity and diversity of reporting. This level of processing is not a standardized or simplified effort dataset. No upper level of processing is currently made available by IRD. Any further aggregation or transformation of effort data should be conducted by the end-user, based on specific scientific goals and with careful consideration of the semantics behind each unit.
Both datasets are enriched with "gear_type_label", "fishing_fleet label", for catch, "species_group" using the FDIWG standards and for efforts "measurement_unit_labels".
Appendix work:
If you are interested in creating a customized version of this Global Tuna Atlas with specific filters or adjustments based on particular issues, please feel free to reach out to us.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Major differences from v1: For level 2 catch: Catches and number raised to nominal are only raised to exactly matching stratas or if not existing, to a strata corresponding with UNK/NEI or 99.9. (new feature in v4) When nominal strata lack specific dimensions (e.g., fishing_mode always UNK) but georeferenced strata include them, the nominal data are “upgraded” to match—preventing loss of detail. Currently this adjustment aligns nominal values to georeferenced totals; future versions may apply proportional scaling. This does not create a direct raising but rather allows more precise reallocation. (new feature in v4) IATTC Purse seine catch-and-effort are available in 3 separate files according to the group of species: tuna, billfishes, sharks. This is due to the fact that PS data is collected from 2 sources: observer and fishing vessel logbooks. Observer records are used when available, and for unobserved trips logbooks are used. Both sources collect tuna data but only observers collect shark and billfish data. As an example, a strata may have observer effort and the number of sets from the observed trips would be counted for tuna and shark and billfish. But there may have also been logbook data for unobserved sets in the same strata so the tuna catch and number of sets for a cell would be added. This would make a higher total number of sets for tuna catch than shark or billfish. Efforts in the billfish and shark datasets might hence represent only a proportion of the total effort allocated in some strata since it is the observed effort, i.e. for which there was an observer onboard. As a result, catch in the billfish and shark datasets might represent only a proportion of the total catch allocated in some strata. Hence, shark and billfish catch were raised to the fishing effort reported in the tuna dataset. (new feature in v4, was done in Firms Level 0 before) Data with resolution of 10degx10deg is removed, it is considered to disaggregate it in next versions. Catches in tons, raised to match nominal values, now consider the geographic area of the nominal data for improved accuracy. (as v3) Captures in "Number of fish" are converted to weight based on nominal data. The conversion factors used in the previous version are no longer used, as they did not adequately represent the diversity of captures. (as v3) Number of fish without corresponding data in nominal are not removed as they were before, creating a huge difference for this measurement_unit between the two datasets. (as v3) Strata for which catches in tons are raised to match nominal data have had their numbers removed. (as v3) Raising only applies to complete years to avoid overrepresenting specific months, particularly in the early years of georeferenced reporting. (as v3) Strata where georeferenced data exceed nominal data have not been adjusted downward, as it is unclear if these discrepancies arise from missing nominal data or different aggregation methods in both datasets. (as v3) The data is not aggregated to 5-degree squares and thus remains unharmonized spatially. Aggregation can be performed using CWP codes for geographic identifiers. For example, an R function is available: source("https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/master/sardara_functions/transform_cwp_code_from_1deg_to_5deg.R") (as v3) This results in a raising of the data compared to v3 for IOTC, ICCAT, IATTC and WCPFC. However as the raising is more specific for CCSBT, the raising is of 22% less than in the previous version. Level 0 dataset has been modified creating differences in this new version notably : The species retained are different; only 32 major species are kept. Mappings have been somewhat modified based on new standards implemented by FIRMS. New rules have been applied for overlapping areas. Data is only displayed in 1 degrees square area and 5 degrees square areas. The data is enriched with "Species group", "Gear labels" using the fdiwg standards. These main differences are recapped in the Differences_v2018_v2024.zip Recommendations: To avoid converting data from number using nominal stratas, we recommend the use of conversion factors which could be provided by tRFMOs. In some strata, nominal data appears higher than georeferenced data, as observed during level 2 processing. These discrepancies may result from errors or differences in aggregation methods. Further analysis will examine these differences in detail to refine treatments accordingly. A summary of differences by tRFMOs, based on the number of strata, is included in the appendix. For level 0 effort : In some datasets—namely those from ICCAT and the purse seine (PS) data from WCPFC— same effort data has been reported multiple times by using different units which have been kept as is, since no official mapping allows conversion between these units. As a result, users have be remind that some ICCAT and WCPFC effort data are deliberately duplicated : in the case of ICCAT data, lines wi