Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Major differences from previous work: For level 2 catch: Catches in tons, raised to match nominal values, now consider the geographic area of the nominal data for improved accuracy. Captures in "Number of fish" are converted to weight based on nominal data. The conversion factors used in the previous version are no longer used, as they did not adequately represent the diversity of captures. Number of fish without corresponding data in nominal are not removed as they were before, creating a huge difference for this measurement_unit between the two datasets. Nominal data from WCPFC includes fishing fleet information, and georeferenced data has been raised based on this instead of solely on the triplet year/gear/species, to avoid random reallocations. Strata for which catches in tons are raised to match nominal data have had their numbers removed. Raising only applies to complete years to avoid overrepresenting specific months, particularly in the early years of georeferenced reporting. Strata where georeferenced data exceed nominal data have not been adjusted downward, as it is unclear if these discrepancies arise from missing nominal data or different aggregation methods in both datasets. The data is not aggregated to 5-degree squares and thus remains unharmonized spatially. Aggregation can be performed using CWP codes for geographic identifiers. For example, an R function is available: source("https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/master/sardara_functions/transform_cwp_code_from_1deg_to_5deg.R") Level 0 dataset has been modified creating differences in this new version notably : The species retained are different; only 32 major species are kept. Mappings have been somewhat modified based on new standards implemented by FIRMS. New rules have been applied for overlapping areas. Data is only displayed in 1 degrees square area and 5 degrees square areas. The data is enriched with "Species group", "Gear labels" using the fdiwg standards. These main differences are recapped in the Differences_v2018_v2024.zip Recommendations: To avoid converting data from number using nominal stratas, we recommend the use of conversion factors which could be provided by tRFMOs. In some strata, nominal data appears higher than georeferenced data, as observed during level 2 processing. These discrepancies may result from errors or differences in aggregation methods. Further analysis will examine these differences in detail to refine treatments accordingly. A summary of differences by tRFMOs, based on the number of strata, is included in the appendix. Some nominal data have no equivalent in georeferenced data and therefore cannot be disaggregated. What could be done is to check for each nominal data without equivalence if a georeferenced data exists in different buffers, and to average the distribution of this footprint. Then, disaggregate the nominal data based on the georeferenced data. This would lead to the creation of data (approximately 3%), and would necessitate reducing/removing all georeferenced data without a nominal equivalent or with a lesser equivalent. Tests are currently being conducted with and without this. It would help improve the biomass captured footprint but could lead to unexpected discrepancies with current datasets. For level 0 effort : In some datasets—namely those from ICCAT and the purse seine (PS) data from WCPFC— same effort data has been reported multiple times by using different units which have been kept as is, since no official mapping allows conversion between these units. As a result, users have be remind that some ICCAT and WCPFC effort data are deliberately duplicated : in the case of ICCAT data, lines with identical strata but different effort units are duplicates reporting the same fishing activity with different measurement units. It is indeed not possible to infer strict equivalence between units, as some contain information about others (e.g., Hours.FAD and Hours.FSC may inform Hours.STD). in the case of WCPFC data, effort records were also kept in all originally reported units. Here, duplicates do not necessarily share the same “fishing_mode”, as SETS for purse seiners are reported with an explicit association to fishing_mode, while DAYS are not. This distinction allows SETS records to be separated by fishing mode, whereas DAYS records remain aggregated. Some limited harmonization—particularly between units such as NET-days and Nets—has not been implemented in the current version of the dataset, but may be considered in future releases if a consistent relationship can be established.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the supplemental data set for "Instantaneous habitable windows in the parameter space of Enceladus' ocean".nominal_salts_case.xlsx contains the output from the chemical speciation model described in the main text for the nominal salt case, with [Cl] = 0.1m and [DIC] = 0.03m. DIC is the sum of the molalities of CO2(aq), HCO3- (aq) and CO32-. The speciation was performed in intervals of 10 K and 0.5 pH units, between pH 7-12 and 273-473 K. high_salts_case.xlsx contains the output from the chemical speciation model described in the main text for the high salt case, with [Cl] = 0.2m and [DIC] = 0.1m. DIC is the sum of the molalities of CO2(aq), HCO3- (aq) and CO32-. The speciation was performed in intervals of 10 K and 0.5 pH units, between pH 7-12 and 273-473 K.low_salts_case.xlsx contains the output from the chemical speciation model described in the main text for the low salt case, with [Cl] = 0.05m and [DIC] = 0.01m. DIC is the sum of the molalities of CO2(aq), HCO3- (aq) and CO32-. The speciation was performed in intervals of 10 K and 0.5 pH units, between pH 7-12 and 273-473 K.CO2_activity_uncertainty.xlsx collects the activity of CO2 from the three files above into a single sheet. This is plotted in supplemental figure S2.independent_samples.zip contains a further 20 figures which show the variance caused by solely each of [CH4], [H2], n_ATP and k at a fixed temperature or pH as indicated by the file name. These show the deviation from the nominal log10(Power supply) e.g. Figure 3 in the main text if the named parameter were allowed to vary within its uncertainty defined in Table 1 in the main text.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Major differences from v1: For level 2 catch: Catches and number raised to nominal are only raised to exactly matching stratas or if not existing, to a strata corresponding with UNK/NEI or 99.9. (new feature in v4) When nominal strata lack specific dimensions (e.g., fishing_mode always UNK) but georeferenced strata include them, the nominal data are “upgraded” to match—preventing loss of detail. Currently this adjustment aligns nominal values to georeferenced totals; future versions may apply proportional scaling. This does not create a direct raising but rather allows more precise reallocation. (new feature in v4) IATTC Purse seine catch-and-effort are available in 3 separate files according to the group of species: tuna, billfishes, sharks. This is due to the fact that PS data is collected from 2 sources: observer and fishing vessel logbooks. Observer records are used when available, and for unobserved trips logbooks are used. Both sources collect tuna data but only observers collect shark and billfish data. As an example, a strata may have observer effort and the number of sets from the observed trips would be counted for tuna and shark and billfish. But there may have also been logbook data for unobserved sets in the same strata so the tuna catch and number of sets for a cell would be added. This would make a higher total number of sets for tuna catch than shark or billfish. Efforts in the billfish and shark datasets might hence represent only a proportion of the total effort allocated in some strata since it is the observed effort, i.e. for which there was an observer onboard. As a result, catch in the billfish and shark datasets might represent only a proportion of the total catch allocated in some strata. Hence, shark and billfish catch were raised to the fishing effort reported in the tuna dataset. (new feature in v4, was done in Firms Level 0 before) Data with resolution of 10degx10deg is removed, it is considered to disaggregate it in next versions. Catches in tons, raised to match nominal values, now consider the geographic area of the nominal data for improved accuracy. (as v3) Captures in "Number of fish" are converted to weight based on nominal data. The conversion factors used in the previous version are no longer used, as they did not adequately represent the diversity of captures. (as v3) Number of fish without corresponding data in nominal are not removed as they were before, creating a huge difference for this measurement_unit between the two datasets. (as v3) Strata for which catches in tons are raised to match nominal data have had their numbers removed. (as v3) Raising only applies to complete years to avoid overrepresenting specific months, particularly in the early years of georeferenced reporting. (as v3) Strata where georeferenced data exceed nominal data have not been adjusted downward, as it is unclear if these discrepancies arise from missing nominal data or different aggregation methods in both datasets. (as v3) The data is not aggregated to 5-degree squares and thus remains unharmonized spatially. Aggregation can be performed using CWP codes for geographic identifiers. For example, an R function is available: source("https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/master/sardara_functions/transform_cwp_code_from_1deg_to_5deg.R") (as v3) This results in a raising of the data compared to v3 for IOTC, ICCAT, IATTC and WCPFC. However as the raising is more specific for CCSBT, the raising is of 22% less than in the previous version. Level 0 dataset has been modified creating differences in this new version notably : The species retained are different; only 32 major species are kept. Mappings have been somewhat modified based on new standards implemented by FIRMS. New rules have been applied for overlapping areas. Data is only displayed in 1 degrees square area and 5 degrees square areas. The data is enriched with "Species group", "Gear labels" using the fdiwg standards. These main differences are recapped in the Differences_v2018_v2024.zip Recommendations: To avoid converting data from number using nominal stratas, we recommend the use of conversion factors which could be provided by tRFMOs. In some strata, nominal data appears higher than georeferenced data, as observed during level 2 processing. These discrepancies may result from errors or differences in aggregation methods. Further analysis will examine these differences in detail to refine treatments accordingly. A summary of differences by tRFMOs, based on the number of strata, is included in the appendix. For level 0 effort : In some datasets—namely those from ICCAT and the purse seine (PS) data from WCPFC— same effort data has been reported multiple times by using different units which have been kept as is, since no official mapping allows conversion between these units. As a result, users have be remind that some ICCAT and WCPFC effort data are deliberately duplicated : in the case of ICCAT data, lines wi
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
### On-/Off-Axis Data Release (Version 1.0, dated 2023/03/24)
This tar archive contains the data release for ‘First measurement of muon neutrino charged-current interactions on hydrocarbon without pions in the final state using multiple detectors with correlated energy spectra at T2K’. It contains the cross-section data points and supporting information in ROOT and text format, which are detailed below:
+ `onoffaxis_xsec_data.root`
This ROOT file contains the extracted cross section and the nominal MC prediction as TH1D histograms for both the flattened 1D array of bins and in the angle binning for the analysis. The ROOT file also contains both the covariance and inverted covariance matrix for the result stored as TH2D histograms. The angle bin numbering and the corresponding bin edges are detailed at the end of the README.
+ `flux_analysis.root`
This ROOT file contains the nominal and post-fit flux histograms for ND280 and INGRID. Two different binnings are included: a fine binned histogram (220 bins) and a coarse binned histogram (20 bins). The coarse binned histogram corresponds to the flux parameters detailed in the paper (and bin edges listed in the appendix).
+ `xsec_data_mc.csv`
The extracted cross-section data points and the nominal MC prediction for each bin is stored as a comma-separated value (CSV) file with header row.
+ `cov_matrix.csv` and `inv_matrix.csv`
The covariance matrix and the inverted covariance matrix are both stored as CSV files with each row stored as a single line and columns separated by commas (there is no header row). Matrix element (0,0) corresponds to the first number in the file.
+ `nd280_analysis_binning.csv` and `ingrid_analysis_binning.csv`
The analysis bin edges are included as CSV files. The columns are labeled with a header row and denote the linear bin index and the lower and upper bin edge for the angle and momentum bins. The units are in cos(angle) for the angle bins and in MeV/c for the momentum bins.
+ `calc_chisq.cxx`
This is an example ROOT script to calculate the chi-square between the data and the nominal MC prediction using the ROOT file in the data release. To run, open ROOT and load the script (`.L calc_chisq.cxx`) and execute the function `calc_chisq("/path/to/file.root")`.
+ `calc_chisq.py`
This is an example Python script to calculate the chi-square between the data and the nominal MC prediction using the text/CSV files in the data release. The code requires NumPy as an external dependency, but otherwise uses built-in modules. To run, execute using a Python3 interpreter and give the file paths to the data/MC text file and the inverse covariance text file as the first and second arguments respectively -- e.g. `python3 calc_chisq.py /path/to/xsec_data_mc.csv /path/to/inv_matrix.csv`
+ ND280 angle bin numbering
- 0: `-1.0 < cos(#theta) < 0.20`
- 1: `0.20 < cos(#theta) < 0.60`
- 2: `0.60 < cos(#theta) < 0.70`
- 3: `0.70 < cos(#theta) < 0.80`
- 4: `0.80 < cos(#theta) < 0.85`
- 5: `0.85 < cos(#theta) < 0.90`
- 6: `0.90 < cos(#theta) < 0.94`
- 7: `0.94 < cos(#theta) < 0.98`
- 8: `0.98 < cos(#theta) < 1.00`
+ INGRID angle bin numbering
- 0: `0.50 < cos(#theta) < 0.82`
- 1: `0.82 < cos(#theta) < 0.94`
- 2: `0.94 < cos(#theta) < 1.00`
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Key information about Philippines Nominal GDP
Facebook
TwitterMarket basket analysis with Apriori algorithm
The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.
Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.
Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.
Number of Attributes: 7
https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">
First, we need to load required libraries. Shortly I describe all libraries.
https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">
Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.
https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png">
https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">
After we will clear our data frame, will remove missing values.
https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">
To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
🌍 Global GDP by Country — 2024 Edition
The Global GDP by Country (2024) dataset provides an up-to-date snapshot of worldwide economic performance, summarizing each country’s nominal GDP, growth rate, population, and global economic contribution.
This dataset is ideal for economic analysis, data visualization, policy modeling, and machine learning applications related to global development and financial forecasting.
🎯 Target Use-Cases:
- Economic growth trend analysis
- GDP-based country clustering
- Per capita wealth comparison
- Share of world economy visualization
| Feature Name | Description |
|---|---|
| Country | Official country name |
| GDP (nominal, 2023) | Total nominal GDP in USD |
| GDP (abbrev.) | Simplified GDP format (e.g., “$25.46 Trillion”) |
| GDP Growth | Annual GDP growth rate (%) |
| Population 2023 | Estimated population for 2023 |
| GDP per capita | Average income per person (USD) |
| Share of World GDP | Percentage contribution to global GDP |
💰 Top Economies (Nominal GDP):
United States, China, Japan, Germany, India
📈 Fastest Growing Economies:
India, Bangladesh, Vietnam, and Rwanda
🌐 Global Insights:
- The dataset covers 181 countries representing 100% of global GDP.
- Suitable for data visualization dashboards, AI-driven economic forecasting, and educational research.
Source: Worldometers — GDP by Country (2024)
Dataset compiled and cleaned by: Asadullah Shehbaz
For open research and data analysis.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A set of MATLAB functions (HSI_PSFS, SC_RS_Analysis_NAD.m, SC_RS_Analysis_sim.m) were developed to assess the spatial coverage of pushbroom hyperspectral imaging (HSI) data. HSI_PSFs derives the net point spread function of HSI data based on nominal data acquisition and sensor parameters (sensor speed, sensor heading, sensor altitude, number of cross track pixels, sensor field of view, integration time, frame time and pixel summing level). SC_RS_Analysis_sim calculates a theoretical spatial coverage map for HSI data based on nominal data acquisition and sensor parameters. The spatial coverage map is the sum of the point spread functions of all the pixels collected within an HSI dataset. Practically, the spatial coverage map quantifies how HSI data spatially samples spectral information across an imaged scene. A secondary theoretical spatial coverage map is also calculated for spatially resampled (nearest neighbour approach) HSI data. The function also calculates theoretical resampling errors such as pixel duplication (%), pixel loss (%) and pixel shifting (m). SC_RS_Analysis_NAD calculates an empirical spatial coverage map for collected HSI data (before and after spatial resampling) based on its nominal data acquisition and sensor parameters. The function also calculates empirical resampling errors. The current implementation of SC_RS_Analysis_NAD only works for ITRES (Calgary, Alberta, Canada) data products as it uses auxiliary information generated during the ITRES data processing workflow. This auxiliary information includes a ground look-up table that specifies the location (easting and northing) of each pixel of the HSI data in its raw sensor geometry. This auxiliary information also includes the pixel-to-pixel mapping between the HSI data in its raw sensor geometry and the spatially resampled HSI data. SC_RS_Analysis_NAD can readily be modified to work with HSI data collected by sensors from other manufacturers so long as the required auxiliary information can be extracted during data processing.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
General information: This dataset is meant to serve as a benchmark problem for fault detection and isolation in dynamic systems. It contains preprocessed sensor data from the adaptive high-rise demonstrator building D1244, built in the scope of the CRC1244. Parts of the measurements have been artificially corrupted and labeled accordingly. Please note that although the measurements are stored in Matlab's .mat-format (Version 7.0), they can easily be processed using free software such as the SciPy library in Python. Structure of the dataset: train contains training data (only nominal) validation contains validation data (nominal and faulty). Faulty samples were obtained by manipulating a single signal in a random nominal sample from the validation data. test contains test data (nominal and faulty). Faulty samples were obtained by manipulating a single signal in a random nominal sample from the test data. meta contains textual labels for all signals as well as additional information on the considered fault classes File contents: Each file contains the following data from 1200 timesteps (60 seconds sampled at 20 Hz): t: time in seconds u: actuator forces (obtained from pressure measurements) in newtons y: relative elongations as well as bending curvatures of structural elements obtained from strain gauge measurements, and actuator displacements measured by position encoders label: categorical label of the present fault class, where 0 denotes the nominal class and faults in the different signals are encoded according to their index in the list of fault types in meta/labels.mat Faulty samples additionally include the corresponding nominal values for reference u_true: actuator forces without faults y_true: measured outputs without faults Textual labels for all in- and output signals as well as all faults are given in the struct labels. Each sample's textual fault label is additionally contained in its filename (between the first and second underscore).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data and probability for an incomplete 2×2 table.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cross sectional data, all countries for the statistic Nominal_Exchange_Rate_3_Year_Change_In_Percent. Indicator Definition:Nominal Exchange Rate 1 Year Change In Percent. The Exchange Rate is defined according to the Quantity Notation, that is, foreign currency (here always the USD) per domestic currency (for example the euro for Germany). Hence, a higher value means, that the domestic currency appreciated as more foreign currency units can be purchased for one unit of domestic currency.Indicator Unit:The statistic is measured in Percent.Descriptive Statistics regarding the Indicator "Nominal Exchange Rate 3 Year Change In Percent":The number of countries with data stands at: 153 countries.The average value across those countries stands at: -4.50.The standard deviation across those countries stands at: 25.54.The lowest value stands at: -98.32, and was observed in Lebanon (LBP), which in this case constitutes the country that ranks last.The highest value stands at: 36.51, and was observed in Albania (ALL), which in this case constitutes the country that ranks first.Looking at countries with values, the top 5 countries are:1. Albania, actual value 36.51, actual ranking 1.2. Costa Rica, actual value 36.30, actual ranking 2.3. Afghanistan, actual value 24.75, actual ranking 3.4. Poland, actual value 23.95, actual ranking 4.5. Sri Lanka, actual value 20.48, actual ranking 5.Looking at countries with values, the bottom 5 countries are:1. Lebanon, actual value -98.32, actual ranking 153.2. Venezuela, RB, actual value -94.88, actual ranking 152.3. Iran, Islamic Rep., actual value -93.01, actual ranking 151.4. Argentina, actual value -89.58, actual ranking 150.5. South Sudan, actual value -89.00, actual ranking 149.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
As the field of human-computer interaction continues to evolve, there is a growing need for robust datasets that can enable the development of gesture recognition systems that operate reliably in diverse real-world scenarios. We present a radar-based gesture dataset, recorded using the BGT60TR13C XENSIV™ 60GHz Frequency Modulated Continuous Radar sensor to address this need. This dataset includes both nominal gestures and anomalous gestures, providing a diverse and challenging benchmark for understanding and improving gesture recognition systems.
The dataset contains a total of 49,000 gesture recordings, with 25,000 nominal gestures and 24,000 anomalous gestures. Each recording consists of 100 frames of raw radar data, accompanied by a label file that provides annotations for every individual frame in each gesture sequence. This frame-based annotation allows for high-resolution temporal analysis and evaluation.
The nominal gestures represent standard, correctly performed gestures. These gestures were collected to serve as the baseline for gesture recognition tasks. The details of the nominal data are as follows:
Gesture Types: The dataset includes five nominal gesture types:
Total Samples: 25,000 nominal gestures.
Participants: The nominal gestures were performed by 12 participants (p1 through p12).
Each nominal gesture has a corresponding label file that annotates every frame with the nominal gesture type, providing a detailed temporal profile for training and evaluation purposes.
The anomalous gestures represent deviations from the nominal gestures. These anomalies were designed to simulate real-world conditions in which gestures might be performed incorrectly, under varying speeds, or with modified execution patterns. The anomalous data introduces additional challenges for gesture recognition models, testing their ability to generalize and handle edge cases effectively.
Total Samples: 24,000 anomalous gestures.
Anomaly Types: The anomalous gestures include three distinct types of anomalies:
Participants: The anomalous gestures involved contributions from eight participants, including p1, p2, p6, p7, p9, p10, p11, and p12.
Locations: All anomalous gestures were collected in location e1 (a closed-space meeting room).
The radar system was configured with an operational frequency range spanning from 58.5 GHz to 62.5 GHz. This configuration provides a range resolution of 37.5 mm and the ability to resolve targets at a maximum range of 1.2 meters. For signal transmission, the radar employed a burst configuration comprising 32 chirps per burst with a frame rate of 33 Hz and a pulse repetition time of 300 µs.
The data for each user, categorized by location and anomaly type, is saved in compressed .npz files. Each .npz file contains key-value pairs for the data and its corresponding labels. The file naming convention is as follows:UserLabel_EnvironmentLabel(_AnomalyLabel).npy. For nominal gestures, the anomaly label is omitted.
The .npz file contains two primary keys:
inputs: Represents the raw radar data.targets: Refers to the corresponding label vector for the raw data.The raw radar data inputsis stored as a NumPy array with 5 dimensions, structured as follows:n_recordings x n_frames x n_antennas x n_chirps x n_samples, where:
n_recordings: The number of gesture sequence instances (i.e., recordings).n_frames: The frame length of each gesture (100 frames per gesture).n_antennas: The number of virtual antennas (3 antennas).n_chirps: The number of chirps per frame (32 chirps).n_samples: The number of samples per chirp (64 samples).The labels targetsare stored as a NumPy array with 2 dimensions, structured as follows:n_recordings x n_frames, where:
n_recordings: The number of gesture sequence instances (i.e., recordings).n_frames: The frame length of each gesture (100 frames per gesture).Each entry in the targets matrix corresponds to the frame-level label for the associated raw radar data in inputs.
The total size of the dataset is approximately 48.1 GB, provided as a compressed file named radar_dataset.zip.
The user labels are defined as follows:
p1: Malep2: Femalep3: Femalep4: Malep5: Malep6: Malep7: Malep8: Malep9: Malep10: Femalep11: Malep12: MaleThe environmental labels included in the dataset are defined as follows:
e1: Closed-space meeting roome2: Open-space office roome3: Librarye4: Kitchene5: Exercise roome6: BedroomThe anomaly labels included in the dataset are defined as follows:
fast: Fast gesture executionslow: Slow gesture executionwrist: Wrist gesture executionThis dataset represents a robust and diverse set of radar-based gesture data, enabling researchers and developers to explore novel models and evaluate their robustness in a variety of scenarios. The inclusion of frame-based labeling provides an additional level of detail to facilitate the design of advanced gesture recognition systems that can operate with high temporal resolution.
This dataset builds upon the version previously published on IEEE DataExplorer (https://ieee-dataport.org/documents/60-ghz-fmcw-radar-gesture-dataset), which included only one label per recording. In contrast, this version includes frame-based labels, providing individual annotations for each frame of the recorded gestures. By offering more granular labeling, this dataset further supports the development and evaluation of gesture recognition models with enhanced temporal precision. However, the raw radar data remains unchanged compared to the dataset available on IEEE DataExplorer.
Facebook
Twitterhttps://earth.esa.int/eogateway/documents/20142/1564626/Terms-and-Conditions-for-the-use-of-ESA-Data.pdfhttps://earth.esa.int/eogateway/documents/20142/1564626/Terms-and-Conditions-for-the-use-of-ESA-Data.pdf
The Fundamental Data Record (FDR) for Atmospheric Composition UVN v.1.0 dataset is a cross-instrument Level-1 product [ATMOS_L1B] generated in 2023 and resulting from the ESA FDR4ATMOS project. The FDR contains selected Earth Observation Level 1b parameters (irradiance/reflectance) from the nadir-looking measurements of the ERS-2 GOME and Envisat SCIAMACHY missions for the period ranging from 1995 to 2012. The data record offers harmonised cross-calibrated spectra with focus on spectral windows in the Ultraviolet-Visible-Near Infrared regions for the retrieval of critical atmospheric constituents like ozone (O3), sulphur dioxide (SO2), nitrogen dioxide (NO2) column densities, alongside cloud parameters. The FDR4ATMOS products should be regarded as experimental due to the innovative approach and the current use of a limited-sized test dataset to investigate the impact of harmonization on the Level 2 target species, specifically SO2, O3 and NO2. Presently, this analysis is being carried out within follow-on activities. The FDR4ATMOS V1 is currently being extended to include the MetOp GOME-2 series. Product format For many aspects, the FDR product has improved compared to the existing individual mission datasets: GOME solar irradiances are harmonised using a validated SCIAMACHY solar reference spectrum, solving the problem of the fast-changing etalon present in the original GOME Level 1b data; Reflectances for both GOME and SCIAMACHY are provided in the FDR product. GOME reflectances are harmonised to degradation-corrected SCIAMACHY values, using collocated data from the CEOS PIC sites; SCIAMACHY data are scaled to the lowest integration time within the spectral band using high-frequency PMD measurements from the same wavelength range. This simplifies the use of the SCIAMACHY spectra which were split in a complex cluster structure (with own integration time) in the original Level 1b data; The harmonization process applied mitigates the viewing angle dependency observed in the UV spectral region for GOME data; Uncertainties are provided. Each FDR product provides, within the same file, irradiance/reflectance data for UV-VIS-NIR special regions across all orbits on a single day, including therein information from the individual ERS-2 GOME and Envisat SCIAMACHY measurements. FDR has been generated in two formats: Level 1A and Level 1B targeting expert users and nominal applications respectively. The Level 1A [ATMOS_L1A] data include additional parameters such as harmonisation factors, PMD, and polarisation data extracted from the original mission Level 1 products. The ATMOS_L1A dataset is not part of the nominal dissemination to users. In case of specific requirements, please contact EOHelp. Please refer to the README file for essential guidance before using the data. All the new products are conveniently formatted in NetCDF. Free standard tools, such as Panoply, can be used to read NetCDF data. Panoply is sourced and updated by external entities. For further details, please consult our Terms and Conditions page. Uncertainty characterisation One of the main aspects of the project was the characterization of Level 1 uncertainties for both instruments, based on metrological best practices. The following documents are provided: General guidance on a metrological approach to Fundamental Data Records (FDR) Uncertainty Characterisation document Effect tables NetCDF files containing example uncertainty propagation analysis and spectral error correlation matrices for SCIAMACHY (Atlantic and Mauretania scene for 2003 and 2010) and GOME (Atlantic scene for 2003) reflectance_uncertainty_example_FDR4ATMOS_GOME.nc reflectance_uncertainty_example_FDR4ATMOS_SCIA.nc Known Issues Non-monotonous wavelength axis for SCIAMACHY in FDR data version 1.0 In the SCIAMACHY OBSERVATION group of the atmospheric FDR v1.0 dataset (DOI: 10.5270/ESA-852456e), the wavelength axis (lambda variable) is not monotonically increasing. This issue affects all spectral channels (UV, VIS, NIR) in the SCIAMACHY group, while GOME OBSERVATION data remain unaffected. The root cause of the issue lies in the incorrect indexing of the lambda variable during the NetCDF writing process. Notably, the wavelength values themselves are calculated correctly within the processing chain. Temporary Workaround The wavelength axis is correct in the first record of each product. As a workaround, users can extract the wavelength axis from the first record and apply it to all subsequent measurements within the same product. The first record can be retrieved by setting the first two indices (time and scanline) to 0 (assuming counting of array indices starts at 0). Note that this process must be repeated separately for each spectral range (UV, VIS, NIR) and every daily product. Since the wavelength axis of SCIAMACHY is highly stable over time, using the first record introduces no expected impact on retrieval results. Python pseudo-code example: lambda_...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
General information: This dataset is meant to serve as a benchmark problem for fault detection and isolation in dynamical systems. It contains pre-processed sensor data from the adaptive high-rise demonstrator building D1244, built in the scope of the CRC1244. Parts of the measurements have been artificially corrupted and labeled accordingly. Please note that although the measurements are stored in Matlab's .mat-format (Version 7.0), they can easily be processed using free software such as the SciPy library in Python. Structure of the dataset: train contains the training data (only nominal) test_easy contains test data (nominal and faulty with high fault amplitude). Faulty samples were obtained by manipulating a single signal in a random nominal sample from the test data. test_hard contains test data (nominal and faulty with low fault amplitude) meta contains textual labels for all signals and fault types File contents: Each file contains the following data from 16384 timesteps: t: time in seconds u: demanded actuator forces in newtons y: measured outputs (relative elongations measured by strain gauges and actuator displacements in meters measured by position encoders) label: categorical label of the present fault class, where 0 denotes the nominal class and faults in the different signals are encoded according to their index in the list of fault types meta/labels.txt Faulty samples additionally include the corresponding nominal values for reference u_true: delivered actuator forces y_true: measured outputs without faults A sample's textual fault label is also contained in its filename (between the first and second underscore).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Melded confidence intervals were proposed as a way to combine two independent one-sample confidence intervals to obtain a two-sample confidence interval for a quantity like a difference or a ratio. Simulation-based work has suggested that melded confidence intervals always provide at least the nominal coverage. However, we show here that for the case of melded confidence intervals for a difference in population quantiles, the confidence intervals do not guarantee the nominal coverage. We derive a lower bound on the coverage for a one-sided confidence interval, and we show that there are pairs of distributions that make the coverage arbitrarily close to this lower bound. One specific example of our results is that the 95% melded upper bound on the difference between two population medians offers a guaranteed coverage of only 88.3% when both samples are of size 20.
Facebook
TwitterFull title: Using Decision Trees to Detect and Isolate Simulated Leaks in the J-2X Rocket Engine Mark Schwabacher, NASA Ames Research Center Robert Aguilar, Pratt & Whitney Rocketdyne Fernando Figueroa, NASA Stennis Space Center Abstract The goal of this work was to use data-driven methods to automatically detect and isolate faults in the J-2X rocket engine. It was decided to use decision trees, since they tend to be easier to interpret than other data-driven methods. The decision tree algorithm automatically “learns” a decision tree by performing a search through the space of possible decision trees to find one that fits the training data. The particular decision tree algorithm used is known as C4.5. Simulated J-2X data from a high-fidelity simulator developed at Pratt & Whitney Rocketdyne and known as the Detailed Real-Time Model (DRTM) was used to “train” and test the decision tree. Fifty-six DRTM simulations were performed for this purpose, with different leak sizes, different leak locations, and different times of leak onset. To make the simulations as realistic as possible, they included simulated sensor noise, and included a gradual degradation in both fuel and oxidizer turbine efficiency. A decision tree was trained using 11 of these simulations, and tested using the remaining 45 simulations. In the training phase, the C4.5 algorithm was provided with labeled examples of data from nominal operation and data including leaks in each leak _location. From the data, it “learned” a decision tree that can classify unseen data as having no leak or having a leak in one of the five leak locations. In the test phase, the decision tree produced very low false alarm rates and low missed detection rates on the unseen data. It had very good fault isolation rates for three of the five simulated leak locations, but it tended to confuse the remaining two locations, perhaps because a large leak at one of these two locations can look very similar to a small leak at the other _location. Introduction The J-2X rocket engine will be tested on Test Stand A-1 at NASA Stennis Space Center (SSC) in Mississippi. A team including people from SSC, NASA Ames Research Center (ARC), and Pratt & Whitney Rocketdyne (PWR) is developing a prototype end-to-end integrated systems health management (ISHM) system that will be used to monitor the test stand and the engine while the engine is on the test stand[1]. The prototype will use several different methods for detecting and diagnosing faults in the test stand and the engine, including rule-based, model-based, and data-driven approaches. SSC is currently using the G2 tool http://www.gensym.com to develop rule-based and model-based fault detection and diagnosis capabilities for the A-1 test stand. This paper describes preliminary results in applying the data-driven approach to detecting and diagnosing faults in the J-2X engine. The conventional approach to detecting and diagnosing faults in complex engineered systems such as rocket engines and test stands is to use large numbers of human experts. Test controllers watch the data in near-real time during each engine test. Engineers study the data after each test. These experts are aided by limit checks that signal when a particular variable goes outside of a predetermined range. The conventional approach is very labor intensive. Also, humans may not be able to recognize faults that involve the relationships among large numbers of variables. Further, some potential faults could happen too quickly for humans to detect them and react before they become catastrophic. Automated fault detection and diagnosis is therefore needed. One approach to automation is to encode human knowledge into rules or models. Another approach is use data-driven methods to automatically learn models from historical data or simulated data. Our prototype will combine the data-driven approach with the model-based and rule-based appro
Facebook
TwitterIn every data analysis project, we need a solid foundation to succeed. Such a foundation consists of specific steps that we need to perform necessary actions, gather the required information, and finally, perform a well-structured and holistic data analysis.
At the first step, we should understand what exactly the problem is, which logic exists behind the problem, how does it affect all the involved parties, and what are clarified points of the main objective in the current data science project.
The main idea of this project is to extract actionable insights from the given data of a company that improves their decision-making process. Furthermore, we want to provide the best possible predictive model for the marketing campaign of their new product which shows if a customer buys the new product or not and how much is the possibility of the purchase.
The provided data is synthetic. So, it does not include any sensitive or real customer information.
The data is split into two CSV files containing the training (train.csv), and the test data (test.csv). The training date set includes 31480 records, containing customer and operational features. Customer features cover master data of customers such as their age, gender, occupation, marital status, education level and account balance, while operational features are related to the last campaign activities including last campaign result, contact date, contact duration, etc. The test date set consists of 13732 samples containing all the provided features in training data except the target value. In general, we have 19 features and one target variable
that should be predicted. These features can be described as follows:
| Feature | Type | Description |
|---|---|---|
| id | Numerical | record ID |
| target | Nominal | target value (customer response to the marketing campaign) |
| day | Numerical | contact day in previous campaign |
| month | Nominal | contact month in previous campaign |
| duration | Numerical | contact duration in previous campaign |
| contactId | Numerical | contact ID |
| age | Numerical | age of the customer |
| gender | Nominal | customer gender |
| job | Nominal | customer occupation |
| maritalStatus | Nominal | customer marital status |
| education | Nominal | customer educational degree |
| craditFailure | Nominal | if the customer has a default credit |
| accountBalance | Numerical | customer account balance |
| house | Nominal | if the customer owns a house |
| credit | Nominal | if the customer has a credit |
| contactType | Nominal | contact media |
| numberOfContacts | Numerical | number of contacts during the current campaign |
| daySinceLastCampaign | Numerical | days after the last contact of the previous campaign |
| numberOfContactsLastCampaign | Numerical | number of contacts during the previous campaign |
| lastCampaignResult | Nominal | result of the previous campaign |
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Every Politician lie but data doesn't. So I collected data of some of the important metrics of all the Indian States to check what is good and bad in all of them. The data is mostly scrapped from Wikipedia so it can be little bit inconsistent however, I will improve that in the subsequent versions.
The contains the data about the metrics like HDI ( Human Development Index), Nominal GDP, Crime Rate, Percentage of population below poverty line and unemployment rate of all the states of India.
Most of the data is scrapped from Wikipedia so thanks to them for providing the data however I wish they improve their authenticity.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Different voters behave differently at the polls, different students make different university choices, or different countries choose different health care systems. Many research questions important to social scientists concern choice behavior, which involves dealing with nominal dependent variables. Drawing on the principle of maximum random utility, we propose applying a flexible and general heterogeneous multinomial logit model to study differences in choice behavior. The model systematically accounts for heterogeneity that classical models do not capture, indicates the strength of heterogeneity, and permits examining which explanatory variables cause heterogeneity. As the proposed approach allows incorporating theoretical expectations about heterogeneity into the analysis of nominal dependent variables, it can be applied to a wide range of research problems. Our empirical example uses individual-level survey data to demonstrate the benefits of the model in studying heterogeneity in electoral decisions.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
United Kingdom Nominal Average Weekly Earnings: sa: Total Pay (TP): Whole Economy data was reported at 716.000 GBP in Feb 2025. This records an increase from the previous number of 711.000 GBP for Jan 2025. United Kingdom Nominal Average Weekly Earnings: sa: Total Pay (TP): Whole Economy data is updated monthly, averaging 461.000 GBP from Jan 2000 (Median) to Feb 2025, with 302 observations. The data reached an all-time high of 716.000 GBP in Feb 2025 and a record low of 299.809 GBP in Feb 2000. United Kingdom Nominal Average Weekly Earnings: sa: Total Pay (TP): Whole Economy data remains active status in CEIC and is reported by Office for National Statistics. The data is categorized under Global Database’s United Kingdom – Table UK.G083: Average Weekly Earnings: Seasonally Adjusted: SIC 2007 . Labour Force Estimates are shown for the mid-month of the three-month average time periods. For example, estimates for January to March 2012 are shown as 'February 2012', estimates for February to April 2012 are shown as 'March 2012', etc. [COVID-19-IMPACT]
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Major differences from previous work: For level 2 catch: Catches in tons, raised to match nominal values, now consider the geographic area of the nominal data for improved accuracy. Captures in "Number of fish" are converted to weight based on nominal data. The conversion factors used in the previous version are no longer used, as they did not adequately represent the diversity of captures. Number of fish without corresponding data in nominal are not removed as they were before, creating a huge difference for this measurement_unit between the two datasets. Nominal data from WCPFC includes fishing fleet information, and georeferenced data has been raised based on this instead of solely on the triplet year/gear/species, to avoid random reallocations. Strata for which catches in tons are raised to match nominal data have had their numbers removed. Raising only applies to complete years to avoid overrepresenting specific months, particularly in the early years of georeferenced reporting. Strata where georeferenced data exceed nominal data have not been adjusted downward, as it is unclear if these discrepancies arise from missing nominal data or different aggregation methods in both datasets. The data is not aggregated to 5-degree squares and thus remains unharmonized spatially. Aggregation can be performed using CWP codes for geographic identifiers. For example, an R function is available: source("https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/master/sardara_functions/transform_cwp_code_from_1deg_to_5deg.R") Level 0 dataset has been modified creating differences in this new version notably : The species retained are different; only 32 major species are kept. Mappings have been somewhat modified based on new standards implemented by FIRMS. New rules have been applied for overlapping areas. Data is only displayed in 1 degrees square area and 5 degrees square areas. The data is enriched with "Species group", "Gear labels" using the fdiwg standards. These main differences are recapped in the Differences_v2018_v2024.zip Recommendations: To avoid converting data from number using nominal stratas, we recommend the use of conversion factors which could be provided by tRFMOs. In some strata, nominal data appears higher than georeferenced data, as observed during level 2 processing. These discrepancies may result from errors or differences in aggregation methods. Further analysis will examine these differences in detail to refine treatments accordingly. A summary of differences by tRFMOs, based on the number of strata, is included in the appendix. Some nominal data have no equivalent in georeferenced data and therefore cannot be disaggregated. What could be done is to check for each nominal data without equivalence if a georeferenced data exists in different buffers, and to average the distribution of this footprint. Then, disaggregate the nominal data based on the georeferenced data. This would lead to the creation of data (approximately 3%), and would necessitate reducing/removing all georeferenced data without a nominal equivalent or with a lesser equivalent. Tests are currently being conducted with and without this. It would help improve the biomass captured footprint but could lead to unexpected discrepancies with current datasets. For level 0 effort : In some datasets—namely those from ICCAT and the purse seine (PS) data from WCPFC— same effort data has been reported multiple times by using different units which have been kept as is, since no official mapping allows conversion between these units. As a result, users have be remind that some ICCAT and WCPFC effort data are deliberately duplicated : in the case of ICCAT data, lines with identical strata but different effort units are duplicates reporting the same fishing activity with different measurement units. It is indeed not possible to infer strict equivalence between units, as some contain information about others (e.g., Hours.FAD and Hours.FSC may inform Hours.STD). in the case of WCPFC data, effort records were also kept in all originally reported units. Here, duplicates do not necessarily share the same “fishing_mode”, as SETS for purse seiners are reported with an explicit association to fishing_mode, while DAYS are not. This distinction allows SETS records to be separated by fishing mode, whereas DAYS records remain aggregated. Some limited harmonization—particularly between units such as NET-days and Nets—has not been implemented in the current version of the dataset, but may be considered in future releases if a consistent relationship can be established.