31 datasets found
  1. Market Basket Analysis

    • kaggle.com
    zip
    Updated Dec 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
    Explore at:
    zip(23875170 bytes)Available download formats
    Dataset updated
    Dec 9, 2021
    Authors
    Aslan Ahmedov
    Description

    Market Basket Analysis

    Market basket analysis with Apriori algorithm

    The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

    Introduction

    Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

    An Example of Association Rules

    Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

    Strategy

    • Data Import
    • Data Understanding and Exploration
    • Transformation of the data – so that is ready to be consumed by the association rules algorithm
    • Running association rules
    • Exploring the rules generated
    • Filtering the generated rules
    • Visualization of Rule

    Dataset Description

    • File name: Assignment-1_Data
    • List name: retaildata
    • File format: . xlsx
    • Number of Row: 522065
    • Number of Attributes: 7

      • BillNo: 6-digit number assigned to each transaction. Nominal.
      • Itemname: Product name. Nominal.
      • Quantity: The quantities of each product per transaction. Numeric.
      • Date: The day and time when each transaction was generated. Numeric.
      • Price: Product price. Numeric.
      • CustomerID: 5-digit number assigned to each customer. Nominal.
      • Country: Name of the country where each customer resides. Nominal.

    imagehttps://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

    Libraries in R

    First, we need to load required libraries. Shortly I describe all libraries.

    • arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
    • arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
    • tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
    • readxl - Read Excel Files in R.
    • plyr - Tools for Splitting, Applying and Combining Data.
    • ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
    • knitr - Dynamic Report generation in R.
    • magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
    • dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
    • tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

    imagehttps://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

    Data Pre-processing

    Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

    imagehttps://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> imagehttps://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

    After we will clear our data frame, will remove missing values.

    imagehttps://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

    To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

  2. SEM/EDS hyperspectral data set from a Famatinite sample

    • catalog.data.gov
    • datasets.ai
    • +1more
    Updated Jul 29, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2022). SEM/EDS hyperspectral data set from a Famatinite sample [Dataset]. https://catalog.data.gov/dataset/sem-eds-hyperspectral-data-set-from-a-famatinite-sample
    Explore at:
    Dataset updated
    Jul 29, 2022
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Description

    Famatinite is a mineral with nominal chemical formula Cu3SbS4. This electron excited X-ray data set was collected from a natural flat-polished sample and the surrounding silicate mineral. Live time/pixel: 0.704.00.953600.0/(512512 # 0.95 hours on 4 detectors Probe current: 1.0 nA Beam energy: 20 keV Energy scale: 10 eV/ch and 0.0 eV offset

  3. H

    Global Indicators 2015 Dataset (Cross-Sectional)

    • dataverse.harvard.edu
    Updated Dec 18, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Miguel Centellas (2017). Global Indicators 2015 Dataset (Cross-Sectional) [Dataset]. http://doi.org/10.7910/DVN/ZN6MWY
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 18, 2017
    Dataset provided by
    Harvard Dataverse
    Authors
    Miguel Centellas
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This is a small dataset of various global indicators developed for use in a course teaching research methods at the Croft Institute for International Studies at the University of Mississippi. The data is ready to be directly imported into SPSS, Stata, or other statistical packages. A brief codebook includes descriptions of each variable, the indicator's reference year(s), and links to the original sources. The data is cross-sectional, country-level data centered on 2015 as the primary reference year. Some data come from the most recent election or averages from a handful of years. The dataset includes socioeconomic and political data drawn from sources and indicators from the World Bank, the UNDP, and International IDEA. It also includes popular indexes (and some key components) from Freedom House, Polity IV, the Economist's Democracy Index, the Heritage Foundation's Index of Economic Freedom, and the Fund for Peace's Fragile States Index. The dataset also includes various types of data (nominal, ordinal, interval, and ratio), useful for pedagogical examples of how to handle statistical data.

  4. DXC'10 Industrial Track Sample Data

    • data.nasa.gov
    • gimi9.com
    • +1more
    Updated Mar 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). DXC'10 Industrial Track Sample Data [Dataset]. https://data.nasa.gov/dataset/dxc10-industrial-track-sample-data
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    Sample data, including nominal and faulty scenarios, for Diagnostic Problems I and II of the Second International Diagnostic Competition. Three file formats are provided, tab-delimited .txt files, Matlab .mat files, and tab-delimited .scn files. The scenario (.scn) files are read by the DXC framework. See the Second International Diagnostic Competition project page for more information.

  5. d

    DXC'09 Industrial Track Sample Data

    • catalog.data.gov
    • s.cnmilf.com
    • +1more
    Updated Apr 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). DXC'09 Industrial Track Sample Data [Dataset]. https://catalog.data.gov/dataset/dxc09-industrial-track-sample-data
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Dashlink
    Description

    Sample data, including nominal and faulty scenarios, for Tier 1 and Tier 2 of the First International Diagnostic Competition. Three file formats are provided, tab-delimited .txt files, Matlab .mat files, and tab-delimited .scn files. The scenario (.scn) files are read by the DXC framework. See the Support/Documentation section below and the First International Diagnostic Competition project page for more information.

  6. f

    Supplementary Data and Sample Figures for "Instantaneous habitable windows...

    • figshare.com
    zip
    Updated Aug 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peter M Higgins; Christopher R. Glein; Charles S. Cockell (2021). Supplementary Data and Sample Figures for "Instantaneous habitable windows in the parameter space of Enceladus' Ocean" [Dataset]. http://doi.org/10.6084/m9.figshare.14562144.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 13, 2021
    Dataset provided by
    figshare
    Authors
    Peter M Higgins; Christopher R. Glein; Charles S. Cockell
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the supplemental data set for "Instantaneous habitable windows in the parameter space of Enceladus' ocean".nominal_salts_case.xlsx contains the output from the chemical speciation model described in the main text for the nominal salt case, with [Cl] = 0.1m and [DIC] = 0.03m. DIC is the sum of the molalities of CO2(aq), HCO3- (aq) and CO32-. The speciation was performed in intervals of 10 K and 0.5 pH units, between pH 7-12 and 273-473 K. high_salts_case.xlsx contains the output from the chemical speciation model described in the main text for the high salt case, with [Cl] = 0.2m and [DIC] = 0.1m. DIC is the sum of the molalities of CO2(aq), HCO3- (aq) and CO32-. The speciation was performed in intervals of 10 K and 0.5 pH units, between pH 7-12 and 273-473 K.low_salts_case.xlsx contains the output from the chemical speciation model described in the main text for the low salt case, with [Cl] = 0.05m and [DIC] = 0.01m. DIC is the sum of the molalities of CO2(aq), HCO3- (aq) and CO32-. The speciation was performed in intervals of 10 K and 0.5 pH units, between pH 7-12 and 273-473 K.CO2_activity_uncertainty.xlsx collects the activity of CO2 from the three files above into a single sheet. This is plotted in supplemental figure S2.independent_samples.zip contains a further 20 figures which show the variance caused by solely each of [CH4], [H2], n_ATP and k at a fixed temperature or pH as indicated by the file name. These show the deviation from the nominal log10(Power supply) e.g. Figure 3 in the main text if the named parameter were allowed to vary within its uncertainty defined in Table 1 in the main text.

  7. 💰 Global GDP Dataset (Latest)

    • kaggle.com
    zip
    Updated Oct 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Asadullah Shehbaz (2025). 💰 Global GDP Dataset (Latest) [Dataset]. https://www.kaggle.com/datasets/asadullahcreative/global-gdp-explorer-2024-world-bank-un-data
    Explore at:
    zip(6672 bytes)Available download formats
    Dataset updated
    Oct 17, 2025
    Authors
    Asadullah Shehbaz
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    🧾 About Dataset

    🌍 Global GDP by Country — 2024 Edition

    📖 Overview

    The Global GDP by Country (2024) dataset provides an up-to-date snapshot of worldwide economic performance, summarizing each country’s nominal GDP, growth rate, population, and global economic contribution.

    This dataset is ideal for economic analysis, data visualization, policy modeling, and machine learning applications related to global development and financial forecasting.

    📊 Dataset Information

    • Total Records: 181 countries
    • Time Period: 2024 (latest available global data)
    • Geographic Coverage: Worldwide
    • File Format: CSV
    • File Size: ~10 KB
    • Missing Values: None (100% complete dataset)

    🎯 Target Use-Cases:
    - Economic growth trend analysis
    - GDP-based country clustering
    - Per capita wealth comparison
    - Share of world economy visualization

    🧩 Key Features

    Feature NameDescription
    CountryOfficial country name
    GDP (nominal, 2023)Total nominal GDP in USD
    GDP (abbrev.)Simplified GDP format (e.g., “$25.46 Trillion”)
    GDP GrowthAnnual GDP growth rate (%)
    Population 2023Estimated population for 2023
    GDP per capitaAverage income per person (USD)
    Share of World GDPPercentage contribution to global GDP

    📈 Statistical Summary

    Population Overview

    • Mean Population: 43.6 million
    • Standard Deviation: 155.5 million
    • Minimum Population: 9,816 (small island nations)
    • Median Population: 9.1 million
    • Maximum Population: 1.43 billion (China)

    🌟 Highlights

    💰 Top Economies (Nominal GDP):
    United States, China, Japan, Germany, India

    📈 Fastest Growing Economies:
    India, Bangladesh, Vietnam, and Rwanda

    🌐 Global Insights:
    - The dataset covers 181 countries representing 100% of global GDP.
    - Suitable for data visualization dashboards, AI-driven economic forecasting, and educational research.

    💡 Example Use-Cases

    • Build a choropleth map showing GDP distribution across continents.
    • Train a regression model to predict GDP per capita based on population and growth.
    • Compare economic inequality using population vs GDP share.

    📚 Dataset Citation

    Source: Worldometers — GDP by Country (2024)
    Dataset compiled and cleaned by: Asadullah Shehbaz
    For open research and data analysis.

  8. Global monthly catch of tuna, tuna-like and shark species (1950-2021) by 1°...

    • data.europa.eu
    unknown
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2025). Global monthly catch of tuna, tuna-like and shark species (1950-2021) by 1° or 5° squares (IRD level 2) - and efforts level 0 (1950-2023) [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-15221705?locale=da
    Explore at:
    unknown(21391)Available download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Major differences from previous work: For level 2 catch: Catches in tons, raised to match nominal values, now consider the geographic area of the nominal data for improved accuracy. Captures in "Number of fish" are converted to weight based on nominal data. The conversion factors used in the previous version are no longer used, as they did not adequately represent the diversity of captures. Number of fish without corresponding data in nominal are not removed as they were before, creating a huge difference for this measurement_unit between the two datasets. Nominal data from WCPFC includes fishing fleet information, and georeferenced data has been raised based on this instead of solely on the triplet year/gear/species, to avoid random reallocations. Strata for which catches in tons are raised to match nominal data have had their numbers removed. Raising only applies to complete years to avoid overrepresenting specific months, particularly in the early years of georeferenced reporting. Strata where georeferenced data exceed nominal data have not been adjusted downward, as it is unclear if these discrepancies arise from missing nominal data or different aggregation methods in both datasets. The data is not aggregated to 5-degree squares and thus remains unharmonized spatially. Aggregation can be performed using CWP codes for geographic identifiers. For example, an R function is available: source("https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/master/sardara_functions/transform_cwp_code_from_1deg_to_5deg.R") Level 0 dataset has been modified creating differences in this new version notably : The species retained are different; only 32 major species are kept. Mappings have been somewhat modified based on new standards implemented by FIRMS. New rules have been applied for overlapping areas. Data is only displayed in 1 degrees square area and 5 degrees square areas. The data is enriched with "Species group", "Gear labels" using the fdiwg standards. These main differences are recapped in the Differences_v2018_v2024.zip Recommendations: To avoid converting data from number using nominal stratas, we recommend the use of conversion factors which could be provided by tRFMOs. In some strata, nominal data appears higher than georeferenced data, as observed during level 2 processing. These discrepancies may result from errors or differences in aggregation methods. Further analysis will examine these differences in detail to refine treatments accordingly. A summary of differences by tRFMOs, based on the number of strata, is included in the appendix. Some nominal data have no equivalent in georeferenced data and therefore cannot be disaggregated. What could be done is to check for each nominal data without equivalence if a georeferenced data exists in different buffers, and to average the distribution of this footprint. Then, disaggregate the nominal data based on the georeferenced data. This would lead to the creation of data (approximately 3%), and would necessitate reducing/removing all georeferenced data without a nominal equivalent or with a lesser equivalent. Tests are currently being conducted with and without this. It would help improve the biomass captured footprint but could lead to unexpected discrepancies with current datasets. For level 0 effort : In some datasets—namely those from ICCAT and the purse seine (PS) data from WCPFC— same effort data has been reported multiple times by using different units which have been kept as is, since no official mapping allows conversion between these units. As a result, users have be remind that some ICCAT and WCPFC effort data are deliberately duplicated : in the case of ICCAT data, lines with identical strata but different effort units are duplicates reporting the same fishing activity with different measurement units. It is indeed not possible to infer strict equivalence between units, as some contain information about others (e.g., Hours.FAD and Hours.FSC may inform Hours.STD). in the case of WCPFC data, effort records were also kept in all originally reported units. Here, duplicates do not necessarily share the same “fishing_mode”, as SETS for purse seiners are reported with an explicit association to fishing_mode, while DAYS are not. This distinction allows SETS records to be separated by fishing mode, whereas DAYS records remain aggregated. Some limited harmonization—particularly between units such as NET-days and Nets—has not been implemented in the current version of the dataset, but may be considered in future releases if a consistent relationship can be established.

  9. Global monthly catch of tuna, tuna-like and shark species (1950-2023) by 1°...

    • data.europa.eu
    unknown
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2025). Global monthly catch of tuna, tuna-like and shark species (1950-2023) by 1° or 5° squares (IRD level 2) - and efforts level 0 (1950-2023) [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-15405414?locale=fi
    Explore at:
    unknown(2677816)Available download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Major differences from v1: For level 2 catch: Catches and number raised to nominal are only raised to exactly matching stratas or if not existing, to a strata corresponding with UNK/NEI or 99.9. (new feature in v4) When nominal strata lack specific dimensions (e.g., fishing_mode always UNK) but georeferenced strata include them, the nominal data are “upgraded” to match—preventing loss of detail. Currently this adjustment aligns nominal values to georeferenced totals; future versions may apply proportional scaling. This does not create a direct raising but rather allows more precise reallocation. (new feature in v4) IATTC Purse seine catch-and-effort are available in 3 separate files according to the group of species: tuna, billfishes, sharks. This is due to the fact that PS data is collected from 2 sources: observer and fishing vessel logbooks. Observer records are used when available, and for unobserved trips logbooks are used. Both sources collect tuna data but only observers collect shark and billfish data. As an example, a strata may have observer effort and the number of sets from the observed trips would be counted for tuna and shark and billfish. But there may have also been logbook data for unobserved sets in the same strata so the tuna catch and number of sets for a cell would be added. This would make a higher total number of sets for tuna catch than shark or billfish. Efforts in the billfish and shark datasets might hence represent only a proportion of the total effort allocated in some strata since it is the observed effort, i.e. for which there was an observer onboard. As a result, catch in the billfish and shark datasets might represent only a proportion of the total catch allocated in some strata. Hence, shark and billfish catch were raised to the fishing effort reported in the tuna dataset. (new feature in v4, was done in Firms Level 0 before) Data with resolution of 10degx10deg is removed, it is considered to disaggregate it in next versions. Catches in tons, raised to match nominal values, now consider the geographic area of the nominal data for improved accuracy. (as v3) Captures in "Number of fish" are converted to weight based on nominal data. The conversion factors used in the previous version are no longer used, as they did not adequately represent the diversity of captures. (as v3) Number of fish without corresponding data in nominal are not removed as they were before, creating a huge difference for this measurement_unit between the two datasets. (as v3) Strata for which catches in tons are raised to match nominal data have had their numbers removed. (as v3) Raising only applies to complete years to avoid overrepresenting specific months, particularly in the early years of georeferenced reporting. (as v3) Strata where georeferenced data exceed nominal data have not been adjusted downward, as it is unclear if these discrepancies arise from missing nominal data or different aggregation methods in both datasets. (as v3) The data is not aggregated to 5-degree squares and thus remains unharmonized spatially. Aggregation can be performed using CWP codes for geographic identifiers. For example, an R function is available: source("https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/master/sardara_functions/transform_cwp_code_from_1deg_to_5deg.R") (as v3) This results in a raising of the data compared to v3 for IOTC, ICCAT, IATTC and WCPFC. However as the raising is more specific for CCSBT, the raising is of 22% less than in the previous version. Level 0 dataset has been modified creating differences in this new version notably : The species retained are different; only 32 major species are kept. Mappings have been somewhat modified based on new standards implemented by FIRMS. New rules have been applied for overlapping areas. Data is only displayed in 1 degrees square area and 5 degrees square areas. The data is enriched with "Species group", "Gear labels" using the fdiwg standards. These main differences are recapped in the Differences_v2018_v2024.zip Recommendations: To avoid converting data from number using nominal stratas, we recommend the use of conversion factors which could be provided by tRFMOs. In some strata, nominal data appears higher than georeferenced data, as observed during level 2 processing. These discrepancies may result from errors or differences in aggregation methods. Further analysis will examine these differences in detail to refine treatments accordingly. A summary of differences by tRFMOs, based on the number of strata, is included in the appendix. For level 0 effort : In some datasets—namely those from ICCAT and the purse seine (PS) data from WCPFC— same effort data has been reported multiple times by using different units which have been kept as is, since no official mapping allows conversion between these units. As a result, users have be remind that some ICCAT and WCPFC effort data are deliberately duplicated : in the case of ICCAT data, lines wi

  10. z

    Controlled Anomalies Time Series (CATS) Dataset

    • zenodo.org
    bin
    Updated Jul 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Patrick Fleith; Patrick Fleith (2024). Controlled Anomalies Time Series (CATS) Dataset [Dataset]. http://doi.org/10.5281/zenodo.7646897
    Explore at:
    binAvailable download formats
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Solenix Engineering GmbH
    Authors
    Patrick Fleith; Patrick Fleith
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Controlled Anomalies Time Series (CATS) Dataset consists of commands, external stimuli, and telemetry readings of a simulated complex dynamical system with 200 injected anomalies.

    The CATS Dataset exhibits a set of desirable properties that make it very suitable for benchmarking Anomaly Detection Algorithms in Multivariate Time Series [1]:

    • Multivariate (17 variables) including sensors reading and control signals. It simulates the operational behaviour of an arbitrary complex system including:
      • 4 Deliberate Actuations / Control Commands sent by a simulated operator / controller, for instance, commands of an operator to turn ON/OFF some equipment.
      • 3 Environmental Stimuli / External Forces acting on the system and affecting its behaviour, for instance, the wind affecting the orientation of a large ground antenna.
      • 10 Telemetry Readings representing the observable states of the complex system by means of sensors, for instance, a position, a temperature, a pressure, a voltage, current, humidity, velocity, acceleration, etc.
    • 5 million timestamps. Sensors readings are at 1Hz sampling frequency.
      • 1 million nominal observations (the first 1 million datapoints). This is suitable to start learning the "normal" behaviour.
      • 4 million observations that include both nominal and anomalous segments. This is suitable to evaluate both semi-supervised approaches (novelty detection) as well as unsupervised approaches (outlier detection).
    • 200 anomalous segments. One anomalous segment may contain several successive anomalous observations / timestamps. Only the last 4 million observations contain anomalous segments.
    • Different types of anomalies to understand what anomaly types can be detected by different approaches.
    • Fine control over ground truth. As this is a simulated system with deliberate anomaly injection, the start and end time of the anomalous behaviour is known very precisely. In contrast to real world datasets, there is no risk that the ground truth contains mislabelled segments which is often the case for real data.
    • Obvious anomalies. The simulated anomalies have been designed to be "easy" to be detected for human eyes (i.e., there are very large spikes or oscillations), hence also detectable for most algorithms. It makes this synthetic dataset useful for screening tasks (i.e., to eliminate algorithms that are not capable to detect those obvious anomalies). However, during our initial experiments, the dataset turned out to be challenging enough even for state-of-the-art anomaly detection approaches, making it suitable also for regular benchmark studies.
    • Context provided. Some variables can only be considered anomalous in relation to other behaviours. A typical example consists of a light and switch pair. The light being either on or off is nominal, the same goes for the switch, but having the switch on and the light off shall be considered anomalous. In the CATS dataset, users can choose (or not) to use the available context, and external stimuli, to test the usefulness of the context for detecting anomalies in this simulation.
    • Pure signal ideal for robustness-to-noise analysis. The simulated signals are provided without noise: while this may seem unrealistic at first, it is an advantage since users of the dataset can decide to add on top of the provided series any type of noise and choose an amplitude. This makes it well suited to test how sensitive and robust detection algorithms are against various levels of noise.
    • No missing data. You can drop whatever data you want to assess the impact of missing values on your detector with respect to a clean baseline.

    [1] Example Benchmark of Anomaly Detection in Time Series: “Sebastian Schmidl, Phillip Wenig, and Thorsten Papenbrock. Anomaly Detection in Time Series: A Comprehensive Evaluation. PVLDB, 15(9): 1779 - 1797, 2022. doi:10.14778/3538598.3538602”

    About Solenix

    Solenix is an international company providing software engineering, consulting services and software products for the space market. Solenix is a dynamic company that brings innovative technologies and concepts to the aerospace market, keeping up to date with technical advancements and actively promoting spin-in and spin-out technology activities. We combine modern solutions which complement conventional practices. We aspire to achieve maximum customer satisfaction by fostering collaboration, constructivism, and flexibility.

  11. Fundamental Data Record for Atmospheric Composition [ATMOS_L1B]

    • earth.esa.int
    Updated Jul 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    European Space Agency (2024). Fundamental Data Record for Atmospheric Composition [ATMOS_L1B] [Dataset]. https://earth.esa.int/eogateway/catalog/fdr-for-atmospheric-composition
    Explore at:
    Dataset updated
    Jul 1, 2024
    Dataset authored and provided by
    European Space Agencyhttp://www.esa.int/
    License

    https://earth.esa.int/eogateway/documents/20142/1564626/Terms-and-Conditions-for-the-use-of-ESA-Data.pdfhttps://earth.esa.int/eogateway/documents/20142/1564626/Terms-and-Conditions-for-the-use-of-ESA-Data.pdf

    Time period covered
    Jun 28, 1995 - Apr 7, 2012
    Description

    The Fundamental Data Record (FDR) for Atmospheric Composition UVN v.1.0 dataset is a cross-instrument Level-1 product [ATMOS_L1B] generated in 2023 and resulting from the ESA FDR4ATMOS project. The FDR contains selected Earth Observation Level 1b parameters (irradiance/reflectance) from the nadir-looking measurements of the ERS-2 GOME and Envisat SCIAMACHY missions for the period ranging from 1995 to 2012. The data record offers harmonised cross-calibrated spectra with focus on spectral windows in the Ultraviolet-Visible-Near Infrared regions for the retrieval of critical atmospheric constituents like ozone (O3), sulphur dioxide (SO2), nitrogen dioxide (NO2) column densities, alongside cloud parameters. The FDR4ATMOS products should be regarded as experimental due to the innovative approach and the current use of a limited-sized test dataset to investigate the impact of harmonization on the Level 2 target species, specifically SO2, O3 and NO2. Presently, this analysis is being carried out within follow-on activities. The FDR4ATMOS V1 is currently being extended to include the MetOp GOME-2 series. Product format For many aspects, the FDR product has improved compared to the existing individual mission datasets: GOME solar irradiances are harmonised using a validated SCIAMACHY solar reference spectrum, solving the problem of the fast-changing etalon present in the original GOME Level 1b data; Reflectances for both GOME and SCIAMACHY are provided in the FDR product. GOME reflectances are harmonised to degradation-corrected SCIAMACHY values, using collocated data from the CEOS PIC sites; SCIAMACHY data are scaled to the lowest integration time within the spectral band using high-frequency PMD measurements from the same wavelength range. This simplifies the use of the SCIAMACHY spectra which were split in a complex cluster structure (with own integration time) in the original Level 1b data; The harmonization process applied mitigates the viewing angle dependency observed in the UV spectral region for GOME data; Uncertainties are provided. Each FDR product provides, within the same file, irradiance/reflectance data for UV-VIS-NIR special regions across all orbits on a single day, including therein information from the individual ERS-2 GOME and Envisat SCIAMACHY measurements. FDR has been generated in two formats: Level 1A and Level 1B targeting expert users and nominal applications respectively. The Level 1A [ATMOS_L1A] data include additional parameters such as harmonisation factors, PMD, and polarisation data extracted from the original mission Level 1 products. The ATMOS_L1A dataset is not part of the nominal dissemination to users. In case of specific requirements, please contact EOHelp. Please refer to the README file for essential guidance before using the data. All the new products are conveniently formatted in NetCDF. Free standard tools, such as Panoply, can be used to read NetCDF data. Panoply is sourced and updated by external entities. For further details, please consult our Terms and Conditions page. Uncertainty characterisation One of the main aspects of the project was the characterization of Level 1 uncertainties for both instruments, based on metrological best practices. The following documents are provided: General guidance on a metrological approach to Fundamental Data Records (FDR) Uncertainty Characterisation document Effect tables NetCDF files containing example uncertainty propagation analysis and spectral error correlation matrices for SCIAMACHY (Atlantic and Mauretania scene for 2003 and 2010) and GOME (Atlantic scene for 2003) reflectance_uncertainty_example_FDR4ATMOS_GOME.nc reflectance_uncertainty_example_FDR4ATMOS_SCIA.nc Known Issues Non-monotonous wavelength axis for SCIAMACHY in FDR data version 1.0 In the SCIAMACHY OBSERVATION group of the atmospheric FDR v1.0 dataset (DOI: 10.5270/ESA-852456e), the wavelength axis (lambda variable) is not monotonically increasing. This issue affects all spectral channels (UV, VIS, NIR) in the SCIAMACHY group, while GOME OBSERVATION data remain unaffected. The root cause of the issue lies in the incorrect indexing of the lambda variable during the NetCDF writing process. Notably, the wavelength values themselves are calculated correctly within the processing chain. Temporary Workaround The wavelength axis is correct in the first record of each product. As a workaround, users can extract the wavelength axis from the first record and apply it to all subsequent measurements within the same product. The first record can be retrieved by setting the first two indices (time and scanline) to 0 (assuming counting of array indices starts at 0). Note that this process must be repeated separately for each spectral range (UV, VIS, NIR) and every daily product. Since the wavelength axis of SCIAMACHY is highly stable over time, using the first record introduces no expected impact on retrieval results. Python pseudo-code example: lambda_...

  12. Frame-Labeled 60 GHz FMCW Radar Gesture Dataset

    • zenodo.org
    zip
    Updated May 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sarah Seifi; Sarah Seifi; Tobias Sukianto; Cecilia Carbonelli; Tobias Sukianto; Cecilia Carbonelli (2025). Frame-Labeled 60 GHz FMCW Radar Gesture Dataset [Dataset]. http://doi.org/10.5281/zenodo.15178095
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 7, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sarah Seifi; Sarah Seifi; Tobias Sukianto; Cecilia Carbonelli; Tobias Sukianto; Cecilia Carbonelli
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    As the field of human-computer interaction continues to evolve, there is a growing need for robust datasets that can enable the development of gesture recognition systems that operate reliably in diverse real-world scenarios. We present a radar-based gesture dataset, recorded using the BGT60TR13C XENSIV™ 60GHz Frequency Modulated Continuous Radar sensor to address this need. This dataset includes both nominal gestures and anomalous gestures, providing a diverse and challenging benchmark for understanding and improving gesture recognition systems.

    The dataset contains a total of 49,000 gesture recordings, with 25,000 nominal gestures and 24,000 anomalous gestures. Each recording consists of 100 frames of raw radar data, accompanied by a label file that provides annotations for every individual frame in each gesture sequence. This frame-based annotation allows for high-resolution temporal analysis and evaluation.

    Nominal Gesture Data

    The nominal gestures represent standard, correctly performed gestures. These gestures were collected to serve as the baseline for gesture recognition tasks. The details of the nominal data are as follows:

    • Gesture Types: The dataset includes five nominal gesture types:

      1. Swipe Left
      2. Swipe Right
      3. Swipe Up
      4. Swipe Down
      5. Push
    • Total Samples: 25,000 nominal gestures.

    • Participants: The nominal gestures were performed by 12 participants (p1 through p12).

    Each nominal gesture has a corresponding label file that annotates every frame with the nominal gesture type, providing a detailed temporal profile for training and evaluation purposes.

    Anomalous Gesture Data

    The anomalous gestures represent deviations from the nominal gestures. These anomalies were designed to simulate real-world conditions in which gestures might be performed incorrectly, under varying speeds, or with modified execution patterns. The anomalous data introduces additional challenges for gesture recognition models, testing their ability to generalize and handle edge cases effectively.

    • Total Samples: 24,000 anomalous gestures.

    • Anomaly Types: The anomalous gestures include three distinct types of anomalies:

      1. Fast Executions: Gestures performed at a rapid pace, lasting approximately 0.1 seconds (much faster than the nominal average of 0.5 seconds).
      2. Slow Executions: Gestures performed at a significantly slower pace, lasting approximately 3 seconds (much slower than the nominal average).
      3. Wrist Executions: Gestures performed using the wrist instead of a fully extended arm, significantly altering the execution pattern.
    • Participants: The anomalous gestures involved contributions from eight participants, including p1, p2, p6, p7, p9, p10, p11, and p12.

    • Locations: All anomalous gestures were collected in location e1 (a closed-space meeting room).

    Radar Configuration Details

    The radar system was configured with an operational frequency range spanning from 58.5 GHz to 62.5 GHz. This configuration provides a range resolution of 37.5 mm and the ability to resolve targets at a maximum range of 1.2 meters. For signal transmission, the radar employed a burst configuration comprising 32 chirps per burst with a frame rate of 33 Hz and a pulse repetition time of 300 µs.

    Data Format

    The data for each user, categorized by location and anomaly type, is saved in compressed .npz files. Each .npz file contains key-value pairs for the data and its corresponding labels. The file naming convention is as follows:
    UserLabel_EnvironmentLabel(_AnomalyLabel).npy. For nominal gestures, the anomaly label is omitted.

    The .npz file contains two primary keys:

    1. inputs: Represents the raw radar data.
    2. targets: Refers to the corresponding label vector for the raw data.

    The raw radar data inputsis stored as a NumPy array with 5 dimensions, structured as follows:
    n_recordings x n_frames x n_antennas x n_chirps x n_samples, where:

    1. n_recordings: The number of gesture sequence instances (i.e., recordings).
    2. n_frames: The frame length of each gesture (100 frames per gesture).
    3. n_antennas: The number of virtual antennas (3 antennas).
    4. n_chirps: The number of chirps per frame (32 chirps).
    5. n_samples: The number of samples per chirp (64 samples).

    The labels targetsare stored as a NumPy array with 2 dimensions, structured as follows:
    n_recordings x n_frames, where:

    1. n_recordings: The number of gesture sequence instances (i.e., recordings).
    2. n_frames: The frame length of each gesture (100 frames per gesture).

    Each entry in the targets matrix corresponds to the frame-level label for the associated raw radar data in inputs.

    The total size of the dataset is approximately 48.1 GB, provided as a compressed file named radar_dataset.zip.

    Metadata

    The user labels are defined as follows:

    • p1: Male
    • p2: Female
    • p3: Female
    • p4: Male
    • p5: Male
    • p6: Male
    • p7: Male
    • p8: Male
    • p9: Male
    • p10: Female
    • p11: Male
    • p12: Male

    The environmental labels included in the dataset are defined as follows:

    • e1: Closed-space meeting room
    • e2: Open-space office room
    • e3: Library
    • e4: Kitchen
    • e5: Exercise room
    • e6: Bedroom

    The anomaly labels included in the dataset are defined as follows:

    • fast: Fast gesture execution
    • slow: Slow gesture execution
    • wrist: Wrist gesture execution

    This dataset represents a robust and diverse set of radar-based gesture data, enabling researchers and developers to explore novel models and evaluate their robustness in a variety of scenarios. The inclusion of frame-based labeling provides an additional level of detail to facilitate the design of advanced gesture recognition systems that can operate with high temporal resolution.

    Disclaimer

    This dataset builds upon the version previously published on IEEE DataExplorer (https://ieee-dataport.org/documents/60-ghz-fmcw-radar-gesture-dataset), which included only one label per recording. In contrast, this version includes frame-based labels, providing individual annotations for each frame of the recorded gestures. By offering more granular labeling, this dataset further supports the development and evaluation of gesture recognition models with enhanced temporal precision. However, the raw radar data remains unchanged compared to the dataset available on IEEE DataExplorer.

  13. R

    Données de réplication pour : Kinetic solubility: experimental and...

    • entrepot.recherche.data.gouv.fr
    bin, txt
    Updated Aug 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shamkhal Baybekov; Shamkhal Baybekov; Pierre Llompart; Pierre Llompart; Gilles Marcou; Gilles Marcou; Patrick Gizzi; Patrick Gizzi; Jean-Luc Galzi; Jean-Luc Galzi; Pascal Ramos; Olivier Saurel; Olivier Saurel; Claire Bourban; Claire Bourban; Claire Minoletti; Claire Minoletti; Alexandre Varnek; Alexandre Varnek; Pascal Ramos (2023). Données de réplication pour : Kinetic solubility: experimental and machine-learning modeling perspectives [Dataset]. http://doi.org/10.57745/ZWS0WC
    Explore at:
    bin(15680472), bin(100130621), bin(4166966), txt(9560), bin(16226572), bin(106190591), bin(1214732), bin(1332562), bin(597827), bin(578879), bin(709579), bin(1604433), bin(914009), bin(1716883), bin(30127191), bin(118707981), bin(610066), bin(99457), bin(519947)Available download formats
    Dataset updated
    Aug 22, 2023
    Dataset provided by
    Recherche Data Gouv
    Authors
    Shamkhal Baybekov; Shamkhal Baybekov; Pierre Llompart; Pierre Llompart; Gilles Marcou; Gilles Marcou; Patrick Gizzi; Patrick Gizzi; Jean-Luc Galzi; Jean-Luc Galzi; Pascal Ramos; Olivier Saurel; Olivier Saurel; Claire Bourban; Claire Bourban; Claire Minoletti; Claire Minoletti; Alexandre Varnek; Alexandre Varnek; Pascal Ramos
    License

    https://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html

    Description

    Kinetic aqueous or buffer solubility is important parameter measuring suitability of compounds for high throughput assays in early drug discovery while thermodynamic solubility is reserved for later stages of drug discovery and development. Kinetic solubility is also considered to have low inter-laboratory reproducibility because of its sensitivity to protocol parameters. Presumably, this is why little efforts have been put to build QSPR models for kinetic in comparison to thermodynamic aqueous solubility. Here, we investigate the reproducibility and modelability of kinetic solubility assays. We first analyzed the relationship between kinetic and thermodynamic solubility data, and then examined the consistency of data from different kinetic assays. In this contribution, we report differences between kinetic and thermodynamic solubility data that are consistent with those reported by others and good agreement between data from different kinetic solubility campaigns in contrast to general expectations. The latter is confirmed by achieving high performing QSPR models trained on merged kinetic solubility datasets. This encourages for building predictive models for kinetic solubility. The kinetic solubility QSPR model developed in this study is freely accessible through the Predictor web service of the Laboratory of Chemoinformatics (https://chematlas.chimie.unistra.fr/cgi-bin/predictor2.cgi). --*-*-*-*-*-*-*-*-*-*-*-*-*-* PICT The dataset was provided by Plateforme Intégrée de Criblage de Toulouse (PICT) screening platform. It consists of kinetic solubility measurements for 939 fragments (small organic molecules). The measurements were performed in PBS buffer solution (pH 7.2) (with 1% DMSO from stock solution) using NMR technique for detection. Adding uncertainties in sample preparation and detection, experts recommend to interpret a fragment of this dataset as “Insoluble” if the reported concentration is < 780 μM and “Soluble” if the concentration is > 880 μM. In-between the solubility label is undecided. Other curation steps included removal of data points reporting a concentration greater than the nominal sample concentration (1 mM) or greater than the concentration in the stock solution, indicative of an error. After the curation and removal of 46 confirmed outliers and suspicious data points, the total number of compounds in the dataset was 606 (513 “Soluble” and 93 “Insoluble”). Prestwick This dataset originates from the former Prestwick Chemicals company. Kinetic solubility was measured for 1049 fragments in a buffer solution (pH 7.4) using static light scattering (SLS). Compounds are categorized as “Soluble” or “Insoluble” at 1 mM PBS (with 1% DMSO from stock solution). Data curation involved removal of identical duplicate measurements, as well as the molecules found soluble at higher concentrations, 5 mM and/or 10 mM, but not at 1 mM concentration, implying an error. The curated dataset consists of 989 compounds (900 “Soluble” and 89 “Insoluble”). Life Chemicals Life Chemicals company provided kinetic solubility data for one of its fragment libraries (https://lifechemicals.com/fragment-libraries/soluble-fragment-library). Solubility of 11457 fragments was visually determined based on scattering observed in solutions at 1 mM concentration in PBS (pH 7.4) with 0.5% DMSO. After removal of data points with no kinetic solubility, the curated dataset consists of 9276 “Soluble” molecules. MLSMR The Molecular Libraries Small Molecule Repository (MLSMR - https://pubchem.ncbi.nlm.nih.gov/bioassay/1996) is a collection of small molecules compiled under the initiative of National Institutes of Health (NIH) and screened by Sanford-Burnham Center for Chemical Genomics (SBCCG). To our knowledge, MLSMR is the largest kinetic solubility dataset available in PubChem and it is composed of 57824 data points measured in PBS (pH 7.4) using quantitative chemiluminescent nitrogen detection (CLND). Although, 0.2 mM was reported as the nominal concentration of a sample, a large fraction of the reported concentration (about 31% of the dataset) is in the range of (0.15; 0.151]. Based on this observation, we assumed 0.15 mM as the actual sample nominal concentration and removed data points which reported concentration greater than or equal to 0.15 mM (13262 data points). Additionally, data curation included removal of duplicate molecules while taking median of their solubility values. The resulting curated dataset contained 44510 nitrogen containing compounds which are insoluble at 0.15 mM, and therefore labeled “Insoluble” at 1 mM. Boehringer Boehringer Ingelheim Pharma GmbH & Co. shared a dataset of 789 kinetic solubility measurements (dot: 10.1002/cmdc.200900205) performed in PBS (pH 7.4) using nephelometry method. Data points with reported precipitate formation in DMSO stock solution and those for which solubility value was only bounded (relation denoted as “>”) were removed. The curated dataset contained 605...

  14. c

    Dataset of X-Ray Micro Computed Tomography Measurements of Porosity for...

    • kilthub.cmu.edu
    txt
    Updated Jun 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Justin Miner; Sneha Prabha Narra (2025). Dataset of X-Ray Micro Computed Tomography Measurements of Porosity for Nominal and Low Power Coupons Fabricated by Powder Bed Fusion-Laser Beam [Dataset]. http://doi.org/10.1184/R1/28152209.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 25, 2025
    Dataset provided by
    Carnegie Mellon University
    Authors
    Justin Miner; Sneha Prabha Narra
    License

    Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
    License information was derived automatically

    Description

    DescriptionDataset of porosity data in Powder Bed Fusion - Laser Beam of Ti-6Al-4V obtained via X-ray Micro Computed Tomography. This work was conducted on an EOS M290. The coupons in this dataset are fabricated at 150 W and 280 W.Contentsporedf.csv: A csv file with pore measurements for each sample scanned.parameters.csv: A csv file containing the process parameters and extreme value statistics (EVS) parameters for each sample scanned.WARNING: parameters.csv is too large to open in excel. Saving it in excel will cause data loss.

  15. Degradation Measurement of Robot Arm Position Accuracy

    • data.nist.gov
    • catalog.data.gov
    Updated Sep 7, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Helen Qiao (2018). Degradation Measurement of Robot Arm Position Accuracy [Dataset]. http://doi.org/10.18434/M31962
    Explore at:
    Dataset updated
    Sep 7, 2018
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Authors
    Helen Qiao
    License

    https://www.nist.gov/open/licensehttps://www.nist.gov/open/license

    Description

    The dataset contains both the robot's high-level tool center position (TCP) health data and controller-level components' information (i.e., joint positions, velocities, currents, temperatures, currents). The datasets can be used by users (e.g., software developers, data scientists) who work on robot health management (including accuracy) but have limited or no access to robots that can capture real data. The datasets can support the: - Development of robot health monitoring algorithms and tools - Research of technologies and tools to support robot monitoring, diagnostics, prognostics, and health management (collectively called PHM) - Validation and verification of the industrial PHM implementation. For example, the verification of a robot's TCP accuracy after the work cell has been reconfigured, or whenever a manufacturer wants to determine if the robot arm has experienced a degradation. For data collection, a trajectory is programmed for the Universal Robot (UR5) approaching and stopping at randomly-selected locations in its workspace. The robot moves along this preprogrammed trajectory during different conditions of temperature, payload, and speed. The TCP (x,y,z) of the robot are measured by a 7-D measurement system developed at NIST. Differences are calculated between the measured positions from the 7-D measurement system and the nominal positions calculated by the nominal robot kinematic parameters. The results are recorded within the dataset. Controller level sensing data are also collected from each joint (direct output from the controller of the UR5), to understand the influences of position degradation from temperature, payload, and speed. Controller-level data can be used for the root cause analysis of the robot performance degradation, by providing joint positions, velocities, currents, accelerations, torques, and temperatures. For example, the cold-start temperatures of the six joints were approximately 25 degrees Celsius. After two hours of operation, the joint temperatures increased to approximately 35 degrees Celsius. Control variables are listed in the header file in the data set (UR5TestResult_header.xlsx). If you'd like to comment on this data and/or offer recommendations on future datasets, please email guixiu.qiao@nist.gov.

  16. Yield Curve Models and Data - Three-Factor Nominal Term Structure Model

    • catalog.data.gov
    • s.cnmilf.com
    Updated Dec 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Board of Governors of the Federal Reserve System (2024). Yield Curve Models and Data - Three-Factor Nominal Term Structure Model [Dataset]. https://catalog.data.gov/dataset/yield-curve-models-and-data-three-factor-nominal-term-structure-model
    Explore at:
    Dataset updated
    Dec 18, 2024
    Dataset provided by
    Federal Reserve Board of Governors
    Federal Reserve Systemhttp://www.federalreserve.gov/
    Description

    This is a no-arbitrage dynamic term structure model, implemented as in Kim and Wright using the methodology of Kim and Orphanides . The underlying model is the standard affine Gaussian model with three factors that are latent (i.e., the factors are defined only statistically and do not have a specific economic meaning). The model is parameterized in a maximally flexible way (i.e., it is the most general model of its kind with three factors that are econometrically identified). In the estimation of the parameters of the model, data on survey forecasts of 3-month Treasury bill (T-bill) rate are used in addition to yields data in order to help address the small sample problems that often pervade econometric estimation with persistent time series like bond yields.

  17. Data from: bicycle store dataset

    • kaggle.com
    zip
    Updated Sep 11, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rohit Sahoo (2020). bicycle store dataset [Dataset]. https://www.kaggle.com/rohitsahoo/bicycle-store-dataset
    Explore at:
    zip(682639 bytes)Available download formats
    Dataset updated
    Sep 11, 2020
    Authors
    Rohit Sahoo
    License

    http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

    Description

    Context

    Perform Exploratory Data Analysis on the Bicycle Store Dataset!

    DATA EXPLORATION Understand the characteristics of given fields in the underlying data such as variable distributions, whether the dataset is skewed towards a certain demographic and the data validity of the fields. For example, a training dataset may be highly skewed towards the younger age bracket. If so, how will this impact your results when using it to predict over the remaining customer base. Identify limitations surrounding the data and gather external data which may be useful for modelling purposes. This may include bringing in ABS data at different geographic levels and creating additional features for the model. For example, the geographic remoteness of different postcodes may be used as an indicator of proximity to consider to whether a customer is in need of a bike to ride to work.

    MODEL DEVELOPMENT Determine a hypothesis related to the business question that can be answered with the data. Perform statistical testing to determine if the hypothesis is valid or not. Create calculated fields based on existing data, for example, convert the D.O.B into an age bracket. Other fields that may be engineered include ‘High Margin Product’ which may be an indicator of whether the product purchased by the customer is in a high margin category in the past three months based on the fields ‘list_price’ and ‘standard cost’. Other examples include, calculating the distance from office to home address to as a factor in determining whether customers may purchase a bicycle for transportation purposes. Additionally, this may include thoughts around determining what the predicted variable actually is. For example, are results predicted in ordinal buckets, nominal, binary or continuous. Test the performance of the model using factors relevant for the given model chosen (i.e. residual deviance, AIC, ROC curves, R Squared). Appropriately document model performance, assumptions and limitations.

    INTEPRETATION AND REPORTING Visualisation and presentation of findings. This may involve interpreting the significant variables and co-efficient from a business perspective. These slides should tell a compelling storing around the business issue and support your case with quantitative and qualitative observations. Please refer to module below for further details

    Content

    The dataset is easy to understand and self-explanatory!

    Inspiration

    It is important to keep in mind the business context when presenting your findings: 1. What are the trends in the underlying data? 2. Which customer segment has the highest customer value? 3. What do you propose should be the marketing and growth strategy?

  18. ACTIVATE Falcon Aircraft Merge Data Files - Dataset - NASA Open Data Portal

    • data.nasa.gov
    Updated Apr 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). ACTIVATE Falcon Aircraft Merge Data Files - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/activate-falcon-aircraft-merge-data-files-69f5c
    Explore at:
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    ACTIVATE_Merge_Data is the pre-generated merge data files created from data collected onboard the HU-25 Falcon aircraft during the ACTIVATE project. ACTIVATE was a 5-year NASA Earth-Venture Sub-Orbital (EVS-3) field campaign. Marine boundary layer clouds play a critical role in Earth’s energy balance and water cycle. These clouds cover more than 45% of the ocean surface and exert a net cooling effect. The Aerosol Cloud meTeorology Interactions oVer the western Atlantic Experiment (ACTIVATE) project was a five-year project that provides important globally-relevant data about changes in marine boundary layer cloud systems, atmospheric aerosols and multiple feedbacks that warm or cool the climate. ACTIVATE studied the atmosphere over the western North Atlantic and sampled its broad range of aerosol, cloud and meteorological conditions using two aircraft, the UC-12 King Air and HU-25 Falcon. The UC-12 King Air was primarily used for remote sensing measurements while the HU-25 Falcon will contain a comprehensive instrument payload for detailed in-situ measurements of aerosol, cloud properties, and atmospheric state. A few trace gas measurements were also onboard the HU-25 Falcon for the measurements of pollution traces, which will contribute to airmass classification analysis. A total of 150 coordinated flights over the western North Atlantic occurred through 6 deployments from 2020-2022. The ACTIVATE science observing strategy intensively targets the shallow cumulus cloud regime and aims to collect sufficient statistics over a broad range of aerosol and weather conditions which enables robust characterization of aerosol-cloud-meteorology interactions. This strategy was implemented by two nominal flight patterns: Statistical Survey and Process Study. The statistical survey pattern involves close coordination between the remote sensing and in-situ aircraft to conduct near coincident sampling at and below cloud base as well as above and within cloud top. The process study pattern involves extensive vertical profiling to characterize the target cloud and surrounding aerosol and meteorological conditions.Marine boundary layer clouds play a critical role in Earth’s energy balance and water cycle. These clouds cover more than 45% of the ocean surface and exert a net cooling effect. The Aerosol Cloud meTeorology Interactions oVer the western Atlantic Experiment (ACTIVATE) project is a five-year project (January 2019-December 2023) that will provide important globally-relevant data about changes in marine boundary layer cloud systems, atmospheric aerosols and multiple feedbacks that warm or cool the climate. ACTIVATE studies the atmosphere over the western North Atlantic and samples its broad range of aerosol, cloud and meteorological conditions using two aircraft, the UC-12 King Air and HU-25 Falcon. The UC-12 King Air will primarily be used for remote sensing measurements while the HU-25 Falcon will contain a comprehensive instrument payload for detailed in-situ measurements of aerosol, cloud properties, and atmospheric state. A few trace gas measurements will also be onboard the HU-25 Falcon for the measurements of pollution traces, which will contribute to airmass classification analysis. A total of 150 coordinated flights over the western North Atlantic are planned through 6 deployments from 2020-2022. The ACTIVATE science observing strategy intensively targets the shallow cumulus cloud regime and aims to collect sufficient statistics over a broad range of aerosol and weather conditions which enables robust characterization of aerosol-cloud-meteorology interactions. This strategy is implemented by two nominal flight patterns: Statistical Survey and Process Study. The statistical survey pattern involves close coordination between the remote sensing and in-situ aircraft to conduct near coincident sampling at and below cloud base as well as above and within cloud top. The process study pattern involves extensive vertical profiling to characterize the target cloud and surrounding aerosol and meteorological conditions.

  19. D

    Experimental Data for Fault Diagnosis in the Adaptive High-Rise D1244

    • darus.uni-stuttgart.de
    Updated Feb 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonas Stiefelmaier (2025). Experimental Data for Fault Diagnosis in the Adaptive High-Rise D1244 [Dataset]. http://doi.org/10.18419/DARUS-4784
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 24, 2025
    Dataset provided by
    DaRUS
    Authors
    Jonas Stiefelmaier
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Dataset funded by
    DFG
    Description

    General information: This dataset is meant to serve as a benchmark problem for fault detection and isolation in dynamic systems. It contains preprocessed sensor data from the adaptive high-rise demonstrator building D1244, built in the scope of the CRC1244. Parts of the measurements have been artificially corrupted and labeled accordingly. Please note that although the measurements are stored in Matlab's .mat-format (Version 7.0), they can easily be processed using free software such as the SciPy library in Python. Structure of the dataset: train contains training data (only nominal) validation contains validation data (nominal and faulty). Faulty samples were obtained by manipulating a single signal in a random nominal sample from the validation data. test contains test data (nominal and faulty). Faulty samples were obtained by manipulating a single signal in a random nominal sample from the test data. meta contains textual labels for all signals as well as additional information on the considered fault classes File contents: Each file contains the following data from 1200 timesteps (60 seconds sampled at 20 Hz): t: time in seconds u: actuator forces (obtained from pressure measurements) in newtons y: relative elongations as well as bending curvatures of structural elements obtained from strain gauge measurements, and actuator displacements measured by position encoders label: categorical label of the present fault class, where 0 denotes the nominal class and faults in the different signals are encoded according to their index in the list of fault types in meta/labels.mat Faulty samples additionally include the corresponding nominal values for reference u_true: actuator forces without faults y_true: measured outputs without faults Textual labels for all in- and output signals as well as all faults are given in the struct labels. Each sample's textual fault label is additionally contained in its filename (between the first and second underscore).

  20. e

    Chen, C., Kyathanahally, S., Reyes, M., Merkli, S., Merz, E., Francazi, E.,...

    • opendata.eawag.ch
    • opendata-stage.eawag.ch
    Updated Nov 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Chen, C., Kyathanahally, S., Reyes, M., Merkli, S., Merz, E., Francazi, E., et al. (2024). Data for: Producing Plankton Classifiers that are Robust to Dataset Shift (Version 1.0). Eawag: Swiss Federal Institute of Aquatic Science and Technology. https://doi.org/10.25678/000C6M [Dataset]. https://opendata.eawag.ch/dataset/data-for-producing-plankton-classifiers-that-are-robust-to-dataset-shift
    Explore at:
    Dataset updated
    Nov 27, 2024
    Description

    Modern plankton high-throughput monitoring relies on deep learning classifiers for species recognition in water ecosystems. Despite satisfactory nominal performances, a significant challenge arises from the dataset shift, where performance drops during real-world deployment compared to ideal testing conditions. In our study, we integrate the ZooLake dataset, which consists of dark-field images of lake plankton, with manually-annotated images from 10 independent days of deployment, serving as test cells to benchmark out-of-dataset (OOD) performances. Our analysis reveals instances where classifiers, initially performing well in ideal conditions, encounter notable failures in real-world scenarios. For example, a MobileNet with a 92% nominal test accuracy shows a 77% OOD accuracy. We systematically investigate conditions leading to OOD performance drops and propose a preemptive assessment method to identify potential pitfalls when classifying new data, and pinpoint features in OOD images that adversely impact classification. We present a three-step pipeline: (i) identifying OOD degradation compared to nominal test performance, (ii) conducting a diagnostic analysis of degradation causes, and (iii) providing solutions. We find that ensembles of BEiT vision transformers, with targeted augmentations addressing OOD robustness, geometric ensembling, and rotation-based test-time augmentation, constitute the most robust model. It achieves an 83% OOD accuracy, with errors concentrated on container classes. Moreover, it exhibits lower sensitivity to dataset shift, and reproduces well the plankton abundances. Our proposed pipeline is applicable to generic plankton classifiers, contingent on the availability of suitable test cells. Implementation of this pipeline is anticipated to usher in a new era of robust classifiers, resilient to dataset shift, and capable of delivering reliable plankton abundance data. By identifying critical shortcomings and offering practical procedures to fortify models against dataset shift, our study contributes to the development of more reliable plankton classification technologies.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
Organization logo

Market Basket Analysis

Analyzing Consumer Behaviour Using MBA Association Rule Mining

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
zip(23875170 bytes)Available download formats
Dataset updated
Dec 9, 2021
Authors
Aslan Ahmedov
Description

Market Basket Analysis

Market basket analysis with Apriori algorithm

The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

Introduction

Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

An Example of Association Rules

Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Strategy

  • Data Import
  • Data Understanding and Exploration
  • Transformation of the data – so that is ready to be consumed by the association rules algorithm
  • Running association rules
  • Exploring the rules generated
  • Filtering the generated rules
  • Visualization of Rule

Dataset Description

  • File name: Assignment-1_Data
  • List name: retaildata
  • File format: . xlsx
  • Number of Row: 522065
  • Number of Attributes: 7

    • BillNo: 6-digit number assigned to each transaction. Nominal.
    • Itemname: Product name. Nominal.
    • Quantity: The quantities of each product per transaction. Numeric.
    • Date: The day and time when each transaction was generated. Numeric.
    • Price: Product price. Numeric.
    • CustomerID: 5-digit number assigned to each customer. Nominal.
    • Country: Name of the country where each customer resides. Nominal.

imagehttps://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

Libraries in R

First, we need to load required libraries. Shortly I describe all libraries.

  • arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
  • arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
  • tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
  • readxl - Read Excel Files in R.
  • plyr - Tools for Splitting, Applying and Combining Data.
  • ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
  • knitr - Dynamic Report generation in R.
  • magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
  • dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
  • tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

imagehttps://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

Data Pre-processing

Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

imagehttps://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> imagehttps://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

After we will clear our data frame, will remove missing values.

imagehttps://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

Search
Clear search
Close search
Google apps
Main menu