35 datasets found

Table_1_Data Mining Techniques in Analyzing Process Data: A Didactic.pdf
frontiersin.figshare.com
pdf
Updated Jun 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xin Qiao; Hong Jiao (2023). Table_1_Data Mining Techniques in Analyzing Process Data: A Didactic.pdf [Dataset]. http://doi.org/10.3389/fpsyg.2018.02231.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fpsyg.2018.02231.s001
Dataset updated
Jun 7, 2023
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Xin Qiao; Hong Jiao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Due to increasing use of technology-enhanced educational assessment, data mining methods have been explored to analyse process data in log files from such assessment. However, most studies were limited to one data mining technique under one specific scenario. The current study demonstrates the usage of four frequently used supervised techniques, including Classification and Regression Trees (CART), gradient boosting, random forest, support vector machine (SVM), and two unsupervised methods, Self-organizing Map (SOM) and k-means, fitted to one assessment data. The USA sample (N = 426) from the 2012 Program for International Student Assessment (PISA) responding to problem-solving items is extracted to demonstrate the methods. After concrete feature generation and feature selection, classifier development procedures are implemented using the illustrated techniques. Results show satisfactory classification accuracy for all the techniques. Suggestions for the selection of classifiers are presented based on the research questions, the interpretability and the simplicity of the classifiers. Interpretations for the results from both supervised and unsupervised learning methods are provided.
d
Data Mining in Systems Health Management
catalog.data.gov
s.cnmilf.com
+1more
Updated Apr 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Data Mining in Systems Health Management [Dataset]. https://catalog.data.gov/dataset/data-mining-in-systems-health-management
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description
This chapter presents theoretical and practical aspects associated to the implementation of a combined model-based/data-driven approach for failure prognostics based on particle filtering algorithms, in which the current esti- mate of the state PDF is used to determine the operating condition of the system and predict the progression of a fault indicator, given a dynamic state model and a set of process measurements. In this approach, the task of es- timating the current value of the fault indicator, as well as other important changing parameters in the environment, involves two basic steps: the predic- tion step, based on the process model, and an update step, which incorporates the new measurement into the a priori state estimate. This framework allows to estimate of the probability of failure at future time instants (RUL PDF) in real-time, providing information about time-to- failure (TTF) expectations, statistical confidence intervals, long-term predic- tions; using for this purpose empirical knowledge about critical conditions for the system (also referred to as the hazard zones). This information is of paramount significance for the improvement of the system reliability and cost-effective operation of critical assets, as it has been shown in a case study where feedback correction strategies (based on uncertainty measures) have been implemented to lengthen the RUL of a rotorcraft transmission system with propagating fatigue cracks on a critical component. Although the feed- back loop is implemented using simple linear relationships, it is helpful to provide a quick insight into the manner that the system reacts to changes on its input signals, in terms of its predicted RUL. The method is able to manage non-Gaussian pdf’s since it includes concepts such as nonlinear state estimation and confidence intervals in its formulation. Real data from a fault seeded test showed that the proposed framework was able to anticipate modifications on the system input to lengthen its RUL. Results of this test indicate that the method was able to successfully suggest the correction that the system required. In this sense, future work will be focused on the development and testing of similar strategies using different input-output uncertainty metrics.
Data Mining in Systems Health Management - Dataset - NASA Open Data Portal
data.nasa.gov
Updated Mar 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). Data Mining in Systems Health Management - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/data-mining-in-systems-health-management
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
This chapter presents theoretical and practical aspects associated to the implementation of a combined model-based/data-driven approach for failure prognostics based on particle filtering algorithms, in which the current esti- mate of the state PDF is used to determine the operating condition of the system and predict the progression of a fault indicator, given a dynamic state model and a set of process measurements. In this approach, the task of es- timating the current value of the fault indicator, as well as other important changing parameters in the environment, involves two basic steps: the predic- tion step, based on the process model, and an update step, which incorporates the new measurement into the a priori state estimate. This framework allows to estimate of the probability of failure at future time instants (RUL PDF) in real-time, providing information about time-to- failure (TTF) expectations, statistical confidence intervals, long-term predic- tions; using for this purpose empirical knowledge about critical conditions for the system (also referred to as the hazard zones). This information is of paramount significance for the improvement of the system reliability and cost-effective operation of critical assets, as it has been shown in a case study where feedback correction strategies (based on uncertainty measures) have been implemented to lengthen the RUL of a rotorcraft transmission system with propagating fatigue cracks on a critical component. Although the feed- back loop is implemented using simple linear relationships, it is helpful to provide a quick insight into the manner that the system reacts to changes on its input signals, in terms of its predicted RUL. The method is able to manage non-Gaussian pdf’s since it includes concepts such as nonlinear state estimation and confidence intervals in its formulation. Real data from a fault seeded test showed that the proposed framework was able to anticipate modifications on the system input to lengthen its RUL. Results of this test indicate that the method was able to successfully suggest the correction that the system required. In this sense, future work will be focused on the development and testing of similar strategies using different input-output uncertainty metrics.
f
Experimental data for "Software Data Analytics: Architectural Model...
figshare.com
zip
Updated Jun 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cong Liu (2023). Experimental data for "Software Data Analytics: Architectural Model Discovery and Design Pattern Detection" [Dataset]. http://doi.org/10.4121/uuid:ca1b0690-d9c5-4626-a067-525ec9d5881b
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/uuid:ca1b0690-d9c5-4626-a067-525ec9d5881b
Dataset updated
Jun 6, 2023
Dataset provided by
4TU.ResearchData
Authors
Cong Liu
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset includes all experimental data used for the PhD thesis of Cong Liu, entitled "Software Data Analytics: Architectural Model Discovery and Design Pattern Detection". These data are generated by instrumenting both synthetic and real-life software systems, and are formated according to the IEEE XES format. See http://www.xes-standard.org/ and https://www.win.tue.nl/ieeetfpm/lib/exe/fetch.php?media=shared:downloads:2017-06-22-xes-software-event-v5-2.pdf for more explanations.
DataSheet1_Outlier detection using iterative adaptive mini-minimum spanning...
frontiersin.figshare.com
datasetcatalog.nlm.nih.gov
pdf
Updated Oct 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jia Li; Jiangwei Li; Chenxu Wang; Fons J. Verbeek; Tanja Schultz; Hui Liu (2023). DataSheet1_Outlier detection using iterative adaptive mini-minimum spanning tree generation with applications on medical data.pdf [Dataset]. http://doi.org/10.3389/fphys.2023.1233341.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fphys.2023.1233341.s001
Dataset updated
Oct 13, 2023
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Jia Li; Jiangwei Li; Chenxu Wang; Fons J. Verbeek; Tanja Schultz; Hui Liu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
As an important technique for data pre-processing, outlier detection plays a crucial role in various real applications and has gained substantial attention, especially in medical fields. Despite the importance of outlier detection, many existing methods are vulnerable to the distribution of outliers and require prior knowledge, such as the outlier proportion. To address this problem to some extent, this article proposes an adaptive mini-minimum spanning tree-based outlier detection (MMOD) method, which utilizes a novel distance measure by scaling the Euclidean distance. For datasets containing different densities and taking on different shapes, our method can identify outliers without prior knowledge of outlier percentages. The results on both real-world medical data corpora and intuitive synthetic datasets demonstrate the effectiveness of the proposed method compared to state-of-the-art methods.
Data Science Platform Market Analysis, Size, and Forecast 2025-2029: North...
technavio.com
pdf
Updated Feb 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Technavio (2025). Data Science Platform Market Analysis, Size, and Forecast 2025-2029: North America (US and Canada), Europe (France, Germany, UK), APAC (China, India, Japan), South America (Brazil), and Middle East and Africa (UAE) [Dataset]. https://www.technavio.com/report/data-science-platform-market-industry-analysis
Explore at:
pdfAvailable download formats
Dataset updated
Feb 8, 2025
Dataset provided by
TechNavio
Authors
Technavio
License
https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Time period covered
2025 - 2029
Area covered
United States
Description
Snapshot img

Data Science Platform Market Size 2025-2029

The data science platform market size is valued to increase USD 763.9 million, at a CAGR of 40.2% from 2024 to 2029. Integration of AI and ML technologies with data science platforms will drive the data science platform market.

Major Market Trends & Insights

North America dominated the market and accounted for a 48% growth during the forecast period. By Deployment - On-premises segment was valued at USD 38.70 million in 2023 By Component - Platform segment accounted for the largest market revenue share in 2023

Market Size & Forecast

Market Opportunities: USD 1.00 million Market Future Opportunities: USD 763.90 million CAGR : 40.2% North America: Largest market in 2023

Market Summary

The market represents a dynamic and continually evolving landscape, underpinned by advancements in core technologies and applications. Key technologies, such as machine learning and artificial intelligence, are increasingly integrated into data science platforms to enhance predictive analytics and automate data processing. Additionally, the emergence of containerization and microservices in data science platforms enables greater flexibility and scalability. However, the market also faces challenges, including data privacy and security risks, which necessitate robust compliance with regulations. According to recent estimates, the market is expected to account for over 30% of the overall big data analytics market by 2025, underscoring its growing importance in the data-driven business landscape.

What will be the Size of the Data Science Platform Market during the forecast period?

Get Key Insights on Market Forecast (PDF) Request Free Sample

How is the Data Science Platform Market Segmented and what are the key trends of market segmentation?

The data science platform industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

Deployment On-premises Cloud Component Platform Services End-user BFSI Retail and e-commerce Manufacturing Media and entertainment Others Sector Large enterprises SMEs Application Data Preparation Data Visualization Machine Learning Predictive Analytics Data Governance Others Geography North America US Canada Europe France Germany UK Middle East and Africa UAE APAC China India Japan South America Brazil Rest of World (ROW)

By Deployment Insights

The on-premises segment is estimated to witness significant growth during the forecast period.

In the dynamic and evolving the market, big data processing is a key focus, enabling advanced model accuracy metrics through various data mining methods. Distributed computing and algorithm optimization are integral components, ensuring efficient handling of large datasets. Data governance policies are crucial for managing data security protocols and ensuring data lineage tracking. Software development kits, model versioning, and anomaly detection systems facilitate seamless development, deployment, and monitoring of predictive modeling techniques, including machine learning algorithms, regression analysis, and statistical modeling. Real-time data streaming and parallelized algorithms enable real-time insights, while predictive modeling techniques and machine learning algorithms drive business intelligence and decision-making.

Cloud computing infrastructure, data visualization tools, high-performance computing, and database management systems support scalable data solutions and efficient data warehousing. ETL processes and data integration pipelines ensure data quality assessment and feature engineering techniques. Clustering techniques and natural language processing are essential for advanced data analysis. The market is witnessing significant growth, with adoption increasing by 18.7% in the past year, and industry experts anticipate a further expansion of 21.6% in the upcoming period. Companies across various sectors are recognizing the potential of data science platforms, leading to a surge in demand for scalable, secure, and efficient solutions.

API integration services and deep learning frameworks are gaining traction, offering advanced capabilities and seamless integration with existing systems. Data security protocols and model explainability methods are becoming increasingly important, ensuring transparency and trust in data-driven decision-making. The market is expected to continue unfolding, with ongoing advancements in technology and evolving business needs shaping its future trajectory.

Request Free Sample

The On-premises segment was valued at USD 38.70 million in 2019 and showed
Data from: Wine Quality
kaggle.com
tensorflow.org
zip
Updated Oct 29, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel S. Panizzo (2017). Wine Quality [Dataset]. https://www.kaggle.com/datasets/danielpanizzo/wine-quality
Explore at:
zip(111077 bytes)Available download formats
Dataset updated
Oct 29, 2017
Authors
Daniel S. Panizzo
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Citation Request: This dataset is public available for research. The details are described in [Cortez et al., 2009]. Please include this citation if you plan to use this database:

P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236.

Available at: [@Elsevier] http://dx.doi.org/10.1016/j.dss.2009.05.016 [Pre-press (pdf)] http://www3.dsi.uminho.pt/pcortez/winequality09.pdf [bib] http://www3.dsi.uminho.pt/pcortez/dss09.bib

Title: Wine Quality

Sources Created by: Paulo Cortez (Univ. Minho), Antonio Cerdeira, Fernando Almeida, Telmo Matos and Jose Reis (CVRVV) @ 2009

Past Usage:

P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236.

In the above reference, two datasets were created, using red and white wine samples. The inputs include objective tests (e.g. PH values) and the output is based on sensory data (median of at least 3 evaluations made by wine experts). Each expert graded the wine quality between 0 (very bad) and 10 (very excellent). Several data mining methods were applied to model these datasets under a regression approach. The support vector machine model achieved the best results. Several metrics were computed: MAD, confusion matrix for a fixed error tolerance (T), etc. Also, we plot the relative importances of the input variables (as measured by a sensitivity analysis procedure).

Relevant Information:

The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine. For more details, consult: http://www.vinhoverde.pt/en/ or the reference [Cortez et al., 2009]. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).

These datasets can be viewed as classification or regression tasks. The classes are ordered and not balanced (e.g. there are munch more normal wines than excellent or poor ones). Outlier detection algorithms could be used to detect the few excellent or poor wines. Also, we are not sure if all input variables are relevant. So it could be interesting to test feature selection methods.

Number of Instances: red wine - 1599; white wine - 4898.

Number of Attributes: 11 + output attribute

Note: several of the attributes may be correlated, thus it makes sense to apply some sort of feature selection.

Attribute information:

For more information, read [Cortez et al., 2009].

Input variables (based on physicochemical tests): 1 - fixed acidity (tartaric acid - g / dm^3) 2 - volatile acidity (acetic acid - g / dm^3) 3 - citric acid (g / dm^3) 4 - residual sugar (g / dm^3) 5 - chlorides (sodium chloride - g / dm^3 6 - free sulfur dioxide (mg / dm^3) 7 - total sulfur dioxide (mg / dm^3) 8 - density (g / cm^3) 9 - pH 10 - sulphates (potassium sulphate - g / dm3) 11 - alcohol (% by volume) Output variable (based on sensory data): 12 - quality (score between 0 and 10)

Missing Attribute Values: None

Description of attributes:

1 - fixed acidity: most acids involved with wine or fixed or nonvolatile (do not evaporate readily)

2 - volatile acidity: the amount of acetic acid in wine, which at too high of levels can lead to an unpleasant, vinegar taste

3 - citric acid: found in small quantities, citric acid can add 'freshness' and flavor to wines

4 - residual sugar: the amount of sugar remaining after fermentation stops, it's rare to find wines with less than 1 gram/liter and wines with greater than 45 grams/liter are considered sweet

5 - chlorides: the amount of salt in the wine

6 - free sulfur dioxide: the free form of SO2 exists in equilibrium between molecular SO2 (as a dissolved gas) and bisulfite ion; it prevents microbial growth and the oxidation of wine

7 - total sulfur dioxide: amount of free and bound forms of S02; in low concentrations, SO2 is mostly undetectable in wine, but at free SO2 concentrations over 50 ppm, SO2 becomes evident in the nose and taste of wine

8 - density: the density of water is close to that of water depending on the percent alcohol and sugar content

9 - pH: describes how acidic or basic a wine is on a scale from 0 (very acidic) to 14 (very basic); most wines are between 3-4 on the pH scale

10 - sulphates: a wine additive which can contribute to sulfur dioxide gas (S02) levels, wich acts as an antimicrobial and antioxidant

11 - alcohol: the percent alcohol content of the wine

Output variable (based on sensory data): 12 - quality (score between 0 and 10)
fdata-01-00003_An Application of Data Mining Techniques to Explore...
frontiersin.figshare.com
pdf
Updated May 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elizabeth Harrison; Caitlin Dreisbach; Nada Basit; Jessica Keim-Malpass (2023). fdata-01-00003_An Application of Data Mining Techniques to Explore Congressional Lobbying Records for Patterns in Pediatric Special Interest Expenditures Prior to the Affordable Care Act.pdf [Dataset]. http://doi.org/10.3389/fdata.2018.00003.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fdata.2018.00003.s001
Dataset updated
May 31, 2023
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Elizabeth Harrison; Caitlin Dreisbach; Nada Basit; Jessica Keim-Malpass
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The full text of this article can be freely accessed on the publisher's website.
u
Data from: Dataset for Collective Intelligence Architecture for IoT Using...
portalcientifico.universidadeuropea.com
produccioncientifica.uca.es
Updated 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rosa-Bilbao, Jesús; Reina Quintero, Antonia M.; Varela-Vaca, Angel Jesus; Gómez-López, María Teresa; Rosa-Bilbao, Jesús; Reina Quintero, Antonia M.; Varela-Vaca, Angel Jesus; Gómez-López, María Teresa (2025). Dataset for Collective Intelligence Architecture for IoT Using Federated Process Mining [Dataset]. https://portalcientifico.universidadeuropea.com/documentos/67bc32b6478fbf5d29390c94
Explore at:
Dataset updated
2025
Authors
Rosa-Bilbao, Jesús; Reina Quintero, Antonia M.; Varela-Vaca, Angel Jesus; Gómez-López, María Teresa; Rosa-Bilbao, Jesús; Reina Quintero, Antonia M.; Varela-Vaca, Angel Jesus; Gómez-López, María Teresa
Description
This dataset contains the key elements used in the paper Collective Intelligence Architecture for IoT Using Federated Process Mining which range from complex event processing to process mining applied over multiple datasets. The information included is organized into the following sections:

1.- CEPApp.siddhi: It contains the rules and configurations used for pattern detection and real-time event processing.

2.- ProcessStorage.sol: Smart contract code used in the case study implemented on solidity using Polygon blockchain platform.

3.- Datasets Used ({adlinterweave_dataset, adlmr_dataset, twor_dataset}.zip): Three datasets used in the study, each with events that have been processed using the CEP engine. The datasets are divided according to the rooms of the house:

_room.csv: CSV file with the data related to the interactions of the room stay.

_bathroom.csv: CSV file with the data related to the interactions of the bathroom stay.

_other.csv: CSV file with the data related to the interactions of the rest of the rooms.

4.- CEP Engine Processing Results ({cepresult_adlinterweave, cepresult_adlmr, cepresult_twor}.json): Output generated by the Siddhi CEP engine, stored in JSON format. The data is categorized into different files based on the type of detected activity:

_room.json: Contains the events related to the stay in the room.

_bathroom.json: Contains the events related to the bathing stay.

_other.json: Contains the events related to the rest of the rooms.

5.- Federated Event Logs ({xesresult_adlinterweave, xesresult_adlmr, xesresult_twor}.xes): Federated event logs in XES format, standard in process mining. Contains event traces obtained after the execution of the Event Log Integrator.

6.- Process Mining Results: Models generated from the processed event logs:

Process Trees ({procestree_adlinterweave, procestree_adlmr, procestree_twor}.svg): structured representation of the detected workflows.

Petri Nets ({petrinet_adlinterweave, petrinet_adlmr, petrinet_twor}.svg): Mathematical model of the discovered processes, useful for compliance analysis and simulations.

Disco Results ({disco_adlinterweave, disco_adlmr, disco_twor}.pdf): Process models discovered with the Disco tool.

ProM Results ({prom_adlinterweave, prom_adlmr, prom_twor}.pdf): Models generated with ProM tool.
Company Documents Dataset
kaggle.com
zip
Updated May 23, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ayoub Cherguelaine (2024). Company Documents Dataset [Dataset]. https://www.kaggle.com/datasets/ayoubcherguelaine/company-documents-dataset
Explore at:
zip(9789538 bytes)Available download formats
Dataset updated
May 23, 2024
Authors
Ayoub Cherguelaine
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Overview

This dataset contains a collection of over 2,000 company documents, categorized into four main types: invoices, inventory reports, purchase orders, and shipping orders. Each document is provided in PDF format, accompanied by a CSV file that includes the text extracted from these documents, their respective labels, and the word count of each document. This dataset is ideal for various natural language processing (NLP) tasks, including text classification, information extraction, and document clustering.

Dataset Content

PDF Documents: The dataset includes 2,677 PDF files, each representing a unique company document. These documents are derived from the Northwind dataset, which is commonly used for demonstrating database functionalities.

The document types are:

Invoices: Detailed records of transactions between a buyer and a seller.

Inventory Reports: Records of inventory levels, including items in stock and units sold.

Purchase Orders: Requests made by a buyer to a seller to purchase products or services.

Shipping Orders: Instructions for the delivery of goods to specified recipients.

Example Entries

Here are a few example entries from the CSV file:

Shipping Order:

Order ID: 10718

Shipping Details: "Ship Name: Königlich Essen, Ship Address: Maubelstr. 90, Ship City: ..."

Word Count: 120

Invoice:

Order ID: 10707

Customer Details: "Customer ID: Arout, Order Date: 2017-10-16, Contact Name: Th..."

Word Count: 66

Purchase Order:

Order ID: 10892

Order Details: "Order Date: 2018-02-17, Customer Name: Catherine Dewey, Products: Product ..."

Word Count: 26

Applications

This dataset can be used for:

Text Classification: Train models to classify documents into their respective categories.

Information Extraction: Extract specific fields and details from the documents.

Document Clustering: Group similar documents together based on their content.

OCR and Text Mining: Improve OCR (Optical Character Recognition) models and text mining techniques using real-world data.
f
DataSheet1_Data mining for prediction and interpretation of bacterial...
frontiersin.figshare.com
pdf
Updated Jun 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Junpei Hosoe; Junya Sunagawa; Shinji Nakaoka; Shige Koseki; Kento Koyama (2023). DataSheet1_Data mining for prediction and interpretation of bacterial population behavior in food.pdf [Dataset]. http://doi.org/10.3389/frfst.2022.979028.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/frfst.2022.979028.s001
Dataset updated
Jun 21, 2023
Dataset provided by
Frontiers
Authors
Junpei Hosoe; Junya Sunagawa; Shinji Nakaoka; Shige Koseki; Kento Koyama
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Although bacterial population behavior has been investigated in a variety of foods in the past 40 years, it is difficult to obtain desired information from the mere juxtaposition of experimental data. We predicted the changes in the number of bacteria and visualize the effects of pH, aw, and temperature using a data mining approach. Population growth and inactivation data on eight pathogenic and food spoilage bacteria under 5,025 environmental conditions were obtained from the ComBase database (www.combase.cc), including 15 food categories, and temperatures ranging from 0°C to 25°C. The eXtreme gradient boosting tree was used to predict population behavior. The root mean square error of the observed and predicted values was 1.23 log CFU/g. The data mining model extracted the growth inhibition for the investigated bacteria against aw, temperature, and pH using the SHapley Additive eXplanations value. A data mining approach provides information concerning bacterial population behavior and how food ecosystems affect bacterial growth and inactivation.
COVID-19 Open Research Dataset (CORD-19) 🙄 ❤️😃
kaggle.com
zip
Updated Mar 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qusay AL-Btoush (2022). COVID-19 Open Research Dataset (CORD-19) 🙄 ❤️😃 [Dataset]. https://www.kaggle.com/datasets/qusaybtoush1990/covid19-open-research-dataset-cord19
Explore at:
zip(15862822 bytes)Available download formats
Dataset updated
Mar 7, 2022
Authors
Qusay AL-Btoush
Description
COVID-19 Open Research Dataset (CORD-19) 🙄 😃🙄 ❤️😃🙄 😃

The COVID-19 Open Research Dataset is “a free resource of over 29,000 scholarly articles 🤝😎😎🤝

DESCRIPTION❤️❤️

About This Data ❤️❤️

Description: 😃😃

The COVID-19 Open Research Dataset is “a free resource of over 29,000 scholarly articles, including over 13,000 with full text, about COVID-19 and the coronavirus family of viruses for use by the global research community.”

in-the-news: On March 16, 2020, the White House issued a “call to action to the tech community” regarding the dataset, asking experts “to develop new text and data mining techniques that can help the science community answer high-priority scientific questions related to COVID-19.”

Included in this dataset:

Commercial use subset (includes PMC content) -- 9000 papers, 186Mb Non-commercial use subset (includes PMC content) -- 1973 papers, 36Mb PMC custom license subset -- 1426 papers, 19Mb bioRxiv/medRxiv subset (pre-prints that are not peer reviewed) -- 803 papers, 13Mb Each paper is represented as a single JSON object. The schema is available here.

We also provide a comprehensive metadata file of 29,000 coronavirus and COVID-19 research articles with links to PubMed, Microsoft Academic and the WHO COVID-19 database of publications (includes articles without open access full text):

Metadata file (readme) -- 47Mb Source: https://pages.semanticscholar.org/coronavirus-research Updated: Weekly License: https://data.world/kgarrett/covid-19-open-research-dataset/workspace/file?filename=COVID.DATA.LIC.AGMT.pdf

Note😃😃😃😃

This data is for training how using data analysis 🤝🎉

Please appreciate the effort with an upvote 👍 😃😃

Thank You ❤️❤️❤️
Human Activity Recognition WISDM Lab dataset
kaggle.com
zip
Updated Jul 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jiashuo Wang (2024). Human Activity Recognition WISDM Lab dataset [Dataset]. https://www.kaggle.com/datasets/wangboluo/mcm2024
Explore at:
zip(10311997 bytes)Available download formats
Dataset updated
Jul 16, 2024
Authors
Jiashuo Wang
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Data Information: WISDM (WIireless Sensor Data Mining) smart phone-based sensor , collecting data from 36 different users in six different activities.

Number of examples: 1,098,207

Number of attributes: 6

Missing attribute values: None

Data processing:

1.Replace the nanoseconds with seconds in the timestamp column, and remove the user column, because each user will perform the same action.

2.Use the sliding window method to transform the data into sequences, and then split each label into training and testing sets, ensuring each label has 8:2 ratio in both the training and testing sets.

3.Shuffle the order of the labels in both training and testing sets and interleave them to prevent two sequences with the same label from being consecutively lined up.

Activity:

0 = Downstairs 100,427 (9.1%)

1 = Jogging 342,177 (31.2%)

2 = Sitting 59,939 (5.5%)

3 = Standing 48,395 (4.4%)

4 = Upstair 122,869 (11.2%)

5 = Walking 424,400 (38.6%)

Resource:

The dataset are collected by WISDM Lab [https://www.cis.fordham.edu/wisdm/dataset.php]

Jeffrey W. Lockhart, Gary M. Weiss, Jack C. Xue, Shaun T. Gallagher, Andrew B. Grosner, and Tony T. Pulickal (2011). "Design Considerations for the WISDM Smart Phone-Based Sensor Mining Architecture," Proceedings of the Fifth International Workshop on Knowledge Discovery from Sensor Data (at KDD-11), San Diego, CA. [https://www.cis.fordham.edu/wisdm/includes/files/Lockhart-Design-SensorKDD11.pdf]
f
Mapping the yearly extent of surface coal mining in Central Appalachia using...
datasetcatalog.nlm.nih.gov
figshare.com
Updated May 14, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Clinton, Nicholas E.; Campagna, David J.; Bernhardt, Emily S.; Thomas, Christian J.; Ross, Matthew R. V.; Pericak, Andrew A.; Wasson, Matthew F.; Amos, John F.; Franklin, Yolandita; Kroodsma, David A. (2018). Mapping the yearly extent of surface coal mining in Central Appalachia using Landsat and Google Earth Engine — Most Recent Mining Year (GeoTIFF) [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000731546
Explore at:
Dataset updated
May 14, 2018
Authors
Clinton, Nicholas E.; Campagna, David J.; Bernhardt, Emily S.; Thomas, Christian J.; Ross, Matthew R. V.; Pericak, Andrew A.; Wasson, Matthew F.; Amos, John F.; Franklin, Yolandita; Kroodsma, David A.
Area covered
Appalachia
Description
These data accompany the 2018 manuscript published in PLOS One titled "Mapping the yearly extent of surface coal mining in Central Appalachia using Landsat and Google Earth Engine". In this manuscript, researchers used the Google Earth Engine platform and freely-accessible Landsat imagery to create a yearly dataset (1985 through 2015) of surface coal mining in the Appalachian region of the United States of America.This specific dataset is a GeoTIFF file depicting when an area was most recently mined, from the period 1985 through 2015. The raster values depict the year that mining was most recently detected by the paper's processing model. A year of "1984" indicates mining that likely was most recently mined at some point prior to 1985. These pre-1985 mining data are derived from a prior study; see https://skytruth.org/wp/wp-content/uploads/2017/03/SkyTruth-MTR-methodology.pdf for more information. This dataset does not indicate for how long an area was a mine or when mining began in a given area.
f
DataSheet_5_Uncovering Transcriptional Regulators and Targets of sRNAs Using...
frontiersin.figshare.com
pdf
Updated Jun 6, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mia K. Mihailovic; Alyssa M. Ekdahl; Angela Chen; Abigail N. Leistra; Bridget Li; Javier González Martínez; Matthew Law; Cindy Ejindu; Éric Massé; Peter L. Freddolino; Lydia M. Contreras (2023). DataSheet_5_Uncovering Transcriptional Regulators and Targets of sRNAs Using an Integrative Data-Mining Approach: H-NS-Regulated RseX as a Case Study.pdf [Dataset]. http://doi.org/10.3389/fcimb.2021.696533.s005
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fcimb.2021.696533.s005
Dataset updated
Jun 6, 2023
Dataset provided by
Frontiers
Authors
Mia K. Mihailovic; Alyssa M. Ekdahl; Angela Chen; Abigail N. Leistra; Bridget Li; Javier González Martínez; Matthew Law; Cindy Ejindu; Éric Massé; Peter L. Freddolino; Lydia M. Contreras
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Bacterial small RNAs (sRNAs) play a vital role in pathogenesis by enabling rapid, efficient networks of gene attenuation during infection. In recent decades, there has been a surge in the number of proposed and biochemically-confirmed sRNAs in both Gram-positive and Gram-negative pathogens. However, limited homology, network complexity, and condition specificity of sRNA has stunted complete characterization of the activity and regulation of these RNA regulators. To streamline the discovery of the expression of sRNAs, and their post-transcriptional activities, we propose an integrative in vivo data-mining approach that couples DNA protein occupancy, RNA-seq, and RNA accessibility data with motif identification and target prediction algorithms. We benchmark the approach against a subset of well-characterized E. coli sRNAs for which a degree of in vivo transcriptional regulation and post-transcriptional activity has been previously reported, finding support for known regulation in a large proportion of this sRNA set. We showcase the abilities of our method to expand understanding of sRNA RseX, a known envelope stress-linked sRNA for which a cellular role has been elusive due to a lack of native expression detection. Using the presented approach, we identify a small set of putative RseX regulators and targets for experimental investigation. These findings have allowed us to confirm native RseX expression under conditions that eliminate H-NS repression as well as uncover a post-transcriptional role of RseX in fimbrial regulation. Beyond RseX, we uncover 163 putative regulatory DNA-binding protein sites, corresponding to regulation of 62 sRNAs, that could lead to new understanding of sRNA transcription regulation. For 32 sRNAs, we also propose a subset of top targets filtered by engagement of regions that exhibit binding site accessibility behavior in vivo. We broadly anticipate that the proposed approach will be useful for sRNA-reliant network characterization in bacteria. Such investigations under pathogenesis-relevant environmental conditions will enable us to deduce complex rapid-regulation schemes that support infection.
Forest Fires Data Set
kaggle.com
zip
Updated Sep 4, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahiale Darlington (2017). Forest Fires Data Set [Dataset]. https://www.kaggle.com/elikplim/forest-fires-data-set
Explore at:
zip(7268 bytes)Available download formats
Dataset updated
Sep 4, 2017
Authors
Ahiale Darlington
Description
Source: https://archive.ics.uci.edu/ml/datasets/forest+fires

Citation Request: This dataset is public available for research. The details are described in [Cortez and Morais, 2007]. Please include this citation if you plan to use this database:

P. Cortez and A. Morais. A Data Mining Approach to Predict Forest Fires using Meteorological Data. In J. Neves, M. F. Santos and J. Machado Eds., New Trends in Artificial Intelligence, Proceedings of the 13th EPIA 2007 - Portuguese Conference on Artificial Intelligence, December, Guimaraes, Portugal, pp. 512-523, 2007. APPIA, ISBN-13 978-989-95618-0-9. Available at: http://www.dsi.uminho.pt/~pcortez/fires.pdf

Title: Forest Fires

Sources Created by: Paulo Cortez and An�bal Morais (Univ. Minho) @ 2007

Past Usage:

P. Cortez and A. Morais. A Data Mining Approach to Predict Forest Fires using Meteorological Data. In Proceedings of the 13th EPIA 2007 - Portuguese Conference on Artificial Intelligence, December, 2007. (http://www.dsi.uminho.pt/~pcortez/fires.pdf)

In the above reference, the output "area" was first transformed with a ln(x+1) function. Then, several Data Mining methods were applied. After fitting the models, the outputs were post-processed with the inverse of the ln(x+1) transform. Four different input setups were used. The experiments were conducted using a 10-fold (cross-validation) x 30 runs. Two regression metrics were measured: MAD and RMSE. A Gaussian support vector machine (SVM) fed with only 4 direct weather conditions (temp, RH, wind and rain) obtained the best MAD value: 12.71 +- 0.01 (mean and confidence interval within 95% using a t-student distribution). The best RMSE was attained by the naive mean predictor. An analysis to the regression error curve (REC) shows that the SVM model predicts more examples within a lower admitted error. In effect, the SVM model predicts better small fires, which are the majority.

Relevant Information:

This is a very difficult regression task. It can be used to test regression methods. Also, it could be used to test outlier detection methods, since it is not clear how many outliers are there. Yet, the number of examples of fires with a large burned area is very small.

Number of Instances: 517

Number of Attributes: 12 + output attribute

Note: several of the attributes may be correlated, thus it makes sense to apply some sort of feature selection.

Attribute information:

For more information, read [Cortez and Morais, 2007].

X - x-axis spatial coordinate within the Montesinho park map: 1 to 9

Y - y-axis spatial coordinate within the Montesinho park map: 2 to 9

month - month of the year: "jan" to "dec"

day - day of the week: "mon" to "sun"

FFMC - FFMC index from the FWI system: 18.7 to 96.20

DMC - DMC index from the FWI system: 1.1 to 291.3

DC - DC index from the FWI system: 7.9 to 860.6

ISI - ISI index from the FWI system: 0.0 to 56.10

temp - temperature in Celsius degrees: 2.2 to 33.30

RH - relative humidity in %: 15.0 to 100

wind - wind speed in km/h: 0.40 to 9.40

rain - outside rain in mm/m2 : 0.0 to 6.4

area - the burned area of the forest (in ha): 0.00 to 1090.84 (this output variable is very skewed towards 0.0, thus it may make sense to model with the logarithm transform).

Missing Attribute Values: None
Multi-aspect Reviews
kaggle.com
zip
Updated Oct 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmad (2023). Multi-aspect Reviews [Dataset]. https://www.kaggle.com/datasets/pypiahmad/multi-aspect-reviews
Explore at:
zip(875907419 bytes)Available download formats
Dataset updated
Oct 30, 2023
Authors
Ahmad
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
The Multi-aspect Reviews dataset primarily encompasses beer review data from RateBeer and BeerAdvocate, with a focus on multiple rated dimensions providing a comprehensive insight into sensory aspects such as taste, look, feel, and smell. This dataset facilitates the analysis of different facets of reviews, thus aiding in a deeper understanding of user preferences and product characteristics.

Basic Statistics: - RateBeer - Number of users: 40,213 - Number of items: 110,419 - Number of ratings/reviews: 2,855,232 - Timespan: Apr 2000 - Nov 2011

BeerAdvocate

Number of users: 33,387

Number of items: 66,051

Number of ratings/reviews: 1,586,259

Timespan: Jan 1998 - Nov 2011

Metadata: - Reviews: Textual reviews provided by users. - Aspect-specific ratings: Ratings on taste, look, feel, smell, and overall impression. - Product Category: Categories of beer products. - ABV (Alcohol By Volume): Indicates the alcohol content in the beer.

Examples: - RateBeer Example json { "beer/name": "John Harvards Simcoe IPA", "beer/beerId": "63836", "beer/brewerId": "8481", "beer/ABV": "5.4", "beer/style": "India Pale Ale (IPA)", "review/appearance": "4/5", "review/aroma": "6/10", "review/palate": "3/5", "review/taste": "6/10", "review/overall": "13/20", "review/time": "1157587200", "review/profileName": "hopdog", "review/text": "On tap at the Springfield, PA location. Poured a deep and cloudy orange (almost a copper) color with a small sized off white head. Aromas or oranges and all around citric. Tastes of oranges, light caramel and a very light grapefruit finish. I too would not believe the 80+ IBUs - I found this one to have a very light bitterness with a medium sweetness to it. Light lacing left on the glass." }

Download Links: - BeerAdvocate Data - RateBeer Data - Sentences with aspect labels (annotator 1) - Sentences with aspect labels (annotator 2)

Citations: - Learning attitudes and attributes from multi-aspect reviews, Julian McAuley, Jure Leskovec, Dan Jurafsky, International Conference on Data Mining (ICDM), 2012. pdf - From amateurs to connoisseurs: modeling the evolution of user expertise through online reviews, Julian McAuley, Jure Leskovec, WWW, 2013. pdf

Use Cases: 1. Aspect-Based Sentiment Analysis (ABSA): Analyzing sentiments on different aspects of beers like taste, look, feel, and smell to gain deeper insights into user preferences and opinions. 2. Recommendation Systems: Developing personalized recommendation systems that consider multiple aspects of user preferences. 3. Product Development: Utilizing the feedback on various aspects to improve the product. 4. Consumer Behavior Analysis: Studying how different aspects influence consumer choice and satisfaction. 5. Competitor Analysis: Comparing ratings on different aspects with competitors to identify strengths and weaknesses. 6. Trend Analysis: Identifying trends in consumer preferences over time across different aspects. 7. Marketing Strategies: Formulating marketing strategies based on insights drawn from aspect-based reviews. 8. Natural Language Processing (NLP): Developing and enhancing NLP models to understand and categorize multi-aspect reviews. 9. Learning User Expertise Evolution: Studying how user expertise evolves through reviews and ratings over time. 10. Training Machine Learning Models: Training supervised learning models to predict aspect-based ratings from review text.

This dataset is extremely valuable for researchers, marketers, product developers, and machine learning practitioners looking to delve into multi-dimensional review analysis and understand user-product interaction on a granular level.
m
COVID-19 Combined Data-set with Improved Measurement Errors
data.mendeley.com
Updated May 13, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Afshin Ashofteh (2020). COVID-19 Combined Data-set with Improved Measurement Errors [Dataset]. http://doi.org/10.17632/nw5m4hs3jr.3
Explore at:
Unique identifier
https://doi.org/10.17632/nw5m4hs3jr.3
Dataset updated
May 13, 2020
Authors
Afshin Ashofteh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Public health-related decision-making on policies aimed at controlling the COVID-19 pandemic outbreak depends on complex epidemiological models that are compelled to be robust and use all relevant available data. This data article provides a new combined worldwide COVID-19 dataset obtained from official data sources with improved systematic measurement errors and a dedicated dashboard for online data visualization and summary. The dataset adds new measures and attributes to the normal attributes of official data sources, such as daily mortality, and fatality rates. We used comparative statistical analysis to evaluate the measurement errors of COVID-19 official data collections from the Chinese Center for Disease Control and Prevention (Chinese CDC), World Health Organization (WHO) and European Centre for Disease Prevention and Control (ECDC). The data is collected by using text mining techniques and reviewing pdf reports, metadata, and reference data. The combined dataset includes complete spatial data such as countries area, international number of countries, Alpha-2 code, Alpha-3 code, latitude, longitude, and some additional attributes such as population. The improved dataset benefits from major corrections on the referenced data sets and official reports such as adjustments in the reporting dates, which suffered from a one to two days lag, removing negative values, detecting unreasonable changes in historical data in new reports and corrections on systematic measurement errors, which have been increasing as the pandemic outbreak spreads and more countries contribute data for the official repositories. Additionally, the root mean square error of attributes in the paired comparison of datasets was used to identify the main data problems. The data for China is presented separately and in more detail, and it has been extracted from the attached reports available on the main page of the CCDC website. This dataset is a comprehensive and reliable source of worldwide COVID-19 data that can be used in epidemiological models assessing the magnitude and timeline for confirmed cases, long-term predictions of deaths or hospital utilization, the effects of quarantine, stay-at-home orders and other social distancing measures, the pandemic’s turning point or in economic and social impact analysis, helping to inform national and local authorities on how to implement an adaptive response approach to re-opening the economy, re-open schools, alleviate business and social distancing restrictions, design economic programs or allow sports events to resume.
Automatic Identification And Data Capture Market Analysis North America,...
technavio.com
pdf
Updated Oct 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Technavio (2024). Automatic Identification And Data Capture Market Analysis North America, APAC, Europe, South America, Middle East and Africa - China, US, Japan, UK, Germany - Size and Forecast 2024-2028 [Dataset]. https://www.technavio.com/report/automatic-identification-and-data-capture-market-industry-analysis
Explore at:
pdfAvailable download formats
Dataset updated
Oct 30, 2024
Dataset provided by
TechNavio
Authors
Technavio
License
https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Time period covered
2024 - 2028
Area covered
United Kingdom, United States
Description
Snapshot img

Automatic Identification And Data Capture Market Size 2024-2028

The automatic identification and data capture market size is valued to increase by USD 21.52 billion, at a CAGR of 8.1% from 2023 to 2028. Increasing applications of RFID will drive the automatic identification and data capture market.

Market Insights

North America dominated the market and accounted for a 47% growth during the 2024-2028. By Product - RFID products segment was valued at USD 18.41 billion in 2022 By segment2 - segment2_1 segment accounted for the largest market revenue share in 2022

Market Size & Forecast

Market Opportunities: USD 79.34 million Market Future Opportunities 2023: USD 21520.40 million CAGR from 2023 to 2028 : 8.1%

Market Summary

The Automatic Identification and Data Capture (AIDC) market encompasses technologies and solutions that enable businesses to capture and process data in real time. This market is driven by the increasing adoption of RFID technology, which offers benefits such as improved supply chain visibility, inventory management, and operational efficiency. The growing popularity of smart factories, where automation and data-driven processes are integral, further fuels the demand for AIDC solutions. However, the market also faces challenges, including security concerns. With the increasing use of AIDC technologies, there is a growing need to ensure data privacy and security. This has led to the development of advanced encryption techniques and access control mechanisms to mitigate potential risks. A real-world business scenario illustrating the importance of AIDC is in the retail industry. Retailers use AIDC technologies such as RFID tags and barcode scanners to manage inventory levels, track stock movements, and optimize supply chain operations. By automating data capture processes, retailers can reduce manual errors, improve order fulfillment accuracy, and enhance the overall customer experience. Despite the challenges, the AIDC market continues to grow, driven by the need for real-time data processing and automation across various industries.

What will be the size of the Automatic Identification And Data Capture Market during the forecast period?

Get Key Insights on Market Forecast (PDF) Request Free SampleThe Automatic Identification and Data Capture (AIDC) market continues to evolve, driven by advancements in technology and increasing business demands. AIDC solutions, including barcode scanners, RFID systems, and OCR technology, enable organizations to streamline processes, enhance data accuracy, and improve operational efficiency. According to recent research, the use of RFID technology in the retail sector has surged by 25% over the past five years, underpinning its significance in inventory management and supply chain optimization. Moreover, the integration of AIDC technologies with cloud computing services and data visualization dashboards offers real-time data access and analysis, empowering businesses to make informed decisions. For instance, a manufacturing firm can leverage RFID data to monitor production lines, optimize workflows, and ensure compliance with industry regulations. AIDC systems are also instrumental in enhancing data security and privacy, with advanced encryption protocols and access control features ensuring data integrity and confidentiality. By adopting AIDC technologies, organizations can not only improve their operational efficiency but also gain a competitive edge in their respective industries.

Unpacking the Automatic Identification And Data Capture Market Landscape

The market encompasses technologies such as RFID tag identification, data stream management, and data mining techniques. These solutions enable businesses to efficiently process and analyze vast amounts of data from various sources, leading to significant improvements in data quality metrics and workflow optimization strategies. For instance, RFID implementation can result in a 30% increase in inventory accuracy, while data mining techniques can uncover hidden patterns and trends, driving ROI improvement and compliance alignment. Real-time data processing, facilitated by technologies like document understanding AI and image recognition algorithms, ensures swift decision-making and error reduction. Data capture pipelines and database management systems provide a solid foundation for data aggregation and analysis, while semantic web technologies and natural language processing enhance information retrieval and understanding. By integrating sensor data and applying machine vision systems, businesses can achieve high-throughput imaging and object detection, further enhancing their data processing capabilities.

Key Market Drivers Fueling Growth

The significant expansion of RFID (Radio-Frequency Identification) technology applications is the primary market growth catalyst. In the dyna
Africa Conflict 1997-2020
kaggle.com
zip
Updated May 16, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Massock Batalong Maurice Blaise (2021). Africa Conflict 1997-2020 [Dataset]. https://www.kaggle.com/lumierebatalong/africa-conflict-19972020
Explore at:
zip(4865809 bytes)Available download formats
Dataset updated
May 16, 2021
Authors
Massock Batalong Maurice Blaise
Area covered
Africa
Description
Context

Africa is a continent that covers 6% of the Earth's surface and 20% of the land surface. Its area is 30,415,873 km2 with the islands, making it the third largest in the world if we count America as a single continent. With more than 1.3 billion inhabitants, Africa is the second most populous continent after Asia and represents 17.2% of the world population in 2020.

Africa abounds in very varied energy sources, distributed in distinct zones: abundance of fossil fuels (gas in North Africa, oil in the Gulf of Guinea and coal in southern Africa), hydraulic basins in Central Africa, deposit uranium; solar radiation in Sahelian countries; and geothermal capacities in East Africa. Despite this, it has been a prey to conflicts (socio-political, political, social, civil war, government mismanagement, etc.) since the independence of its countries. And also a land of fierce lust for powerful countries and large multinational corporations.

Content

data is acquired by ACLED (Armed Conflict Location & Event Data) project. The ACLED project report information on the type, agents, location, date, and other characteristics of political violence events, demonstrations and select politically relevant non-violent events. Also, ACLED focuses on tracking a range of violent and non-violent actions by political agents, including governments, rebels, militias, identity groups, political parties, external actors, rioters, protesters and civilians. Africa conflict 1997-2020 datasets is one of database of the ACLED project.

For detail acleddata.com Codebook: ACLED codebook Guide User Quick Guide

Acknowledgements

Thanks to “Armed Conflict Location & Event Data Project (ACLED); https://www.acleddata.com.”

Inspiration

Can you understand how conflicts evolve in Africa from 1997 to 2020 and what link is there between the energy ressources of certain regions of Africa and conflicts? (Make your Geopolitics, Geo-economics and Geo-energy skills in practical)

Facebook

Twitter

Click to copy link

Link copied

Cite

Xin Qiao; Hong Jiao (2023). Table_1_Data Mining Techniques in Analyzing Process Data: A Didactic.pdf [Dataset]. http://doi.org/10.3389/fpsyg.2018.02231.s001

Table_1_Data Mining Techniques in Analyzing Process Data: A Didactic.pdf

Explore at:

pdfAvailable download formats

Unique identifier

https://doi.org/10.3389/fpsyg.2018.02231.s001

Dataset updated

Jun 7, 2023

Dataset provided by

Frontiers Mediahttp://www.frontiersin.org/

Authors

Xin Qiao; Hong Jiao

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Due to increasing use of technology-enhanced educational assessment, data mining methods have been explored to analyse process data in log files from such assessment. However, most studies were limited to one data mining technique under one specific scenario. The current study demonstrates the usage of four frequently used supervised techniques, including Classification and Regression Trees (CART), gradient boosting, random forest, support vector machine (SVM), and two unsupervised methods, Self-organizing Map (SOM) and k-means, fitted to one assessment data. The USA sample (N = 426) from the 2012 Program for International Student Assessment (PISA) responding to problem-solving items is extracted to demonstrate the methods. After concrete feature generation and feature selection, classifier development procedures are implemented using the illustrated techniques. Results show satisfactory classification accuracy for all the techniques. Suggestions for the selection of classifiers are presented based on the research questions, the interpretability and the simplicity of the classifiers. Interpretations for the results from both supervised and unsupervised learning methods are provided.

Clear search

Close search

Google apps

Main menu

Table_1_Data Mining Techniques in Analyzing Process Data: A Didactic.pdf

Data Mining in Systems Health Management

Data Mining in Systems Health Management - Dataset - NASA Open Data Portal

Experimental data for "Software Data Analytics: Architectural Model...

DataSheet1_Outlier detection using iterative adaptive mini-minimum spanning...

Data Science Platform Market Analysis, Size, and Forecast 2025-2029: North...

Snapshot img

Data from: Wine Quality

fdata-01-00003_An Application of Data Mining Techniques to Explore...

Data from: Dataset for Collective Intelligence Architecture for IoT Using...

Company Documents Dataset

Overview

Dataset Content

Example Entries

Shipping Order:

Invoice:

Purchase Order:

Applications

DataSheet1_Data mining for prediction and interpretation of bacterial...

COVID-19 Open Research Dataset (CORD-19) 🙄 ❤️😃

COVID-19 Open Research Dataset (CORD-19) 🙄 😃🙄 ❤️😃🙄 😃

The COVID-19 Open Research Dataset is “a free resource of over 29,000 scholarly articles 🤝😎😎🤝

DESCRIPTION❤️❤️

About This Data ❤️❤️

Description: 😃😃

Note😃😃😃😃

Thank You ❤️❤️❤️

Human Activity Recognition WISDM Lab dataset

Mapping the yearly extent of surface coal mining in Central Appalachia using...

DataSheet_5_Uncovering Transcriptional Regulators and Targets of sRNAs Using...

Forest Fires Data Set

Multi-aspect Reviews

COVID-19 Combined Data-set with Improved Measurement Errors

Automatic Identification And Data Capture Market Analysis North America,...

Snapshot img

Africa Conflict 1997-2020

Context

Content

Acknowledgements

Inspiration

Table_1_Data Mining Techniques in Analyzing Process Data: A Didactic.pdf