Distributed data mining from privacy-sensitive multi-party data is likely to play an important role in the next generation of integrated vehicle health monitoring systems. For example, consider an airline manufacturer [tex]$\mathcal{C}$[/tex] manufacturing an aircraft model [tex]$A$[/tex] and selling it to five different airline operating companies [tex]$\mathcal{V}_1 \dots \mathcal{V}_5$[/tex]. These aircrafts, during their operation, generate huge amount of data. Mining this data can reveal useful information regarding the health and operability of the aircraft which can be useful for disaster management and prediction of efficient operating regimes. Now if the manufacturer [tex]$\mathcal{C}$[/tex] wants to analyze the performance data collected from different aircrafts of model-type [tex]$A$[/tex] belonging to different airlines then central collection of data for subsequent analysis may not be an option. It should be noted that the result of this analysis may be statistically more significant if the data for aircraft model [tex]$A$[/tex] across all companies were available to [tex]$\mathcal{C}$[/tex]. The potential problems arising out of such a data mining scenario are:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparison of the running time(in ms) of the three algorithms.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Descriptions of the datasets.
http://rightsstatements.org/vocab/InC/1.0/http://rightsstatements.org/vocab/InC/1.0/
The dataset and source code for paper "Automating Intention Mining".
The code is based on dennybritz's implementation of Yoon Kim's paper Convolutional Neural Networks for Sentence Classification.
By default, the code uses Tensorflow 0.12. Some errors might be reported when using other versions of Tensorflow due to the incompatibility of some APIs.
Running 'online_prediction.py', you can input any sentence and check the classification result produced by a pre-trained CNN model. The model uses all sentences of the four Github projects as training data.
Running 'play.py', you can get the evaluation result of cross-project prediction. Please check the code for more details of the configuration. By default, it will use the four Github projects as training data to predict the sentences in DECA dataset, and in this setting, the category 'aspect evaluation' and 'others' are dropped since DECA dataset does not contain these two categories.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Iris data aggregation class effect.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Traumatic brain injury is highly prevalent in the United States. However, despite its frequency and significance, there is little understanding of how the brain responds during injurious loading. A confounding problem is that because testing conditions vary between assessment methods, brain biomechanics cannot be fully understood. Data mining techniques, which are commonly used to determine patterns in large datasets, were applied to discover how changes in testing conditions affect the mechanical response of the brain. Data at various strain rates were collected from published literature and sorted into datasets based on strain rate and tension vs. compression. Self-organizing maps were used to conduct a sensitivity analysis to rank the testing condition parameters by importance. Fuzzy C-means clustering was applied to determine if there were any patterns in the data. The parameter rankings and clustering for each dataset varied, indicating that the strain rate and type of deformation influence the role of these parameters in the datasets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the content of the subset of all files with a correct publication date from the 2017 release of files related to the JFK case (retrieved from https://www.archives.gov/research/jfk/2017-release). This content was extracted from the source PDF files using the R OCR libraries tesseract and pdftools.
The code to derive the dataset is given as follows:
### BEGIN R DATA PROCESSING SCRIPT
library(tesseract)
library(pdftools)
pdfs <- list.files("[path to your output directory containing all PDF files]")
meta <- read.csv2("[path to your input directory]/jfkrelease-2017-dce65d0ec70a54d5744de17d280f3ad2.csv",header = T,sep = ',') #the meta file containing all metadata for the PDF files (e.g. publication date)
meta$Doc.Date <- as.character(meta$Doc.Date)
meta.clean <- meta[-which(meta$Doc.Date=="" | grepl("/0000",meta$Doc.Date)),]
for(i in 1:nrow(meta.clean)){
meta.clean$Doc.Date[i] <- gsub("00","01",meta.clean$Doc.Date[i])
if(nchar(meta.clean$Doc.Date[i])<10){
meta.clean$Doc.Date[i]<-format(strptime(meta.clean$Doc.Date[i],format = "%d/%m/%y"),"%m/%d/%Y")
}
}
meta.clean$Doc.Date <- strptime(meta.clean$Doc.Date,format = "%m/%d/%Y")
meta.clean <- meta.clean[order(meta.clean$Doc.Date),]
docs <- data.frame(content=character(0),dpub=character(0),stringsAsFactors = F)
for(i in 1:nrow(meta.clean)){
#for(i in 1:3){
pdf_prop <- pdftools::pdf_info(paste0("[path to your output directory]/",tolower(meta.clean$File.Name[i])))
tmp_files <- c()
for(k in 1:pdf_prop$pages){
tmp_files <- c(tmp_files,paste0("/home/STAFF/luczakma/RProjects/JFK/data/tmp/",k))
}
img_file <- pdftools::pdf_convert(paste0("[path to your output directory]/",tolower(meta.clean$File.Name[i])), format = 'tiff', pages = NULL, dpi = 700,filenames = tmp_files)
txt <- ""
for(j in 1:length(img_file)){
extract <- ocr(img_file[j], engine = tesseract("eng"))
#unlink(img_file)
txt <- paste(txt,extract,collapse = " ")
}
docs <- rbind(docs,data.frame(content=iconv(tolower(gsub("\\s+"," ",gsub("[[:punct:]]|[
]"," ",txt))),to="UTF-8"),dpub=format(meta.clean$Doc.Date[i],"%Y/%m/%d"),stringsAsFactors = F),stringsAsFactors = F)
}
write.table(docs,"[path to your output directory]/documents.csv", row.names = F)
### END R DATA PROCESSING SCRIPT
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables relating to the recurrence of breast cancer in a dataset.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
A list of the 29 journals under the Web of Science heading "Mycology" as of November 2020
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset includes replication data for the paper: Sann, R. and Lai, P.-C. (2023), "Topic modeling of the quality of guest’s experience using latent Dirichlet allocation: western versus eastern perspectives", Consumer Behavior in Tourism and Hospitality, Vol. ahead-of-print No. ahead-of-print. https://doi.org/10.1108/CBTH-04-2022-0084
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Japan TSE: C: PB Ratio: 1st Sec: Mining data was reported at 0.400 Unit in Feb 2019. This stayed constant from the previous number of 0.400 Unit for Jan 2019. Japan TSE: C: PB Ratio: 1st Sec: Mining data is updated monthly, averaging 0.500 Unit from Jan 2013 (Median) to Feb 2019, with 74 observations. The data reached an all-time high of 14.000 Unit in Apr 2014 and a record low of 0.400 Unit in Feb 2019. Japan TSE: C: PB Ratio: 1st Sec: Mining data remains active status in CEIC and is reported by Japan Exchange Group. The data is categorized under Global Database’s Japan – Table JP.Z011: Tokyo Stock Exchange: PB Ratio.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The most important feature with the highest accuracy in the diagnosis of breast cancer.
The Preliminary Interpretive Report 2004-3B, "Bedrock geologic map of the Livengood SW C-3 and SE C-4 quadrangles, Tolovana mining district, Alaska," is the bedrock geologic map of an approximately 123-square-mile area in the central Livengood Quadrangle, Alaska.
This report presents 40Ar/39Ar step-heating geochronology results for igneous and metamorphic rocks from the eastern Moran area. Field samples were collected by the DGGS Mineral Resources section during detailed geologic mapping campaigns in 2011. The data provided in this report add significant detail to the thermal history of the Moran area. These new data indicate that the minimum age of prograde metamorphism of Ruby terrane rocks ranges from 148.5 +/- 1.7 to 140.4 +/- 1.7 Ma, and retrograde greenschist metamorphism is 122.6 +/- 2.3 Ma. The retrograde metamorphism is roughly coeval with the age of fabric development parallel to the Kaltag fault (128.3 +/- 1.7) and Tozitna thrust/detachment fault (123.2 +/- 1.5 Ma). The new data also indicate that the Melozitna pluton is composite, with a biotite cooling age of 116.5 +/- 1.3 from coarse-grained granite, while cooling ages for dikes cutting the granite range from 110.1 +/- 1.3 to 102.8 +/- 1.2 Ma. The age of mineralized veins in the area are variable and include 119.0 +/- 1.3 Ma galena veins in the Tozimoran drainage and an interpreted age of 66.5 +/- 2.6 for an auriferous vein from the Monday Creek area, which is synchronous with ages of biotite samples from granite and schist from the Ruby Mining district. The complete report and digital data are available through the DGGS website: http://doi.org/10.14509/30117.
This data file presents 40Ar/39Ar step-heating geochronology results for a granite sample from the Livengood mining district. The Livengood area is a historically productive placer mining area approximately 80 road miles north of Fairbanks, Alaska. This data is a component of a geologic map and accompanying report that synthesizes recently collected and previously published agency and industry geologic data in a 1:50,000-scale comprehensive geologic map to build a better understanding of the geology and mineral-resource potential of the Livengood area.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Japan TSE: C: Average: PE Ratio: PM: Mining data was reported at 10.200 Times in Apr 2025. This records a decrease from the previous number of 11.000 Times for Mar 2025. Japan TSE: C: Average: PE Ratio: PM: Mining data is updated monthly, averaging 9.800 Times from Apr 2022 (Median) to Apr 2025, with 37 observations. The data reached an all-time high of 35.500 Times in May 2022 and a record low of 3.700 Times in Jun 2023. Japan TSE: C: Average: PE Ratio: PM: Mining data remains active status in CEIC and is reported by Japan Exchange Group Inc.. The data is categorized under Global Database’s Japan – Table JP.Z: Tokyo Stock Exchange: Price Earnings Ratio. [COVID-19-IMPACT]
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Description: Environmental Sensor Readings from Mars Rover Prototype
Research Hypothesis A scaled-down Mars Rover prototype can effectively collect temperature and humidity data, demonstrating how real-time environmental monitoring can be used for autonomous navigation, climate analysis, and anomaly detection.
By analyzing the collected data, we aim to identify trends, evaluate sensor accuracy, and explore potential improvements in robotic exploration. This includes assessing response time, consistency, and anomalies caused by external factors like human interference or sudden environmental changes.
What the Data Shows
This dataset contains timestamped temperature and humidity readings collected at regular time intervals by the rover’s onboard DHT22 sensor. The data highlights:
- Gradual fluctuations in environmental conditions.
- Notable temperature spikes (~10°C) introduced using a lighter to test sensor response.
- Stable humidity levels with minor deviations due to air circulation or sensor drift.
Notable Findings
- Controlled Temperature Spikes: Short bursts of heat resulted in clear temperature increases (~10°C), demonstrating the sensor's ability to detect and log transient changes.
- Humidity Stability: Humidity levels remained within a narrow range, confirming minimal impact from applied temperature fluctuations.
- Gradual Environmental Variations: Small temperature and humidity shifts were observed, likely due to ambient conditions and ventilation effects.
How the Data Was Gathered
- Sensor Used: DHT22 (for temperature & humidity).
- Data Collection Frequency: Logged every few seconds.
- Controlled Testing: Heat spikes added using a lighter to simulate external interference.
- Data Transmission: Logged in real-time via wireless communication to a laptop.
How to Interpret and Use the Data
- Identify Trends: Observe temperature and humidity variations over time.
- Detect Anomalies: Locate sharp temperature spikes (~10°C increases) caused by external heating.
- Compare Sensor Performance: Evaluate how quickly temperature normalizes after a spike.
- Develop Predictive Models: Train machine learning models to predict environmental changes.
Potential Applications
- Autonomous Environment Monitoring: Detecting and responding to environmental anomalies.
- Sensor Calibration & Validation: Testing DHT22 sensor accuracy under different conditions.
- Climate Simulation & Research: Indoor climate modeling & environmental trend analysis.
- Robotics & AI: Training AI for automated responses to climate fluctuations.
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
Surficial geologic map of the Circle mining district, Alaska, Report of Investigation 95-2C, presents results from a geologic investigation of surficial deposits in parts of the Circle B-1, B-2, B-3, B-4, C-2, C-3, and C-4 quadrangles. The surficial deposits of the Circle mining district have long been sources of placer gold and even alluvial diamonds. Throughout most of the map area, the bedrock surface is blanketed by periglacial slope deposits consisting of silty rubble with angular and subangular clasts of local bedrock. This unconsolidated unsorted material was produced by intense frost shattering of bedrock and was spread downslope by mass movement processes. These processes are still active in the area today. In valley bottoms, terrace and floodplain alluvium consist of coarse, angular to subangular gravels. In upland valleys, the locally auriferous are commonly mixed with slope debris. Radiocarbon dating of organic sediments ranges in age from mid-Wisconsonian to Holocene. Although frost is discontinuous, sentiments in the valley are generally frozen and locally contain bones of large extinct vertebrae. Although landforms have been extensively modified by erosion, frost action, and mass movement, recognizable glacially modified valleys, silty gravel, and till provide evidence of at least one former glaciation. Glaciation of North Harrison Creek probably had a significant effect on the distribution of nuggets there. The complete report, geodatabase, and ESRI fonts and style files are available from the DGGS website: http://doi.org/10.14509/2516.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Update QuakeSim services to integrate and rapidly fuse data from multiple sources to support comprehensive efforts in data mining, analysis, simulation, and forecasting.
Extend QuakeSim infrastructure to include tiered publishing mechanisms and data provenance, trust, and history tracking.
Develop and deploy a Cloud Computing architecture to access and analyze large and heterogeneous data products and integrate them with earthquake models and simulations.
Distributed data mining from privacy-sensitive multi-party data is likely to play an important role in the next generation of integrated vehicle health monitoring systems. For example, consider an airline manufacturer [tex]$\mathcal{C}$[/tex] manufacturing an aircraft model [tex]$A$[/tex] and selling it to five different airline operating companies [tex]$\mathcal{V}_1 \dots \mathcal{V}_5$[/tex]. These aircrafts, during their operation, generate huge amount of data. Mining this data can reveal useful information regarding the health and operability of the aircraft which can be useful for disaster management and prediction of efficient operating regimes. Now if the manufacturer [tex]$\mathcal{C}$[/tex] wants to analyze the performance data collected from different aircrafts of model-type [tex]$A$[/tex] belonging to different airlines then central collection of data for subsequent analysis may not be an option. It should be noted that the result of this analysis may be statistically more significant if the data for aircraft model [tex]$A$[/tex] across all companies were available to [tex]$\mathcal{C}$[/tex]. The potential problems arising out of such a data mining scenario are: