100+ datasets found

w
Books on Advances in data mining and database management (ADMDM) book series...
workwithdata.com
Updated Nov 9, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2024). Books on Advances in data mining and database management (ADMDM) book series [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=j0-book_series&fop0=%3D&fval0=Advances+in+data+mining+and+database+management+%28ADMDM%29+book+series&j=1&j0=book_series
Explore at:
Dataset updated
Nov 9, 2024
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books and is filtered where the book series is Advances in data mining and database management (ADMDM) book series, featuring 9 columns including author, BNB id, book, book publisher, and book series. The preview is ordered by publication date (descending).
m
T10I4D1000K transactional database
data.mendeley.com
Updated Oct 23, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Uday kiran RAGE (2019). T10I4D1000K transactional database [Dataset]. http://doi.org/10.17632/tykb96s325.1
Explore at:
Unique identifier
https://doi.org/10.17632/tykb96s325.1
Dataset updated
Oct 23, 2019
Authors
Uday kiran RAGE
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a synthetic database widely used for evaluating the scalability of pattern mining patterns. This database is generated using IBM Data Quest generator.
f
Data_Sheet_1_BioVDB: biological vector database for high-throughput gene...
figshare.com
frontiersin.figshare.com
pdf
Updated Mar 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michał J. Winnicki; Chase A. Brown; Hunter L. Porter; Cory B. Giles; Jonathan D. Wren (2024). Data_Sheet_1_BioVDB: biological vector database for high-throughput gene expression meta-analysis.PDF [Dataset]. http://doi.org/10.3389/frai.2024.1366273.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/frai.2024.1366273.s001
Dataset updated
Mar 8, 2024
Dataset provided by
Frontiers
Authors
Michał J. Winnicki; Chase A. Brown; Hunter L. Porter; Cory B. Giles; Jonathan D. Wren
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
High-throughput sequencing has created an exponential increase in the amount of gene expression data, much of which is freely, publicly available in repositories such as NCBI's Gene Expression Omnibus (GEO). Querying this data for patterns such as similarity and distance, however, becomes increasingly challenging as the total amount of data increases. Furthermore, vectorization of the data is commonly required in Artificial Intelligence and Machine Learning (AI/ML) approaches. We present BioVDB, a vector database for storage and analysis of gene expression data, which enhances the potential for integrating biological studies with AI/ML tools. We used a previously developed approach called Automatic Label Extraction (ALE) to extract sample labels from metadata, including age, sex, and tissue/cell-line. BioVDB stores 438,562 samples from eight microarray GEO platforms. We show that it allows for efficient querying of data using similarity search, which can also be useful for identifying and inferring missing labels of samples, and for rapid similarity analysis.
Africa - PowerMining Projects Database
data.amerigeoss.org
data.subak.org
+3more
csv
Updated Jul 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
World Bank (2024). Africa - PowerMining Projects Database [Dataset]. https://data.amerigeoss.org/sr/dataset/0d8a2549-2180-400e-ad4e-12936c7436c3
Explore at:
csv(275127)Available download formats
Dataset updated
Jul 24, 2024
Dataset provided by
World Bankhttp://worldbank.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
"The Africa Power–Mining Database 2014 shows ongoing and forthcoming mining projects in Africa categorized by the type of mineral, ore grade, size of the project. The database draws on basic mining data from Infomine surveys, the United States Geological Survey, annual reports, technical reports, feasibility studies, investor presentations, sustainability reports on property-owner websites or filed in public domains, and mining websites (Mining Weekly, Mining Journal, Mbendi, Mining-technology, and Miningmx). Comprising 455 projects in 28 SSA countries with each project’s ore reserve value assessed at more than $250 million, the database collates publicly available and proprietary information. It also provides a panoramic view of projects operating in 2000–12 and anticipated demand in 2020. The analysis is presented over three timeframes: pre-2000, 2001–12, and 2020 (each containing the projects from the previous period except for those closing during that previous period)."
w
Time granularities in databases, data mining, and temporal reasoning
workwithdata.com
Updated Jan 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The citation is currently not available for this dataset.
Explore at:
Dataset updated
Jan 10, 2022
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Time granularities in databases, data mining, and temporal reasoning is a book. It was written by Claudio Bettini and published by : Springer in 2000.
Data from: MusicOSet: An Enhanced Open Dataset for Music Data Mining
zenodo.org
data.niaid.nih.gov
bin, zip
Updated Jun 7, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mariana O. Silva; Mariana O. Silva; Laís Mota; Mirella M. Moro; Mirella M. Moro; Laís Mota (2021). MusicOSet: An Enhanced Open Dataset for Music Data Mining [Dataset]. http://doi.org/10.5281/zenodo.4904639
Explore at:
zip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4904639
Dataset updated
Jun 7, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mariana O. Silva; Mariana O. Silva; Laís Mota; Mirella M. Moro; Mirella M. Moro; Laís Mota
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MusicOSet is an open and enhanced dataset of musical elements (artists, songs and albums) based on musical popularity classification. Provides a directly accessible collection of data suitable for numerous tasks in music data mining (e.g., data visualization, classification, clustering, similarity search, MIR, HSS and so forth). To create MusicOSet, the potential information sources were divided into three main categories: music popularity sources, metadata sources, and acoustic and lyrical features sources. Data from all three categories were initially collected between January and May 2019. Nevertheless, the update and enhancement of the data happened in June 2019.

The attractive features of MusicOSet include:

Integration and centralization of different musical data sources

Calculation of popularity scores and classification of hits and non-hits musical elements, varying from 1962 to 2018

Enriched metadata for music, artists, and albums from the US popular music industry

Availability of acoustic and lyrical resources

Unrestricted access in two formats: SQL database and compressed .csv files

| Data | # Records | |:-----------------:|:---------:| | Songs | 20,405 | | Artists | 11,518 | | Albums | 26,522 | | Lyrics | 19,664 | | Acoustic Features | 20,405 | | Genres | 1,561 |
f
Database after processed itemset XEDC.
plos.figshare.com
xls
Updated Feb 3, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Loan T. T. Nguyen; Hoa Duong; An Mai; Bay Vo (2025). Database after processed itemset XEDC. [Dataset]. http://doi.org/10.1371/journal.pone.0317427.t009
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0317427.t009
Dataset updated
Feb 3, 2025
Dataset provided by
PLOS ONE
Authors
Loan T. T. Nguyen; Hoa Duong; An Mai; Bay Vo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Privacy is as a critical issue in the age of data. Organizations and corporations who publicly share their data always have a major concern that their sensitive information may be leaked or extracted by rivals or attackers using data miners. High-utility itemset mining (HUIM) is an extension to frequent itemset mining (FIM) which deals with business data in the form of transaction databases, data that is also in danger of being stolen. To deal with this, a number of privacy-preserving data mining (PPDM) techniques have been introduced. An important topic in PPDM in the recent years is privacy-preserving utility mining (PPUM). The goal of PPUM is to protect the sensitive information, such as sensitive high-utility itemsets, in transaction databases, and make them undiscoverable for data mining techniques. However, available PPUM methods do not consider the generalization of items in databases (categories, classes, groups, etc.). These algorithms only consider the items at a specialized level, leaving the item combinations at a higher level vulnerable to attacks. The insights gained from higher abstraction levels are somewhat more valuable than those from lower levels since they contain the outlines of the data. To address this issue, this work suggests two PPUM algorithms, namely MLHProtector and FMLHProtector, to operate at all abstraction levels in a transaction database to protect them from data mining algorithms. Empirical experiments showed that both algorithms successfully protect the itemsets from being compromised by attackers.
Data Analytics Market By Type (Descriptive Analytics, Predictive Analytics,...
verifiedmarketresearch.com
Updated Oct 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
VERIFIED MARKET RESEARCH (2024). Data Analytics Market By Type (Descriptive Analytics, Predictive Analytics, Augmented Analytics), Solution (Data Management, Data Mining, Data Monitoring), Application (Human Resource Management, Supply Chain Management, Database Management), By Geographic Scope And Forecast & Region for 2024-2031 [Dataset]. https://www.verifiedmarketresearch.com/product/data-analytics-market/
Explore at:
Dataset updated
Oct 14, 2024
Dataset provided by
Verified Market Researchhttps://www.verifiedmarketresearch.com/
Authors
VERIFIED MARKET RESEARCH
License
https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Time period covered
2024 - 2031
Area covered
Global
Description
Data Analytics Market Valuation – 2024-2031

Data Analytics Market was valued at USD 68.83 Billion in 2024 and is projected to reach USD 482.73 Billion by 2031, growing at a CAGR of 30.41% from 2024 to 2031.

Data Analytics Market Drivers

Data Explosion: The proliferation of digital devices and the internet has led to an exponential increase in data generation. Businesses are increasingly recognizing the value of harnessing this data to gain competitive insights.

Advancements in Technology: Advancements in data storage, processing power, and analytics tools have made it easier and more cost-effective for organizations to analyze large datasets.

Increased Business Demand: Businesses across various industries are seeking data-driven insights to improve decision-making, optimize operations, and enhance customer experiences.

Data Analytics Market Restraints

Data Quality and Integrity: Ensuring the accuracy, completeness, and consistency of data is crucial for effective analytics. Poor data quality can hinder insights and lead to erroneous conclusions.

Data Privacy and Security Concerns: As organizations collect and analyze sensitive data, concerns about data privacy and security are becoming increasingly important. Breaches can have significant financial and reputational consequences.
Data from: DATA MINING THE GALAXY ZOO MERGERS
data.staging.idas-ds1.appdat.jsc.nasa.gov
data.nasa.gov
+2more
Updated Feb 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.staging.idas-ds1.appdat.jsc.nasa.gov (2025). DATA MINING THE GALAXY ZOO MERGERS [Dataset]. https://data.staging.idas-ds1.appdat.jsc.nasa.gov/dataset/data-mining-the-galaxy-zoo-mergers
Explore at:
Dataset updated
Feb 18, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
DATA MINING THE GALAXY ZOO MERGERS STEVEN BAEHR, ARUN VEDACHALAM, KIRK BORNE, AND DANIEL SPONSELLER Abstract. Collisions between pairs of galaxies usually end in the coalescence (merger) of the two galaxies. Collisions and mergers are rare phenomena, yet they may signal the ultimate fate of most galaxies, including our own Milky Way. With the onset of massive collection of astronomical data, a computerized and automated method will be necessary for identifying those colliding galaxies worthy of more detailed study. This project researches methods to accomplish that goal. Astronomical data from the Sloan Digital Sky Survey (SDSS) and human-provided classifications on merger status from the Galaxy Zoo project are combined and processed with machine learning algorithms. The goal is to determine indicators of merger status based solely on discovering those automated pipeline-generated attributes in the astronomical database that correlate most strongly with the patterns identified through visual inspection by the Galaxy Zoo volunteers. In the end, we aim to provide a new and improved automated procedure for classification of collisions and mergers in future petascale astronomical sky surveys. Both information gain analysis (via the C4.5 decision tree algorithm) and cluster analysis (via the Davies-Bouldin Index) are explored as techniques for finding the strongest correlations between human-identified patterns and existing database attributes. Galaxy attributes measured in the SDSS green waveband images are found to represent the most influential of the attributes for correct classification of collisions and mergers. Only a nominal information gain is noted in this research, however, there is a clear indication of which attributes contribute so that a direction for further study is apparent.
e
Data from: Database Management Systems (DBMS)
paper.erudition.co.in
html
Updated Mar 5, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Einetic (2025). Database Management Systems (DBMS) [Dataset]. https://paper.erudition.co.in/makaut/master-of-business-administration-2023-24/2/management-information-system
Explore at:
htmlAvailable download formats
Dataset updated
Mar 5, 2025
Dataset authored and provided by
Einetic
License
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Description
Question Paper Solutions of chapter Database Management Systems (DBMS) of Management Information System, 2nd Semester , Master of Business Administration (2023-24)
Global Wildfire Database for GWIS (2021)
doi.pangaea.de
html, tsv
Updated May 11, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tomàs Artés Vivancos (2022). Global Wildfire Database for GWIS (2021) [Dataset]. http://doi.org/10.1594/PANGAEA.943975
Explore at:
html, tsvAvailable download formats
Unique identifier
https://doi.org/10.1594/PANGAEA.943975
Dataset updated
May 11, 2022
Dataset provided by
PANGAEA
Authors
Tomàs Artés Vivancos
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 2000 - Jan 1, 2021
Variables measured
DATE/TIME, File content, Binary Object, Binary Object (File Size)
Description
Global Wildfire Database for GWIS (2021) is an individual fire event focused database. Post processing of MCD64A1 providing geometries of final fire perimeters including initial and final date and the corresponding daily active areas for each fire. This dataset is an update of the data related with GlobFire (https://doi.org/10.6084/m9.figshare.10284101). […]
A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and...
zenodo.org
data.niaid.nih.gov
+2more
csv
Updated Jul 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nirmalya Thakur; Nirmalya Thakur; Vanessa Su; Mingchen Shao; Kesha A. Patel; Hongseok Jeong; Victoria Knieling; Andrew Bian; Vanessa Su; Mingchen Shao; Kesha A. Patel; Hongseok Jeong; Victoria Knieling; Andrew Bian (2024). A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and other sources about the 2024 outbreak of Measles [Dataset]. http://doi.org/10.5281/zenodo.11711230
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11711230
Dataset updated
Jul 20, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Nirmalya Thakur; Nirmalya Thakur; Vanessa Su; Mingchen Shao; Kesha A. Patel; Hongseok Jeong; Victoria Knieling; Andrew Bian; Vanessa Su; Mingchen Shao; Kesha A. Patel; Hongseok Jeong; Victoria Knieling; Andrew Bian
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jun 15, 2024
Area covered
YouTube
Description
Please cite the following paper when using this dataset:

N. Thakur, V. Su, M. Shao, K. Patel, H. Jeong, V. Knieling, and A. Bian “A labelled dataset for sentiment analysis of videos on YouTube, TikTok, and other sources about the 2024 outbreak of measles,” Proceedings of the 26th International Conference on Human-Computer Interaction (HCII 2024), Washington, USA, 29 June - 4 July 2024. (Accepted as a Late Breaking Paper, Preprint Available at: https://doi.org/10.48550/arXiv.2406.07693)

Abstract

This dataset contains the data of 4011 videos about the ongoing outbreak of measles published on 264 websites on the internet between January 1, 2024, and May 31, 2024. These websites primarily include YouTube and TikTok, which account for 48.6% and 15.2% of the videos, respectively. The remainder of the websites include Instagram and Facebook as well as the websites of various global and local news organizations. For each of these videos, the URL of the video, title of the post, description of the post, and the date of publication of the video are presented as separate attributes in the dataset. After developing this dataset, sentiment analysis (using VADER), subjectivity analysis (using TextBlob), and fine-grain sentiment analysis (using DistilRoBERTa-base) of the video titles and video descriptions were performed. This included classifying each video title and video description into (i) one of the sentiment classes i.e. positive, negative, or neutral, (ii) one of the subjectivity classes i.e. highly opinionated, neutral opinionated, or least opinionated, and (iii) one of the fine-grain sentiment classes i.e. fear, surprise, joy, sadness, anger, disgust, or neutral. These results are presented as separate attributes in the dataset for the training and testing of machine learning algorithms for performing sentiment analysis or subjectivity analysis in this field as well as for other applications. The paper associated with this dataset (please see the above-mentioned citation) also presents a list of open research questions that may be investigated using this dataset.
More than 120,520 Verified Emails and Phone numbers of Dentists From USA |...
datarade.ai
Updated Aug 6, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DataCaptive (2021). More than 120,520 Verified Emails and Phone numbers of Dentists From USA | Dentists Data | DataCaptive [Dataset]. https://datarade.ai/data-products/more-than-120-520-verified-emails-and-phone-numbers-of-dentis-datacaptive
Explore at:
.json, .xml, .csv, .xls, .txtAvailable download formats
Dataset updated
Aug 6, 2021
Dataset authored and provided by
DataCaptive
Area covered
United States
Description
Salient Features of Dentists Email Addresses

So make sure that you don’t find excuses for failing at global marketing campaigns and in reaching targeted medical practitioners and healthcare specialists. With our Dentists Email Leads, you will seldom have a reason not to succeed! So make haste and take action today!

1.2 million phone calls per month as a part of a data verification

85% telephone and email verified Dentist Mailing Lists

Quarterly SMTP and NCOA verified to keep data fresh and active

15 million verification messages sent every month to validate email addresses

Connect with top Dentists across the US, Canada, UK, Europe, EMEA, Australia, APAC and many more countries.

egularly updated and cleansed databases to keep it free of duplicate and inaccurate data

How Can Our Dentists Data Help You to Market to Dentists?

We provide a variety of methods for marketing your dental appliances or products to the top-rated dentists in the United States. Take a glance at some of the available channels:

• Email blast • Marketing viability • Test campaigns • Direct mail • Sales leads • Drift campaigns • ABM campaigns • Product launches • B2B marketing

Data Sources

The contact details of your targeted healthcare professionals are compiled from highly credible resources like: • Websites • Medical seminars • Medical records • Trade shows • Medical conferences

What’s in for you? Over choosing us, here are a few advantages we authenticate- • Locate, target, and prospect leads from 170+ countries • Design and execute ABM and multi-channel campaigns • Seamless and smooth pre-and post-sale customer service • Connect with old leads and build a fruitful customer relationship • Analyze the market for product development and sales campaigns • Boost sales and ROI with increased customer acquisition and retention

Our security compliance

We use of globally recognized data laws like –

GDPR, CCPA, ACMA, EDPS, CAN-SPAM and ANTI CAN-SPAM to ensure the privacy and security of our database. We engage certified auditors to validate our security and privacy by providing us with certificates to represent our security compliance.

Our USPs- what makes us your ideal choice?

At DataCaptive™, we strive consistently to improve our services and cater to the needs of businesses around the world while keeping up with industry trends.

• Elaborate data mining from credible sources • 7-tier verification, including manual quality check • Strict adherence to global and local data policies • Guaranteed 95% accuracy or cash-back • Free sample database available on request

Guaranteed benefits of our Dentists email database!

85% email deliverability and 95% accuracy on other data fields

We understand the importance of data accuracy and employ every avenue to keep our database fresh and updated. We execute a multi-step QC process backed by our Patented AI and Machine learning tools to prevent anomalies in consistency and data precision. This cycle repeats every 45 days. Although maintaining 100% accuracy is quite impractical, since data such as email, physical addresses, and phone numbers are subjected to change, we guarantee 85% email deliverability and 95% accuracy on other data points.

100% replacement in case of hard bounces

Every data point is meticulously verified and then re-verified to ensure you get the best. Data Accuracy is paramount in successfully penetrating a new market or working within a familiar one. We are committed to precision. However, in an unlikely event where hard bounces or inaccuracies exceed the guaranteed percentage, we offer replacement with immediate effect. If need be, we even offer credits and/or refunds for inaccurate contacts.

Other promised benefits

• Contacts are for the perpetual usage • The database comprises consent-based opt-in contacts only • The list is free of duplicate contacts and generic emails • Round-the-clock customer service assistance • 360-degree database solutions
f
Data from: Integrating Data Mining and Natural Language Processing to...
figshare.com
zip
Updated Oct 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jinyoung Jeong; Taehyun Park; JunHo Song; Seungpyo Kang; Joonghee Won; Jungim Han; Kyoungmin Min (2024). Integrating Data Mining and Natural Language Processing to Construct a Melting Point Database for Organometallic Compounds [Dataset]. http://doi.org/10.1021/acs.jcim.4c01254.s004
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.jcim.4c01254.s004
Dataset updated
Oct 1, 2024
Dataset provided by
ACS Publications
Authors
Jinyoung Jeong; Taehyun Park; JunHo Song; Seungpyo Kang; Joonghee Won; Jungim Han; Kyoungmin Min
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
As semiconductor devices are miniaturized, the importance of atomic layer deposition (ALD) technology is growing. When designing ALD precursors, it is important to consider the melting point, because the precursors should have melting points lower than the process temperature. However, obtaining melting point data is challenging due to experimental sensitivity and high computational costs. As a result, a comprehensive and well-organized database for the melting point of the OMCs has not been fully reported yet. Therefore, in this study, we constructed a database of melting points for 1,845 OMCs, including 58 metal and 6 metalloid elements. The database contains CAS numbers, molecular formulas, and structural information and was constructed through automatic extraction and systematic curation. The melting point information was extracted using two methods: 1) 1,434 materials from 11 chemical vendor databases and 2) 411 materials identified through natural language processing (NLP) techniques with an accuracy of 86.3%, based on 2,096 scientific papers published over the past 29 years. In our database, the OMCs contain up to around 250 atoms and have melting points that range from −170 to 1610 °C. The main source is the Chemsrc database, accounting for 607 materials (32.9%), and Fe is the most common central metal or metalloid element (15.0%), followed by Si (11.6%) and B (6.7%). To validate the utilization of the constructed database, a multimodal neural network model was developed integrating graph-based and feature-based information as descriptors to predict the melting points of the OMCs but moderate performance. We believe the current approach reduces the time and cost associated with hand-operated data collection and processing, contributing to effective screening of potentially promising ALD precursors and providing crucial information for the advancement of the semiconductor industry.
Metadata record for: A database for using machine learning and data mining...
springernature.figshare.com
txt
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roohallah Alizadehsani; Mohamad Roshanzamir; Moloud Abdar; Adham Beykikhoshk; Abbas Khosravi; Maryam Panahiazar; Afsaneh Koohestani; Fahime Khozeimeh; Saeid Nahavandi; Nizal Sarrafzadegan (2023). Metadata record for: A database for using machine learning and data mining techniques for coronary artery disease diagnosis [Dataset]. http://doi.org/10.6084/m9.figshare.9825680.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.9825680.v2
Dataset updated
Jun 1, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Roohallah Alizadehsani; Mohamad Roshanzamir; Moloud Abdar; Adham Beykikhoshk; Abbas Khosravi; Maryam Panahiazar; Afsaneh Koohestani; Fahime Khozeimeh; Saeid Nahavandi; Nizal Sarrafzadegan
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset contains key characteristics about the data described in the Data Descriptor A database for using machine learning and data mining techniques for coronary artery disease diagnosis. Contents:

1. human readable metadata summary table in CSV format 2. machine readable metadata file in JSON formatVersioning Note:Version 2 was generated when the metadata format was updated from JSON to JSON-LD. This was an automatic process that changed only the format, not the contents, of the metadata.
Ocean Carbon States Database and Toolbox
zenodo.org
data.niaid.nih.gov
zip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anastasia Romanou; Rebecca Latto; Anastasia Romanou; Rebecca Latto (2020). Ocean Carbon States Database and Toolbox [Dataset]. http://doi.org/10.5281/zenodo.996892
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.996892
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Anastasia Romanou; Rebecca Latto; Anastasia Romanou; Rebecca Latto
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The "Ocean Carbon States Database and Toolbox" includes observational and climate model datasets and matlab scripts to compute regimes of the ocean carbon cycle.
d
A database of artisanal, small-scale, and large-scale mining in the...
catalog.data.gov
data.usgs.gov
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). A database of artisanal, small-scale, and large-scale mining in the Copperbelt region of the Democratic Republic of Congo and Zambia [Dataset]. https://catalog.data.gov/dataset/a-database-of-artisanal-small-scale-and-large-scale-mining-in-the-copperbelt-region-of-the
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
U.S. Geological Survey
Area covered
Democratic Republic of the Congo, Zambia, Copperbelt Province
Description
Cobalt, designated a critical mineral by the European Union and the United States, is a crucial component of the lithium-ion batteries found in cell phones, electric vehicles, and personal computing devices. Over half of the world’s cobalt supply is produced in the Democratic Republic of the Congo (DRC), where cobalt is mined in both large-scale and artisanal or small-scale operations. This dataset focuses on Africa’s mineral-rich Copperbelt region, an area mined for both copper and cobalt, that extends south across the DRC boundary into neighboring Zambia. Existing geoscientific data and remote sensing analysis were investigated to build a comprehensive dataset describing cobalt mining extent and technique (large- or artisanal/small-scale). Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.
l
LScDC Word-Category RIG Matrix
figshare.le.ac.uk
pdf
Updated Apr 28, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LScDC Word-Category RIG Matrix [Dataset]. https://figshare.le.ac.uk/articles/dataset/LScDC_Word-Category_RIG_Matrix/12133431
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.25392/leicester.data.12133431.v2
Dataset updated
Apr 28, 2020
Dataset provided by
University of Leicester
Authors
Neslihan Suzen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
LScDC Word-Category RIG MatrixApril 2020 by Neslihan Suzen, PhD student at the University of Leicester (ns433@leicester.ac.uk / suzenneslihan@hotmail.com)Supervised by Prof Alexander Gorban and Dr Evgeny MirkesGetting StartedThis file describes the Word-Category RIG Matrix for theLeicester Scientific Corpus (LSC) [1], the procedure to build the matrix and introduces the Leicester Scientific Thesaurus (LScT) with the construction process. The Word-Category RIG Matrix is a 103,998 by 252 matrix, where rows correspond to words of Leicester Scientific Dictionary-Core (LScDC) [2] and columns correspond to 252 Web of Science (WoS) categories [3, 4, 5]. Each entry in the matrix corresponds to a pair (category,word). Its value for the pair shows the Relative Information Gain (RIG) on the belonging of a text from the LSC to the category from observing the word in this text. The CSV file of Word-Category RIG Matrix in the published archive is presented with two additional columns of the sum of RIGs in categories and the maximum of RIGs over categories (last two columns of the matrix). So, the file ‘Word-Category RIG Matrix.csv’ contains a total of 254 columns.This matrix is created to be used in future research on quantifying of meaning in scientific texts under the assumption that words have scientifically specific meanings in subject categories and the meaning can be estimated by information gains from word to categories. LScT (Leicester Scientific Thesaurus) is a scientific thesaurus of English. The thesaurus includes a list of 5,000 words from the LScDC. We consider ordering the words of LScDC by the sum of their RIGs in categories. That is, words are arranged in their informativeness in the scientific corpus LSC. Therefore, meaningfulness of words evaluated by words’ average informativeness in the categories. We have decided to include the most informative 5,000 words in the scientific thesaurus. Words as a Vector of Frequencies in WoS CategoriesEach word of the LScDC is represented as a vector of frequencies in WoS categories. Given the collection of the LSC texts, each entry of the vector consists of the number of texts containing the word in the corresponding category.It is noteworthy that texts in a corpus do not necessarily belong to a single category, as they are likely to correspond to multidisciplinary studies, specifically in a corpus of scientific texts. In other words, categories may not be exclusive. There are 252 WoS categories and a text can be assigned to at least 1 and at most 6 categories in the LSC. Using the binary calculation of frequencies, we introduce the presence of a word in a category. We create a vector of frequencies for each word, where dimensions are categories in the corpus.The collection of vectors, with all words and categories in the entire corpus, can be shown in a table, where each entry corresponds to a pair (word,category). This table is build for the LScDC with 252 WoS categories and presented in published archive with this file. The value of each entry in the table shows how many times a word of LScDC appears in a WoS category. The occurrence of a word in a category is determined by counting the number of the LSC texts containing the word in a category. Words as a Vector of Relative Information Gains Extracted for CategoriesIn this section, we introduce our approach to representation of a word as a vector of relative information gains for categories under the assumption that meaning of a word can be quantified by their information gained for categories.For each category, a function is defined on texts that takes the value 1, if the text belongs to the category, and 0 otherwise. For each word, a function is defined on texts that takes the value 1 if the word belongs to the text, and 0 otherwise. Consider LSC as a probabilistic sample space (the space of equally probable elementary outcomes). For the Boolean random variables, the joint probability distribution, the entropy and information gains are defined.The information gain about the category from the word is the amount of information on the belonging of a text from the LSC to the category from observing the word in the text [6]. We used the Relative Information Gain (RIG) providing a normalised measure of the Information Gain. This provides the ability of comparing information gains for different categories. The calculations of entropy, Information Gains and Relative Information Gains can be found in the README file in the archive published. Given a word, we created a vector where each component of the vector corresponds to a category. Therefore, each word is represented as a vector of relative information gains. It is obvious that the dimension of vector for each word is the number of categories. The set of vectors is used to form the Word-Category RIG Matrix, in which each column corresponds to a category, each row corresponds to a word and each component is the relative information gain from the word to the category. In Word-Category RIG Matrix, a row vector represents the corresponding word as a vector of RIGs in categories. We note that in the matrix, a column vector represents RIGs of all words in an individual category. If we choose an arbitrary category, words can be ordered by their RIGs from the most informative to the least informative for the category. As well as ordering words in each category, words can be ordered by two criteria: sum and maximum of RIGs in categories. The top n words in this list can be considered as the most informative words in the scientific texts. For a given word, the sum and maximum of RIGs are calculated from the Word-Category RIG Matrix.RIGs for each word of LScDC in 252 categories are calculated and vectors of words are formed. We then form the Word-Category RIG Matrix for the LSC. For each word, the sum (S) and maximum (M) of RIGs in categories are calculated and added at the end of the matrix (last two columns of the matrix). The Word-Category RIG Matrix for the LScDC with 252 categories, the sum of RIGs in categories and the maximum of RIGs over categories can be found in the database.Leicester Scientific Thesaurus (LScT)Leicester Scientific Thesaurus (LScT) is a list of 5,000 words form the LScDC [2]. Words of LScDC are sorted in descending order by the sum (S) of RIGs in categories and the top 5,000 words are selected to be included in the LScT. We consider these 5,000 words as the most meaningful words in the scientific corpus. In other words, meaningfulness of words evaluated by words’ average informativeness in the categories and the list of these words are considered as a ‘thesaurus’ for science. The LScT with value of sum can be found as CSV file with the published archive. Published archive contains following files:1) Word_Category_RIG_Matrix.csv: A 103,998 by 254 matrix where columns are 252 WoS categories, the sum (S) and the maximum (M) of RIGs in categories (last two columns of the matrix), and rows are words of LScDC. Each entry in the first 252 columns is RIG from the word to the category. Words are ordered as in the LScDC.2) Word_Category_Frequency_Matrix.csv: A 103,998 by 252 matrix where columns are 252 WoS categories and rows are words of LScDC. Each entry of the matrix is the number of texts containing the word in the corresponding category. Words are ordered as in the LScDC.3) LScT.csv: List of words of LScT with sum (S) values. 4) Text_No_in_Cat.csv: The number of texts in categories. 5) Categories_in_Documents.csv: List of WoS categories for each document of the LSC.6) README.txt: Description of Word-Category RIG Matrix, Word-Category Frequency Matrix and LScT and forming procedures.7) README.pdf (same as 6 in PDF format)References[1] Suzen, Neslihan (2019): LSC (Leicester Scientific Corpus). figshare. Dataset. https://doi.org/10.25392/leicester.data.9449639.v2[2] Suzen, Neslihan (2019): LScDC (Leicester Scientific Dictionary-Core). figshare. Dataset. https://doi.org/10.25392/leicester.data.9896579.v3[3] Web of Science. (15 July). Available: https://apps.webofknowledge.com/[4] WoS Subject Categories. Available: https://images.webofknowledge.com/WOKRS56B5/help/WOS/hp_subject_category_terms_tasca.html [5] Suzen, N., Mirkes, E. M., & Gorban, A. N. (2019). LScDC-new large scientific dictionary. arXiv preprint arXiv:1912.06858. [6] Shannon, C. E. (1948). A mathematical theory of communication. Bell system technical journal, 27(3), 379-423.
d
Replication Data for: \"Unraveling spatial, structural, and social...
search.dataone.org
dataverse.harvard.edu
Updated Nov 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PÁJARO, Agustin; DURAN, Ignacio J.; RODRIGO, Pablo (2023). Replication Data for: \"Unraveling spatial, structural, and social country-level conditions for the emergence of the foreign fighter phenomenon: an exploratory data mining approach to the case of ISIS\" [Dataset]. https://search.dataone.org/view/sha256%3A14ddbd4f4acade3b624c9d518cddfeb2bb845be0043a741211c5afe242ad0bd4
Explore at:
Dataset updated
Nov 9, 2023
Dataset provided by
Harvard Dataverse
Authors
PÁJARO, Agustin; DURAN, Ignacio J.; RODRIGO, Pablo
Description
Data from the article "Unraveling spatial, structural, and social country-level conditions for the emergence of the foreign fighter phenomenon: an exploratory data mining approach to the case of ISIS", by Agustin Pájaro, Ignacio J. Duran and Pablo Rodrigo, published in Revista DADOS, v. 65, n. 3, 2022.
f
Table_3_MeiosisOnline: A Manually Curated Database for Tracking and...
figshare.com
xlsx
Updated Jun 9, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiaohua Jiang; Daren Zhao; Asim Ali; Bo Xu; Wei Liu; Jie Wen; Huan Zhang; Qinghua Shi; Yuanwei Zhang (2023). Table_3_MeiosisOnline: A Manually Curated Database for Tracking and Predicting Genes Associated With Meiosis.XLSX [Dataset]. http://doi.org/10.3389/fcell.2021.673073.s008
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/fcell.2021.673073.s008
Dataset updated
Jun 9, 2023
Dataset provided by
Frontiers
Authors
Xiaohua Jiang; Daren Zhao; Asim Ali; Bo Xu; Wei Liu; Jie Wen; Huan Zhang; Qinghua Shi; Yuanwei Zhang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Meiosis, an essential step in gametogenesis, is the key event in sexually reproducing organisms. Thousands of genes have been reported to be involved in meiosis. Therefore, a specialist database is much needed for scientists to know about the function of these genes quickly and to search for genes with potential roles in meiosis. Here, we developed “MeiosisOnline,” a publicly accessible, comprehensive database of known functional genes and potential candidates in meiosis (https://mcg.ustc.edu.cn/bsc/meiosis/index.html). A total of 2,052 meiotic genes were manually curated from literature resource and were classified into different categories. Annotation information was provided for both meiotic genes and predicted candidates, including basic information, function, protein–protein interaction (PPI), and expression data. On the other hand, 165 mouse genes were predicted as potential candidates in meiosis using the “Greed AUC Stepwise” algorithm. Thus, MeiosisOnline provides the most updated and detailed information of experimental verified and predicted genes in meiosis. Furthermore, the searching tools and friendly interface of MeiosisOnline will greatly help researchers in studying meiosis in an easy and efficient way.

Facebook

Twitter

Click to copy link

Link copied

Cite

Work With Data (2024). Books on Advances in data mining and database management (ADMDM) book series [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=j0-book_series&fop0=%3D&fval0=Advances+in+data+mining+and+database+management+%28ADMDM%29+book+series&j=1&j0=book_series

Books on Advances in data mining and database management (ADMDM) book series

Explore at:

Dataset updated

Nov 9, 2024

Dataset authored and provided by

Work With Data

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset is about books and is filtered where the book series is Advances in data mining and database management (ADMDM) book series, featuring 9 columns including author, BNB id, book, book publisher, and book series. The preview is ordered by publication date (descending).

Clear search

Close search

Google apps

Main menu

Books on Advances in data mining and database management (ADMDM) book series...

T10I4D1000K transactional database

Data_Sheet_1_BioVDB: biological vector database for high-throughput gene...

Africa - PowerMining Projects Database

Time granularities in databases, data mining, and temporal reasoning

Data from: MusicOSet: An Enhanced Open Dataset for Music Data Mining

Database after processed itemset XEDC.

Data Analytics Market By Type (Descriptive Analytics, Predictive Analytics,...

Data from: DATA MINING THE GALAXY ZOO MERGERS

Data from: Database Management Systems (DBMS)

Global Wildfire Database for GWIS (2021)

A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and...

More than 120,520 Verified Emails and Phone numbers of Dentists From USA |...

Data from: Integrating Data Mining and Natural Language Processing to...

Metadata record for: A database for using machine learning and data mining...

Ocean Carbon States Database and Toolbox

A database of artisanal, small-scale, and large-scale mining in the...

LScDC Word-Category RIG Matrix

Replication Data for: \"Unraveling spatial, structural, and social...

Table_3_MeiosisOnline: A Manually Curated Database for Tracking and...

Books on Advances in data mining and database management (ADMDM) book seriesSee More Versions

Books on Advances in data mining and database management (ADMDM) book series