100+ datasets found

m
pinterest_dataset
data.mendeley.com
Updated Oct 27, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
pinterest_dataset [Dataset]. https://data.mendeley.com/datasets/fs4k2zc5j5/2
Explore at:
Unique identifier
https://doi.org/10.17632/fs4k2zc5j5.2
Dataset updated
Oct 27, 2017
Authors
Juan Carlos Gomez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset with 72000 pins from 117 users in Pinterest. Each pin contains a short raw text and an image. The images are processed using a pretrained Convolutional Neural Network and transformed into a vector of 4096 features.

This dataset was used in the paper "User Identification in Pinterest Through the Refinement of a Cascade Fusion of Text and Images" to idenfity specific users given their comments. The paper is publishe in the Research in Computing Science Journal, as part of the LKE 2017 conference. The dataset includes the splits used in the paper.

There are nine files. text_test, text_train and text_val, contain the raw text of each pin in the corresponding split of the data. imag_test, imag_train and imag_val contain the image features of each pin in the corresponding split of the data. train_user and val_test_users contain the index of the user of each pin (between 0 and 116). There is a correspondance one-to-one among the test, train and validation files for images, text and users. There are 400 pins per user in the train set, and 100 pins per user in the validation and test sets each one.

If you have questions regarding the data, write to: jc dot gomez at ugto dot mx
Data from: PADMINI: A PEER-TO-PEER DISTRIBUTED ASTRONOMY DATA MINING SYSTEM...
data.nasa.gov
data.staging.idas-ds1.appdat.jsc.nasa.gov
+2more
application/rdfxml +5
Updated Jun 26, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2018). PADMINI: A PEER-TO-PEER DISTRIBUTED ASTRONOMY DATA MINING SYSTEM AND A CASE STUDY [Dataset]. https://data.nasa.gov/dataset/PADMINI-A-PEER-TO-PEER-DISTRIBUTED-ASTRONOMY-DATA-/r38j-jwis
Explore at:
csv, xml, application/rdfxml, application/rssxml, tsv, jsonAvailable download formats
Dataset updated
Jun 26, 2018
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
PADMINI: A PEER-TO-PEER DISTRIBUTED ASTRONOMY DATA MINING SYSTEM AND A CASE STUDY

TUSHAR MAHULE*, KIRK BORNE**, SANDIPAN DEY*, SUGANDHA ARORA*, AND HILLOL KARGUPTA***

Abstract. Peer-to-Peer (P2P) networks are appealing for astronomy data mining from virtual observatories because of the large volume of the data, compute-intensive tasks, potentially large number of users, and distributed nature of the data analysis process. This paper offers a brief overview of PADMINI—a Peer-to-Peer Astronomy Data MINIng system. It also presents a case study on PADMINI for distributed outlier detection using astronomy data. PADMINI is a webbased system powered by Google Sky and distributed data mining algorithms that run on a collection of computing nodes. This paper offers a case study of the PADMINI evaluating the architecture and the performance of the overall system. Detailed experimental results are presented in order to document the utility and scalability of the system.
Telemarketer & Regular user CDR phone records
zenodo.org
data.niaid.nih.gov
zip
Updated Jun 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ladislav Beháň; Ladislav Beháň (2022). Telemarketer & Regular user CDR phone records [Dataset]. http://doi.org/10.5281/zenodo.6637796
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6637796
Dataset updated
Jun 13, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Ladislav Beháň; Ladislav Beháň
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Real-world CDR records gathered from Telemarketer PBX and mobile phone users.
d
Distributed Data Mining in Peer-to-Peer Networks
catalog.data.gov
s.cnmilf.com
+1more
Updated Dec 7, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2023). Distributed Data Mining in Peer-to-Peer Networks [Dataset]. https://catalog.data.gov/dataset/distributed-data-mining-in-peer-to-peer-networks
Explore at:
Dataset updated
Dec 7, 2023
Dataset provided by
Dashlink
Description
Peer-to-peer (P2P) networks are gaining popularity in many applications such as file sharing, e-commerce, and social networking, many of which deal with rich, distributed data sources that can benefit from data mining. P2P networks are, in fact,well-suited to distributed data mining (DDM), which deals with the problem of data analysis in environments with distributed data,computing nodes,and users. This article offers an overview of DDM applications and algorithms for P2P environments,focusing particularly on local algorithms that perform data analysis by using computing primitives with limited communication overhead. The authors describe both exact and approximate local P2P data mining algorithms that work in a decentralized and communication-efficient manner.
f
Confusion matrix.
figshare.com
xls
Updated Jul 7, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shaoxia Mou; Heming Zhang (2023). Confusion matrix. [Dataset]. http://doi.org/10.1371/journal.pone.0288140.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0288140.t002
Dataset updated
Jul 7, 2023
Dataset provided by
PLOS ONE
Authors
Shaoxia Mou; Heming Zhang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Due to the inherent characteristics of accumulation sequence of unbalanced data, the mining results of this kind of data are often affected by a large number of categories, resulting in the decline of mining performance. To solve the above problems, the performance of data cumulative sequence mining is optimized. The algorithm for mining cumulative sequence of unbalanced data based on probability matrix decomposition is studied. The natural nearest neighbor of a few samples in the unbalanced data cumulative sequence is determined, and the few samples in the unbalanced data cumulative sequence are clustered according to the natural nearest neighbor relationship. In the same cluster, new samples are generated from the core points of dense regions and non core points of sparse regions, and then new samples are added to the original data accumulation sequence to balance the data accumulation sequence. The probability matrix decomposition method is used to generate two random number matrices with Gaussian distribution in the cumulative sequence of balanced data, and the linear combination of low dimensional eigenvectors is used to explain the preference of specific users for the data sequence; At the same time, from a global perspective, the AdaBoost idea is used to adaptively adjust the sample weight and optimize the probability matrix decomposition algorithm. Experimental results show that the algorithm can effectively generate new samples, improve the imbalance of data accumulation sequence, and obtain more accurate mining results. Optimizing global errors as well as more efficient single-sample errors. When the decomposition dimension is 5, the minimum RMSE is obtained. The proposed algorithm has good classification performance for the cumulative sequence of balanced data, and the average ranking of index F value, G mean and AUC is the best.
f
fdata-02-00005-g0001_Location Prediction for Tweets.tif
figshare.com
tiff
Updated Jun 2, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chieh-Yang Huang; Hanghang Tong; Jingrui He; Ross Maciejewski (2023). fdata-02-00005-g0001_Location Prediction for Tweets.tif [Dataset]. http://doi.org/10.3389/fdata.2019.00005.s003
Explore at:
tiffAvailable download formats
Unique identifier
https://doi.org/10.3389/fdata.2019.00005.s003
Dataset updated
Jun 2, 2023
Dataset provided by
Frontiers
Authors
Chieh-Yang Huang; Hanghang Tong; Jingrui He; Ross Maciejewski
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Geographic information provides an important insight into many data mining and social media systems. However, users are reluctant to provide such information due to various concerns, such as inconvenience, privacy, etc. In this paper, we aim to develop a deep learning based solution to predict geographic information for tweets. The current approaches bear two major limitations, including (a) hard to model the long term information and (b) hard to explain to the end users what the model learns. To address these issues, our proposed model embraces three key ideas. First, we introduce a multi-head self-attention model for text representation. Second, to further improve the result on informal language, we treat subword as a feature in our model. Lastly, the model is trained jointly with the city and country to incorporate the information coming from different labels. The experiment performed on W-NUT 2016 Geo-tagging shared task shows our proposed model is competitive with the state-of-the-art systems when using accuracy measurement, and in the meanwhile, leading to a better distance measure over the existing approaches.
d
Appendix - Mining User Behaviour from Smartphone data: a literature review
data.dtu.dk
xlsx
Updated Jul 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Valentino Servizi; Francisco Camara Pereira; Marie Karen Anderson; Otto Anker Nielsen (2023). Appendix - Mining User Behaviour from Smartphone data: a literature review [Dataset]. http://doi.org/10.11583/DTU.11989455
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.11583/DTU.11989455
Dataset updated
Jul 12, 2023
Dataset provided by
Technical University of Denmark
Authors
Valentino Servizi; Francisco Camara Pereira; Marie Karen Anderson; Otto Anker Nielsen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Each study reviewed is here catalogued as follows.· Level of difficulty: Classification Task, Number and List of Classes.· Approach: Method and Main Features.· Performance: Score, Metric, Validation Method.· Realism of dataset: Ground Truth, Person-day, Respondents, Observations, Collection Time, Area, Smartphone App.· Sensors involved: AGPS, Inertial Navigation Systems (INS), Geographic Information Systems (GIS), Data Fusion.
Review results of the manuscript "A Systematic Review on Privacy-Preserving...
figshare.com
txt
Updated Aug 26, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chang Sun (2021). Review results of the manuscript "A Systematic Review on Privacy-Preserving Distributed Data Mining" [Dataset]. http://doi.org/10.6084/m9.figshare.14239937.v4
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14239937.v4
Dataset updated
Aug 26, 2021
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Chang Sun
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains the review results of the manuscript of "A Systematic Review on Privacy-Preserving Distributed Data Mining" authored by Chang Sun, Lianne Ippel, Andre Dekker, Michel Dumontier, Johan van Soest. In the datasets, there are 231 published articles about privacy-perserving distributed data mining. Variables include article DOI, title, authors, keywords, user scenarios, distributed data scenarios, privacy/security definition/proof/analysis, privacy statement, privacy-preserving methods category, privacy-preserving methods (specific), data mining problem, data mining/machine learning methods, experiment data information, accuracy of the methods, efficiency (computation and communication cost), and scalability. The search method and evaluation criteria are described in the paper "A Systematic Review on Privacy-Preserving Distributed Data Mining". The DOI and link to the paper will be provided when the paper gets published.
Business Analytics Market Size By Component (Software, Services), By...
verifiedmarketresearch.com
Updated Apr 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
VERIFIED MARKET RESEARCH (2024). Business Analytics Market Size By Component (Software, Services), By Organization Size (Large Enterprises, Small-Medium Enterprises (SMEs)), By Deployment Mode (On-Premises, Cloud), By Application (Finance Analytics, Marketing Analytics, Supply Chain Analytics, Data Mining), By End-User Industry (Banking, Financial Services and Insurance (BFSI), Retail and eCommerce, Media and Entertainment, Manufacturing, Energy and Utilities, Telecom and IT, Healthcare, Government, Education), By Geographic Scope And Forecast [Dataset]. https://www.verifiedmarketresearch.com/product/global-business-analytics-market-size-and-forecast/
Explore at:
Dataset updated
Apr 29, 2024
Dataset provided by
Verified Market Researchhttps://www.verifiedmarketresearch.com/
Authors
VERIFIED MARKET RESEARCH
License
https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Time period covered
2024 - 2031
Area covered
Global
Description
Business Analytics Market was valued at USD 84.42 Billion in 2024 and is projected to reach USD 176.14 Billion by 2031, growing at a CAGR of 9.63% from 2024 to 2031.

Global Business Analytics Market Drivers

The market drivers for the Business Analytics Market can be influenced by various factors. These may include:

Growing Adoption of Big Data Analytics: In order to extract meaningful insights from their data, organizations are progressively using big data analytics in response to the exponential expansion of data. Making educated decisions through data analysis is facilitated by business analytics.
Growing Need for Data-driven Decision Making: In order to obtain a competitive edge, businesses are realizing the significance of data-driven decision making. The methods and instruments for data analysis and significant insights extraction for improved decision-making are offered by business analytics.
Growing Need for Predictive and Prescriptive Analytics: Predictive and prescriptive analytics are becoming more and more in demand as a means of projecting future trends and results. Businesses can use business analytics to prescribe activities to achieve desired outcomes and forecast future outcomes based on previous data.
Growing Emphasis on Customer Analytics: As e-commerce and digital marketing gain traction, companies are putting more of an emphasis on comprehending the behavior and preferences of their customers. In order to increase consumer engagement and personalize marketing efforts, business analytics is used to analyze customer data.
Emergence of Advanced Technologies: The use of advanced analytics solutions is being propelled by developments in fields like artificial intelligence (AI), machine learning (ML), and natural language processing (NLP). Businesses may now analyze data more effectively and gain deeper insights thanks to these technologies.
Operational Efficiency and Cost Optimization Are Necessary: Companies are always under pressure to increase operational efficiency and reduce costs. Business analytics promotes market expansion by assisting in the identification of opportunities for process and cost-cutting enhancements.
Compliance and Regulatory Requirements: The use of business analytics solutions for risk management and compliance reporting is being fueled by the growing regulatory requirements in a number of industries, including healthcare, banking, and retail.
w
OceanXtremes: Oceanographic Data-Intensive Anomaly Detection and Analysis...
data.wu.ac.at
data.amerigeoss.org
xml
Updated Jan 25, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Aeronautics and Space Administration (2018). OceanXtremes: Oceanographic Data-Intensive Anomaly Detection and Analysis Portal [Dataset]. https://data.wu.ac.at/schema/data_gov/N2M1NjFmOGYtMGVkMi00OTQ4LWE3ZDUtMDc0N2NhOTA4YmNi
Explore at:
xmlAvailable download formats
Dataset updated
Jan 25, 2018
Dataset provided by
National Aeronautics and Space Administration
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
Anomaly detection is a process of identifying items, events or observations, which do not conform to an expected pattern in a dataset or time series. Current and future missions and our research communities challenge us to rapidly identify features and anomalies in complex and voluminous observations to further science and improve decision support. Given this data intensive reality, we propose to develop an anomaly detection system, called OceanXtremes, powered by an intelligent, elastic Cloud-based analytic service backend that enables execution of domain-specific, multi-scale anomaly and feature detection algorithms across the entire archive of ocean science datasets. A parallel analytics engine will be developed as the key computational and data-mining core of OceanXtreams' backend processing. This analytic engine will demonstrate three new technology ideas to provide rapid turn around on climatology computation and anomaly detection: 1. An adaption of the Hadoop/MapReduce framework for parallel data mining of science datasets, typically large 3 or 4 dimensional arrays packaged in NetCDF and HDF. 2. An algorithm profiling service to efficiently and cost-effectively scale up hybrid Cloud computing resources based on the needs of scheduled jobs (CPU, memory, network, and bursting from a private Cloud computing cluster to public cloud provider like Amazon Cloud services). 3. An extension to industry-standard search solutions (OpenSearch and Faceted search) to provide support for shared discovery and exploration of ocean phenomena and anomalies, along with unexpected correlations between key measured variables. We will use a hybrid Cloud compute cluster (private Eucalyptus on-premise at JPL with bursting to Amazon Web Services) as the operational backend. The key idea is that the parallel data-mining operations will be run 'near' the ocean data archives (a local 'network' hop) so that we can efficiently access the thousands of (say, daily) files making up a three decade time-series, and then cache key variables and pre-computed climatologies in a high-performance parallel database. OceanXtremes will be equipped with both web portal and web service interfaces for users and applications/systems to register and retrieve oceanographic anomalies data. By leveraging technology such as Datacasting (Bingham, et.al, 2007), users can also subscribe to anomaly or 'event' types of their interest and have newly computed anomaly metrics and other information delivered to them by metadata feeds packaged in standard Rich Site Summary (RSS) format. Upon receiving new feed entries, users can examine the metrics and download relevant variables, by simply clicking on a link, to begin further analyzing the event. The OceanXtremes web portal will allow users to define their own anomaly or feature types where continuous backend processing will be scheduled to populate the new user-defined anomaly type by executing the chosen data mining algorithm (i.e. differences from climatology or gradients above a specified threshold). Metadata on the identified anomalies will be cataloged including temporal and geospatial profiles, key physical metrics, related observational artifacts and other relevant metadata to facilitate discovery, extraction, and visualization. Products created by the anomaly detection algorithm will be made explorable and subsettable using Webification (Huang, et.al, 2014) and OPeNDAP (http://opendap.org) technologies. Using this platform scientists can efficiently search for anomalies or ocean phenomena, compute data metrics for events or over time-series of ocean variables, and efficiently find and access all of the data relevant to their study (and then download only that data).
Z
Predictive Analytics Market - by Software Solutions (Data Mining &...
zionmarketresearch.com
pdf
Updated Mar 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zion Market Research (2025). Predictive Analytics Market - by Software Solutions (Data Mining & Management, Decision Support Systems, Fraud & Security Intelligence, Financial Intelligence, Customer Intelligence, and Others), By Delivery Mode (Cloud-Based Technology and On-Premise Deployment), By End-User (BFSI, Telecom & IT, Healthcare, Transport & Logistics, Government & Utilities, and Others) and by Application (Customer & Channel, Sales and Marketing, Finance & Risk, and Other Applications), and By Region - Global and Regional Industry Overview, Comprehensive Analysis, Historical Data, and Forecasts 2024-2032 [Dataset]. https://www.zionmarketresearch.com/report/predictive-analytic-market
Explore at:
pdfAvailable download formats
Dataset updated
Mar 17, 2025
Dataset authored and provided by
Zion Market Research
License
https://www.zionmarketresearch.com/privacy-policyhttps://www.zionmarketresearch.com/privacy-policy
Time period covered
2022 - 2030
Area covered
Global
Description
Global Predictive Analytics Market size worth at USD 16.19 Billion in 2023 and projected to USD 113.8 Billion by 2032, with a CAGR of around 24.19% between 2024-2032.
m
ShoppingAppReviews Dataset
data.mendeley.com
Updated Aug 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Noor Mairukh Khan Arnob (2024). ShoppingAppReviews Dataset [Dataset]. http://doi.org/10.17632/chr5b94c6y.1
Explore at:
Unique identifier
https://doi.org/10.17632/chr5b94c6y.1
Dataset updated
Aug 20, 2024
Authors
Noor Mairukh Khan Arnob
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
A dataset consisting of 751,500 English app reviews of 12 online shopping apps. The dataset was scraped from the internet using a python script. This ShoppingAppReviews dataset contains app reviews of the 12 most popular online shopping android apps: Alibaba, Aliexpress, Amazon, Daraz, eBay, Flipcart, Lazada, Meesho, Myntra, Shein, Snapdeal and Walmart. Each review entry contains many metadata like review score, thumbsupcount, review posting time, reply content etc. The dataset is organized in a zip file, under which there are 12 json files for 12 online shopping apps. This dataset can be used to obtain valuable information about customers' feedback regarding their user experience of these financially important apps.
i
Data from: Twitter Big Data as a Resource for Exoskeleton Research: A...
ieee-dataport.org
Updated Oct 22, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nirmalya Thakur (2022). Twitter Big Data as a Resource for Exoskeleton Research: A Large-Scale Dataset of about 140,000 Tweets and 100 Research Questions [Dataset]. http://doi.org/10.21227/r5mv-ax79
Explore at:
Unique identifier
https://doi.org/10.21227/r5mv-ax79
Dataset updated
Oct 22, 2022
Dataset provided by
IEEE Dataport
Authors
Nirmalya Thakur
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Please cite the following paper when using this dataset:N. Thakur, "Twitter Big Data as a Resource for Exoskeleton Research: A Large-Scale Dataset of about 140,000 Tweets from 2017–2022 and 100 Research Questions", Journal of Analytics, Volume 1, Issue 2, 2022, pp. 72-97, DOI: https://doi.org/10.3390/analytics1020007AbstractThe exoskeleton technology has been rapidly advancing in the recent past due to its multitude of applications and diverse use cases in assisted living, military, healthcare, firefighting, and industry 4.0. The exoskeleton market is projected to increase by multiple times its current value within the next two years. Therefore, it is crucial to study the degree and trends of user interest, views, opinions, perspectives, attitudes, acceptance, feedback, engagement, buying behavior, and satisfaction, towards exoskeletons, for which the availability of Big Data of conversations about exoskeletons is necessary. The Internet of Everything style of today’s living, characterized by people spending more time on the internet than ever before, with a specific focus on social media platforms, holds the potential for the development of such a dataset by the mining of relevant social media conversations. Twitter, one such social media platform, is highly popular amongst all age groups, where the topics found in the conversation paradigms include emerging technologies such as exoskeletons. To address this research challenge, this work makes two scientific contributions to this field. First, it presents an open-access dataset of about 140,000 Tweets about exoskeletons that were posted in a 5-year period from 21 May 2017 to 21 May 2022. Second, based on a comprehensive review of the recent works in the fields of Big Data, Natural Language Processing, Information Retrieval, Data Mining, Pattern Recognition, and Artificial Intelligence that may be applied to relevant Twitter data for advancing research, innovation, and discovery in the field of exoskeleton research, a total of 100 Research Questions are presented for researchers to study, analyze, evaluate, ideate, and investigate based on this dataset.
e
Additional data requirements Market Market Research Report By Product Type...
exactitudeconsultancy.com
Updated Mar 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Exactitude Consultancy (2025). Additional data requirements Market Market Research Report By Product Type (Data Analytics, Data Storage, Data Processing), By Application (Business Intelligence, Data Mining, Predictive Analytics), By End User (Healthcare, Retail, Finance, Education), By Technology (Cloud-based, On-premises), By Distribution Channel (Online, Offline) – Forecast to 2034. [Dataset]. https://exactitudeconsultancy.com/reports/48048/additional-data-requirements-market
Explore at:
Dataset updated
Mar 2025
Dataset authored and provided by
Exactitude Consultancy
License
https://exactitudeconsultancy.com/privacy-policyhttps://exactitudeconsultancy.com/privacy-policy
Description
The market is projected to be valued at $X million in 2024, driven by factors such as increasing consumer awareness and the rising prevalence of industry-specific trends. The market is expected to grow at a CAGR of Y%, reaching approximately $Z million by 2034.
Z
Data from: Mining the Technical Roles of GitHub Users
data.niaid.nih.gov
zenodo.org
Updated Feb 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mining the Technical Roles of GitHub Users [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2559483
Explore at:
Dataset updated
Feb 24, 2022
Dataset provided by
Marco Tulio
Luciana L.
João Eduardo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains the scripts and dataset used in the study reported at Mining the Technical Roles of GitHub Users paper. The files are described in more detailed below:

processed_ground_truth.csv: A CSV file with the information of the developers considered in the study. Due to privacy issues, we already preprocessed the dataset to remove identification clues. Please contact the authors in case you need the original one.

processed_ground_truth_fullstack.csv: Same CSV file but with fullstack developers.

script.ipynb, utils.py: Source code of the script used in our study.

Dockerfile, docker-compose.yml, requirements.txt: Files to replicate the code environment used in this study.

BoW-tuning.csv: List of classifications results for different bag of words parameters.
f
fdata-02-00012_Identifying Travel Regions Using Location-Based Social...
figshare.com
frontiersin.figshare.com
pdf
Updated May 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Avradip Sen; Linus W. Dietz (2023). fdata-02-00012_Identifying Travel Regions Using Location-Based Social Network Check-in Data.pdf [Dataset]. http://doi.org/10.3389/fdata.2019.00012.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fdata.2019.00012.s001
Dataset updated
May 31, 2023
Dataset provided by
Frontiers
Authors
Avradip Sen; Linus W. Dietz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Travel regions are not necessarily defined by political or administrative boundaries. For example, in the Schengen region of Europe, tourists can travel freely across borders irrespective of national borders. Identifying transboundary travel regions is an interesting problem which we aim to solve using mobility analysis of Twitter users. Our proposed solution comprises collecting geotagged tweets, combining them into trajectories and, thus, mining thousands of trips undertaken by twitter users. After aggregating these trips into a mobility graph, we apply a community detection algorithm to find coherent regions throughout the world. The discovered regions provide insights into international travel and can reveal both domestic and transnational travel regions.
Data Warehouse as a Service (DWaaS) Market By End-User (Government & Public...
zionmarketresearch.com
pdf
Updated Mar 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zion Market Research (2025). Data Warehouse as a Service (DWaaS) Market By End-User (Government & Public Sector, Media & Entertainment, Manufacturing, Travel & Hospitality, Telecom & IT, Healthcare & Pharmaceutical, Retail, E-Commerce, BFSI, and Others), By Organization Size (Large Enterprises and Small & Medium Enterprises), By Deployment Model (Hybrid, Private, and Public Deployment Models), By Usage (Data Mining, Reporting, and Analytics), By Application (Fraud Detection & Threat Management, Supply Chain Management, Asset Management, Risk & Compliance Management, Customer Analytics, and Others), By Type (Operational Data Stores and Enterprise DWaaS), And By Region - Global And Regional Industry Overview, Market Intelligence, Comprehensive Analysis, Historical Data, And Forecasts 2024 - 2032- [Dataset]. https://www.zionmarketresearch.com/report/data-warehouse-as-a-service-market
Explore at:
pdfAvailable download formats
Dataset updated
Mar 12, 2025
Dataset provided by
Authors
Zion Market Research
License
https://www.zionmarketresearch.com/privacy-policyhttps://www.zionmarketresearch.com/privacy-policy
Time period covered
2022 - 2030
Area covered
Global
Description
Global Data Warehouse as a Service (DWaaS) Market valued at USD 5.03 Billion in 2023 and is predicted to USD 30.37 Billion by 2032, with a CAGR of 22.1%.
Automotive Artificial Intelligence Market Size By Technology (Computer...
verifiedmarketresearch.com
Updated Sep 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
VERIFIED MARKET RESEARCH (2024). Automotive Artificial Intelligence Market Size By Technology (Computer Vision, Context Awareness), Process (Data Mining, Image Recognition), Application (Semi-Autonomous Driving, Human Machine Interface), & Region for 2024-2031 [Dataset]. https://www.verifiedmarketresearch.com/product/automotive-artificial-intelligence-market/
Explore at:
Dataset updated
Sep 24, 2024
Dataset provided by
Verified Market Researchhttps://www.verifiedmarketresearch.com/
Authors
VERIFIED MARKET RESEARCH
License
https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Time period covered
2024 - 2031
Area covered
Global
Description
Automotive Artificial Intelligence Market size was valued at USD 2.3 Billion in 2024 and is projected to reach USD 12.94 Billion by 2031, growing at a CAGR of 24.1% from 2024 to 2031.

Global Automotive Artificial Intelligence Market Drivers

Growing Need for Autonomous Vehicles (AVs): The growing need for autonomous vehicles is one of the main factors propelling the automotive artificial intelligence industry. Artificial Intelligence plays a major role in AVs’ ability to perceive, plan, and control. The need for automotive AI is anticipated to increase in tandem with the maturation of AV technology and the growing acceptance of AVs by consumers.
Increasing Adoption of Advanced Driver-Assistance Systems (ADAS): ADAS refers to a group of technologies that automate or support driving operations via the use of sensors and software. Features including adaptive cruise control, lane departure warning, and automated emergency braking are included in these systems. The need for automotive AI is being driven by the increasing use of ADAS, these systems need AI skills to work well.
Tight Government Restrictions for Safe Driving: Tight government restrictions for safe driving are being implemented by governments all over the world. Automotive AI is becoming more and more necessary as a result of these rules, which are also driving the development of ADAS and other safety technologies in cars.
Focus on Convenience Features and Improved User Experience: Cars with amenities that make driving more pleasurable and convenient are becoming more and more in demand from consumers. Voice recognition, in-car personalization, and gesture control are just a few of the AI-powered features that are gaining popularity. The market for automotive AI is anticipated to continue growing as a result of this trend.
Major OEM Investments: Major automakers are making significant investments in the advancement of artificial intelligence (AI) technologies for their automobiles. The creation of fresh, cutting-edge AI-powered automotive features is accelerating thanks to these investments.
E
Data from: Implicit aspect-based opinion mining and analysis of airline...
live.european-language-grid.eu
zenodo.org
csv
Updated May 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Implicit aspect-based opinion mining and analysis of airline industry based on user generated reviews [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7665
Explore at:
csvAvailable download formats
Dataset updated
May 29, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Mining opinions from reviews has been a field of ever-growing research. These include mining opinions on document level, sentence-level, and even aspect level of a review. While explicitly mentioned aspects in a review have been widely researched, very little work has been done in gathering opinions on aspects that are implied and not explicitly mentioned. E.g. “the flight was spacious and there was plenty of legroom”. This gives an opinion on the entities of the cabin and seat of an airline. Words like “spacious” and phrases like “plenty of legroom” help identify these implied entities and the opinions attached to them. Not much research has been done for gathering such implicit aspects and opinions for airline reviews. The present dataset is a manually annotated domain-specific aspect-based corpus that helps a study to extract and analyze opinions about such implied aspects and entities of airlines.
Global Fleet Management Tool For Mining Market Size By Deployment Type, By...
verifiedmarketresearch.com
Updated Sep 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
VERIFIED MARKET RESEARCH (2024). Global Fleet Management Tool For Mining Market Size By Deployment Type, By End-User Industry, By Application, By Geographic Scope And Forecast [Dataset]. https://www.verifiedmarketresearch.com/product/fleet-management-tool-for-mining-market/
Explore at:
Dataset updated
Sep 6, 2024
Dataset provided by
Verified Market Researchhttps://www.verifiedmarketresearch.com/
Authors
VERIFIED MARKET RESEARCH
License
https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Time period covered
2024 - 2031
Area covered
Global
Description
Fleet Management Tool For Mining Market size was valued at USD 3.5 Billion in 2023 and is projected to reach USD 6.8 Billion by 2031, growing at a CAGR of 9.5% during the forecasted period 2024 to 2031.
Global Fleet Management Tool For Mining Market Drivers
The market drivers for the Fleet Management Tool For Mining Market can be influenced by various factors. These may include:

• Increased Demand for Operational Efficiency: Mining companies are seeking to improve efficiency and productivity in their operations. Fleet management tools help optimize fleet performance, reduce downtime, and ensure timely maintenance, leading to cost savings and improved operational efficiency.
• Technological Advancements: The development of advanced technologies such as IoT, GPS, and real-time data analytics has significantly enhanced fleet management capabilities. These technologies enable better tracking, monitoring, and management of mining fleets, driving the adoption of fleet management tools.

Global Fleet Management Tool For Mining Market Restraints
Several factors can act as restraints or challenges for the Fleet Management Tool For Mining Market. These may include:

• High Initial Investment: The cost of implementing advanced fleet management tools can be significant, including expenses for software, hardware, and integration with existing systems. This high upfront investment may deter smaller mining companies from adopting these technologies.
• Complexity of Integration: Integrating fleet management tools with existing mining operations and equipment can be complex and time-consuming. This complexity may lead to resistance from companies accustomed to their current systems.

Facebook

Twitter

Click to copy link

Link copied

Cite

pinterest_dataset [Dataset]. https://data.mendeley.com/datasets/fs4k2zc5j5/2

pinterest_dataset

Explore at:

5 scholarly articles cite this dataset (View in Google Scholar)

Unique identifier

https://doi.org/10.17632/fs4k2zc5j5.2

Dataset updated

Oct 27, 2017

Authors

Juan Carlos Gomez

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Dataset with 72000 pins from 117 users in Pinterest. Each pin contains a short raw text and an image. The images are processed using a pretrained Convolutional Neural Network and transformed into a vector of 4096 features.

This dataset was used in the paper "User Identification in Pinterest Through the Refinement of a Cascade Fusion of Text and Images" to idenfity specific users given their comments. The paper is publishe in the Research in Computing Science Journal, as part of the LKE 2017 conference. The dataset includes the splits used in the paper.

There are nine files. text_test, text_train and text_val, contain the raw text of each pin in the corresponding split of the data. imag_test, imag_train and imag_val contain the image features of each pin in the corresponding split of the data. train_user and val_test_users contain the index of the user of each pin (between 0 and 116). There is a correspondance one-to-one among the test, train and validation files for images, text and users. There are 400 pins per user in the train set, and 100 pins per user in the validation and test sets each one.

If you have questions regarding the data, write to: jc dot gomez at ugto dot mx

Clear search

Close search

Google apps

Main menu

pinterest_dataset

Data from: PADMINI: A PEER-TO-PEER DISTRIBUTED ASTRONOMY DATA MINING SYSTEM...

Telemarketer & Regular user CDR phone records

Distributed Data Mining in Peer-to-Peer Networks

Confusion matrix.

fdata-02-00005-g0001_Location Prediction for Tweets.tif

Appendix - Mining User Behaviour from Smartphone data: a literature review

Review results of the manuscript "A Systematic Review on Privacy-Preserving...

Business Analytics Market Size By Component (Software, Services), By...

OceanXtremes: Oceanographic Data-Intensive Anomaly Detection and Analysis...

Predictive Analytics Market - by Software Solutions (Data Mining &...

ShoppingAppReviews Dataset

Data from: Twitter Big Data as a Resource for Exoskeleton Research: A...

Additional data requirements Market Market Research Report By Product Type...

Data from: Mining the Technical Roles of GitHub Users

fdata-02-00012_Identifying Travel Regions Using Location-Based Social...

Data Warehouse as a Service (DWaaS) Market By End-User (Government & Public...

Automotive Artificial Intelligence Market Size By Technology (Computer...

Data from: Implicit aspect-based opinion mining and analysis of airline...

Global Fleet Management Tool For Mining Market Size By Deployment Type, By...

pinterest_datasetSee More Versions

pinterest_dataset