95 datasets found

f
Quantitative Research Methods and Data Analysis Workshop 2020
unisa.figshare.com
pdf
Updated Jun 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tracy Probert; Maxine Schaefer; Anneke Carien Wilsenach (2025). Quantitative Research Methods and Data Analysis Workshop 2020 [Dataset]. http://doi.org/10.25399/UnisaData.12581483.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.25399/UnisaData.12581483.v1
Dataset updated
Jun 12, 2025
Dataset provided by
University of South Africa
Authors
Tracy Probert; Maxine Schaefer; Anneke Carien Wilsenach
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We include the course syllabus used to teach quantitative research design and analysis methods to graduate Linguistics students using a blended teaching and learning approach. The blended course took place over two weeks and builds on a face to face course presented over two days in 2019. Students worked through the topics in preparation for a live interactive video session each Friday to go through the activities. Additional communication took place on Slack for two hours each week. A survey was conducted at the start and end of the course to ascertain participants' perceptions of the usefulness of the course. The links to online elements and the evaluations have been removed from the uploaded course guide.Participants who complete this workshop will be able to:- outline the steps and decisions involved in quantitative data analysis of linguistic data- explain common statistical terminology (sample, mean, standard deviation, correlation, nominal, ordinal and scale data)- perform common statistical tests using jamovi (e.g. t-test, correlation, anova, regression)- interpret and report common statistical tests- describe and choose from the various graphing options used to display data- use jamovi to perform common statistical tests and graph resultsEvaluationParticipants who complete the course will use these skills and knowledge to complete the following activities for evaluation:- analyse the data for a project and/or assignment (in part or in whole)- plan the results section of an Honours research project (where applicable)Feedback and suggestions can be directed to M Schaefer schaemn@unisa.ac.za
f
Data Sheet 1_Measuring shared knowledge in group discussions through text...
figshare.com
frontiersin.figshare.com
pdf
Updated Apr 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yoshiko Arima (2025). Data Sheet 1_Measuring shared knowledge in group discussions through text analysis.pdf [Dataset]. http://doi.org/10.3389/frsps.2025.1499850.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/frsps.2025.1499850.s001
Dataset updated
Apr 2, 2025
Dataset provided by
Frontiers
Authors
Yoshiko Arima
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This study addresses the challenge of quantifying shared knowledge in group discussions through text analysis. Topic modeling was applied to systematically evaluate how information sharing influences knowledge structures and decision-making. In an online group discussion setting, two mock jury experiments involving 204 participants were conducted to reach a consensus on a verdict for a fictional murder case. The first experiment investigated whether the bias in pre-shared information influenced the topic ratios of each participant. Topic ratios, derived from a Latent Dirichlet Allocation model, were assigned to each participant's chat lines. The presence or absence of shared information, as well as the type of information shared, systematically influenced the topic ratios that appeared in group discussions. In Experiment 2, false memories were assessed before and after the discussion to evaluate whether the topics identified in Experiment 1 measured shared knowledge. Mediation analysis indicated that a higher topic ratio related to evidence was statistically associated with an increased likelihood of false memory for evidence. These results suggested that topics yielded by LDA reflected the knowledge structure shared during group discussions.
Q
Data for: Debating Algorithmic Fairness
data.qdr.syr.edu
Updated Nov 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Melissa Hamilton; Melissa Hamilton (2023). Data for: Debating Algorithmic Fairness [Dataset]. http://doi.org/10.5064/F6JOQXNF
Explore at:
pdf(53179), pdf(63339), pdf(285052), pdf(103333), application/x-json-hypothesis(55745), pdf(256399), jpeg(101993), pdf(233414), pdf(536400), pdf(786428), pdf(2243113), pdf(109638), pdf(176988), pdf(59204), pdf(124046), pdf(802960), pdf(82120)Available download formats
Unique identifier
https://doi.org/10.5064/F6JOQXNF
Dataset updated
Nov 13, 2023
Dataset provided by
Qualitative Data Repository
Authors
Melissa Hamilton; Melissa Hamilton
License
https://qdr.syr.edu/policies/qdr-standard-access-conditionshttps://qdr.syr.edu/policies/qdr-standard-access-conditions
Time period covered
2008 - 2017
Area covered
United States
Description
This is an Annotation for Transparent Inquiry (ATI) data project. The annotated article can be viewed on the Publisher's Website. Data Generation The research project engages a story about perceptions of fairness in criminal justice decisions. The specific focus involves a debate between ProPublica, a news organization, and Northpointe, the owner of a popular risk tool called COMPAS. ProPublica wrote that COMPAS was racist against blacks, while Northpointe posted online a reply rejecting such a finding. These two documents were the obvious foci of the qualitative analysis because of the further media attention they attracted, the confusion their competing conclusions caused readers, and the power both companies wield in public circles. There were no barriers to retrieval as both documents have been publicly available on their corporate websites. This public access was one of the motivators for choosing them as it meant that they were also easily attainable by the general public, thus extending the documents’ reach and impact. Additional materials from ProPublica relating to the main debate were also freely downloadable from its website and a third party, open source platform. Access to secondary source materials comprising additional writings from Northpointe representatives that could assist in understanding Northpointe’s main document, though, was more limited. Because of a claim of trade secrets on its tool and the underlying algorithm, it was more difficult to reach Northpointe’s other reports. Nonetheless, largely because its clients are governmental bodies with transparency and accountability obligations, some of Northpointe-associated reports were retrievable from third parties who had obtained them, largely through Freedom of Information Act queries. Together, the primary and (retrievable) secondary sources allowed for a triangulation of themes, arguments, and conclusions. The quantitative component uses a dataset of over 7,000 individuals with information that was collected and compiled by ProPublica and made available to the public on github. ProPublica’s gathering the data directly from criminal justice officials via Freedom of Information Act requests rendered the dataset in the public domain, and thus no confidentiality issues are present. The dataset was loaded into SPSS v. 25 for data analysis. Data Analysis The qualitative enquiry used critical discourse analysis, which investigates ways in which parties in their communications attempt to create, legitimate, rationalize, and control mutual understandings of important issues. Each of the two main discourse documents was parsed on its own merit. Yet the project was also intertextual in studying how the discourses correspond with each other and to other relevant writings by the same authors. Several more specific types of discursive strategies were of interest in attracting further critical examination: Testing claims and rationalizations that appear to serve the speaker’s self-interest Examining conclusions and determining whether sufficient evidence supported them Revealing contradictions and/or inconsistencies within the same text and intertextually Assessing strategies underlying justifications and rationalizations used to promote a party’s assertions and arguments Noticing strategic deployment of lexical phrasings, syntax, and rhetoric Judging sincerity of voice and the objective consideration of alternative perspectives Of equal importance in a critical discourse analysis is consideration of what is not addressed, that is to uncover facts and/or topics missing from the communication. For this project, this included parsing issues that were either briefly mentioned and then neglected, asserted yet the significance left unstated, or not suggested at all. This task required understanding common practices in the algorithmic data science literature. The paper could have been completed with just the critical discourse analysis. However, because one of the salient findings from it highlighted that the discourses overlooked numerous definitions of algorithmic fairness, the call to fill this gap seemed obvious. Then, the availability of the same dataset used by the parties in conflict, made this opportunity more appealing. Calculating additional algorithmic equity equations would not thereby be troubled by irregularities because of diverse sample sets. New variables were created as relevant to calculate algorithmic fairness equations. In addition to using various SPSS Analyze functions (e.g., regression, crosstabs, means), online statistical calculators were useful to compute z-test comparisons of proportions and t-test comparisons of means. Logic of Annotation Annotations were employed to fulfil a variety of functions, including supplementing the main text with context, observations, counter-points, analysis, and source attributions. These fall under a few categories. Space considerations. Critical discourse analysis offers a rich method...
Voices from the field cultural capital transformation issues for dance...
data.niaid.nih.gov
datadryad.org
zip
Updated Dec 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jiabei Han (2024). Voices from the field cultural capital transformation issues for dance practitioners [Dataset]. http://doi.org/10.5061/dryad.8cz8w9h21
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.8cz8w9h21
Dataset updated
Dec 19, 2024
Dataset provided by
University of Edinburgh
Authors
Jiabei Han
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Data Collection Dance practitioners are a group in urgent need of attention. Reviewing the literature, there are problems in the professional status of dance practitioners. These include low and unstable financial income, limited professional development, multiple pressures and role conflicts faced by practitioners, social class and social acceptance, and the wider impact on the discipline of dance. As the core group that promotes the development of the professional field of dance, the contribution and return of dance practitioners do not match, which will largely frustrate the career expectations and sense of achievement of dance practitioners, leading to job burnout. This will not only lead to the loss of talent within the industry, but also affect the current and future development of the professional field of dance. Through induction, this study conducted semi-structured interviews with 10 qualified dance practitioners in Chinese universities, compared the cultural capital accumulated by dance practitioners in the early stage with the actual economic and social capital in the later stage, and investigated the status quo of cultural capital transformation of dance practitioners. Second, what challenges or obstacles do dance practitioners face in the transformation of cultural capital? And why they were able to hold on to their dance careers in the face of difficulties. Through the exploration of these stories, it can be seen that the transformation of cultural capital accumulated by dance practitioners into economic capital is difficult, and the transformation into social capital is more significant. Dance practitioners' love for dance and spiritual values are the main motivators that help them overcome challenges and obstacles. They now face the dual challenges of physical and mental health and job burnout. This study considers implications for future research and practical applications and hopes to provide a call for concern for the health and safety of dance practitioners, as well as relevant supporting materials. Methods Data Collection The teacher interviews were focused on June to July 2024. A semi-structured interview was conducted with each participant in a library discussion room at the University of Edinburgh, UK. Due to geographic and temporal differences, all interviews were conducted online through Teams, and the timing of the interviews was random, referencing the time of the participants. Each interview lasts 45 to 60 minutes. In order to make the interview more smooth, each participant has access to interview questions in advance. Before each participant was interviewed, during the interview, there were videotaped video recordings and audio interview scripts, and the key points emphasized by the interviewer to the participant were marked and recorded. The semi-structured interviews in this study include four themes: (1) cultural capital accumulation; (2) The transformation of cultural capital into economic and social capital; (3) Challenges faced and coping strategies; (4) Reasons and prospects for persistence. The purpose of the interview is to understand whether the cultural capital of dance practitioners (college dance teachers) such as dance professional knowledge, academic background, accumulated professional certificates and honors can be transformed into actual economic and social capital; As well as the actual performance and some challenges and obstacles in the transformation process; In the face of difficulties and pressure, I chose to stick to my dance career. During the interview, I asked the participants 23 questions, including open questions, closed questions and leading questions. After the participant described specific events and feelings during the interview, the interviewer would again summarize what the participant had said in a general way, such as: "This is......... ? What are you trying to say?" "So you think......" Ensure data accuracy. In addition, interviewers focus primarily on open-ended questions. When the intervieee is unable to continue to answer in-depth questions, the interviewer will guide to a certain extent, reducing the deliberate guidance and intervention of the interviewer. For example, "What did you just share...... Can you tell me more?" Based on your personal experience, what do you think is the cause of...... While it's helpful to have a basic interview guide, it's also important for interviewers to "actively listen and move the interview forward as much as possible by building on what the participants have already begun to share" (Seidman, 2013). All participants' data is kept in onedrive's university account and can only be shared between the author and the tutor. All data will be destroyed within 30 days of the completion of the paper. Data Analysis In my data analysis, I used the thematic analysis approach. Thematic analysis is a method of identifying and recording relevant patterns in methodological data that, despite multiple approaches, often follows the process from coding data to reporting and discussing analytical topics. By extracting statements from large amounts of qualitative data, thematic analysis can enable data analysis to become coherent and transparent to the reader, and thus can strongly support data analysis. Theme analysis consists of six steps: familiarization with the data, preliminary coding, finding topics, reviewing topics, defining and naming topics, and finally writing a report (Braun & Clarke, 2006; Miles & Huberman). Each participant interviews were audio-recorded and transcribed. First, I read the data of each participant and paid special attention to the exclusivity of the data during the second level of coding. To facilitate preliminary coding, interviews irrelevant to the study question were excluded, and single sentences were collated into complete paragraphs to match each interview question. Merriam (2009) believes that coding is the process of the researcher reading the data, noting interesting, potentially relevant or important parts, and conducting conversations, questions and comments with the data. In this study, I adopted open coding, which implies maintaining an open mind during coding. In the first level coding, I directly coded the participants' statements, marking the original data paragraphs or sentences that fully fit the research question; in the second level coding, I read the complete data and summarize words and phrases next to the text (Merriam, 2009). Merriam (2009) mentioned that "data analysis is a complex process involving repeated switching between concrete data and abstract concepts, between inductive and deductive reasoning, and between description and interpretation" (p.176). Therefore, I moved back and forth between data fragments, descriptions, and interpretations, looking for common clues to these themes (Fraser, 2004). Due to the open encoding of the data content, 53 secondary codes were generated, which posed challenges for my subsequent topic definition and naming. By analyzing the common clues of these contents, I continue to summarize the secondary coding into seven tertiary coding themes that can directly answer, define and name the research questions. Four-level coding corresponds to the four questions in this study, each corresponding to the three-level coding topic and answering the four questions of the study .
a
Collision Analysis with R
hub.arcgis.com
Updated Oct 22, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Civic Analytics Network (2016). Collision Analysis with R [Dataset]. https://hub.arcgis.com/documents/1e1b49837b4d454e8b218697fc4fee40
Explore at:
Dataset updated
Oct 22, 2016
Dataset authored and provided by
Civic Analytics Network
Description
Taking place at the Leeds Institute for Data Analytics on April 27th as part of the Leeds Digital Festival, the aim of the Vision Zero Innovation Lab is to explore ways to reduce the number of road casualties to zero in Leeds. If you would like to get involved or find out more, check out the event on eventbrite.Student Data Labs runs data-driven Innovation Labs for university students to learn practical data skills whilst working on civic problems. In the past, we have held Labs that tackle Type 2 Diabetes and health inequalities in Leeds. Student Data Labs works with an interdisciplinary team of students, data scientists, designers, researchers and software developers. We also aim to connect our Data Lab Volunteers with local employers who may be interested in employing them upon graduation. Visit our website, Twitter or Facebook for more info.The Vision Zero Innovation Lab is split into two sections - a Learning Lab and a Innovation Lab. The Learning Lab helps students learn real-world data skills - getting them up and running with tools like R as well as common data science problems as part of a team. The Innovation Lab is more experimental, where the aim is to develop ideas and data-driven tools to take on wicked problems.
W
Workforce Analytics Industry Report
datainsightsmarket.com
doc, pdf, ppt
Updated Jan 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Workforce Analytics Industry Report [Dataset]. https://www.datainsightsmarket.com/reports/workforce-analytics-industry-11591
Explore at:
pdf, ppt, docAvailable download formats
Dataset updated
Jan 2, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The size of the Workforce Analytics Industry market was valued at USD XX Million in 2023 and is projected to reach USD XXX Million by 2032, with an expected CAGR of 15.64% during the forecast period.Workforce analytics is the collection, analysis, and interpretation of data regarding an organization's workforce in order to make better decisions and optimize human capital. Advanced analytics techniques can be used by organizations to provide valuable insights into employee performance, engagement, productivity, and other key metrics.Workforce analytics helps the organization make fact-based decisions while acquiring, retaining, developing, and compensating talent. Then the patterns that could be applied to predict future workforce needs would help solve potential problems before they arise and optimize usage from historical data analysis. Workforce analytics further allows an organization to find potential talent, measure the ROI of training programs, and assess the effectiveness of the organizational change initiatives.Using the power of workforce analytics, organizations can make their workforce much more connected, productive, and effective in conducting businesses successfully. Recent developments include: September 2022: ActivTrak partnered with Google Workspace to provide personal work insights that enable employees to improve their digital work habits and wellness. Customers can embed individual work metrics into their Google Workspace applications with ActivTrak for Google Workspace, giving employees immediate visibility to help them redesign their workday, protect focus time, and improve well-being., August 2022: ADP has launched Intelligent Self-Service, which assists employees with common issues before they need to contact their HR department for assistance. Based on an analysis of data from across ADP's ecosystem, the product employs predictive analytics and machine learning to predict which issues may arise.. Key drivers for this market are: Increasing Need to Make a Smarter a Decision About the Talent, Increasing Data in HR Departments related to Pay rolls, Recruitment. Potential restraints include: Lack of Awareness About Workforce Analytics. Notable trends are: Performance Monitoring Offers Potential Growth.
o
Dating App Sentiment Analysis Dataset
opendatabay.com
.undefined
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Dating App Sentiment Analysis Dataset [Dataset]. https://www.opendatabay.com/data/consumer/77355978-301e-414e-8094-a205b7a505b6
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 3, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Reviews & Ratings
Description
This dataset provides a collection of user reviews and ratings for dating applications, primarily sourced from the Google Play Store for the Indian region between 2017 and 2022. It offers valuable insights into user sentiment, evolving trends, and common feedback regarding dating apps. The data is particularly useful for practising Natural Language Processing (NLP) tasks such as sentiment analysis, topic modelling, and identifying user concerns.

Columns

Index: A unique identifier for each review entry.

Name: The name of the user who left the review.

Username: The username of the reviewer.

Review: The textual content of the review left by the user.

Rating: The numerical rating given by the user to the app, indicating their satisfaction level.

#ThumbsUp: A measure of how useful the review was perceived to be by other users.

Date&Time: The specific date and time when the review was posted.

App: The name of the dating application being reviewed.

Label Count: A numerical label, the specific purpose of which is not detailed in the provided information, but it appears to relate to ranges of index or other numerical values within the dataset.

Distribution

The dataset is typically provided in a CSV file format. It contains a substantial number of records, estimated to be around 527,000 individual reviews. This makes it suitable for large-scale data analysis and machine learning projects. The dataset structure is tabular, with clearly defined columns for review content, metadata, and user feedback. Specific row/record counts are not exact but are indicated by the extensive range of index labels.

Usage

This dataset is ideally suited for a variety of analytical and machine learning applications: * Analysing trends in dating app usage and perception over the years. * Determining which dating applications receive more favourable responses and if this consistency has changed over time. * Identifying common issues reported by users who give low ratings (below 3/5). * Investigating the correlation between user enthusiasm and their app ratings. * Performing sentiment analysis on review texts to gauge overall user sentiment. * Developing Natural Language Processing (NLP) models for text classification, entity recognition, or summarisation. * Examining the perceived usefulness of top-rated reviews. * Understanding user behaviour and preferences across different dating apps.

Coverage

The dataset primarily covers user reviews from the Google Play Store, specifically for the Indian country region ('in'), despite being titled as "all regions" in some contexts. The data spans a time range from 2017 to 2022, offering a multi-year perspective on dating app trends and user feedback. There are no specific demographic details for the reviewers themselves beyond their reviews and ratings.

License

CCO

Who Can Use It

This dataset is suitable for: * Data Scientists and Analysts: For conducting deep dives into user sentiment, trend analysis, and predictive modelling. * NLP Practitioners and Researchers: As a practical dataset for training and evaluating natural language processing models, especially for text classification and sentiment analysis tasks. * App Developers and Product Managers: To understand user feedback, identify areas for improvement in their own or competing dating applications, and inform product development strategies. * Market Researchers: To gain insights into the consumer behaviour and preferences within the online dating market. * Students and Beginners: It is tagged as 'Beginner' friendly, making it a good resource for those new to data analysis or NLP projects.

Dataset Name Suggestions

Google Play Dating App Reviews (India, 2017-2022)

Indian Dating App User Reviews

Mobile Dating App Reviews & Ratings

Dating App Sentiment Analysis Dataset

Google Play Dating App Feedback

Attributes

Original Data Source: Dating Apps Reviews 2017-2022 (all regions)
NOAA NCCOS Assessment: Prioritizing Areas for Future Seafloor Mapping and...
zenodo.org
datasets.ai
+4more
zip
Updated Oct 26, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jennifer Kraus; Bethany Williams; Tim Battista; Ken Buja; Jennifer Kraus; Bethany Williams; Tim Battista; Ken Buja (2023). NOAA NCCOS Assessment: Prioritizing Areas for Future Seafloor Mapping and Exploration in the U.S. Caribbean from 2019-06-28 to 2019-07-28 [Dataset]. http://doi.org/10.5281/zenodo.3909729
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3909729
Dataset updated
Oct 26, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jennifer Kraus; Bethany Williams; Tim Battista; Ken Buja; Jennifer Kraus; Bethany Williams; Tim Battista; Ken Buja
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Caribbean, United States
Description
Spatial information about the seafloor is critical for decision-making by marine resource science, management and tribal organizations. Coordinating data needs can help organizations leverage collective resources to meet shared goals. To help enable this coordination, the National Oceanic and Atmospheric Administration (NOAA) National Centers for Coastal Ocean Science (NCCOS) developed a spatial framework, process and online application to identify common data collection priorities for seafloor mapping, sampling and visual surveys off the US Caribbean territories of Puerto Rico and the US Virgin Islands. Fifteen participants from local federal, state, and academic institutions entered their priorities in an online application, using virtual coins to denote their priorities in 2.5x2.5 kilometer (nearshore) and 10x10 kilometer (offshore) grid size. Grid cells with more coins were higher priorities than cells with fewer coins. Participants also reported why these locations were important and what data types were needed. Results were analyzed and mapped using statistical techniques to identify significant relationships between priorities, reasons for those priorities and data needs. Fifteen high priority locations were broadly identified for future mapping, sampling and visual surveys. These locations include: (1) a coastal location in northwest Puerto Rico (Punta Jacinto to Punta Agujereada), (2) a location approximately 11 km off Punta Agujereada, (3) coastal Rincon, (4) San Juan, (5) Punta Arenas (west of Vieques Island), (6) southwest Vieques, (7) Grappler Seamount, (8) southern Virgin Passage, (9) north St. Thomas, (10) east St. Thomas, (11) south St. John, (12) west offshore St. Croix, (13) west nearshore St. Croix, (14) east nearshore St. Croix, and (15) east offshore St. Croix. Participants consistently selected (1) Biota/Important Natural Area, (2) Commercial Fishing and (3) Coastal/Marine Hazards as their top reasons (i.e., justifications) for prioritizing locations, and (1) Benthic Habitat Map and (2) Sub-bottom Profiles as their top data or product needs. This ESRI shapefile summarizes the results from this spatial prioritization effort. This information will enable US Caribbean organization to more efficiently leverage resources and coordinate their mapping of high priority locations in the region.

This effort was funded by NOAA’s NCCOS and supported by CRCP. The overall goal of the project was to systematically gather and quantify suggestions for seafloor mapping, sampling and visual surveys in the US Caribbean territories of Puerto Rico and the US Virgin Islands. The results are will help organizations in the US Caribbean identify locations where their interests overlap with other organizations, to coordinate their data needs and to leverage collective resources to meet shared goals.

There were four main steps in the US Caribbean spatial prioritization process. The first step was to identify the technical advisory team, which included the 4 CRCP members: 2 from each the Puerto Rico and USVI regions. This advisory team recommended 33 organizations to participate in the prioritization. Each organization was then requested to designate a single representative, or respondent, who would have access to the web tool. The respondent would be responsible for communicating with their team about their needs and inputting their collective priorities. Step two was to develop the spatial framework and an online application. To do this, the US Caribbean was divided into 4 sub regions: nearshore and offshore for both Puerto Rico and USVI. The total inshore regions had 2,387 square grid cells approximately 2.5x2.5 km in size. The total offshore regions consisted of 438 square grid cells 10x10 km in size. Existing relevant spatial datasets (e.g., bathymetry, protected area boundaries, etc.) were compiled to help participants understand information and data gaps and to identify areas they wanted to prioritize for future data collections. These spatial datasets were housed in the online application, which was developed using Esri’s Web AppBuilder. In step three, this online application was used by 15 participants to enter their priorities in each subregion of interest. Respondents allocated virtual coins in the grid cells to denote their priorities for each region. Respondents were given access to all four regions, despite which territory they represented, but were not required to provide input into each region. Grid cells with more coins were higher priorities than cells with fewer coins. Participants also reported why these locations were important and what data types were needed. Coin values were standardized across the nearshore and offshore zones and used to identify spatial patterns across the US Caribbean region as a whole. The number of coins were standardized because each subregion had a different number of grid cells and participants. Standardized coin values were analyzed and mapped using statistical techniques, including hierarchical cluster analysis, to identify significant relationships between priorities, reasons for those priorities and data needs. This ESRI shapefile contains the 2.5x2.5 km and 10x10 km grid cells used in this prioritization effort and associated the standardized coin values overall, as well as by organization, justification and product. For a complete description of the process and analysis please see: Kraus et al. 2020.
Amazon Product Reviews
kaggle.com
Updated Nov 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Amazon Product Reviews [Dataset]. https://www.kaggle.com/datasets/thedevastator/amazon-product-reviews/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 26, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Amazon Product Reviews

18 Years of Customer Ratings and Experiences

By Huggingface Hub [source]

About this dataset

The Amazon Reviews Polarity Dataset discloses eighteen years of customers' ratings and reviews from Amazon.com, offering an unparalleled trove of insight and knowledge. Drawing from the immense pool of over 35 million customer reviews, this dataset presents a broad spectrum of customer opinions on products they have bought or used. This invaluable data is a gold mine for improving products and services as it contains comprehensive information regarding customers' experiences with a product including ratings, titles, and plaintext content. At the same time, this dataset contains both customer-specific data along with product information which encourages deep analytics that could lead to great advances in providing tailored solutions for customers. Has your product been favored by the majority? Are there any aspects that need extra care? Use Amazon Reviews Polarity to gain deeper insights into what your customers want - explore now!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

Analyze customer ratings to identify trends: Take a look at how many customers have rated the same product or service with the same score (e.g., 4 stars). You can use this information to identify what customers like or don’t like about it by examining common sentiment throughout the reviews. Identifying these patterns can help you make decisions on which features of your products or services to emphasize in order to boost sales and satisfaction rates.

2 Review content analysis: Analyzing review content is one of the best ways to gauge customer sentiment toward specific features or aspects of a product/service. Using natural language processing tools such as Word2Vec, Latent Dirichlet Allocation (LDA), or even simple keyword search algorithms can quickly reveal general topics that are discussed in relation to your product/service across multiple reviews - allowing you quickly pinpoint areas that may need improvement for particular items within your lines of business.

3 Track associated scores over time: By tracking customer ratings overtime, you may be able to better understand when there has been an issue with something specific related to your product/service - such as negative response toward a feature that was introduced but didn’t seem popular among customers and was removed shortly after introduction.. This can save time and money by identifying issues before they become widespread concerns with larger sets of consumers who invest their money in using your company's item(s).

4 Visualize sentiment data over time graphs : Utilizing visualizations such as bar graphs can help identify trends across different categories quicker than raw numbers alone; combining both numeric values along with color differences associated between different scores allows you spot anomalies easier - allowing faster resolution times when trying figure out why certain spikes occurred where other stayed stable (or vice-versa) when comparing similar data points through time-series based visualization models

Research Ideas

Developing a customer sentiment analysis system that can be used to quickly analyze the sentiment of reviews and identify any potential areas of improvement.

Building a product recommendation service that takes into account the ratings and reviews of customers when recommending similar products they may be interested in purchasing.

Training a machine learning model to accurately predict customers’ ratings on new products they have not yet tried and leverage this for further product development optimization initiatives

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: train.csv | Column name | Description | |:--------------|:-------------------------------------------------------------------| | label | The sentiment of the review, either positive or negative. (String) | | title | The title of the review. (String) ...
H
Replication Data for: The Use of Text as Data Methods in Public...
dataverse.harvard.edu
Updated Jul 25, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harvard Dataverse (2018). Replication Data for: The Use of Text as Data Methods in Public Administration: A Review and an Application to Agency Priorities (Journal of Public Administration Research and Theory) [Dataset]. http://doi.org/10.7910/DVN/HNURSV
Explore at:
tsv(372), tsv(3186), text/plain; charset=us-ascii(52844), tsv(6706)Available download formats
Unique identifier
https://doi.org/10.7910/DVN/HNURSV
Dataset updated
Jul 25, 2018
Dataset provided by
Harvard Dataverse
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Due to the large amounts of text generated by government agencies and policymakers, computer-assisted text-as-data methods are starting to become more popular for scholars of public administration, public policy, and political science, as they allow for much faster processing of large amounts of textual data. Here, I review several of the more common text-as-data methods and provide an overview of their applicability to different data structures and substantive questions in public administration. Then, using thousands of documents issued by the Centers for Medicare & Medicaid Services and its predecessor agency—the Health Care Financing Administration—I showcase the utility of topic models by illustrating how they can be used in conjunction with other politically-relevant covariates to help explain changes in agency priorities. I then conclude by discussing other possible uses for computational text analysis methods in public administration.
c
Unlocking User Sentiment: The App Store Reviews Dataset
crawlfeeds.com
json, zip
Updated Jun 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Unlocking User Sentiment: The App Store Reviews Dataset [Dataset]. https://crawlfeeds.com/datasets/app-store-reviews-dataset
Explore at:
json, zipAvailable download formats
Dataset updated
Jun 20, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
This dataset offers a focused and invaluable window into user perceptions and experiences with applications listed on the Apple App Store. It is a vital resource for app developers, product managers, market analysts, and anyone seeking to understand the direct voice of the customer in the dynamic mobile app ecosystem.

Dataset Specifications:

Investment: $45.0

Status: Published and immediately available.

Category: Ratings and Reviews Data

Format: Compressed ZIP archive containing JSON files, ensuring easy integration into your analytical tools and platforms.

Volume: Comprises 10,000 unique app reviews, providing a robust sample for qualitative and quantitative analysis of user feedback.

Timeliness: Last crawled: (This field is blank in your provided info, which means its recency is currently unknown. If this were a real product, specifying this would be critical for its value proposition.)

Richness of Detail (11 Comprehensive Fields):

Each record in this dataset provides a detailed breakdown of a single App Store review, enabling multi-dimensional analysis:

Review Content:

review: The full text of the user's written feedback, crucial for Natural Language Processing (NLP) to extract themes, sentiment, and common keywords.

title: The title given to the review by the user, often summarizing their main point.

isEdited: A boolean flag indicating whether the review has been edited by the user since its initial submission. This can be important for tracking evolving sentiment or understanding user behavior.

Reviewer & Rating Information:

username: The public username of the reviewer, allowing for analysis of engagement patterns from specific users (though not personally identifiable).

rating: The star rating (typically 1-5) given by the user, providing a quantifiable measure of satisfaction.

App & Origin Context:

app_name: The name of the application being reviewed.

app_id: A unique identifier for the application within the App Store, enabling direct linking to app details or other datasets.

country: The country of the App Store storefront where the review was left, allowing for geographic segmentation of feedback.

Metadata & Timestamps:

_id: A unique identifier for the specific review record in the dataset.

crawled_at: The timestamp indicating when this particular review record was collected by the data provider (Crawl Feeds).

date: The original date the review was posted by the user on the App Store.

Expanded Use Cases & Analytical Applications:

This dataset is a goldmine for understanding what users truly think and feel about mobile applications. Here's how it can be leveraged:

Product Development & Improvement:

Bug Detection & Prioritization: Analyze negative review text to identify recurring technical issues, crashes, or bugs, allowing developers to prioritize fixes based on user impact.

Feature Requests & Roadmap Prioritization: Extract feature suggestions from positive and neutral review text to inform future product roadmap decisions and develop features users actively desire.

User Experience (UX) Enhancement: Understand pain points related to app design, navigation, and overall usability by analyzing common complaints in the review field.

Version Impact Analysis: If integrated with app version data, track changes in rating and sentiment after new app updates to assess the effectiveness of bug fixes or new features.

Market Research & Competitive Intelligence:

Competitor Benchmarking: Analyze reviews of competitor apps (if included or combined with similar datasets) to identify their strengths, weaknesses, and user expectations within a specific app category.

Market Gap Identification: Discover unmet user needs or features that users desire but are not adequately provided by existing apps.

Niche Opportunities: Identify specific use cases or user segments that are underserved based on recurring feedback.

Marketing & App Store Optimization (ASO):

Sentiment Analysis: Perform sentiment analysis on the review and title fields to gauge overall user satisfaction, pinpoint specific positive and negative aspects, and track sentiment shifts over time.

Keyword Optimization: Identify frequently used keywords and phrases in reviews to optimize app store listings, improving discoverability and search ranking.

Messaging Refinement: Understand how users describe and use the app in their own words, which can inform marketing copy and advertising campaigns.

Reputation Management: Monitor rating trends and identify critical reviews quickly to facilitate timely responses and proactive customer engagement.

Academic & Data Science Research:

Natural Language Processing (NLP): The review and title fields are excellent for training and testing NLP models for sentiment analysis, topic modeling, named entity recognition, and text summarization.

User Behavior Analysis: Study patterns in rating distribution, isEdited status, and date to understand user engagement and feedback cycles.

Cross-Country Comparisons: Analyze country-specific reviews to understand regional differences in app perception, feature preferences, or cultural nuances in feedback.

This App Store Reviews dataset provides a direct, unfiltered conduit to understanding user needs and ultimately driving better app performance and greater user satisfaction. Its structured format and granular detail make it an indispensable asset for data-driven decision-making in the mobile app industry.
An IoT-Enriched Event Log for Smart Factories with Injected Data Quality...
zenodo.org
Updated May 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joscha Grüger; Joscha Grüger; Alexander Schultheis; Alexander Schultheis; Lukas Malburg; Lukas Malburg; Yannis Bertrand; Yannis Bertrand (2025). An IoT-Enriched Event Log for Smart Factories with Injected Data Quality Issues [Dataset]. http://doi.org/10.5281/zenodo.15487019
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.15487019
Dataset updated
May 22, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Joscha Grüger; Joscha Grüger; Alexander Schultheis; Alexander Schultheis; Lukas Malburg; Lukas Malburg; Yannis Bertrand; Yannis Bertrand
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Modern technologies such as the Internet of Things (IoT) play a key role in Smart Manufacturing and Business Process Management (BPM). In particular, process mining benefits from enriched event logs that incorporate physical sensor data. This dataset presents an IoT-enriched XES event log recorded in a physical smart factory environment. It builds upon the previously published dataset “An IoT-Enriched Event Log for Process Mining in Smart Factories” (available on Zenodo) and follows the DataStream XES extension. In this modified version, three types of common Data Quality Issues (DQIs) - missing sensor values, missing sensors, and time shifts - have been artificially injected into the sensor data. These issues reflect realistic challenges in industrial IoT data processing and are valuable for developing and testing robust data cleaning and analysis methods.

By comparing the original (clean) dataset with this modified version, researchers can systematically evaluate DQI detection, handling, and solving techniques under controlled conditions. Further details are provided for each of three DQI types in the subfolders in a csv changelog.
W
Data from: MSP Data Study - Evaluation of data and knowledge gaps to...
cloud.csiss.gmu.edu
png
Updated Jun 27, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Caribbean Marine Atlas (CMA) (2019). MSP Data Study - Evaluation of data and knowledge gaps to implement MSP [Dataset]. https://cloud.csiss.gmu.edu/uddi/dataset/msp-data-study-evaluation-of-data-and-knowledge-gaps-to-implement-msp
Explore at:
pngAvailable download formats
Dataset updated
Jun 27, 2019
Dataset provided by
Caribbean Marine Atlas (CMA)
Description
The MSP Data Study, undertaken on behalf of DG MARE between February and December 2016, presents an overview of what data and knowledge are needed by Member States for MSP decision making, taking into account different scales and different points in the MSP cycle. It examines current and future MSP data and knowledge issues from various perspectives (i.e. from Member States, Sea Basin(s) as well as projects and other relevant initiatives) in order to identify: - What data is available for MSP purposes and what data is actually used for MSP; - Commonalities in MSP projects and Member State experiences; - The potential for EMODnet sea basin portals to help coordination of MSP at a regional level and options for realising marine spatial data infrastructures to implement MSP; - Potential revisions to be made concerning INSPIRE specifications for MSP purposes. The study finds that across all European Sea Basins, countries are encountering similar issues with respect to MSP data needs. Differences are found in the scope of activities and sea uses between Member States and Sea Basins and the type of planning that is being carried out. Common data gaps include socio-economic data for different uses and socio-cultural information. By and large, data and information gaps are not so much about what data is missing but more about how to aggregate and interpret data in order to acquire the information needed by a planner. Challenges for Member States lie in developing second generation plans which require more analytical information and strategic evidence. Underlying this is the need for spatial evaluation tools for assessment, impact and conflict analysis purposes. Transnational MSP data needs are different to national MSP data needs. While the scope and level of detail of data needed is typically much simpler, ensuring its coherence and harmonisation across boundaries remains a challenge. Pan-European initiatives, such as the EMODnet data portals and Sea Basin Checkpoints have the potential to support transboundary MSP data exchange needs by providing access to a range of harmonised data sets across European Sea Basins and testing the availability and adequacy of existing data sets to meet commercial and policy challenges
d
AI Training Data | US Transcription Data| Unique Consumer Sentiment Data:...
datarade.ai
Updated Jan 13, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WiserBrand.com (2025). AI Training Data | US Transcription Data| Unique Consumer Sentiment Data: Transcription of the calls to the companies [Dataset]. https://datarade.ai/data-products/wiserbrand-ai-training-data-us-transcription-data-unique-wiserbrand-com
Explore at:
.csv, .xls, .txt, .jsonAvailable download formats
Dataset updated
Jan 13, 2025
Dataset provided by
WiserBrand.com
Area covered
United States
Description
WiserBrand's Comprehensive Customer Call Transcription Dataset: Tailored Insights

WiserBrand offers a customizable dataset comprising transcribed customer call records, meticulously tailored to your specific requirements. This extensive dataset includes:

User ID and Firm Name: Identify and categorize calls by unique user IDs and company names. Call Duration: Analyze engagement levels through call lengths. Geographical Information: Detailed data on city, state, and country for regional analysis. Call Timing: Track peak interaction times with precise timestamps. Call Reason and Group: Categorised reasons for calls, helping to identify common customer issues. Device and OS Types: Information on the devices and operating systems used for technical support analysis. Transcriptions: Full-text transcriptions of each call, enabling sentiment analysis, keyword extraction, and detailed interaction reviews.

Our dataset is designed for businesses aiming to enhance customer service strategies, develop targeted marketing campaigns, and improve product support systems. Gain actionable insights into customer needs and behavior patterns with this comprehensive collection, particularly useful for Consumer Data, Consumer Behavior Data, Consumer Sentiment Data, Consumer Review Data, AI Training Data, Textual Data, and Transcription Data applications.

WiserBrand's dataset is essential for companies looking to leverage Consumer Data and B2B Marketing Data to drive their strategic initiatives in the English-speaking markets of the USA, UK, and Australia. By accessing this rich dataset, businesses can uncover trends and insights critical for improving customer engagement and satisfaction.

Cases:

Training Speech Recognition (Speech-to-Text) and Speech Synthesis (Text-to-Speech) Models WiserBrand's Comprehensive Customer Call Transcription Dataset is an excellent resource for training and improving speech recognition models (Speech-to-Text, STT) and speech synthesis systems (Text-to-Speech, TTS). Here’s how this dataset can contribute to these tasks:

Enriching STT Models: The dataset includes a wide variety of real-world customer service calls with diverse accents, tones, and terminologies. This makes it highly valuable for training speech-to-text models to better recognize different dialects, regional speech patterns, and industry-specific jargon. It could help improve accuracy in transcribing conversations in customer service, sales, or technical support.

Contextualized Speech Recognition: Given the contextual information (e.g., reasons for calls, call categories, etc.), it can help models differentiate between various types of conversations (technical support vs. sales queries), which would improve the model’s ability to transcribe in a more contextually relevant manner.

Improving TTS Systems: The transcriptions, along with their associated metadata (such as call duration, timing, and call reason), can aid in training Text-to-Speech models that mimic natural conversation patterns, including pauses, tone variation, and proper intonation. This is especially beneficial for developing conversational agents that sound more natural and human-like in their responses.

Noise and Speech Quality Handling: Real-world customer service calls often contain background noise, overlapping speech, and interruptions, which are crucial elements for training speech models to handle real-life scenarios more effectively.

Training AI Agents for Replacing Customer Service Representatives WiserBrand’s dataset can be incredibly valuable for businesses looking to develop AI-powered customer support agents that can replace or augment human customer service representatives. Here’s how this dataset supports AI agent training:

Customer Interaction Simulation: The transcriptions provide a comprehensive view of real customer interactions, including common queries, complaints, and support requests. By training AI models on this data, businesses can equip their virtual agents with the ability to understand customer concerns, follow up on issues, and provide meaningful solutions, all while mimicking human-like conversational flow.

Sentiment Analysis and Emotional Intelligence: The full-text transcriptions, along with associated call metadata (e.g., reason for the call, call duration, and geographical data), allow for sentiment analysis, enabling AI agents to gauge the emotional tone of customers. This helps the agents respond appropriately, whether it’s providing reassurance during frustrating technical issues or offering solutions in a polite, empathetic manner. Such capabilities are essential for improving customer satisfaction in automated systems.

Customizable Dialogue Systems: The dataset allows for categorizing and identifying recurring call patterns and issues. This means AI agents can be trained to recognize the types of queries that come up frequently, allowing them to automate routine tasks such as ...

Industrial screw driving dataset collection: Time series data for process...

zenodo.org

tar

Updated Jan 30, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Nikolai West; Nikolai West; Jochen Deuse; Jochen Deuse (2025). Industrial screw driving dataset collection: Time series data for process monitoring and anomaly detection [Dataset]. http://doi.org/10.5281/zenodo.14769379

Explore at:

tarAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.14769379

Dataset updated

Jan 30, 2025

Dataset provided by

Nikolai West

Authors

Nikolai West; Nikolai West; Jochen Deuse; Jochen Deuse

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Industrial Screw Driving Datasets

Overview

This repository contains a collection of real-world industrial screw driving datasets, designed to support research in manufacturing process monitoring, anomaly detection, and quality control. Each dataset represents different aspects and challenges of automated screw driving operations, with a focus on natural process variations and degradation patterns.

Scenario name	Number of work pieces used in the experiments	Repetitions (screw cylces) per workpiece	Individual screws per workpiece	Total number of observations	Number of unique classes	Purpose
S01_thread-degradation	100	25	2	5.000	1	Investigation of thread degradation through repeated fastening
S02_surface-friction	250	25	2	12.500	8	Surface friction effects on screw driving operations
S03_error-collection-1		1	2		>20
S04_error-collection-2	2.500	1	2	5.000	25

Dataset Collection

The datasets were collected from operational industrial environments, specifically from automated screw driving stations used in manufacturing. Each scenario investigates specific mechanical phenomena that can occur during industrial screw driving operations:

Currently Available Datasets:

1. S01_thread-degradation

Focus: Investigation of thread degradation through repeated fastening
Samples: 5,000 screw operations (4,089 normal, 911 faulty)
Features: Natural degradation patterns, no artificial error induction
Equipment: Delta PT 40x12 screws, thermoplastic components
Process: 25 cycles per location, two locations per workpiece
First published in: HICSS 2024 (West & Deuse, 2024)

2. S02_surface-friction

Focus: Surface friction effects on screw driving operations
Samples: 12,500 screw operations (9,512 normal, 2,988 faulty)
Features: Eight distinct surface conditions (baseline to mechanical damage)
Equipment: Delta PT 40x12 screws, thermoplastic components, surface treatment materials
Process: 25 cycles per location, two locations per workpiece
First published in: CIE51 2024 (West & Deuse, 2024) [DOI will be added after publication]

Upcoming Datasets:

3. S03_screw-error-collection-1 (recorded but unpublished)

Focus: Varius manipulations of the screw driving process
Features: More than 20 different errors recorded
First published in: Publication planned
Status: In preparation

4. S04_screw-error-collection-2 (recorded but unpublished)

Focus: Varius manipulations of the screw driving process
Features: 25 distinct errors recorded over the course of a week
First published in: Publication planned
Status: In preparation

5. S05_upper-workpiece-manipulations (recorded but unpublished)

Manipulations of the injection molding process with no changes during tightening

6. S06_lower-workpiece-manipulations (recorded but unpublished)

Manipulations of the injection molding process with no changes during tightening

Additional scenarios may be added to this collection as they become available.

Data Format

Each dataset follows a standardized structure:

JSON files containing individual screw operation data
CSV files with operation metadata and labels
Comprehensive documentation in README files
Example code for data loading and processing is available in the companion library PyScrew

Research Applications

These datasets are suitable for various research purposes:

Machine learning model development and validation
Process monitoring and control systems
Quality assurance methodology development
Manufacturing analytics research
Anomaly detection algorithm benchmarking

Usage Notes

All datasets include both normal operations and natural process anomalies
Complete time series data for torque, angle, and additional parameters available
Detailed documentation of experimental conditions and setup
Data collection procedures and equipment specifications available

Access and Citation

These datasets are provided under an open-access license to support research and development in manufacturing analytics. When using any of these datasets, please cite the corresponding publication as detailed in each dataset's README file.

Related Tools

We recommend using our library PyScrew to load and prepare the data. However, the the datasets can be processed using standard JSON and CSV processing libraries. Common data analysis and machine learning frameworks may be used for the analysis. The .tar file provided all information required for each scenario.

Documentation

Each dataset includes:

Detailed README files
Data format specifications
Equipment and process parameters
Experimental setup documentation
Citation information

Contact and Support

For questions, issues, or collaboration interests regarding these datasets, please:

Open an issue in the respective GitHub repository
Contact the authors through the provided institutional channels

Acknowledgments

These datasets were collected and prepared from:

RIF Institute for Research and Transfer e.V.
Technical University Dortmund, Institute for Production Systems

The research was supported by:

German Ministry of Education and Research (BMBF)
European Union's "NextGenerationEU" program
The research is part of this funding program
More information regarding the research project is available here

Data from: Untargeted metabolomics workshop report: quality control...
data.niaid.nih.gov
xml
Updated Dec 17, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prasad Phapale (2020). Untargeted metabolomics workshop report: quality control considerations from sample preparation to data analysis [Dataset]. https://data.niaid.nih.gov/resources?id=mtbls1301
Explore at:
xmlAvailable download formats
Dataset updated
Dec 17, 2020
Dataset provided by
EMBL
Authors
Prasad Phapale
Variables measured
tumor, Metabolomics
Description
The Metabolomics workshop on experimental and data analysis training for untargeted metabolomics was hosted by the Proteomics Society of India in December 2019. The Workshop included six tutorial lectures and hands-on data analysis training sessions presented by seven speakers. The tutorials and hands-on data analysis sessions focused on workflows for liquid chromatography-mass spectrometry (LC-MS) based on untargeted metabolomics. We review here three main topics from the workshop which were uniquely identified as bottlenecks for new researchers: a) experimental design, b) quality controls during sample preparation and instrumental analysis and c) data quality evaluation. Our objective here is to present common challenges faced by novice researchers and present possible guidelines and resources to address them. We provide resources and good practices for researchers who are at the initial stage of setting up metabolomics workflows in their labs.Complete detailed metabolomics/lipidomics protocols are available online at EMBL-MCF protocol including video tutorials.
o
Global iPhone Reviews Dataset
opendatabay.com
.undefined
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Global iPhone Reviews Dataset [Dataset]. https://www.opendatabay.com/data/consumer/42533232-0299-4752-8408-4579f2251a34
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 3, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Reviews & Ratings
Description
This dataset provides customer reviews for Apple iPhones, sourced from Amazon. It is designed to facilitate in-depth analysis of user feedback, enabling insights into product sentiment, feature performance, and underlying discussion themes. The dataset is ideal for understanding customer satisfaction and market trends related to iPhone products.

Columns

productAsin: Amazon's unique identifier for a product.

country: The country where the review was submitted.

date: The date the review was submitted.

isVerified: A boolean indicator showing if the reviewer is a verified purchaser. Approximately 93% of reviewers are verified.

ratingScore: The numerical rating given to the product, typically on a scale from 1 to 5.

reviewTitle: The title of the customer's review.

reviewDescription: The detailed text content of the review.

reviewUrl: The specific URL of the individual review.

reviewedIn: The particular product or category for which the review was left.

variant: If applicable, details of the specific product variant or version reviewed, such as 'Colour: BlueSize: 128 GB'.

Distribution

The dataset is typically provided in a CSV file format. While specific record counts are not available, data points related to verified purchasers indicate over 3,000 entries. The dataset's quality is rated as 5 out of 5.

Usage

This dataset is well-suited for various analytical projects, including: * Sentiment analysis: To determine overall sentiment and identify trends in customer opinions. * Feature analysis: To analyse user satisfaction with specific iPhone features. * Topic modelling: To discover underlying themes and common discussion points within customer reviews. * Exploratory Data Analysis (EDA): For initial investigations and pattern discovery. * Natural Language Processing (NLP) tasks: For text analysis and understanding.

Coverage

The dataset has a global regional coverage. While a specific time range for the reviews is not detailed, the dataset itself was listed on 08/06/2025.

License

CCO

Who Can Use It

Data Scientists: For developing and applying machine learning models for sentiment analysis and topic modelling.

Product Managers: To gain insights into customer satisfaction and identify areas for product improvement.

Market Researchers: To understand market trends, competitor analysis, and consumer preferences for electronics.

Academics and Students: For research projects focused on consumer behaviour, text analysis, and data science.

Dataset Name Suggestions

iPhone Customer Review Data

Apple iPhone Review Dataset

Smartphone User Feedback Data

Global iPhone Reviews

Amazon iPhone Review Data

Attributes

Original Data Source: Apple IPhone Customer Reviews
d
L2L Microarray Analysis Tool
dknet.org
Updated Sep 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). L2L Microarray Analysis Tool [Dataset]. http://identifiers.org/RRID:SCR_013440/resolver/mentions?q=&i=rrid
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_013440 https://identifiers.org/RRID:SCR_013440/resolver/mentions?q=&i=rrid
Dataset updated
Sep 3, 2024
Description
THIS RESOURCE IS NO LONGER IN SERVICE, documented May 10, 2017. A pilot effort that has developed a centralized, web-based biospecimen locator that presents biospecimens collected and stored at participating Arizona hospitals and biospecimen banks, which are available for acquisition and use by researchers. Researchers may use this site to browse, search and request biospecimens to use in qualified studies. The development of the ABL was guided by the Arizona Biospecimen Consortium (ABC), a consortium of hospitals and medical centers in the Phoenix area, and is now being piloted by this Consortium under the direction of ABRC. You may browse by type (cells, fluid, molecular, tissue) or disease. Common data elements decided by the ABC Standards Committee, based on data elements on the National Cancer Institute''s (NCI''s) Common Biorepository Model (CBM), are displayed. These describe the minimum set of data elements that the NCI determined were most important for a researcher to see about a biospecimen. The ABL currently does not display information on whether or not clinical data is available to accompany the biospecimens. However, a requester has the ability to solicit clinical data in the request. Once a request is approved, the biospecimen provider will contact the requester to discuss the request (and the requester''s questions) before finalizing the invoice and shipment. The ABL is available to the public to browse. In order to request biospecimens from the ABL, the researcher will be required to submit the requested required information. Upon submission of the information, shipment of the requested biospecimen(s) will be dependent on the scientific and institutional review approval. Account required. Registration is open to everyone.. Documented on August 26, 2019.Database of published microarray gene expression data, and a software tool for comparing that published data to a user''''s own microarray results. It is very simple to use - all you need is a web browser and a list of the probes that went up or down in your experiment. If you find L2L useful please consider contributing your published data to the L2L Microarray Database in the form of list files. L2L finds true biological patterns in gene expression data by systematically comparing your own list of genes to lists of genes that have been experimentally determined to be co-expressed in response to a particular stimulus - in other words, published lists of microarray results. The patterns it finds can point to the underlying disease process or affected molecular function that actually generated the observed changed in gene expression. Its insights are far more systematic than critical gene analyses, and more biologically relevant than pure Gene Ontology-based analyses. The publications included in the L2L MDB initially reflected topics thought to be related to Cockayne syndrome: aging, cancer, and DNA damage. Since then, the scope of the publications included has expanded considerably, to include chromatin structure, immune and inflammatory mediators, the hypoxic response, adipogenesis, growth factors, hormones, cell cycle regulators, and others. Despite the parochial origins of the database, the wide range of topics covered will make L2L of general interest to any investigator using microarrays to study human biology. In addition to the L2L Microarray Database, L2L contains three sets of lists derived from Gene Ontology categories: Biological Process, Cellular Component, and Molecular Function. As with the L2L MDB, each GO sub-category is represented by a text file that contains annotation information and a list of the HUGO symbols of the genes assigned to that sub-category or any of its descendants. You don''''t need to download L2L to use it to analyze your microarray data. There is an easy-to-use web-based analysis tool, and you have the option of downloading your results so you can view them at any time on your own computer, using any web browser. However, if you prefer, the entire L2L project, and all of its components, can be downloaded from the download page. Platform: Online tool, Windows compatible, Mac OS X compatible, Linux compatible, Unix compatible
Data from: tableone: An open source Python package for producing summary...
zenodo.org
datadryad.org
csv, txt
Updated May 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tom J. Pollard; Alistair E. W. Johnson; Jesse D. Raffa; Roger G. Mark; Tom J. Pollard; Alistair E. W. Johnson; Jesse D. Raffa; Roger G. Mark (2022). Data from: tableone: An open source Python package for producing summary statistics for research papers [Dataset]. http://doi.org/10.5061/dryad.26c4s35
Explore at:
csv, txtAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.26c4s35
Dataset updated
May 30, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Tom J. Pollard; Alistair E. W. Johnson; Jesse D. Raffa; Roger G. Mark; Tom J. Pollard; Alistair E. W. Johnson; Jesse D. Raffa; Roger G. Mark
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Objectives: In quantitative research, understanding basic parameters of the study population is key for interpretation of the results. As a result, it is typical for the first table ("Table 1") of a research paper to include summary statistics for the study data. Our objectives are 2-fold. First, we seek to provide a simple, reproducible method for providing summary statistics for research papers in the Python programming language. Second, we seek to use the package to improve the quality of summary statistics reported in research papers.

Materials and Methods: The tableone package is developed following good practice guidelines for scientific computing and all code is made available under a permissive MIT License. A testing framework runs on a continuous integration server, helping to maintain code stability. Issues are tracked openly and public contributions are encouraged.

Results: The tableone software package automatically compiles summary statistics into publishable formats such as CSV, HTML, and LaTeX. An executable Jupyter Notebook demonstrates application of the package to a subset of data from the MIMIC-III database. Tests such as Tukey's rule for outlier detection and Hartigan's Dip Test for modality are computed to highlight potential issues in summarizing the data.

Discussion and Conclusion: We present open source software for researchers to facilitate carrying out reproducible studies in Python, an increasingly popular language in scientific research. The toolkit is intended to mature over time with community feedback and input. Development of a common tool for summarizing data may help to promote good practice when used as a supplement to existing guidelines and recommendations. We encourage use of tableone alongside other methods of descriptive statistics and, in particular, visualization to ensure appropriate data handling. We also suggest seeking guidance from a statistician when using tableone for a research study, especially prior to submitting the study for publication.
Dataset for "Geospatial analysis of toponyms in geotagged social media...
zenodo.org
zip
Updated Oct 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Takayuki Hiraoka; Takayuki Hiraoka; Takashi Kirimura; Takashi Kirimura; Naoya Fujiwara; Naoya Fujiwara (2024). Dataset for "Geospatial analysis of toponyms in geotagged social media posts" [Dataset]. http://doi.org/10.5281/zenodo.13860969
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13860969
Dataset updated
Oct 1, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Takayuki Hiraoka; Takayuki Hiraoka; Takashi Kirimura; Takashi Kirimura; Naoya Fujiwara; Naoya Fujiwara
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Geotagged Twitter posts dataset

Dataset used for the research presented in the following paper: Takayuki Hiraoka, Takashi Kirimura, Naoya Fujiwara (2024) "Geospatial analysis of toponyms in geo-tagged social media posts".

We collected georeferenced Twitter posts tagged to coordinates inside the bounding box of Japan between 2012-2018. The present dataset represents the spatial distributions of all geotagged posts as well as posts containing in the text each of 24 domestic toponyms, 12 common nouns, and 6 foreign toponyms. The code used to analyze the data is available on GitHub.

Data description

selected_geotagged_tweet_data/: Number of geotagged twitter posts in each grid cell. Each csv file under this directory associates each grid cell (spanning 30 seconds of latitude and 45 secoonds of longitude, which is approximately a 1km x 1km square, specified by an 8 digit code m3code) with the number of geotagged tweets tagged to the coordinates inside that cell (tweetcount). file_names.json relates each of the toponyms studied in this work to the corresponding datafile (all denotes the full data).

population/population_center_2020.xlsx: Center of population of each municipality based on the 2020 census. Derived from data published by the Statistics Bureau of Japan on their website (Japanese)

population/census2015mesh3_totalpop_setai.csv: Resident population in each grid cell based on the 2015 census. Derived from data published by the Statistics Bureau of Japan on e-stat (Japanese)

population/economiccensus2016mesh3_jigyosyo_jugyosya.csv: Employed population in each grid cell based on the 2016 Economic Census. Derived from data published by the Statistics Bureau of Japan on e-stat (Japanese)

japan_MetropolitanEmploymentArea2015map/: Shape file for the boundaries of Metropolitan Employment Areas (MEA) in Japan. See this website for details of MEA.

ward_shapefiles/: Shape files for the boundaries of wards in large cities, published by the Statistics Bureau of Japan on e-stat

Facebook

Twitter

Click to copy link

Link copied

Cite

Tracy Probert; Maxine Schaefer; Anneke Carien Wilsenach (2025). Quantitative Research Methods and Data Analysis Workshop 2020 [Dataset]. http://doi.org/10.25399/UnisaData.12581483.v1

Quantitative Research Methods and Data Analysis Workshop 2020

Explore at:

pdfAvailable download formats

Unique identifier

https://doi.org/10.25399/UnisaData.12581483.v1

Dataset updated

Jun 12, 2025

Dataset provided by

University of South Africa

Authors

Tracy Probert; Maxine Schaefer; Anneke Carien Wilsenach

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

We include the course syllabus used to teach quantitative research design and analysis methods to graduate Linguistics students using a blended teaching and learning approach. The blended course took place over two weeks and builds on a face to face course presented over two days in 2019. Students worked through the topics in preparation for a live interactive video session each Friday to go through the activities. Additional communication took place on Slack for two hours each week. A survey was conducted at the start and end of the course to ascertain participants' perceptions of the usefulness of the course. The links to online elements and the evaluations have been removed from the uploaded course guide.Participants who complete this workshop will be able to:- outline the steps and decisions involved in quantitative data analysis of linguistic data- explain common statistical terminology (sample, mean, standard deviation, correlation, nominal, ordinal and scale data)- perform common statistical tests using jamovi (e.g. t-test, correlation, anova, regression)- interpret and report common statistical tests- describe and choose from the various graphing options used to display data- use jamovi to perform common statistical tests and graph resultsEvaluationParticipants who complete the course will use these skills and knowledge to complete the following activities for evaluation:- analyse the data for a project and/or assignment (in part or in whole)- plan the results section of an Honours research project (where applicable)Feedback and suggestions can be directed to M Schaefer schaemn@unisa.ac.za

Clear search

Close search

Google apps

Main menu

Quantitative Research Methods and Data Analysis Workshop 2020

Data Sheet 1_Measuring shared knowledge in group discussions through text...

Data for: Debating Algorithmic Fairness

Voices from the field cultural capital transformation issues for dance...

Collision Analysis with R

Workforce Analytics Industry Report

Dating App Sentiment Analysis Dataset

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

NOAA NCCOS Assessment: Prioritizing Areas for Future Seafloor Mapping and...

Amazon Product Reviews

Amazon Product Reviews

18 Years of Customer Ratings and Experiences

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Replication Data for: The Use of Text as Data Methods in Public...

Unlocking User Sentiment: The App Store Reviews Dataset

An IoT-Enriched Event Log for Smart Factories with Injected Data Quality...

Data from: MSP Data Study - Evaluation of data and knowledge gaps to...

AI Training Data | US Transcription Data| Unique Consumer Sentiment Data:...

Industrial screw driving dataset collection: Time series data for process...

Industrial Screw Driving Datasets

Overview

Dataset Collection

Currently Available Datasets:

Upcoming Datasets:

Data Format

Research Applications

Usage Notes

Access and Citation

Related Tools

Documentation

Contact and Support

Acknowledgments

Data from: Untargeted metabolomics workshop report: quality control...

Global iPhone Reviews Dataset

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

L2L Microarray Analysis Tool

Data from: tableone: An open source Python package for producing summary...

Dataset for "Geospatial analysis of toponyms in geotagged social media...

Geotagged Twitter posts dataset

Data description

Quantitative Research Methods and Data Analysis Workshop 2020