Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract This aim of this paper is the acquisition of geographic data from the Foursquare application, using data mining to perform exploratory and spatial analyses of the distribution of tourist attraction and their density distribution in Rio de Janeiro city. Thus, in accordance with the Extraction, Transformation, and Load methodology, three research algorithms were developed using a tree hierarchical structure to collect information for the categories of Museums, Monuments and Landmarks, Historic Sites, Scenic Lookouts, and Trails, in the foursquare database. Quantitative analysis was performed of check-ins per neighborhood of Rio de Janeiro city, and kernel density (hot spot) maps were generated The results presented in this paper show the need for the data filtering process - less than 50% of the mined data were used, and a large part of the density of the Museums, Historic Sites, and Monuments and Landmarks categories is in the center of the city; while the Scenic Lookouts and Trails categories predominate in the south zone. This kind of analysis was shown to be a tool to support the city's tourist management in relation to the spatial localization of these categories, the tourists’ evaluations of the places, and the frequency of the target public.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The top 10 clusters of innovativeness research named using the dominant theme with the most important quantitative data (number of articles, average publication year, top three journals, and number of articles in each journal) until 2021.
Facebook
TwitterAttribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
During my Senior in the Shan Dong University, my tutor give me research direction of University thesis, which is bitcoin transaction data analysis, so I crawled all of bitcoin transaction data from January 2009 to February 2018.I make statistical analysis and quantitative analysis,I hope this data will give you some help, data mining is interesting and helping not only in the skill of data mining but also in our life.
I crawled these data from website https://www.blockchain.com/explorer, each file contains many blocks,the scope of blocks is reflected in the file name,e.g. this file 0-68732.csv is composed of zero block which is also called genesis block until 68732 block.if a block that didn't have input is not in this file. let's see the columns and rows, there has five columns, the Height column represent block height,the Input column represent the input address of this block,the Output column represent the output address of this block,the Sum column represent bitcoin transaction amount corresponding to the Output,the Time column represent the generation time of this block.A block contains many transactions.
The page is just part two of all data, others can be found here https://www.kaggle.com/shiheyingzhe/datasets
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Interactive network files. Interactive network files with all statistical and topological analyses. This is a Cytoscape.cys session. In order to open/view/modify this file please use the freely available Cytoscape software platform, available at http://www.cytoscape.org/download.php . (SIF 3413 kb)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Discover the 5 distinct types of data scientists your startup needs, from quantitative PhDs to operational analysts. Learn which role best fits your company's growth stage.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The SPHERE is students' performance in physics education research dataset. It is presented as a multi-domain learning dataset of students’ performance on physics that has been collected through several research-based assessments (RBAs) established by the physics education research (PER) community. A total of 497 eleventh-grade students were involved from three large and a small public high school located in a suburban district of a high-populated province in Indonesia. Some variables related to demographics, accessibility to literature resources, and students’ physics identity are also investigated. Some RBAs utilized in this data were selected based on concepts learned by the students in the Indonesian physics curriculum. We commenced the survey of students’ understanding on Newtonian mechanics at the end of the first semester using Force Concept Inventory (FCI) and Force and Motion Conceptual Evaluation (FMCE). In the second semester, we assessed the students’ scientific abilities and learning attitude through Scientific Abilities Assessment Rubrics (SAAR) and the Colorado Learning Attitudes about Science Survey (CLASS) respectively. The conceptual assessments were continued at the second semester measured through Rotational and Rolling Motion Conceptual Survey (RRMCS), Fluid Mechanics Concept Inventory (FMCI), Mechanical Waves Conceptual Survey (MWCS), Thermal Concept Evaluation (TCE), and Survey of Thermodynamic Processes and First and Second Laws (STPFaSL). We expect SPHERE could be a valuable dataset for supporting the advancement of the PER field particularly in quantitative studies. For example, there is a need to help advance research on using machine learning and data mining techniques in PER that might face challenges due to the unavailable dataset for the specific purpose of PER studies. SPHERE can be reused as a students’ performance dataset on physics specifically dedicated for PER scholars which might be willing to implement machine learning techniques in physics education.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Questionable research practices (QRPs) have been the focus of the scientific community amid greater scrutiny and evidence highlighting issues with replicability across many fields of science. To capture the most impactful publications and the main thematic domains in the literature on QRPs, this study uses a document co-citation analysis. The analysis was conducted on a sample of 341 documents that covered the past 50 years of research in QRPs. Nine major thematic clusters emerged. Statistical reporting and statistical power emerged as key areas of research, where systemic-level factors in how research is conducted are consistently raised as the precipitating factors for QRPs. There is also an encouraging shift in the focus of research into open science practices designed to address engagement in QRPs. Such a shift is indicative of the growing momentum of the open science movement, and more research can be conducted on how these practices are employed on the ground and how their uptake by researchers can be further promoted. However, the results suggest that, while pre-registration and registered reports receive the most research interest, less attention has been paid to other open science practices (e.g., data and methods sharing). Methods All data were downloaded from the Scopus platform on 6 February 2023. Data were retrieved using the string TITLE-ABS-KEY(``questionable research practice*"). The files contains information for 341 documents published between 1974–2023.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Qualitative data gathered from interviews that were conducted with case organisations. The data is analysed using a qualitative data analysis tool (AtlasTi) to code and generate network diagrams. Software such as Atlas.ti 8 Windows will be a great advantage to use in order to view these results. Interviews were conducted with four case organisations. The details of the responses from the respondents from case organisations are captured. The data gathered during the interview sessions is captured in a tabular form and graphs were also created to identify trends. Also in this study is desktop review of the case organisations that formed part of the study. The desktop study was done using published annual reports over a period of more than seven years. The analysis was done given the scope of the project and its constructs.
Facebook
TwitterThe importance of data science skills for modern scientific research cannot be understated. Although policy documents increasingly recommend what skills should be included in undergraduate statistics and data science curricula, little is known about how students actually develop and apply these skills. This paper addresses this gap through an in-depth case study tracing one student’s learning progressions throughout her master’s program. Using a qualitative method to analyze student code, which has seen little use in statistics education research, I examined how Alicia transferred the data science skills from her applied statistics course into authentic research settings. The analysis shows that, while Alicia successfully navigated new challenges, she encountered persistent hurdles when extending bivariate techniques into multivariate contexts, particularly with visualizations and summary statistics. These findings highlight the obs..., R Script files submitted by Alicia (pseudonym) over the course of the study. The files are named according to when they were submitted:
December 2018
R Script #1
April 2019
R Script #1 (revised) R Script #2
September 2019
R Script #1 (revised) R Script #2 (revised)
Qualitative Data Analysis Files (Rich text files)
December 2018 Script #1 April 2019 Script #1 April 2019 Script #2 September 2019 Script #1 September 2019 Script #2
Quantitative Data Analysis Files
r-code-themes.csv
Comma separated values file with separate sheets for each R script Each sheet contains the qualitative code assigned to each line of code and whether the code contained errors.
, , # Code from: Beyond the classroom: Alicia’s multivariate journey
https://doi.org/10.5061/dryad.c59zw3rg6
This repository contains the R script files submitted by Alicia (pseudonym) throughout this study, files associated with the qualitative analysis of the code, and files associated with visualizations of the qualitative themes included in Alicia's code.
As this is a qualitative analysis, the usage of these "data" files differs from a typical quantitative analysis.
.R Files contain the scripts generated by Alicia at each time point (December 2018, April 2019, September 2019)-codes.rft Files contain the (qualitative) process codes for each R scriptr-code-themes.xlsx The file contains information on every script and the qualitative code assigned to each line of code.While the "data" for this analysis are R scripts, these scripts cannot be execu...,
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The objective of this paper is to explore the potential of Large Language Models (LLMs) for assisting with quantitative data analysis in social science research. Specifically, it aims to introduce key concepts to help researchers effectively integrate LLMs into their workflows. For this purpose, we replicate a research paper in educational leadership on the relationship between school program coherence and student achievement. By leveraging LLMs to generate code for statistical tools like Mplus and R, researchers can streamline their data analysis, potentially saving time and effort. The quality of analytical code generated by LLMs can be influenced by the researcher’s understanding and application of concepts like context windows, LLM training data and training cut-off, model parameter settings like temperature, zero- and few-shot learning, and Retrieval-Augmented Generation. By describing and demonstrating the applications of these concepts, we aim to equip researchers with a basic toolset to leverage LLMs effectively to assist with coding for quantitative analysis.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Systematic reviews are the method of choice to synthesize research evidence. To identify main topics (so-called hot spots) relevant to large corpora of original publications in need of a synthesis, one must address the “three Vs” of big data (volume, velocity, and variety), especially in loosely defined or fragmented disciplines. For this purpose, text mining and predictive modeling are very helpful. Thus, we applied these methods to a compilation of documents related to digitalization in aesthetic, arts, and cultural education, as a prototypical, loosely defined, fragmented discipline, and particularly to quantitative research within it (QRD-ACE). By broadly querying the abstract and citation database Scopus with terms indicative of QRD-ACE, we identified a corpus of N = 55,553 publications for the years 2013–2017. As the result of an iterative approach of text mining, priority screening, and predictive modeling, we identified n = 8,304 potentially relevant publications of which n = 1,666 were included after priority screening. Analysis of the subject distribution of the included publications revealed video games as a first hot spot of QRD-ACE. Topic modeling resulted in aesthetics and cultural activities on social media as a second hot spot, related to 4 of k = 8 identified topics. This way, we were able to identify current hot spots of QRD-ACE by screening less than 15% of the corpus. We discuss implications for harnessing text mining, predictive modeling, and priority screening in future research syntheses and avenues for future original research on QRD-ACE. Dataset for: Christ, A., Penthin, M., & Kröner, S. (2019). Big Data and Digital Aesthetic, Arts, and Cultural Education: Hot Spots of Current Quantitative Research. Social Science Computer Review, 089443931988845. https://doi.org/10.1177/0894439319888455:
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains data about all Nobel Prizes and their respective recipients from 1901 to 2023 as well as the code to regenerate the dataset to include future years. This dataset can be used to conduct quantitative analysis and was created to fulfill an assignment for COSC426: Introduction to Data Mining.
Facebook
TwitterSystematic reviews are the method of choice to synthesize research evidence. To identify main topics (so-called hot spots) relevant to large corpora of original publications in need of a synthesis, one must address the “three Vs” of big data (volume, velocity, and variety), especially in loosely defined or fragmented disciplines. For this purpose, text mining and predictive modeling are very helpful. Thus, we applied these methods to a compilation of documents related to digitalization in aesthetic, arts, and cultural education, as a prototypical, loosely defined, fragmented discipline, and particularly to quantitative research within it (QRD-ACE). By broadly querying the abstract and citation database Scopus with terms indicative of QRD-ACE, we identified a corpus of N = 55,553 publications for the years 2013–2017. As the result of an iterative approach of text mining, priority screening, and predictive modeling, we identified n = 8,304 potentially relevant publications of which n = 1,666 were included after priority screening. Analysis of the subject distribution of the included publications revealed video games as a first hot spot of QRD-ACE. Topic modeling resulted in aesthetics and cultural activities on social media as a second hot spot, related to 4 of k = 8 identified topics. This way, we were able to identify current hot spots of QRD-ACE by screening less than 15% of the corpus. We discuss implications for harnessing text mining, predictive modeling, and priority screening in future research syntheses and avenues for future original research on QRD-ACE. Dataset for: Christ, A., Penthin, M., & Kröner, S. (2019). Big Data and Digital Aesthetic, Arts, and Cultural Education: Hot Spots of Current Quantitative Research. Social Science Computer Review, 089443931988845. https://doi.org/10.1177/0894439319888455
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Systematic reviews are the method of choice to synthesize research evidence. To identify main topics (so-called hot spots) relevant to large corpora of original publications in need of a synthesis, one must address the “three Vs” of big data (volume, velocity, and variety), especially in loosely defined or fragmented disciplines. For this purpose, text mining and predictive modeling are very helpful. Thus, we applied these methods to a compilation of documents related to digitalization in aesthetic, arts, and cultural education, as a prototypical, loosely defined, fragmented discipline, and particularly to quantitative research within it (QRD-ACE). By broadly querying the abstract and citation database Scopus with terms indicative of QRD-ACE, we identified a corpus of N = 55,553 publications for the years 2013–2017. As the result of an iterative approach of text mining, priority screening, and predictive modeling, we identified n = 8,304 potentially relevant publications of which n = 1,666 were included after priority screening. Analysis of the subject distribution of the included publications revealed video games as a first hot spot of QRD-ACE. Topic modeling resulted in aesthetics and cultural activities on social media as a second hot spot, related to 4 of k = 8 identified topics. This way, we were able to identify current hot spots of QRD-ACE by screening less than 15% of the corpus. We discuss implications for harnessing text mining, predictive modeling, and priority screening in future research syntheses and avenues for future original research on QRD-ACE. Dataset for: Christ, A., Penthin, M., & Kröner, S. (2019). Big Data and Digital Aesthetic, Arts, and Cultural Education: Hot Spots of Current Quantitative Research. Social Science Computer Review, 089443931988845. https://doi.org/10.1177/0894439319888455:
Facebook
Twitterhttps://www.lseg.com/en/policies/website-disclaimerhttps://www.lseg.com/en/policies/website-disclaimer
Assess risk in publically traded companies with LSEG's StarMine Text Mining Credit Risk Model (TMCR), scoring over 38,000 companies.
Facebook
Twitterthis data was use to analize the transformative role of machine learning (ML) techniques in refining Management Control Systems (MCS) to bolster predictive analytics capabilities within varied organizational contexts. Utilizing a mixed-methods approach, this research synthesizes a comprehensive quantitative analysis of Bloomberg's extensive dataset, encompassing 4,500 companies across multiple industries from 2015 to 2023
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We present a quantitative analysis of small RNA dynamics during the transition from hPSCs to the three germ layer lineages to identify spatiotemporal-specific small RNAs that may be involved in hPSC differentiation. To determine the degree of spatiotemporal specificity, we utilized two algorithms, namely normalized maximum timepoint specificity index (NMTSI) and across-tissue specificity index (ASI). NMTSI could identify spatiotemporal-specific small RNAs that go up or down at just one timepoint in a specific lineage. ASI could identify spatiotemporal-specific small RNAs that maintain high expression from intermediate timepoints to the terminal timepoint in a specific lineage. Beyond analyzing single small RNAs, we also quantified the spatiotemporal-specificity of microRNA families and observed their differential expression patterns in certain lineages. To clarify the regulatory effects of group miRNAs on cellular events during lineage differentiation, we performed a gene ontology (GO) analysis on the downstream targets of synergistically up- and downregulated microRNAs. To provide an integrated interface for researchers to access and browse our analysis results, we designed a web-based tool at https://keyminer.pythonanywhere.com/km/.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains a Low Energy Electron Microscopy dataset consisting of raw data of both a dark field and a bright field spectroscopic image series of a region of few layer graphene on Silicon Carbide. Additionally it contains calibration data: a dark count dataset, a HDR calibration dataset and two curves showcasing the difference between HDR and non-HDR imaging.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This portfolio provides a detailed analysis of the EUR/USD currency pair on a 15-minute timeframe, aiming to explore market patterns, volatility, and potential opportunities for developing algorithmic trading strategies.
Included in this work:
Data cleaning and preprocessing of historical records.
Exploratory analysis of prices, volumes, and movement ranges.
Pattern detection such as consecutive candles, trends, and reversals.
Quantitative metrics to assess risk and performance.
Dataset preparation for backtesting and predictive modeling.
This project is designed for traders, quantitative analysts, and data science enthusiasts interested in applying analytical methods to Forex markets, with a practical and replicable approach to generating financial insights.
Facebook
Twitterhttps://www.gesis.org/en/institute/data-usage-termshttps://www.gesis.org/en/institute/data-usage-terms
These interview data are part of the project "Looking for data: information seeking behaviour of survey data users", a study of secondary data users’ information-seeking behaviour. The overall goal of this study was to create evidence of actual information practices of users of one particular retrieval system for social science data in order to inform the development of research data infrastructures that facilitate data sharing. In the project, data were collected based on a mixed methods design. The research design included a qualitative study in the form of expert interviews and – building on the results found therein – a quantitative web survey of secondary survey data users. For the qualitative study, expert interviews with six reference persons of a large social science data archive have been conducted. They were interviewed in their role as intermediaries who provide guidance for secondary users of survey data. The knowledge from their reference work was expected to provide a condensed view of goals, practices, and problems of people who are looking for survey data. The anonymized transcripts of these interviews are provided here. They can be reviewed or reused upon request. The survey dataset from the quantitative study of secondary survey data users is downloadable through this data archive after registration. The core result of the Looking for data study is that community involvement plays a pivotal role in survey data seeking. The analyses show that survey data communities are an important determinant in survey data users' information seeking behaviour and that community involvement facilitates data seeking and has the capacity of reducing problems or barriers. The qualitative part of the study was designed and conducted using constructivist grounded theory methodology as introduced by Kathy Charmaz (2014). In line with grounded theory methodology, the interviews did not follow a fixed set of questions, but were conducted based on a guide that included areas of exploration with tentative questions. This interview guide can be obtained together with the transcript. For the Looking for data project, the data were coded and scrutinized by constant comparison, as proposed by grounded theory methodology. This analysis resulted in core categories that make up the "theory of problem-solving by community involvement". This theory was exemplified in the quantitative part of the study. For this exemplification, the following hypotheses were drawn from the qualitative study: (1) The data seeking hypotheses: (1a) When looking for data, information seeking through personal contact is used more often than impersonal ways of information seeking. (1b) Ways of information seeking (personal or impersonal) differ with experience. (2) The experience hypotheses: (2a) Experience is positively correlated with having ambitious goals. (2b) Experience is positively correlated with having more advanced requirements for data. (2c) Experience is positively correlated with having more specific problems with data. (3) The community involvement hypothesis: Experience is positively correlated with community involvement. (4) The problem solving hypothesis: Community involvement is positively correlated with problem solving strategies that require personal interactions.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract This aim of this paper is the acquisition of geographic data from the Foursquare application, using data mining to perform exploratory and spatial analyses of the distribution of tourist attraction and their density distribution in Rio de Janeiro city. Thus, in accordance with the Extraction, Transformation, and Load methodology, three research algorithms were developed using a tree hierarchical structure to collect information for the categories of Museums, Monuments and Landmarks, Historic Sites, Scenic Lookouts, and Trails, in the foursquare database. Quantitative analysis was performed of check-ins per neighborhood of Rio de Janeiro city, and kernel density (hot spot) maps were generated The results presented in this paper show the need for the data filtering process - less than 50% of the mined data were used, and a large part of the density of the Museums, Historic Sites, and Monuments and Landmarks categories is in the center of the city; while the Scenic Lookouts and Trails categories predominate in the south zone. This kind of analysis was shown to be a tool to support the city's tourist management in relation to the spatial localization of these categories, the tourists’ evaluations of the places, and the frequency of the target public.