100+ datasets found
  1. Data from: STRATEGY FOR EXTRACTION OF FOURSQUARE’S SOCIAL MEDIA GEOGRAPHIC...

    • scielo.figshare.com
    jpeg
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paula Fernandez Costa; Irving da Silva Badolato; Rogério Luís Ribeiro Borba; Julia Celia Mercedes Strauch (2023). STRATEGY FOR EXTRACTION OF FOURSQUARE’S SOCIAL MEDIA GEOGRAPHIC INFORMATION THROUGH DATA MINING [Dataset]. http://doi.org/10.6084/m9.figshare.8031641.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    SciELOhttp://www.scielo.org/
    Authors
    Paula Fernandez Costa; Irving da Silva Badolato; Rogério Luís Ribeiro Borba; Julia Celia Mercedes Strauch
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract This aim of this paper is the acquisition of geographic data from the Foursquare application, using data mining to perform exploratory and spatial analyses of the distribution of tourist attraction and their density distribution in Rio de Janeiro city. Thus, in accordance with the Extraction, Transformation, and Load methodology, three research algorithms were developed using a tree hierarchical structure to collect information for the categories of Museums, Monuments and Landmarks, Historic Sites, Scenic Lookouts, and Trails, in the foursquare database. Quantitative analysis was performed of check-ins per neighborhood of Rio de Janeiro city, and kernel density (hot spot) maps were generated The results presented in this paper show the need for the data filtering process - less than 50% of the mined data were used, and a large part of the density of the Museums, Historic Sites, and Monuments and Landmarks categories is in the center of the city; while the Scenic Lookouts and Trails categories predominate in the south zone. This kind of analysis was shown to be a tool to support the city's tourist management in relation to the spatial localization of these categories, the tourists’ evaluations of the places, and the frequency of the target public.

  2. The top 10 clusters of innovativeness research named using the dominant...

    • plos.figshare.com
    xls
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yousif Elsamani; Cristian Mejia; Yuya Kajikawa (2023). The top 10 clusters of innovativeness research named using the dominant theme with the most important quantitative data (number of articles, average publication year, top three journals, and number of articles in each journal) until 2021. [Dataset]. http://doi.org/10.1371/journal.pone.0280005.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Yousif Elsamani; Cristian Mejia; Yuya Kajikawa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The top 10 clusters of innovativeness research named using the dominant theme with the most important quantitative data (number of articles, average publication year, top three journals, and number of articles in each journal) until 2021.

  3. Bitcoin data part two from Jan 2009 to Feb 2018

    • kaggle.com
    zip
    Updated Apr 17, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ZouJiu (2020). Bitcoin data part two from Jan 2009 to Feb 2018 [Dataset]. https://www.kaggle.com/shiheyingzhe/bitcoin-data-part-two-from-jan-2009-to-feb-2018
    Explore at:
    zip(10311105755 bytes)Available download formats
    Dataset updated
    Apr 17, 2020
    Authors
    ZouJiu
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    During my Senior in the Shan Dong University, my tutor give me research direction of University thesis, which is bitcoin transaction data analysis, so I crawled all of bitcoin transaction data from January 2009 to February 2018.I make statistical analysis and quantitative analysis,I hope this data will give you some help, data mining is interesting and helping not only in the skill of data mining but also in our life.

    I crawled these data from website https://www.blockchain.com/explorer, each file contains many blocks,the scope of blocks is reflected in the file name,e.g. this file 0-68732.csv is composed of zero block which is also called genesis block until 68732 block.if a block that didn't have input is not in this file. let's see the columns and rows, there has five columns, the Height column represent block height,the Input column represent the input address of this block,the Output column represent the output address of this block,the Sum column represent bitcoin transaction amount corresponding to the Output,the Time column represent the generation time of this block.A block contains many transactions.

    The page is just part two of all data, others can be found here https://www.kaggle.com/shiheyingzhe/datasets

  4. Additional file 1 of Novel methods of qualitative analysis for health policy...

    • springernature.figshare.com
    txt
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mireya Martínez-García; Maite Vallejo; Enrique Hernández-Lemus; Jorge Alberto Álvarez-Díaz (2023). Additional file 1 of Novel methods of qualitative analysis for health policy research [Dataset]. http://doi.org/10.6084/m9.figshare.7587416.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Mireya Martínez-García; Maite Vallejo; Enrique Hernández-Lemus; Jorge Alberto Álvarez-Díaz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Interactive network files. Interactive network files with all statistical and topological analyses. This is a Cytoscape.cys session. In order to open/view/modify this file please use the freely available Cytoscape software platform, available at http://www.cytoscape.org/download.php . (SIF 3413 kb)

  5. t

    Which of the Five Types of Data Science Does Your Startup Need? - Data...

    • tomtunguz.com
    Updated Oct 2, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tomasz Tunguz (2013). Which of the Five Types of Data Science Does Your Startup Need? - Data Analysis [Dataset]. https://tomtunguz.com/data-science-types/
    Explore at:
    Dataset updated
    Oct 2, 2013
    Dataset provided by
    Theory Ventures
    Authors
    Tomasz Tunguz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Discover the 5 distinct types of data scientists your startup needs, from quantitative PhDs to operational analysts. Learn which role best fits your company's growth stage.

  6. m

    SPHERE: Students' performance dataset of conceptual understanding,...

    • data.mendeley.com
    Updated Jan 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Purwoko Haryadi Santoso (2025). SPHERE: Students' performance dataset of conceptual understanding, scientific ability, and learning attitude in physics education research (PER) [Dataset]. http://doi.org/10.17632/88d7m2fv7p.2
    Explore at:
    Dataset updated
    Jan 15, 2025
    Authors
    Purwoko Haryadi Santoso
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The SPHERE is students' performance in physics education research dataset. It is presented as a multi-domain learning dataset of students’ performance on physics that has been collected through several research-based assessments (RBAs) established by the physics education research (PER) community. A total of 497 eleventh-grade students were involved from three large and a small public high school located in a suburban district of a high-populated province in Indonesia. Some variables related to demographics, accessibility to literature resources, and students’ physics identity are also investigated. Some RBAs utilized in this data were selected based on concepts learned by the students in the Indonesian physics curriculum. We commenced the survey of students’ understanding on Newtonian mechanics at the end of the first semester using Force Concept Inventory (FCI) and Force and Motion Conceptual Evaluation (FMCE). In the second semester, we assessed the students’ scientific abilities and learning attitude through Scientific Abilities Assessment Rubrics (SAAR) and the Colorado Learning Attitudes about Science Survey (CLASS) respectively. The conceptual assessments were continued at the second semester measured through Rotational and Rolling Motion Conceptual Survey (RRMCS), Fluid Mechanics Concept Inventory (FMCI), Mechanical Waves Conceptual Survey (MWCS), Thermal Concept Evaluation (TCE), and Survey of Thermodynamic Processes and First and Second Laws (STPFaSL). We expect SPHERE could be a valuable dataset for supporting the advancement of the PER field particularly in quantitative studies. For example, there is a need to help advance research on using machine learning and data mining techniques in PER that might face challenges due to the unavailable dataset for the specific purpose of PER studies. SPHERE can be reused as a students’ performance dataset on physics specifically dedicated for PER scholars which might be willing to implement machine learning techniques in physics education.

  7. n

    Dataset for: Fifty years of research on questionable research practices in...

    • data.niaid.nih.gov
    • search.dataone.org
    • +1more
    zip
    Updated Sep 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michelle Jin Yee Neoh; Alessandro Carollo; Albert Lee; Gianluca Esposito (2023). Dataset for: Fifty years of research on questionable research practices in science: Quantitative analysis of co-citation patterns [Dataset]. http://doi.org/10.5061/dryad.2fqz612tx
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 26, 2023
    Dataset provided by
    Nanyang Technological University
    University of Trento
    Authors
    Michelle Jin Yee Neoh; Alessandro Carollo; Albert Lee; Gianluca Esposito
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Questionable research practices (QRPs) have been the focus of the scientific community amid greater scrutiny and evidence highlighting issues with replicability across many fields of science. To capture the most impactful publications and the main thematic domains in the literature on QRPs, this study uses a document co-citation analysis. The analysis was conducted on a sample of 341 documents that covered the past 50 years of research in QRPs. Nine major thematic clusters emerged. Statistical reporting and statistical power emerged as key areas of research, where systemic-level factors in how research is conducted are consistently raised as the precipitating factors for QRPs. There is also an encouraging shift in the focus of research into open science practices designed to address engagement in QRPs. Such a shift is indicative of the growing momentum of the open science movement, and more research can be conducted on how these practices are employed on the ground and how their uptake by researchers can be further promoted. However, the results suggest that, while pre-registration and registered reports receive the most research interest, less attention has been paid to other open science practices (e.g., data and methods sharing). Methods All data were downloaded from the Scopus platform on 6 February 2023. Data were retrieved using the string TITLE-ABS-KEY(``questionable research practice*"). The files contains information for 341 documents published between 1974–2023.

  8. u

    Data from: The use of project portfolios in effective strategy execution to...

    • researchdata.up.ac.za
    zip
    Updated May 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Palesa Agnes Ramashala (2023). The use of project portfolios in effective strategy execution to improve business value [Dataset]. http://doi.org/10.25403/UPresearchdata.13280141.v3
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    University of Pretoria
    Authors
    Palesa Agnes Ramashala
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Qualitative data gathered from interviews that were conducted with case organisations. The data is analysed using a qualitative data analysis tool (AtlasTi) to code and generate network diagrams. Software such as Atlas.ti 8 Windows will be a great advantage to use in order to view these results. Interviews were conducted with four case organisations. The details of the responses from the respondents from case organisations are captured. The data gathered during the interview sessions is captured in a tabular form and graphs were also created to identify trends. Also in this study is desktop review of the case organisations that formed part of the study. The desktop study was done using published annual reports over a period of more than seven years. The analysis was done given the scope of the project and its constructs.

  9. d

    Code from: Beyond the classroom: Alicia’s multivariate journey

    • search.dataone.org
    Updated Nov 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Allison Theobold (2025). Code from: Beyond the classroom: Alicia’s multivariate journey [Dataset]. http://doi.org/10.5061/dryad.c59zw3rg6
    Explore at:
    Dataset updated
    Nov 27, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Allison Theobold
    Description

    The importance of data science skills for modern scientific research cannot be understated. Although policy documents increasingly recommend what skills should be included in undergraduate statistics and data science curricula, little is known about how students actually develop and apply these skills. This paper addresses this gap through an in-depth case study tracing one student’s learning progressions throughout her master’s program. Using a qualitative method to analyze student code, which has seen little use in statistics education research, I examined how Alicia transferred the data science skills from her applied statistics course into authentic research settings. The analysis shows that, while Alicia successfully navigated new challenges, she encountered persistent hurdles when extending bivariate techniques into multivariate contexts, particularly with visualizations and summary statistics. These findings highlight the obs..., R Script files submitted by Alicia (pseudonym) over the course of the study. The files are named according to when they were submitted:

    December 2018

    R Script #1

    April 2019

    R Script #1 (revised) R Script #2

    September 2019

    R Script #1 (revised) R Script #2 (revised)

    Qualitative Data Analysis Files (Rich text files)

    December 2018 Script #1 April 2019 Script #1 April 2019 Script #2 September 2019 Script #1 September 2019 Script #2

    Quantitative Data Analysis Files

    r-code-themes.csv

    Comma separated values file with separate sheets for each R script Each sheet contains the qualitative code assigned to each line of code and whether the code contained errors.

    , , # Code from: Beyond the classroom: Alicia’s multivariate journey

    https://doi.org/10.5061/dryad.c59zw3rg6

    This repository contains the R script files submitted by Alicia (pseudonym) throughout this study, files associated with the qualitative analysis of the code, and files associated with visualizations of the qualitative themes included in Alicia's code.

    Description of the data and file structure

    As this is a qualitative analysis, the usage of these "data" files differs from a typical quantitative analysis.

    • The .R Files contain the scripts generated by Alicia at each time point (December 2018, April 2019, September 2019)
    • The -codes.rft Files contain the (qualitative) process codes for each R script
    • The r-code-themes.xlsx The file contains information on every script and the qualitative code assigned to each line of code.

    Code/Software

    While the "data" for this analysis are R scripts, these scripts cannot be execu...,

  10. o

    Enhancing Quantitative Analysis in Social Sciences with Large Language...

    • openicpsr.org
    delimited
    Updated Sep 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    James Sebastian; Jeong-Mi Moon; Eric Camburn (2025). Enhancing Quantitative Analysis in Social Sciences with Large Language Models (LLMs): A Methodological Case Study in Educational Research [Dataset]. http://doi.org/10.3886/E237744V3
    Explore at:
    delimitedAvailable download formats
    Dataset updated
    Sep 6, 2025
    Dataset provided by
    University of Missouri-Columbia
    University of Missouri-Kansas City
    Korea Institute of Energy Technology
    Authors
    James Sebastian; Jeong-Mi Moon; Eric Camburn
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The objective of this paper is to explore the potential of Large Language Models (LLMs) for assisting with quantitative data analysis in social science research. Specifically, it aims to introduce key concepts to help researchers effectively integrate LLMs into their workflows. For this purpose, we replicate a research paper in educational leadership on the relationship between school program coherence and student achievement. By leveraging LLMs to generate code for statistical tools like Mplus and R, researchers can streamline their data analysis, potentially saving time and effort. The quality of analytical code generated by LLMs can be influenced by the researcher’s understanding and application of concepts like context windows, LLM training data and training cut-off, model parameter settings like temperature, zero- and few-shot learning, and Retrieval-Augmented Generation. By describing and demonstrating the applications of these concepts, we aim to equip researchers with a basic toolset to leverage LLMs effectively to assist with coding for quantitative analysis.

  11. p

    Research General Stopwords.csv

    • psycharchives.org
    Updated Oct 8, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). Research General Stopwords.csv [Dataset]. https://www.psycharchives.org/en/item/cab36090-633c-473c-9b78-420010637fa4
    Explore at:
    Dataset updated
    Oct 8, 2019
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Systematic reviews are the method of choice to synthesize research evidence. To identify main topics (so-called hot spots) relevant to large corpora of original publications in need of a synthesis, one must address the “three Vs” of big data (volume, velocity, and variety), especially in loosely defined or fragmented disciplines. For this purpose, text mining and predictive modeling are very helpful. Thus, we applied these methods to a compilation of documents related to digitalization in aesthetic, arts, and cultural education, as a prototypical, loosely defined, fragmented discipline, and particularly to quantitative research within it (QRD-ACE). By broadly querying the abstract and citation database Scopus with terms indicative of QRD-ACE, we identified a corpus of N = 55,553 publications for the years 2013–2017. As the result of an iterative approach of text mining, priority screening, and predictive modeling, we identified n = 8,304 potentially relevant publications of which n = 1,666 were included after priority screening. Analysis of the subject distribution of the included publications revealed video games as a first hot spot of QRD-ACE. Topic modeling resulted in aesthetics and cultural activities on social media as a second hot spot, related to 4 of k = 8 identified topics. This way, we were able to identify current hot spots of QRD-ACE by screening less than 15% of the corpus. We discuss implications for harnessing text mining, predictive modeling, and priority screening in future research syntheses and avenues for future original research on QRD-ACE. Dataset for: Christ, A., Penthin, M., & Kröner, S. (2019). Big Data and Digital Aesthetic, Arts, and Cultural Education: Hot Spots of Current Quantitative Research. Social Science Computer Review, 089443931988845. https://doi.org/10.1177/0894439319888455:

  12. H

    Nobel Laureates, from 1901 to 2023

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Feb 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tyler J Duckworth (2024). Nobel Laureates, from 1901 to 2023 [Dataset]. http://doi.org/10.7910/DVN/DJQFDE
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 4, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Tyler J Duckworth
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains data about all Nobel Prizes and their respective recipients from 1901 to 2023 as well as the code to regenerate the dataset to include future years. This dataset can be used to conduct quantitative analysis and was created to fulfill an assignment for COSC426: Introduction to Data Mining.

  13. d

    Dataframe of Significant Stems for: Big Data and Digital Aesthetic, Arts and...

    • demo-b2find.dkrz.de
    Updated Sep 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Dataframe of Significant Stems for: Big Data and Digital Aesthetic, Arts and Cultural Education: Hot Spots of Current Quantitative Research Dataset for: Big Data and Digital Aesthetic, Arts and Cultural Education: Hot Spots of Current Quantitative Research - Dataset - B2FIND [Dataset]. http://demo-b2find.dkrz.de/dataset/0bd97871-d19f-5b9b-bfcc-87f133bd9275
    Explore at:
    Dataset updated
    Sep 21, 2025
    Description

    Systematic reviews are the method of choice to synthesize research evidence. To identify main topics (so-called hot spots) relevant to large corpora of original publications in need of a synthesis, one must address the “three Vs” of big data (volume, velocity, and variety), especially in loosely defined or fragmented disciplines. For this purpose, text mining and predictive modeling are very helpful. Thus, we applied these methods to a compilation of documents related to digitalization in aesthetic, arts, and cultural education, as a prototypical, loosely defined, fragmented discipline, and particularly to quantitative research within it (QRD-ACE). By broadly querying the abstract and citation database Scopus with terms indicative of QRD-ACE, we identified a corpus of N = 55,553 publications for the years 2013–2017. As the result of an iterative approach of text mining, priority screening, and predictive modeling, we identified n = 8,304 potentially relevant publications of which n = 1,666 were included after priority screening. Analysis of the subject distribution of the included publications revealed video games as a first hot spot of QRD-ACE. Topic modeling resulted in aesthetics and cultural activities on social media as a second hot spot, related to 4 of k = 8 identified topics. This way, we were able to identify current hot spots of QRD-ACE by screening less than 15% of the corpus. We discuss implications for harnessing text mining, predictive modeling, and priority screening in future research syntheses and avenues for future original research on QRD-ACE. Dataset for: Christ, A., Penthin, M., & Kröner, S. (2019). Big Data and Digital Aesthetic, Arts, and Cultural Education: Hot Spots of Current Quantitative Research. Social Science Computer Review, 089443931988845. https://doi.org/10.1177/0894439319888455

  14. p

    Dataframe of Significant Stems.csv

    • psycharchives.org
    Updated Oct 8, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). Dataframe of Significant Stems.csv [Dataset]. https://www.psycharchives.org/en/item/84d5c4b2-579d-48a0-8d4e-f02f2ae99192
    Explore at:
    Dataset updated
    Oct 8, 2019
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Systematic reviews are the method of choice to synthesize research evidence. To identify main topics (so-called hot spots) relevant to large corpora of original publications in need of a synthesis, one must address the “three Vs” of big data (volume, velocity, and variety), especially in loosely defined or fragmented disciplines. For this purpose, text mining and predictive modeling are very helpful. Thus, we applied these methods to a compilation of documents related to digitalization in aesthetic, arts, and cultural education, as a prototypical, loosely defined, fragmented discipline, and particularly to quantitative research within it (QRD-ACE). By broadly querying the abstract and citation database Scopus with terms indicative of QRD-ACE, we identified a corpus of N = 55,553 publications for the years 2013–2017. As the result of an iterative approach of text mining, priority screening, and predictive modeling, we identified n = 8,304 potentially relevant publications of which n = 1,666 were included after priority screening. Analysis of the subject distribution of the included publications revealed video games as a first hot spot of QRD-ACE. Topic modeling resulted in aesthetics and cultural activities on social media as a second hot spot, related to 4 of k = 8 identified topics. This way, we were able to identify current hot spots of QRD-ACE by screening less than 15% of the corpus. We discuss implications for harnessing text mining, predictive modeling, and priority screening in future research syntheses and avenues for future original research on QRD-ACE. Dataset for: Christ, A., Penthin, M., & Kröner, S. (2019). Big Data and Digital Aesthetic, Arts, and Cultural Education: Hot Spots of Current Quantitative Research. Social Science Computer Review, 089443931988845. https://doi.org/10.1177/0894439319888455:

  15. StarMine Text Mining Credit Risk Model

    • lseg.com
    Updated Oct 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LSEG (2025). StarMine Text Mining Credit Risk Model [Dataset]. https://www.lseg.com/en/data-analytics/financial-data/company-data/quantitative-models/credit-risk-models/starmine-text-mining
    Explore at:
    csv,delimited,gzip,json,python,sql,text,user interface,xml,zip archiveAvailable download formats
    Dataset updated
    Oct 14, 2025
    Dataset provided by
    London Stock Exchange Grouphttp://www.londonstockexchangegroup.com/
    Authors
    LSEG
    License

    https://www.lseg.com/en/policies/website-disclaimerhttps://www.lseg.com/en/policies/website-disclaimer

    Description

    Assess risk in publically traded companies with LSEG's StarMine Text Mining Credit Risk Model (TMCR), scoring over 38,000 companies.

  16. d

    Integrating Machine Learning Techniques in the Evaluation of Management...

    • search.dataone.org
    Updated Mar 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David, Lemuel (2024). Integrating Machine Learning Techniques in the Evaluation of Management Control Systems for Enhanced Predictive Analytics [Dataset]. http://doi.org/10.7910/DVN/7AYC1L
    Explore at:
    Dataset updated
    Mar 6, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    David, Lemuel
    Description

    this data was use to analize the transformative role of machine learning (ML) techniques in refining Management Control Systems (MCS) to bolster predictive analytics capabilities within varied organizational contexts. Utilizing a mixed-methods approach, this research synthesizes a comprehensive quantitative analysis of Bloomberg's extensive dataset, encompassing 4,500 companies across multiple industries from 2015 to 2023

  17. Data from: Analysis of spatiotemporal specificity of small RNAs regulating...

    • figshare.com
    xlsx
    Updated Sep 29, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lu Li (2019). Analysis of spatiotemporal specificity of small RNAs regulating hPSC differentiation and beyond [Dataset]. http://doi.org/10.6084/m9.figshare.9911918.v2
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Sep 29, 2019
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Lu Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We present a quantitative analysis of small RNA dynamics during the transition from hPSCs to the three germ layer lineages to identify spatiotemporal-specific small RNAs that may be involved in hPSC differentiation. To determine the degree of spatiotemporal specificity, we utilized two algorithms, namely normalized maximum timepoint specificity index (NMTSI) and across-tissue specificity index (ASI). NMTSI could identify spatiotemporal-specific small RNAs that go up or down at just one timepoint in a specific lineage. ASI could identify spatiotemporal-specific small RNAs that maintain high expression from intermediate timepoints to the terminal timepoint in a specific lineage. Beyond analyzing single small RNAs, we also quantified the spatiotemporal-specificity of microRNA families and observed their differential expression patterns in certain lineages. To clarify the regulatory effects of group miRNAs on cellular events during lineage differentiation, we performed a gene ontology (GO) analysis on the downstream targets of synergistically up- and downregulated microRNAs. To provide an integrated interface for researchers to access and browse our analysis results, we designed a web-based tool at https://keyminer.pythonanywhere.com/km/.

  18. 4

    Data underlying the paper: Quantitative analysis of spectroscopic Low Energy...

    • data.4tu.nl
    zip
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tobias A. de Jong; J. (Johannes) Jobst, Data underlying the paper: Quantitative analysis of spectroscopic Low Energy Electron Microscopy Data [Dataset]. http://doi.org/10.4121/uuid:7f672638-66f6-4ec3-a16c-34181cc45202
    Explore at:
    zipAvailable download formats
    Dataset provided by
    4TU.Centre for Research Data
    Authors
    Tobias A. de Jong; J. (Johannes) Jobst
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains a Low Energy Electron Microscopy dataset consisting of raw data of both a dark field and a bright field spectroscopic image series of a region of few layer graphene on Silicon Carbide. Additionally it contains calibration data: a dark count dataset, a HDR calibration dataset and two curves showcasing the difference between HDR and non-HDR imaging.

  19. EURUSD 15 minutes data

    • kaggle.com
    zip
    Updated Sep 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DOCTOR DIEGO LEON (2025). EURUSD 15 minutes data [Dataset]. https://www.kaggle.com/datasets/doctordiegoleon/eurusd-15-minutes-data
    Explore at:
    zip(1511380 bytes)Available download formats
    Dataset updated
    Sep 16, 2025
    Authors
    DOCTOR DIEGO LEON
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    This portfolio provides a detailed analysis of the EUR/USD currency pair on a 15-minute timeframe, aiming to explore market patterns, volatility, and potential opportunities for developing algorithmic trading strategies.

    Included in this work:

    Data cleaning and preprocessing of historical records.

    Exploratory analysis of prices, volumes, and movement ranges.

    Pattern detection such as consecutive candles, trends, and reversals.

    Quantitative metrics to assess risk and performance.

    Dataset preparation for backtesting and predictive modeling.

    This project is designed for traders, quantitative analysts, and data science enthusiasts interested in applying analytical methods to Forex markets, with a practical and replicable approach to generating financial insights.

  20. g

    Looking for data (Expert interviews)

    • search.gesis.org
    Updated Jul 2, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Friedrich, Tanja (2025). Looking for data (Expert interviews) [Dataset]. https://search.gesis.org/research_data/SDN-10.7802-1.1943
    Explore at:
    Dataset updated
    Jul 2, 2025
    Dataset provided by
    GESIS search
    GESIS, Köln
    Authors
    Friedrich, Tanja
    License

    https://www.gesis.org/en/institute/data-usage-termshttps://www.gesis.org/en/institute/data-usage-terms

    Description

    These interview data are part of the project "Looking for data: information seeking behaviour of survey data users", a study of secondary data users’ information-seeking behaviour. The overall goal of this study was to create evidence of actual information practices of users of one particular retrieval system for social science data in order to inform the development of research data infrastructures that facilitate data sharing. In the project, data were collected based on a mixed methods design. The research design included a qualitative study in the form of expert interviews and – building on the results found therein – a quantitative web survey of secondary survey data users. For the qualitative study, expert interviews with six reference persons of a large social science data archive have been conducted. They were interviewed in their role as intermediaries who provide guidance for secondary users of survey data. The knowledge from their reference work was expected to provide a condensed view of goals, practices, and problems of people who are looking for survey data. The anonymized transcripts of these interviews are provided here. They can be reviewed or reused upon request. The survey dataset from the quantitative study of secondary survey data users is downloadable through this data archive after registration. The core result of the Looking for data study is that community involvement plays a pivotal role in survey data seeking. The analyses show that survey data communities are an important determinant in survey data users' information seeking behaviour and that community involvement facilitates data seeking and has the capacity of reducing problems or barriers. The qualitative part of the study was designed and conducted using constructivist grounded theory methodology as introduced by Kathy Charmaz (2014). In line with grounded theory methodology, the interviews did not follow a fixed set of questions, but were conducted based on a guide that included areas of exploration with tentative questions. This interview guide can be obtained together with the transcript. For the Looking for data project, the data were coded and scrutinized by constant comparison, as proposed by grounded theory methodology. This analysis resulted in core categories that make up the "theory of problem-solving by community involvement". This theory was exemplified in the quantitative part of the study. For this exemplification, the following hypotheses were drawn from the qualitative study: (1) The data seeking hypotheses: (1a) When looking for data, information seeking through personal contact is used more often than impersonal ways of information seeking. (1b) Ways of information seeking (personal or impersonal) differ with experience. (2) The experience hypotheses: (2a) Experience is positively correlated with having ambitious goals. (2b) Experience is positively correlated with having more advanced requirements for data. (2c) Experience is positively correlated with having more specific problems with data. (3) The community involvement hypothesis: Experience is positively correlated with community involvement. (4) The problem solving hypothesis: Community involvement is positively correlated with problem solving strategies that require personal interactions.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Paula Fernandez Costa; Irving da Silva Badolato; Rogério Luís Ribeiro Borba; Julia Celia Mercedes Strauch (2023). STRATEGY FOR EXTRACTION OF FOURSQUARE’S SOCIAL MEDIA GEOGRAPHIC INFORMATION THROUGH DATA MINING [Dataset]. http://doi.org/10.6084/m9.figshare.8031641.v1
Organization logo

Data from: STRATEGY FOR EXTRACTION OF FOURSQUARE’S SOCIAL MEDIA GEOGRAPHIC INFORMATION THROUGH DATA MINING

Related Article
Explore at:
jpegAvailable download formats
Dataset updated
May 31, 2023
Dataset provided by
SciELOhttp://www.scielo.org/
Authors
Paula Fernandez Costa; Irving da Silva Badolato; Rogério Luís Ribeiro Borba; Julia Celia Mercedes Strauch
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Abstract This aim of this paper is the acquisition of geographic data from the Foursquare application, using data mining to perform exploratory and spatial analyses of the distribution of tourist attraction and their density distribution in Rio de Janeiro city. Thus, in accordance with the Extraction, Transformation, and Load methodology, three research algorithms were developed using a tree hierarchical structure to collect information for the categories of Museums, Monuments and Landmarks, Historic Sites, Scenic Lookouts, and Trails, in the foursquare database. Quantitative analysis was performed of check-ins per neighborhood of Rio de Janeiro city, and kernel density (hot spot) maps were generated The results presented in this paper show the need for the data filtering process - less than 50% of the mined data were used, and a large part of the density of the Museums, Historic Sites, and Monuments and Landmarks categories is in the center of the city; while the Scenic Lookouts and Trails categories predominate in the south zone. This kind of analysis was shown to be a tool to support the city's tourist management in relation to the spatial localization of these categories, the tourists’ evaluations of the places, and the frequency of the target public.

Search
Clear search
Close search
Google apps
Main menu