100+ datasets found
  1. H

    Political Analysis Using R: Example Code and Data, Plus Data for Practice...

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Apr 28, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jamie Monogan (2020). Political Analysis Using R: Example Code and Data, Plus Data for Practice Problems [Dataset]. http://doi.org/10.7910/DVN/ARKOTI
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 28, 2020
    Dataset provided by
    Harvard Dataverse
    Authors
    Jamie Monogan
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Each R script replicates all of the example code from one chapter from the book. All required data for each script are also uploaded, as are all data used in the practice problems at the end of each chapter. The data are drawn from a wide array of sources, so please cite the original work if you ever use any of these data sets for research purposes.

  2. f

    Data_Sheet_1_Advanced large language models and visualization tools for data...

    • frontiersin.figshare.com
    txt
    Updated Aug 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jorge Valverde-Rebaza; Aram González; Octavio Navarro-Hinojosa; Julieta Noguez (2024). Data_Sheet_1_Advanced large language models and visualization tools for data analytics learning.csv [Dataset]. http://doi.org/10.3389/feduc.2024.1418006.s001
    Explore at:
    txtAvailable download formats
    Dataset updated
    Aug 8, 2024
    Dataset provided by
    Frontiers
    Authors
    Jorge Valverde-Rebaza; Aram González; Octavio Navarro-Hinojosa; Julieta Noguez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionIn recent years, numerous AI tools have been employed to equip learners with diverse technical skills such as coding, data analysis, and other competencies related to computational sciences. However, the desired outcomes have not been consistently achieved. This study aims to analyze the perspectives of students and professionals from non-computational fields on the use of generative AI tools, augmented with visualization support, to tackle data analytics projects. The focus is on promoting the development of coding skills and fostering a deep understanding of the solutions generated. Consequently, our research seeks to introduce innovative approaches for incorporating visualization and generative AI tools into educational practices.MethodsThis article examines how learners perform and their perspectives when using traditional tools vs. LLM-based tools to acquire data analytics skills. To explore this, we conducted a case study with a cohort of 59 participants among students and professionals without computational thinking skills. These participants developed a data analytics project in the context of a Data Analytics short session. Our case study focused on examining the participants' performance using traditional programming tools, ChatGPT, and LIDA with GPT as an advanced generative AI tool.ResultsThe results shown the transformative potential of approaches based on integrating advanced generative AI tools like GPT with specialized frameworks such as LIDA. The higher levels of participant preference indicate the superiority of these approaches over traditional development methods. Additionally, our findings suggest that the learning curves for the different approaches vary significantly. Since learners encountered technical difficulties in developing the project and interpreting the results. Our findings suggest that the integration of LIDA with GPT can significantly enhance the learning of advanced skills, especially those related to data analytics. We aim to establish this study as a foundation for the methodical adoption of generative AI tools in educational settings, paving the way for more effective and comprehensive training in these critical areas.DiscussionIt is important to highlight that when using general-purpose generative AI tools such as ChatGPT, users must be aware of the data analytics process and take responsibility for filtering out potential errors or incompleteness in the requirements of a data analytics project. These deficiencies can be mitigated by using more advanced tools specialized in supporting data analytics tasks, such as LIDA with GPT. However, users still need advanced programming knowledge to properly configure this connection via API. There is a significant opportunity for generative AI tools to improve their performance, providing accurate, complete, and convincing results for data analytics projects, thereby increasing user confidence in adopting these technologies. We hope this work underscores the opportunities and needs for integrating advanced LLMs into educational practices, particularly in developing computational thinking skills.

  3. B

    Data Cleaning Sample

    • borealisdata.ca
    Updated Jul 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 13, 2023
    Dataset provided by
    Borealis
    Authors
    Rong Luo
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Sample data for exercises in Further Adventures in Data Cleaning.

  4. d

    Analysis Practice Data

    • search.dataone.org
    Updated Nov 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arshad, Abdul Rehman (2023). Analysis Practice Data [Dataset]. http://doi.org/10.7910/DVN/R1VIPU
    Explore at:
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Arshad, Abdul Rehman
    Description

    This data set comes as a supplementary resource for my book on Biostatistics and SPSS. Readers are free to download this file and practice using SPSS as they go along reading the book.

  5. B

    Easing into Excellent Excel Practices Learning Series / Série...

    • borealisdata.ca
    • search.dataone.org
    Updated Nov 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julie Marcoux (2023). Easing into Excellent Excel Practices Learning Series / Série d'apprentissages en route vers des excellentes pratiques Excel [Dataset]. http://doi.org/10.5683/SP3/WZYO1F
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 15, 2023
    Dataset provided by
    Borealis
    Authors
    Julie Marcoux
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    With a step-by-step approach, learn to prepare Excel files, data worksheets, and individual data columns for data analysis; practice conditional formatting and creating pivot tables/charts; go over basic principles of Research Data Management as they might apply to an Excel project. Avec une approche étape par étape, apprenez à préparer pour l’analyse des données des fichiers Excel, des feuilles de calcul de données et des colonnes de données individuelles; pratiquez la mise en forme conditionnelle et la création de tableaux croisés dynamiques ou de graphiques; passez en revue les principes de base de la gestion des données de recherche tels qu’ils pourraient s’appliquer à un projet Excel.

  6. Sample data files for Python Course

    • figshare.com
    txt
    Updated Nov 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peter Verhaar (2022). Sample data files for Python Course [Dataset]. http://doi.org/10.6084/m9.figshare.21501549.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Nov 4, 2022
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Peter Verhaar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Sample data set used in an introductory course on Programming in Python

  7. P

    Practice Analytics Software Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Jan 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Practice Analytics Software Report [Dataset]. https://www.datainsightsmarket.com/reports/practice-analytics-software-1399248
    Explore at:
    pdf, ppt, docAvailable download formats
    Dataset updated
    Jan 16, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global practice analytics software market is anticipated to expand at a considerable CAGR during the forecast period from 2025 to 2033. In 2025, the market size stood at approximately million, and it is projected to reach a value of million by 2033. The increasing need for data-driven insights in healthcare, rising adoption of electronic health records (EHRs), and growing emphasis on patient engagement are primarily driving market growth. Key trends include the emergence of cloud-based solutions, integration of artificial intelligence (AI) and machine learning (ML), and the adoption of predictive analytics. The market is segmented across application, type, company, and region. In terms of application, large enterprises and small and medium-sized enterprises (SMEs) are the key end-users. Based on type, the market is bifurcated into cloud-based and on-premise solutions. Key players in the market include AdvancedMD, DrChrono, athenahealth, Kareo, NXGN Management LLC, Compulink, Bizmatics Software, Greenway Health LLC, Valant Inc, Medsphere Systems Corporation (ChartLogic), and Practice EHR. Regionally, North America, Europe, Asia Pacific, South America, and the Middle East & Africa are the primary markets for practice analytics software. The increasing adoption of EHRs and the growing awareness of the benefits of data analytics are driving growth in these regions.

  8. f

    Data_Sheet_1_Raw Data Visualization for Common Factorial Designs Using SPSS:...

    • frontiersin.figshare.com
    zip
    Updated Jun 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Florian Loffing (2023). Data_Sheet_1_Raw Data Visualization for Common Factorial Designs Using SPSS: A Syntax Collection and Tutorial.ZIP [Dataset]. http://doi.org/10.3389/fpsyg.2022.808469.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Frontiers
    Authors
    Florian Loffing
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Transparency in data visualization is an essential ingredient for scientific communication. The traditional approach of visualizing continuous quantitative data solely in the form of summary statistics (i.e., measures of central tendency and dispersion) has repeatedly been criticized for not revealing the underlying raw data distribution. Remarkably, however, systematic and easy-to-use solutions for raw data visualization using the most commonly reported statistical software package for data analysis, IBM SPSS Statistics, are missing. Here, a comprehensive collection of more than 100 SPSS syntax files and an SPSS dataset template is presented and made freely available that allow the creation of transparent graphs for one-sample designs, for one- and two-factorial between-subject designs, for selected one- and two-factorial within-subject designs as well as for selected two-factorial mixed designs and, with some creativity, even beyond (e.g., three-factorial mixed-designs). Depending on graph type (e.g., pure dot plot, box plot, and line plot), raw data can be displayed along with standard measures of central tendency (arithmetic mean and median) and dispersion (95% CI and SD). The free-to-use syntax can also be modified to match with individual needs. A variety of example applications of syntax are illustrated in a tutorial-like fashion along with fictitious datasets accompanying this contribution. The syntax collection is hoped to provide researchers, students, teachers, and others working with SPSS a valuable tool to move towards more transparency in data visualization.

  9. m

    Data for "Best Practices for Your Exploratory Factor Analysis: Factor...

    • data.mendeley.com
    Updated Jul 16, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pablo Rogers (2021). Data for "Best Practices for Your Exploratory Factor Analysis: Factor Tutorial" published by RAC-Revista de Administração Contemporânea [Dataset]. http://doi.org/10.17632/rdky78bk8r.1
    Explore at:
    Dataset updated
    Jul 16, 2021
    Authors
    Pablo Rogers
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains material related to the analysis performed in the article "Best Practices for Your Exploratory Factor Analysis: Factor Tutorial". The material includes the data used in the analyses in .dat format, the labels (.txt) of the variables used in the Factor software, the outputs (.txt) evaluated in the article, and videos (.mp4 with English subtitles) recorded for the purpose of explaining the article. The videos can also be accessed in the following playlist: https://youtube.com/playlist?list=PLDfyRtHbxiZ3R-T3H1cY8dusz273aUFVe. Below is a summary of the article:

    "Exploratory Factor Analysis (EFA) is one of the statistical methods most widely used in Administration, however, its current practice coexists with rules of thumb and heuristics given half a century ago. The purpose of this article is to present the best practices and recent recommendations for a typical EFA in Administration through a practical solution accessible to researchers. In this sense, in addition to discussing current practices versus recommended practices, a tutorial with real data on Factor is illustrated, a software that is still little known in the Administration area, but freeware, easy to use (point and click) and powerful. The step-by-step illustrated in the article, in addition to the discussions raised and an additional example, is also available in the format of tutorial videos. Through the proposed didactic methodology (article-tutorial + video-tutorial), we encourage researchers/methodologists who have mastered a particular technique to do the same. Specifically, about EFA, we hope that the presentation of the Factor software, as a first solution, can transcend the current outdated rules of thumb and heuristics, by making best practices accessible to Administration researchers"

  10. Data from: US Airports

    • kaggle.com
    Updated Jul 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ms. Nancy Al Aswad (2022). US Airports [Dataset]. https://www.kaggle.com/datasets/nancyalaswad90/us-airports/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 21, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ms. Nancy Al Aswad
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    United States
    Description

    What is Data Expo 2009 - Airline on-time performance?

    Have you ever been stuck in an airport because your flight was delayed or canceled and wondered if you could have predicted it if you'd had more data? This is your chance to find out.

    .

    How to use this dataset

    We had a total of nine entries, and turn ou at the poster session at the JSM was great, with plenty of people stopping by to find out why their flights were delayed.

    Acknowledgments

    When we use this dataset in our research, we credit the authors.

    The main idea for uploading this dataset is to practice data analysis with my students, as I am working in college and want my student to train our studying ideas in a big dataset, It may be not up to date and I mention the collecting years, but it is a good resource of data to practice

  11. Global Practice Analytics Software Market Size By Application, By End-User...

    • verifiedmarketresearch.com
    Updated Apr 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VERIFIED MARKET RESEARCH (2024). Global Practice Analytics Software Market Size By Application, By End-User Industry, By Geographic Scope And Forecast [Dataset]. https://www.verifiedmarketresearch.com/product/practice-analytics-software-market/
    Explore at:
    Dataset updated
    Apr 5, 2024
    Dataset provided by
    Verified Market Researchhttps://www.verifiedmarketresearch.com/
    Authors
    VERIFIED MARKET RESEARCH
    License

    https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/

    Time period covered
    2024 - 2030
    Area covered
    Global
    Description

    Practice Analytics Software Market size was valued at USD 4.2 Billion in 2023 and is projected to reach USD 12.8 Billion by 2030, growing at a CAGR of 17.5% during the forecasted period 2024 to 2030

    Global Practice Analytics Software Market Drivers

    Growing Need for Data-Driven Insights: Hospitals, clinics, and other healthcare institutions are among the businesses realizing more and more how crucial data analytics is to streamlining their operations and enhancing patient care. Healthcare practitioners can make well-informed decisions and improve operational efficiency by using practice analytics software, which offers useful insights into a variety of practice management topics, such as patient demographics, appointment scheduling, resource utilization, and revenue cycle management.

    Growing Emphasis on Value-Based Care: Practice analytics software adoption is being driven by the move to value-based care models, which pay healthcare providers based on the quality and results of treatment rather than the quantity of services rendered. In order to improve patient outcomes and cut costs, providers must manage key performance indicators (KPIs), monitor clinical quality metrics, and prove their worth to payers and regulatory bodies. To do this, they require comprehensive analytics tools.

    Financial Performance Optimization Is Necessary: Regulatory changes, diminishing reimbursements, and growing expenditures have put healthcare companies under increasing financial strain. With the use of practice analytics software, providers can monitor costs, detect revenue opportunities, evaluate financial data, and streamline the billing and coding procedures in order to increase profits.

    Emphasis on Patient Engagement and Satisfaction: For healthcare businesses looking to improve patient outcomes and experience, patient engagement and satisfaction are top priorities. By analyzing patient comments, preferences, and outcomes with the use of practice analytics software, healthcare professionals may better fulfill patient wants and expectations by customizing services, enhancing communication, and providing individualized care.

    Regulatory Compliance and Reporting Requirements: Healthcare providers are subject to strict reporting, data security, and privacy requirements under laws like the Medicare Access and CHIP Reauthorization Act (MACRA) and the Health Insurance Portability and Accountability Act (HIPAA). With its audit trails, data encryption, and reporting features, practice analytics software helps businesses manage risk, ensure regulatory compliance, and stay out of trouble.

    Technological Advancements in Data Analytics: The field is experiencing a rapid period of innovation in analytics software due to the rapid advances in artificial intelligence (AI), machine learning (ML), and predictive analytics. With the use of these sophisticated analytics tools, healthcare professionals can now extract knowledge from massive, intricate datasets, forecast trends, spot patterns, and streamline decision-making procedures—all of which contribute to more proactive and individualized patient care.

    Integration with Electronic Health Record (EHR) Systems: Interoperability and smooth data interchange between healthcare institutions depend on integration with EHR systems. Data-driven decision-making and improved care coordination are made possible by practice analytics software that connects with EHR platforms and gives practitioners real-time access to full patient information, clinical data, and operational indicators.

    Remote patient monitoring and telehealth: Practice analytics software that can evaluate virtual care encounters, remotely monitor patient outcomes, and optimize virtual care workflows is becoming more and more necessary as a result of the COVID-19 pandemic's acceleration of the use of telehealth and remote patient monitoring solutions. With the use of analytics tools, healthcare professionals may determine which patients are at high risk, evaluate the success of telemedicine interventions, and schedule timely interventions to avoid unfavorable outcomes.

    Demand for Population Health Management: By addressing social determinants of health, managing chronic illnesses, and encouraging preventive care, population health management efforts seek to enhance the health outcomes of entire patient groups. In order to enhance population health and save healthcare costs, practice analytics software is essential for gathering and analyzing data from many sources, identifying populations that are at-risk, and putting targeted treatments into action.

    Competitive Pressures and Market Differentiation: Health systems, competing providers, and alternative approaches to care delivery are posing a growing threat to healthcare companies. With the help of practice analytics software, providers can stand out from the competition and draw in and keep patients, doctors, and payers by showcasing their superior clinical results, operational effectiveness, and patient happiness.

  12. Fictional Sales Data

    • kaggle.com
    Updated May 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anthony Kim (2024). Fictional Sales Data [Dataset]. https://www.kaggle.com/datasets/teluskiman/fictional-sales-data/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 22, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Anthony Kim
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    The purpose of this fictional sales dataset is to provide data for Data Analysis practice. The 3 tables must be joined before one can analyze the data.

    This fictional data set consists of 3 tables: 1. Customer dimension (history preserving) 2. Product dimension (history preserving) 3. Sales Transactions

    The Customer Dimension dataset includes unique customer IDs, addresses, ages, and indicators of current records, with effective start and end dates for each customer.

    The Product Dimension dataset details unique product IDs, names, prices, and their validity periods, along with indicators of current price records.

    The Sales Transactions dataset captures sales activities with unique order IDs, product IDs, customer IDs, quantities sold, and order dates. Together, these datasets offer a comprehensive view of customer demographics, product pricing history, and sales transactions.

  13. Data from: Untargeted metabolomics workshop report: quality control...

    • data.niaid.nih.gov
    xml
    Updated Dec 17, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prasad Phapale (2020). Untargeted metabolomics workshop report: quality control considerations from sample preparation to data analysis [Dataset]. https://data.niaid.nih.gov/resources?id=mtbls1301
    Explore at:
    xmlAvailable download formats
    Dataset updated
    Dec 17, 2020
    Dataset provided by
    EMBL
    Authors
    Prasad Phapale
    Variables measured
    tumor, Metabolomics
    Description

    The Metabolomics workshop on experimental and data analysis training for untargeted metabolomics was hosted by the Proteomics Society of India in December 2019. The Workshop included six tutorial lectures and hands-on data analysis training sessions presented by seven speakers. The tutorials and hands-on data analysis sessions focused on workflows for liquid chromatography-mass spectrometry (LC-MS) based on untargeted metabolomics. We review here three main topics from the workshop which were uniquely identified as bottlenecks for new researchers: a) experimental design, b) quality controls during sample preparation and instrumental analysis and c) data quality evaluation. Our objective here is to present common challenges faced by novice researchers and present possible guidelines and resources to address them. We provide resources and good practices for researchers who are at the initial stage of setting up metabolomics workflows in their labs.

    Complete detailed metabolomics/lipidomics protocols are available online at EMBL-MCF protocol including video tutorials.

  14. Dataset for article "Unveiling Openness in Energy Research: A Bibliometric...

    • zenodo.org
    • meta4ds.fokus.fraunhofer.de
    csv
    Updated Mar 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Linna Lu; Linna Lu; Amanda Wein; Amanda Wein (2025). Dataset for article "Unveiling Openness in Energy Research: A Bibliometric Analysis Focusing on Open Access and Data Sharing Practices" [Dataset]. http://doi.org/10.5281/zenodo.15023865
    Explore at:
    csvAvailable download formats
    Dataset updated
    Mar 14, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Linna Lu; Linna Lu; Amanda Wein; Amanda Wein
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 6, 2024
    Description

    This dataset was used as a data corpus for a bibliometric analysis with the title "Unveiling Openness in Energy Research: A Bibliometric Analysis Focusing on Open Access and Data Sharing Practices".

    The CSV file (2024-12-06_OpenAlex_API_download_works_Energy_Germany_(2013-2023)) was collected on December 6th, 2024, by using the OpenAlex API and search criteria: OpenAlex field "Energy", continent “Europe”, country “Germany”, and publication years 2013 – 2023. Based on this file, two sample files were extracted - one by subfield (2024-12-06_OpenAlex_API_dwonload_works_Energy_Germany_(2013-2023)_sampled_by_subfield) and another by year group (2024-12-06_OpenAlex_API_download_works_Energy_Germany_(2013-2023)_sampled_by_year_group).

    This dataset was collected and used to answer the following research questions:

    - What percentage of energy research publications are OA? How do the types (gold, green, etc.) of these publications differ?

    - Are there notable differences in OA and data sharing practices in different subfields of energy research?

    - How commonly are datasets for energy studies shared? What are the primary repositories used?

    - What kind of data sharing or publication practices are widespread? How has this evolved over the last decade?

  15. UCI and OpenML Data Sets for Ordinal Quantification

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Jul 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mirko Bunse; Mirko Bunse; Alejandro Moreo; Alejandro Moreo; Fabrizio Sebastiani; Fabrizio Sebastiani; Martin Senz; Martin Senz (2023). UCI and OpenML Data Sets for Ordinal Quantification [Dataset]. http://doi.org/10.5281/zenodo.8177302
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 25, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mirko Bunse; Mirko Bunse; Alejandro Moreo; Alejandro Moreo; Fabrizio Sebastiani; Fabrizio Sebastiani; Martin Senz; Martin Senz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These four labeled data sets are targeted at ordinal quantification. The goal of quantification is not to predict the label of each individual instance, but the distribution of labels in unlabeled sets of data.

    With the scripts provided, you can extract CSV files from the UCI machine learning repository and from OpenML. The ordinal class labels stem from a binning of a continuous regression label.

    We complement this data set with the indices of data items that appear in each sample of our evaluation. Hence, you can precisely replicate our samples by drawing the specified data items. The indices stem from two evaluation protocols that are well suited for ordinal quantification. To this end, each row in the files app_val_indices.csv, app_tst_indices.csv, app-oq_val_indices.csv, and app-oq_tst_indices.csv represents one sample.

    Our first protocol is the artificial prevalence protocol (APP), where all possible distributions of labels are drawn with an equal probability. The second protocol, APP-OQ, is a variant thereof, where only the smoothest 20% of all APP samples are considered. This variant is targeted at ordinal quantification tasks, where classes are ordered and a similarity of neighboring classes can be assumed.

    Usage

    You can extract four CSV files through the provided script extract-oq.jl, which is conveniently wrapped in a Makefile. The Project.toml and Manifest.toml specify the Julia package dependencies, similar to a requirements file in Python.

    Preliminaries: You have to have a working Julia installation. We have used Julia v1.6.5 in our experiments.

    Data Extraction: In your terminal, you can call either

    make

    (recommended), or

    julia --project="." --eval "using Pkg; Pkg.instantiate()"
    julia --project="." extract-oq.jl

    Outcome: The first row in each CSV file is the header. The first column, named "class_label", is the ordinal class.

    Further Reading

    Implementation of our experiments: https://github.com/mirkobunse/regularized-oq

  16. m

    Data of Safety attitudes in General Practice Nurses

    • data.mendeley.com
    Updated Jan 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yeojin Kil (2025). Data of Safety attitudes in General Practice Nurses [Dataset]. http://doi.org/10.17632/2nc2m253pm.2
    Explore at:
    Dataset updated
    Jan 27, 2025
    Authors
    Yeojin Kil
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This study aimed to investigate the safety attitudes of general practice nurses (GPNs). Data analysis was conducted to examine the effects of demographic factors—such as length of work experience and the number of general practitioners (GPs) or GPNs in the practice—on safety attitudes.

    The findings indicated a positive relationship between safety attitudes and length of work experience and a negative relationship between safety attitudes and the number of GPs or GPNs in the practice. These relationships were measured using ANOVA and the Kruskal-Wallis H test.

    Question 11 was administered only to participants who responded with 'Strongly Agree' or 'Agree a little' to Question 10.

    Responses to Question 10 were divided into 'Positive' and 'Negative' groups, and the impact of regular operational meetings on safety attitudes was analysed using a T-test.

    Additionally, open-ended questions were employed to identify the safety-related concerns faced by GPNs and their needs regarding current practices. Data collection was conducted via a questionnaire comprising 34 items on a 5-point Likert scale and 8 open-ended questions. Quantitative data were analysed using SPSS, whilst qualitative data were analysed using NVivo.

  17. Data from: Data accessibility in the chemical sciences: an analysis of...

    • zenodo.org
    • data.niaid.nih.gov
    Updated Oct 14, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cerys Willoughby; Cerys Willoughby; Sally Bloodworth; Sally Bloodworth; Simon J. Coles; Simon J. Coles (2024). Data accessibility in the chemical sciences: an analysis of recent practice in organic chemistry journals [Dataset]. http://doi.org/10.5281/zenodo.13928084
    Explore at:
    Dataset updated
    Oct 14, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Cerys Willoughby; Cerys Willoughby; Sally Bloodworth; Sally Bloodworth; Simon J. Coles; Simon J. Coles
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data is the analysis of the data outputs of 240 randomly selected research papers from 12 top-ranked journals published in early 2023. We investigate author compliance with recommended (but not compulsory) data policies, whether there is evidence to suggest that authors apply FAIR data guidance in their data publishing, and if the existence of specific recommendations for publishing NMR data by some journals encourages compliance. Files in the data package have been provided in both human and machine-readable forms. The main dataset is available in the Excel file Data worksheet.XLSX, the contents of which can also be found in Main_dataset.CSV, Data_types.CSV, and Article_selection.CSV with explanations of the variable coding used in the studies in Variable_names.CSV, Codes.CSV, and FAIR_variable_coding.CSV. The R code used for the article selection can be found in Article_selection.R. Data about article types from the journals that contain original research data is in Article_types.CSV. Data collected for analysis in our sister paper[4] can be found in Extended_Adherence.CSV, Extended_Crystallography.CSV, Extended_DAS.CSV, Extended_File_Types.CSV, and Extended_Submission_Process.CSV. A full list of files in the data package and a short description for each is given in README.TXT.

  18. Z

    Cloud-based User Entity Behavior Analytics Log Data Set

    • data.niaid.nih.gov
    • zenodo.org
    Updated Oct 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Höld, Georg (2023). Cloud-based User Entity Behavior Analytics Log Data Set [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7119952
    Explore at:
    Dataset updated
    Oct 30, 2023
    Dataset provided by
    Wurzenberger, Markus
    Skopik, Florian
    Landauer, Max
    Höld, Georg
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This respository contains the CLUE-LDS (CLoud-based User Entity behavior analytics Log Data Set). The data set contains log events from real users utilizing a cloud storage suitable for User Entity Behavior Analytics (UEBA). Events include logins, file accesses, link shares, config changes, etc. The data set contains around 50 million events generated by more than 5000 distinct users in more than five years (2017-07-07 to 2022-09-29 or 1910 days). The data set is complete except for 109 events missing on 2021-04-22, 2021-08-20, and 2021-09-05 due to database failure. The unpacked file size is around 14.5 GB. A detailed analysis of the data set is provided in [1]. The logs are provided in JSON format with the following attributes in the first level:

    id: Unique log line identifier that starts at 1 and increases incrementally, e.g., 1. time: Time stamp of the event in ISO format, e.g., 2021-01-01T00:00:02Z. uid: Unique anonymized identifier for the user generating the event, e.g., old-pink-crane-sharedealer. uidType: Specifier for uid, which is either the user name or IP address for logged out users. type: The action carried out by the user, e.g., file_accessed. params: Additional event parameters (e.g., paths, groups) stored in a nested dictionary. isLocalIP: Optional flag for event origin, which is either internal (true) or external (false). role: Optional user role: consulting, administration, management, sales, technical, or external. location: Optional IP-based geolocation of event origin, including city, country, longitude, latitude, etc. In the following data sample, the first object depicts a successful user login (see type: login_successful) and the second object depicts a file access (see type: file_accessed) from a remote location:

    {"params": {"user": "intact-gray-marlin-trademarkagent"}, "type": "login_successful", "time": "2019-11-14T11:26:43Z", "uid": "intact-gray-marlin-trademarkagent", "id": 21567530, "uidType": "name"}

    {"isLocalIP": false, "params": {"path": "/proud-copper-orangutan-artexer/doubtful-plum-ptarmigan-merchant/insufficient-amaranth-earthworm-qualitycontroller/curious-silver-galliform-tradingstandards/incredible-indigo-octopus-printfinisher/wicked-bronze-sloth-claimsmanager/frantic-aquamarine-horse-cleric"}, "type": "file_accessed", "time": "2019-11-14T11:26:51Z", "uid": "graceful-olive-spoonbill-careersofficer", "id": 21567531, "location": {"countryCode": "AT", "countryName": "Austria", "region": "4", "city": "Gmunden", "latitude": 47.915, "longitude": 13.7959, "timezone": "Europe/Vienna", "postalCode": "4810", "metroCode": null, "regionName": "Upper Austria", "isInEuropeanUnion": true, "continent": "Europe", "accuracyRadius": 50}, "uidType": "ipaddress"} The data set was generated at the premises of Huemer Group, a midsize IT service provider located in Vienna, Austria. Huemer Group offers a range of Infrastructure-as-a-Service solutions for enterprises, including cloud computing and storage. In particular, their cloud storage solution called hBOX enables customers to upload their data, synchronize them with multiple devices, share files with others, create versions and backups of their documents, collaborate with team members in shared data spaces, and query the stored documents using search terms. The hBOX extends the open-source project Nextcloud with interfaces and functionalities tailored to the requirements of customers. The data set comprises only normal user behavior, but can be used to evaluate anomaly detection approaches by simulating account hijacking. We provide an implementation for identifying similar users, switching pairs of users to simulate changes of behavior patterns, and a sample detection approach in our github repo. Acknowledgements: Partially funded by the FFG project DECEPT (873980). The authors thank Walter Huemer, Oskar Kruschitz, Kevin Truckenthanner, and Christian Aigner from Huemer Group for supporting the collection of the data set. If you use the dataset, please cite the following publication: [1] M. Landauer, F. Skopik, G. Höld, and M. Wurzenberger. "A User and Entity Behavior Analytics Log Data Set for Anomaly Detection in Cloud Computing". 2022 IEEE International Conference on Big Data - 6th International Workshop on Big Data Analytics for Cyber Intelligence and Defense (BDA4CID 2022), December 17-20, 2022, Osaka, Japan. IEEE. [PDF]

  19. w

    Synthetic Data for an Imaginary Country, Sample, 2023 - World

    • microdata.worldbank.org
    • nada-demo.ihsn.org
    Updated Jul 7, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Development Data Group, Data Analytics Unit (2023). Synthetic Data for an Imaginary Country, Sample, 2023 - World [Dataset]. https://microdata.worldbank.org/index.php/catalog/5906
    Explore at:
    Dataset updated
    Jul 7, 2023
    Dataset authored and provided by
    Development Data Group, Data Analytics Unit
    Time period covered
    2023
    Area covered
    World, World
    Description

    Abstract

    The dataset is a relational dataset of 8,000 households households, representing a sample of the population of an imaginary middle-income country. The dataset contains two data files: one with variables at the household level, the other one with variables at the individual level. It includes variables that are typically collected in population censuses (demography, education, occupation, dwelling characteristics, fertility, mortality, and migration) and in household surveys (household expenditure, anthropometric data for children, assets ownership). The data only includes ordinary households (no community households). The dataset was created using REaLTabFormer, a model that leverages deep learning methods. The dataset was created for the purpose of training and simulation and is not intended to be representative of any specific country.

    The full-population dataset (with about 10 million individuals) is also distributed as open data.

    Geographic coverage

    The dataset is a synthetic dataset for an imaginary country. It was created to represent the population of this country by province (equivalent to admin1) and by urban/rural areas of residence.

    Analysis unit

    Household, Individual

    Universe

    The dataset is a fully-synthetic dataset representative of the resident population of ordinary households for an imaginary middle-income country.

    Kind of data

    ssd

    Sampling procedure

    The sample size was set to 8,000 households. The fixed number of households to be selected from each enumeration area was set to 25. In a first stage, the number of enumeration areas to be selected in each stratum was calculated, proportional to the size of each stratum (stratification by geo_1 and urban/rural). Then 25 households were randomly selected within each enumeration area. The R script used to draw the sample is provided as an external resource.

    Mode of data collection

    other

    Research instrument

    The dataset is a synthetic dataset. Although the variables it contains are variables typically collected from sample surveys or population censuses, no questionnaire is available for this dataset. A "fake" questionnaire was however created for the sample dataset extracted from this dataset, to be used as training material.

    Cleaning operations

    The synthetic data generation process included a set of "validators" (consistency checks, based on which synthetic observation were assessed and rejected/replaced when needed). Also, some post-processing was applied to the data to result in the distributed data files.

    Response rate

    This is a synthetic dataset; the "response rate" is 100%.

  20. u

    VAPOR Sample Data

    • data.ucar.edu
    • rda.ucar.edu
    • +1more
    netcdf
    Updated Aug 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Visualization and Enabling Technologies Section, Computational and Information Systems Laboratory, National Center for Atmospheric Research, UCAR (2024). VAPOR Sample Data [Dataset]. https://data.ucar.edu/dataset/vapor-sample-data
    Explore at:
    netcdfAvailable download formats
    Dataset updated
    Aug 4, 2024
    Dataset provided by
    Research Data Archive at the National Center for Atmospheric Research, Computational and Information Systems Laboratory
    Authors
    Visualization and Enabling Technologies Section, Computational and Information Systems Laboratory, National Center for Atmospheric Research, UCAR
    Description

    A collection of various sample data for the VAPOR (Visualization and Analysis Platform for Ocean, Atmosphere, and Solar Researchers) software.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Jamie Monogan (2020). Political Analysis Using R: Example Code and Data, Plus Data for Practice Problems [Dataset]. http://doi.org/10.7910/DVN/ARKOTI

Political Analysis Using R: Example Code and Data, Plus Data for Practice Problems

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 28, 2020
Dataset provided by
Harvard Dataverse
Authors
Jamie Monogan
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

Each R script replicates all of the example code from one chapter from the book. All required data for each script are also uploaded, as are all data used in the practice problems at the end of each chapter. The data are drawn from a wide array of sources, so please cite the original work if you ever use any of these data sets for research purposes.

Search
Clear search
Close search
Google apps
Main menu