100+ datasets found
  1. n

    Data from: Designing data science workshops for data-intensive environmental...

    • data.niaid.nih.gov
    • zenodo.org
    • +1more
    zip
    Updated Dec 8, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Allison Theobold; Stacey Hancock; Sara Mannheimer (2020). Designing data science workshops for data-intensive environmental science research [Dataset]. http://doi.org/10.5061/dryad.7wm37pvp7
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 8, 2020
    Dataset provided by
    Montana State University
    California State Polytechnic University
    Authors
    Allison Theobold; Stacey Hancock; Sara Mannheimer
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Over the last 20 years, statistics preparation has become vital for a broad range of scientific fields, and statistics coursework has been readily incorporated into undergraduate and graduate programs. However, a gap remains between the computational skills taught in statistics service courses and those required for the use of statistics in scientific research. Ten years after the publication of "Computing in the Statistics Curriculum,'' the nature of statistics continues to change, and computing skills are more necessary than ever for modern scientific researchers. In this paper, we describe research on the design and implementation of a suite of data science workshops for environmental science graduate students, providing students with the skills necessary to retrieve, view, wrangle, visualize, and analyze their data using reproducible tools. These workshops help to bridge the gap between the computing skills necessary for scientific research and the computing skills with which students leave their statistics service courses. Moreover, though targeted to environmental science graduate students, these workshops are open to the larger academic community. As such, they promote the continued learning of the computational tools necessary for working with data, and provide resources for incorporating data science into the classroom.

    Methods Surveys from Carpentries style workshops the results of which are presented in the accompanying manuscript.

    Pre- and post-workshop surveys for each workshop (Introduction to R, Intermediate R, Data Wrangling in R, Data Visualization in R) were collected via Google Form.

    The surveys administered for the fall 2018, spring 2019 academic year are included as pre_workshop_survey and post_workshop_assessment PDF files. 
    The raw versions of these data are included in the Excel files ending in survey_raw or assessment_raw.
    
      The data files whose name includes survey contain raw data from pre-workshop surveys and the data files whose name includes assessment contain raw data from the post-workshop assessment survey.
    
    
    The annotated RMarkdown files used to clean the pre-workshop surveys and post-workshop assessments are included as workshop_survey_cleaning and workshop_assessment_cleaning, respectively. 
    The cleaned pre- and post-workshop survey data are included in the Excel files ending in clean. 
    The summaries and visualizations presented in the manuscript are included in the analysis annotated RMarkdown file.
    
  2. p

    Newark Sch Of Data Science And Information Technology

    • publicschoolreview.com
    json, xml
    Updated Aug 29, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Public School Review (2014). Newark Sch Of Data Science And Information Technology [Dataset]. https://www.publicschoolreview.com/newark-sch-of-data-science-and-information-technology-profile
    Explore at:
    json, xmlAvailable download formats
    Dataset updated
    Aug 29, 2014
    Dataset authored and provided by
    Public School Review
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2022 - Dec 31, 2025
    Area covered
    Newark
    Description

    Historical Dataset of Newark Sch Of Data Science And Information Technology is provided by PublicSchoolReview and contain statistics on metrics:Total Students Trends Over Years (2022-2023),Total Classroom Teachers Trends Over Years (2022-2023),Distribution of Students By Grade Trends,Student-Teacher Ratio Comparison Over Years (2022-2023),Asian Student Percentage Comparison Over Years (2022-2023),Hispanic Student Percentage Comparison Over Years (2022-2023),Black Student Percentage Comparison Over Years (2022-2023),White Student Percentage Comparison Over Years (2022-2023),Diversity Score Comparison Over Years (2022-2023),Free Lunch Eligibility Comparison Over Years (2022-2023),Reduced-Price Lunch Eligibility Comparison Over Years (2022-2023)

  3. f

    Data from: A Data Science Course for Undergraduates: Thinking With Data

    • tandf.figshare.com
    pdf
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ben Baumer (2023). A Data Science Course for Undergraduates: Thinking With Data [Dataset]. http://doi.org/10.6084/m9.figshare.1568372.v2
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Ben Baumer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data science is an emerging interdisciplinary field that combines elements of mathematics, statistics, computer science, and knowledge in a particular application domain for the purpose of extracting meaningful information from the increasingly sophisticated array of data available in many settings. These data tend to be nontraditional, in the sense that they are often live, large, complex, and/or messy. A first course in statistics at the undergraduate level typically introduces students to a variety of techniques to analyze small, neat, and clean datasets. However, whether they pursue more formal training in statistics or not, many of these students will end up working with data that are considerably more complex, and will need facility with statistical computing techniques. More importantly, these students require a framework for thinking structurally about data. We describe an undergraduate course in a liberal arts environment that provides students with the tools necessary to apply data science. The course emphasizes modern, practical, and useful skills that cover the full data analysis spectrum, from asking an interesting question to acquiring, managing, manipulating, processing, querying, analyzing, and visualizing data, as well communicating findings in written, graphical, and oral forms. Supplementary materials for this article are available online. [Received June 2014. Revised July 2015.]

  4. Online Data Science Training Programs Market Analysis, Size, and Forecast...

    • technavio.com
    Updated Feb 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2025). Online Data Science Training Programs Market Analysis, Size, and Forecast 2025-2029: North America (Mexico), Europe (France, Germany, Italy, and UK), Middle East and Africa (UAE), APAC (Australia, China, India, Japan, and South Korea), South America (Brazil), and Rest of World (ROW) [Dataset]. https://www.technavio.com/report/online-data-science-training-programs-market-industry-analysis
    Explore at:
    Dataset updated
    Feb 15, 2025
    Dataset provided by
    TechNavio
    Authors
    Technavio
    Time period covered
    2021 - 2025
    Area covered
    Mexico, Germany, United Kingdom, Global
    Description

    Snapshot img

    Online Data Science Training Programs Market Size 2025-2029

    The online data science training programs market size is forecast to increase by USD 8.67 billion, at a CAGR of 35.8% between 2024 and 2029.

    The market is experiencing significant growth due to the increasing demand for data science professionals in various industries. The job market offers lucrative opportunities for individuals with data science skills, making online training programs an attractive option for those seeking to upskill or reskill. Another key driver in the market is the adoption of microlearning and gamification techniques in data science training. These approaches make learning more engaging and accessible, allowing individuals to acquire new skills at their own pace. Furthermore, the availability of open-source learning materials has democratized access to data science education, enabling a larger pool of learners to enter the field. However, the market also faces challenges, including the need for continuous updates to keep up with the rapidly evolving data science landscape and the lack of standardization in online training programs, which can make it difficult for employers to assess the quality of graduates. Companies seeking to capitalize on market opportunities should focus on offering up-to-date, high-quality training programs that incorporate microlearning and gamification techniques, while also addressing the challenges of continuous updates and standardization. By doing so, they can differentiate themselves in a competitive market and meet the evolving needs of learners and employers alike.

    What will be the Size of the Online Data Science Training Programs Market during the forecast period?

    Request Free SampleThe online data science training market continues to evolve, driven by the increasing demand for data-driven insights and innovations across various sectors. Data science applications, from computer vision and deep learning to natural language processing and predictive analytics, are revolutionizing industries and transforming business operations. Industry case studies showcase the impact of data science in action, with big data and machine learning driving advancements in healthcare, finance, and retail. Virtual labs enable learners to gain hands-on experience, while data scientist salaries remain competitive and attractive. Cloud computing and data science platforms facilitate interactive learning and collaborative research, fostering a vibrant data science community. Data privacy and security concerns are addressed through advanced data governance and ethical frameworks. Data science libraries, such as TensorFlow and Scikit-Learn, streamline the development process, while data storytelling tools help communicate complex insights effectively. Data mining and predictive analytics enable organizations to uncover hidden trends and patterns, driving innovation and growth. The future of data science is bright, with ongoing research and development in areas like data ethics, data governance, and artificial intelligence. Data science conferences and education programs provide opportunities for professionals to expand their knowledge and expertise, ensuring they remain at the forefront of this dynamic field.

    How is this Online Data Science Training Programs Industry segmented?

    The online data science training programs industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments. TypeProfessional degree coursesCertification coursesApplicationStudentsWorking professionalsLanguageR programmingPythonBig MLSASOthersMethodLive streamingRecordedProgram TypeBootcampsCertificatesDegree ProgramsGeographyNorth AmericaUSMexicoEuropeFranceGermanyItalyUKMiddle East and AfricaUAEAPACAustraliaChinaIndiaJapanSouth KoreaSouth AmericaBrazilRest of World (ROW)

    By Type Insights

    The professional degree courses segment is estimated to witness significant growth during the forecast period.The market encompasses various segments catering to diverse learning needs. The professional degree course segment holds a significant position, offering comprehensive and in-depth training in data science. This segment's curriculum covers essential aspects such as statistical analysis, machine learning, data visualization, and data engineering. Delivered by industry professionals and academic experts, these courses ensure a high-quality education experience. Interactive learning environments, including live lectures, webinars, and group discussions, foster a collaborative and engaging experience. Data science applications, including deep learning, computer vision, and natural language processing, are integral to the market's growth. Data analysis, a crucial application, is gaining traction due to the increasing demand

  5. f

    Data from: Introducing Variational Inference in Statistics and Data Science...

    • tandf.figshare.com
    Updated Jul 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vojtech Kejzlar; Jingchen Hu (2024). Introducing Variational Inference in Statistics and Data Science Curriculum [Dataset]. http://doi.org/10.6084/m9.figshare.23609578.v1
    Explore at:
    application/x-dosexecAvailable download formats
    Dataset updated
    Jul 23, 2024
    Dataset provided by
    Taylor & Francis
    Authors
    Vojtech Kejzlar; Jingchen Hu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Probabilistic models such as logistic regression, Bayesian classification, neural networks, and models for natural language processing, are increasingly more present in both undergraduate and graduate statistics and data science curricula due to their wide range of applications. In this article, we present a one-week course module for students in advanced undergraduate and applied graduate courses on variational inference, a popular optimization-based approach for approximate inference with probabilistic models. Our proposed module is guided by active learning principles: In addition to lecture materials on variational inference, we provide an accompanying class activity, an R shiny app, and guided labs based on real data applications of logistic regression and clustering documents using Latent Dirichlet Allocation with R code. The main goal of our module is to expose students to a method that facilitates statistical modeling and inference with large datasets. Using our proposed module as a foundation, instructors can adopt and adapt it to introduce more realistic case studies and applications in data science, Bayesian statistics, multivariate analysis, and statistical machine learning courses.

  6. q

    50 Years of Data Science

    • qubeshub.org
    Updated Oct 30, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Donoho (2018). 50 Years of Data Science [Dataset]. http://doi.org/10.25334/Q42B0D
    Explore at:
    Dataset updated
    Oct 30, 2018
    Dataset provided by
    QUBES
    Authors
    David Donoho
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This paper reviews some ingredients of the current “Data Science moment”, including recent commentary about data science in the popular media, and about how/whether Data Science is really different from Statistics.

  7. Number of data scientists employed in companies worldwide 2020 and 2021

    • statista.com
    Updated Jul 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Number of data scientists employed in companies worldwide 2020 and 2021 [Dataset]. https://www.statista.com/statistics/1136560/data-scientists-company-employment/
    Explore at:
    Dataset updated
    Jul 10, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Nov 2020
    Area covered
    Worldwide
    Description

    Across industries, organizations are increasing their hiring efforts to build larger data science arsenals: from 2020 to 2021, the percentage of surveyed organizations that employed ** data scientists or more increased from ** percent to almost ** percent. On average, the number of data scientists employed in a organization grew from ** to **.

  8. a

    [Coursera] Statistics: Making Sense of Data

    • academictorrents.com
    bittorrent
    Updated Sep 26, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alison Gibbs, Jeffrey Rosenthal (University of Toronto) (2016). [Coursera] Statistics: Making Sense of Data [Dataset]. https://academictorrents.com/details/a0cbaf3e03e0893085b6fbdc97cb6220896dddf2
    Explore at:
    bittorrent(840871302)Available download formats
    Dataset updated
    Sep 26, 2016
    Dataset authored and provided by
    Alison Gibbs, Jeffrey Rosenthal (University of Toronto)
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    This course is an introduction to the key ideas and principles of the collection, display, and analysis of data to guide you in making valid and appropriate conclusions about the world. We live in a world where data are increasingly available, in ever larger quantities, and are increasingly expected to form the basis for decisions by governments, businesses, and other organizations, as well as by individuals in their daily lives. To cope effectively, every informed citizen must be statistically literate. This course will provide an intuitive introduction to applied statistical reasoning, introducing fundamental statistical skills and acquainting students with the full process of inquiry and evaluation used in investigations in a wide range of fields.

  9. Data science and machine learning adoption worldwide 2019, by function

    • statista.com
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Data science and machine learning adoption worldwide 2019, by function [Dataset]. https://www.statista.com/statistics/1053561/data-science-machine-learning-deployment-by-function/
    Explore at:
    Dataset updated
    Jun 30, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2019
    Area covered
    Worldwide
    Description

    Research and development's adoption of data science and machine learning technologies is the fastest among enterprise departments, with around ** percent of respondents from R&D saying that they already deployed data science and machine learning in their work, as of 2019. The finance department lags behind in this respect.

  10. Survey of Graduate Students and Postdoctorates in Science and Engineering...

    • catalog.data.gov
    Updated Mar 23, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Center for Science and Engineering Statistics (2024). Survey of Graduate Students and Postdoctorates in Science and Engineering 2022 [Dataset]. https://catalog.data.gov/dataset/survey-of-graduate-students-and-postdoctorates-in-science-and-engineering-2022
    Explore at:
    Dataset updated
    Mar 23, 2024
    Dataset provided by
    National Center for Science and Engineering Statisticshttp://ncses.nsf.gov/
    Description

    The Graduate Students and Postdoctorates in Science and Engineering survey is an annual census of all U.S. academic institutions granting research-based master's degrees or doctorates in science, engineering, and selected health fields as of fall of the survey year. The survey, sponsored by the National Center for Science and Engineering Statistics within the National Science Foundation and by the National Institutes of Health, collects the total number of master's and doctoral students, postdoctoral appointees, and doctorate-level nonfaculty researchers by demographic and other characteristics such as source of financial support. Results are used to assess shifts in graduate enrollment and postdoc appointments and trends in financial support. This dataset includes GSS assets for 2022.

  11. Top data science skills in U.S. 2019

    • statista.com
    Updated May 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2022). Top data science skills in U.S. 2019 [Dataset]. https://www.statista.com/statistics/1016247/united-states-wanted-data-science-skills/
    Explore at:
    Dataset updated
    May 23, 2022
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Apr 2019
    Area covered
    United States
    Description

    The statistic displays the most wanted data science skills in the United States as of April 2019. As of the measured period, 76.13 percent of data scientist job openings on LinkedIn required a knowledge of the programming language Python.

  12. f

    Data from: Fostering Conceptual Understanding in Mathematical Statistics

    • tandf.figshare.com
    txt
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jennifer L. Green; Erin E. Blankenship (2023). Fostering Conceptual Understanding in Mathematical Statistics [Dataset]. http://doi.org/10.6084/m9.figshare.1483474.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Jennifer L. Green; Erin E. Blankenship
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In many undergraduate statistics programs, the two-semester calculus-based mathematical statistics sequence is the cornerstone of the curriculum. However, 10 years after the release of the Guidelines for the Assessment and Instruction in Statistics Education (GAISE) College Report, 2005, and the subsequent movement to stress conceptual understanding and foster active learning in statistics classrooms, the sequence still remains a traditional, lecture-intensive course. In this article, we discuss various instructional approaches, activities, and assessments that can be used to foster active learning and emphasize conceptual understanding while still covering the necessary theoretical content students need to be successful in subsequent statistics or actuarial science courses. In addition, we share student reflections on these course enhancements. The course revision we suggest doesn’t require substantial changes in content, so other mathematical statistics instructors can implement these strategies without sacrificing concepts in probability and inference that are fundamental to the needs of their students. Supplementary materials, including code used to generate class plots and activity handouts, are available online. Received December 2014. Revised June 2015.

  13. CourseKata Dataset Items (QuestionTypes)

    • kaggle.com
    Updated Apr 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gagan Karnati (2024). CourseKata Dataset Items (QuestionTypes) [Dataset]. https://www.kaggle.com/datasets/gagankarnati/coursekata-dataset-items-questiontypes
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 21, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Gagan Karnati
    Description

    CourseKata is a platform that creates and publishes a series of e-books for introductory statistics and data science classes that utilize demonstrated learning strategies to help students learn statistics and data science. The developers of CourseKata, Jim Stigler (UCLA) and Ji Son (Cal State Los Angeles) and their team, are cognitive psychologists interested in improving statistics learning by examining students' interactions with online interactive textbooks. Traditionally, much of the research in how students learn is done in a 1-hour lab or through small-scale interviews with students. CourseKata offers the opportunity to peek into the actions, responses, and choices of thousands of students as they are engaged in learning the interrelated concepts and skills of statistics and coding in R over many weeks or months in real classes.

    1. items.csv (1335 X 19) Each row contains information about a particular question (although it does not provide the prompt). The item to which a question belongs is included. All items/questions are represented. Use this file to go deeper into particular questions that students encounter in the course.

    Questions are grouped into items (item_id). An item can be one of three item_type 's: code, learnosity or learnosity-activity (the distinction between learnosity and learnosity-activity is not important). Code items are a single question and ask for R code as a response. (Responses can be seen in responses.csv.) Learnosity-activities and learnosity items are collections of one or more questions that can be of a variety of lrn_type's: ● association ● choicematrix ● clozeassociation ● formulaV2 ● imageclozeassociation ● mcq ● plaintext ● shorttext ● sortlist

    Examples of these question types are provided at the end of this document.

    The level of detail made available to you in the responses file depends on the lrn_type. For example, for multiple choice questions (mcq), you can find the options in the responses file in the columns labeled lrn_option_0 through lrn_option_11, and you can see the chosen option in the results variable.

    Assessment Types In general, assessments, such as the items and questions included in CourseKata, can be used for two purposes. Formative assessments are meant to provide feedback to the student (and instructor), or to serve as a learning aid to help prompt students improve memory and deepen their understanding. Summative assessments are meant to provide a summary of a student's understanding, often for use in assigning a grade. For example, most midterms and final exams that you've taken are summative assessments.

    The vast majority of items in CourseKata should be treated as formative assessments. The exceptions are the end-of-chapter Review questions, which can be thought of as summative. The mean number of correct answers for end-of-chapter review questions is provided within the checkpoints file. You might see that some pages have the word "Quiz" or "Exam" or "Midterm" in them. Results from these items and responses to them are not provided to us in this data set.

  14. f

    Data from: Average salary

    • froghire.ai
    Updated Apr 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FrogHire.ai (2025). Average salary [Dataset]. https://www.froghire.ai/major/Statistics%20%20Data%20Science%20Concentration
    Explore at:
    Dataset updated
    Apr 3, 2025
    Dataset provided by
    FrogHire.ai
    Description

    Explore the progression of average salaries for graduates in Statistics Data Science Concentration from 2020 to 2023 through this detailed chart. It compares these figures against the national average for all graduates, offering a comprehensive look at the earning potential of Statistics Data Science Concentration relative to other fields. This data is essential for students assessing the return on investment of their education in Statistics Data Science Concentration, providing a clear picture of financial prospects post-graduation.

  15. Data for: Integrating open education practices with data analysis of open...

    • data.niaid.nih.gov
    • search.dataone.org
    • +1more
    zip
    Updated Jul 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marja Bakermans (2024). Data for: Integrating open education practices with data analysis of open science in an undergraduate course [Dataset]. http://doi.org/10.5061/dryad.37pvmcvst
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 26, 2024
    Dataset provided by
    Worcester Polytechnic Institute
    Authors
    Marja Bakermans
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    The open science movement produces vast quantities of openly published data connected to journal articles, creating an enormous resource for educators to engage students in current topics and analyses. However, educators face challenges using these materials to meet course objectives. I present a case study using open science (published articles and their corresponding datasets) and open educational practices in a capstone course. While engaging in current topics of conservation, students trace connections in the research process, learn statistical analyses, and recreate analyses using the programming language R. I assessed the presence of best practices in open articles and datasets, examined student selection in the open grading policy, surveyed students on their perceived learning gains, and conducted a thematic analysis on student reflections. First, articles and datasets met just over half of the assessed fairness practices, but this increased with the publication date. There was a marginal difference in how assessment categories were weighted by students, with reflections highlighting appreciation for student agency. In course content, students reported the greatest learning gains in describing variables, while collaborative activities (e.g., interacting with peers and instructor) were the most effective support. The most effective tasks to facilitate these learning gains included coding exercises and team-led assignments. Autocoding of student reflections identified 16 themes, and positive sentiments were written nearly 4x more often than negative sentiments. Students positively reflected on their growth in statistical analyses, and negative sentiments focused on how limited prior experience with statistics and coding made them feel nervous. As a group, we encountered several challenges and opportunities in using open science materials. I present key recommendations, based on student experiences, for scientists to consider when publishing open data to provide additional educational benefits to the open science community. Methods Article and dataset fairness To assess the utility of open articles and their datasets as an educational tool in an undergraduate academic setting, I measured the congruence of each pair to a set of best practices and guiding principles. I assessed ten guiding principles and best practices (Table 1), where each category was scored ‘1’ or ‘0’ based on whether it met that criteria, with a total possible score of ten. Open grading policies Students were allowed to specify the percentage weight for each assessment category in the course, including 1) six coding exercises (Exercises), 2) one lead exercise (Lead Exercise), 3) fourteen annotation assignments of readings (Annotations), 4) one final project (Final Project), 5) five discussion board posts and a statement of learning reflection (Discussion), and 6) attendance and participation (Participation). I examined if assessment categories (independent variable) were weighted (dependent variable) differently by students using an analysis of variance (ANOVA) and examined pairwise differences with Tukey HSD. Assessment of perceived learning gains I used a student assessment of learning gains (SALG) survey to measure students’ perceptions of learning gains related to course objectives (Seymour et al. 2000). This Likert-scale survey provided five response categories ranging from ‘no gains’ to ‘great gains’ in learning and the option of open responses in each category. A summary report that converted Likert responses to numbers and calculated descriptive statistics was produced from the SALG instrument website. Student reflections In student reflections, I examined the frequency of the 100 most frequent words, with stop words excluded and a minimum length of four (letters), both “with synonyms” and “with generalizations”. Due to this paper's explorative nature, I used autocoding to identify students' broad themes and sentiments in their reflections. Autocoding examines the sentiment of each word and scores it as positive, neutral, mixed, or negative. In this process, I compared how students felt about each theme, focusing on positive (i.e., satisfaction) and negative (i.e., dissatisfaction) sentiments. The relationship of how sentiment was coded to themes was visualized in a treemap, where the size of a block is relative to the number of references for that code. All reflection processing and analyses were performed in NVivo 14 (Windows). All data were collected with institutional IRB approval (IRB-24–0314). All statistical analyses were performed in R (ver. 4.3.1; R Core Team 2023).

  16. f

    Data from: Developing Students’ Statistical Expertise Through Writing in the...

    • tandf.figshare.com
    pdf
    Updated Jun 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laura S. DeLuca; Alex Reinhart; Gordon Weinberg; Michael Laudenbach; Sydney Miller; David West Brown (2025). Developing Students’ Statistical Expertise Through Writing in the Age of AI [Dataset]. http://doi.org/10.6084/m9.figshare.28883205.v2
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 6, 2025
    Dataset provided by
    Taylor & Francis
    Authors
    Laura S. DeLuca; Alex Reinhart; Gordon Weinberg; Michael Laudenbach; Sydney Miller; David West Brown
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    As large language models (LLMs) such as GPT have become more accessible, concerns about their potential effects on students’ learning have grown. In data science education, the specter of students’ turning to LLMs raises multiple issues, as writing is a means not just of conveying information but of developing their statistical reasoning. In our study, we engage with questions surrounding LLMs and their pedagogical impact by: (a) quantitatively and qualitatively describing how select LLMs write report introductions and complete data analysis reports; and (b) comparing patterns in texts authored by LLMs to those authored by students and by published researchers. Our results show distinct differences between machine-generated and human-generated writing, as well as between novice and expert writing. Those differences are evident in how writers manage information, modulate confidence, signal importance, and report statistics. The findings can help inform classroom instruction, whether that instruction is aimed at dissuading the use of LLMs or at guiding their use as a productivity tool. It also has implications for students’ development as statistical thinkers and writers. What happens when they offload the work of data science to a model that doesn’t write quite like a data scientist? Supplementary materials for this article are available online.

  17. H

    Replication Data for: 'Bringing the World to the Classroom: Teaching...

    • dataverse.harvard.edu
    bin, docx, html, pdf +1
    Updated Jul 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harvard Dataverse (2021). Replication Data for: 'Bringing the World to the Classroom: Teaching Statistics and Programming in a Project-Based Setting' [Dataset]. http://doi.org/10.7910/DVN/JQLNCT
    Explore at:
    bin(1218), bin(1304), html(740142), tex(5493), tex(4647), docx(10976), tex(21873), bin(333), bin(2343), pdf(54986)Available download formats
    Dataset updated
    Jul 26, 2021
    Dataset provided by
    Harvard Dataverse
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This article introduces how to teach an interactive one-semester-long statistics and programming class. The setting can also be applied to shorter and longer classes as well as for beginner and advanced courses. I propose a project-based seminar that also inherits elements of an inverted classroom. Thanks to this character, the seminar supports the students' learning progress and can also create engaging virtual classes. To showcase how to apply a project-based seminar setting to teaching statistics and programming classes, I use an introductory class to data wrangling and management with the statistical software R. Students are guided through a typical data science workflow that requires data management, data wrangling, and ends with visualizing and presenting first research results during a mini-conference.

  18. Number of available data science jobs India 2019-2022, by sector

    • statista.com
    Updated Mar 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Number of available data science jobs India 2019-2022, by sector [Dataset]. https://www.statista.com/statistics/1320179/india-number-of-available-data-science-jobs-by-sector/
    Explore at:
    Dataset updated
    Mar 13, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    India
    Description

    In 2022, over 18 thousand data science job positions were available in the BFSI sector in India. An increase in the availability of data science jobs was seen over the years from 2019. E-commerce and internet followed suite with roughly 13 thousand jobs during the same time period.

  19. Global advanced analytics and data science software market share 2025

    • statista.com
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Global advanced analytics and data science software market share 2025 [Dataset]. https://www.statista.com/statistics/1258535/advanced-analytics-data-science-market-share-technology-worldwide/
    Explore at:
    Dataset updated
    Jun 30, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2025
    Area covered
    Worldwide
    Description

    MATLAB led the global advanced analytics and data science software industry in 2025 with a market share of ***** percent. First launched in 1984, MATLAB is developed by the U.S. firm MathWorks.

  20. Python frameworks used in data science 2021

    • statista.com
    Updated Jul 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Python frameworks used in data science 2021 [Dataset]. https://www.statista.com/statistics/1338424/python-use-frameworks-data-science/
    Explore at:
    Dataset updated
    Jul 11, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Oct 2021 - Dec 2021
    Area covered
    Worldwide
    Description

    Python is one of the most popular programming languages among data scientists, partly due to its varied packages and capabilities. In 2021, Numpy and Pandas were the most used Python frameworks for data science, with a ** percent and ** percent share respectively.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Allison Theobold; Stacey Hancock; Sara Mannheimer (2020). Designing data science workshops for data-intensive environmental science research [Dataset]. http://doi.org/10.5061/dryad.7wm37pvp7

Data from: Designing data science workshops for data-intensive environmental science research

Related Article
Explore at:
zipAvailable download formats
Dataset updated
Dec 8, 2020
Dataset provided by
Montana State University
California State Polytechnic University
Authors
Allison Theobold; Stacey Hancock; Sara Mannheimer
License

https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

Description

Over the last 20 years, statistics preparation has become vital for a broad range of scientific fields, and statistics coursework has been readily incorporated into undergraduate and graduate programs. However, a gap remains between the computational skills taught in statistics service courses and those required for the use of statistics in scientific research. Ten years after the publication of "Computing in the Statistics Curriculum,'' the nature of statistics continues to change, and computing skills are more necessary than ever for modern scientific researchers. In this paper, we describe research on the design and implementation of a suite of data science workshops for environmental science graduate students, providing students with the skills necessary to retrieve, view, wrangle, visualize, and analyze their data using reproducible tools. These workshops help to bridge the gap between the computing skills necessary for scientific research and the computing skills with which students leave their statistics service courses. Moreover, though targeted to environmental science graduate students, these workshops are open to the larger academic community. As such, they promote the continued learning of the computational tools necessary for working with data, and provide resources for incorporating data science into the classroom.

Methods Surveys from Carpentries style workshops the results of which are presented in the accompanying manuscript.

Pre- and post-workshop surveys for each workshop (Introduction to R, Intermediate R, Data Wrangling in R, Data Visualization in R) were collected via Google Form.

The surveys administered for the fall 2018, spring 2019 academic year are included as pre_workshop_survey and post_workshop_assessment PDF files. 
The raw versions of these data are included in the Excel files ending in survey_raw or assessment_raw.

  The data files whose name includes survey contain raw data from pre-workshop surveys and the data files whose name includes assessment contain raw data from the post-workshop assessment survey.


The annotated RMarkdown files used to clean the pre-workshop surveys and post-workshop assessments are included as workshop_survey_cleaning and workshop_assessment_cleaning, respectively. 
The cleaned pre- and post-workshop survey data are included in the Excel files ending in clean. 
The summaries and visualizations presented in the manuscript are included in the analysis annotated RMarkdown file.
Search
Clear search
Close search
Google apps
Main menu