Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The “Students Performance Data” dataset provides academic and demographic information of students. It includes their marks in Maths, Science, and English along with attendance and city details. This dataset is ideal for beginners learning data entry, analysis, and visualization using tools like Excel or Kaggle Notebooks.
Facebook
TwitterA little paragraph from one real dataset, with a few little changes to protect students' private information. Permissions are given.
You are going to help teachers with only the data: 1. Prediction: To tell what makes a brilliant student who can apply for a graduate school, whether abroad or not. 2. Application: To help those who fails to apply for a graduate school with advice in job searching.
Some of the original structure are deleted or censored. For those are left: Basic data like: - ID - class: categorical, initially students were divided into 2 classes, yet teachers suspect that of different classes students may performance significant differently. - gender - race: categorical and censored - GPA: real numbers, float
Some teachers assume that scores of math curriculums can represent one's likelihood perfectly: - Algebra: real numbers, Advanced Algebra - ......
Some assume that background of students can affect their choices and likelihood significantly, which are all censored as: - from1: students' home locations - from2: a probably bad indicator for preference on mathematics - from 3: how did students apply for this university (undergraduate) - from4: a probably bad indicator for family background. 0 with more wealth, 4 with more poverty
The final indicator y: - 0, one fails to apply for the graduate school, who may apply again or search jobs in the future - 1, success, inland - 2, success, abroad
Facebook
Twitterhttps://research.csiro.au/dap/licences/csiro-data-licence/https://research.csiro.au/dap/licences/csiro-data-licence/
A csv file containing the tidal frequencies used for statistical analyses in the paper "Estimating Freshwater Flows From Tidally-Affected Hydrographic Data" by Dan Pagendam and Don Percival.
Facebook
TwitterThe OECD Programme for International Student Assessment (PISA) surveys collected data on students’ performances in reading, mathematics and science, as well as contextual information on students’ background, home characteristics and school factors which could influence performance. This publication includes detailed information on how to analyse the PISA data, enabling researchers to both reproduce the initial results and to undertake further analyses. In addition to the inclusion of the necessary techniques, the manual also includes a detailed account of the PISA 2006 database and worked examples providing full syntax in SPSS.
Facebook
TwitterThe open science movement produces vast quantities of openly published data connected to journal articles, creating an enormous resource for educators to engage students in current topics and analyses. However, educators face challenges using these materials to meet course objectives. I present a case study using open science (published articles and their corresponding datasets) and open educational practices in a capstone course. While engaging in current topics of conservation, students trace connections in the research process, learn statistical analyses, and recreate analyses using the programming language R. I assessed the presence of best practices in open articles and datasets, examined student selection in the open grading policy, surveyed students on their perceived learning gains, and conducted a thematic analysis on student reflections. First, articles and datasets met just over half of the assessed fairness practices, but this increased with the publication date. There was a..., Article and dataset fairness To assess the utility of open articles and their datasets as an educational tool in an undergraduate academic setting, I measured the congruence of each pair to a set of best practices and guiding principles. I assessed ten guiding principles and best practices (Table 1), where each category was scored ‘1’ or ‘0’ based on whether it met that criteria, with a total possible score of ten. Open grading policies Students were allowed to specify the percentage weight for each assessment category in the course, including 1) six coding exercises (Exercises), 2) one lead exercise (Lead Exercise), 3) fourteen annotation assignments of readings (Annotations), 4) one final project (Final Project), 5) five discussion board posts and a statement of learning reflection (Discussion), and 6) attendance and participation (Participation). I examined if assessment categories (independent variable) were weighted (dependent variable) differently by students using an analysis of ..., , # Data for: Integrating open education practices with data analysis of open science in an undergraduate course
Author: Marja H Bakermans Affiliation: Worcester Polytechnic Institute, 100 Institute Rd, Worcester, MA 01609 USA ORCID: https://orcid.org/0000-0002-4879-7771 Institutional IRB approval: IRB-24–0314
The full dataset file called OEPandOSdata (.xlsx extension) contains 8 files. Below are descriptions of the name and contents of each file. NA = not applicable or no data available
Facebook
TwitterThis publication provides all the information required to understand the PISA 2003 educational performance database and perform analyses in accordance with the complex methodologies used to collect and process the data. It enables researchers to both reproduce the initial results and to undertake further analyses. The publication includes introductory chapters explaining the statistical theories and concepts required to analyse the PISA data, including full chapters on how to apply replicate weights and undertake analyses using plausible values; worked examples providing full syntax in SPSS®; and a comprehensive description of the OECD PISA 2003 international database. The PISA 2003 database includes micro-level data on student educational performance for 41 countries collected in 2003, together with students’ responses to the PISA 2003 questionnaires and the test questions. A similar manual is available for SAS users.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Initial data analysis (IDA) is the part of the data pipeline that takes place between the end of data retrieval and the beginning of data analysis that addresses the research question. Systematic IDA and clear reporting of the IDA findings is an important step towards reproducible research. A general framework of IDA for observational studies includes data cleaning, data screening, and possible updates of pre-planned statistical analyses. Longitudinal studies, where participants are observed repeatedly over time, pose additional challenges, as they have special features that should be taken into account in the IDA steps before addressing the research question. We propose a systematic approach in longitudinal studies to examine data properties prior to conducting planned statistical analyses. In this paper we focus on the data screening element of IDA, assuming that the research aims are accompanied by an analysis plan, meta-data are well documented, and data cleaning has already been performed. IDA data screening comprises five types of explorations, covering the analysis of participation profiles over time, evaluation of missing data, presentation of univariate and multivariate descriptions, and the depiction of longitudinal aspects. Executing the IDA plan will result in an IDA report to inform data analysts about data properties and possible implications for the analysis plan—another element of the IDA framework. Our framework is illustrated focusing on hand grip strength outcome data from a data collection across several waves in a complex survey. We provide reproducible R code on a public repository, presenting a detailed data screening plan for the investigation of the average rate of age-associated decline of grip strength. With our checklist and reproducible R code we provide data analysts a framework to work with longitudinal data in an informed way, enhancing the reproducibility and validity of their work.
Facebook
Twitterhttps://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Get access to the Walmart Basic Product Details Dataset, which includes essential information on a wide range of products available at Walmart.
This comprehensive dataset features product names, categories, descriptions, prices, and more. Ideal for market analysis, competitive research, and e-commerce applications.
Download now to enhance your data-driven strategies and insights with detailed Walmart product information.
The dataset having basic details of a dataset like title, id, image, price and descripton.
Records count: 2.5 million +
Facebook
Twitterhttps://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
The Big Data Basic Platform market is experiencing robust growth, projected to reach a market size of $150 billion by 2025, exhibiting a Compound Annual Growth Rate (CAGR) of 18% from 2025 to 2033. This expansion is fueled by several key drivers, including the escalating volume and velocity of data generated across various industries, the increasing demand for real-time data analytics, and the growing adoption of cloud-based solutions for data storage and processing. Furthermore, advancements in technologies like artificial intelligence (AI) and machine learning (ML) are creating new opportunities for businesses to leverage big data for improved decision-making and enhanced operational efficiency. The market is segmented across various deployment models (cloud, on-premise, hybrid), industry verticals (finance, healthcare, retail, etc.), and functionalities (data ingestion, storage, processing, analytics). Key players in this competitive landscape include established technology giants like IBM, Microsoft, and AWS, alongside specialized big data solution providers such as Splunk and Cloudera. The market's growth trajectory is expected to remain strong throughout the forecast period, driven by ongoing digital transformation initiatives across enterprises globally. The significant market expansion reflects a confluence of factors. Businesses are increasingly recognizing the strategic value of big data for competitive advantage, leading to significant investments in platform infrastructure and skilled talent. Geographic expansion is also a notable driver, with developing economies witnessing accelerated adoption. However, challenges remain, including the complexities of data integration, security concerns related to sensitive data, and the need for skilled professionals capable of managing and interpreting large datasets. The market is witnessing increasing consolidation through mergers and acquisitions, as companies strive to broaden their service offerings and strengthen their market positions. The emergence of open-source technologies and the ongoing evolution of cloud computing architectures are further shaping the market's competitive dynamics, driving innovation and lowering the barrier to entry for new entrants. Future growth will likely depend on continued technological advancements, increasing data literacy, and the development of robust data governance frameworks.
Facebook
TwitterBackgroundIndividual participant data (IPD) meta-analyses that obtain “raw” data from studies rather than summary data typically adopt a “two-stage” approach to analysis whereby IPD within trials generate summary measures, which are combined using standard meta-analytical methods. Recently, a range of “one-stage” approaches which combine all individual participant data in a single meta-analysis have been suggested as providing a more powerful and flexible approach. However, they are more complex to implement and require statistical support. This study uses a dataset to compare “two-stage” and “one-stage” models of varying complexity, to ascertain whether results obtained from the approaches differ in a clinically meaningful way. Methods and FindingsWe included data from 24 randomised controlled trials, evaluating antiplatelet agents, for the prevention of pre-eclampsia in pregnancy. We performed two-stage and one-stage IPD meta-analyses to estimate overall treatment effect and to explore potential treatment interactions whereby particular types of women and their babies might benefit differentially from receiving antiplatelets. Two-stage and one-stage approaches gave similar results, showing a benefit of using anti-platelets (Relative risk 0.90, 95% CI 0.84 to 0.97). Neither approach suggested that any particular type of women benefited more or less from antiplatelets. There were no material differences in results between different types of one-stage model. ConclusionsFor these data, two-stage and one-stage approaches to analysis produce similar results. Although one-stage models offer a flexible environment for exploring model structure and are useful where across study patterns relating to types of participant, intervention and outcome mask similar relationships within trials, the additional insights provided by their usage may not outweigh the costs of statistical support for routine application in syntheses of randomised controlled trials. Researchers considering undertaking an IPD meta-analysis should not necessarily be deterred by a perceived need for sophisticated statistical methods when combining information from large randomised trials.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This resource collects teaching materials that are originally created for the in-person course 'GEOSC/GEOG 497 – Data Mining in Environmental Sciences' at Penn State University (co-taught by Tao Wen, Susan Brantley, and Alan Taylor) and then refined/revised by Tao Wen to be used in the online teaching module 'Data Science in Earth and Environmental Sciences' hosted on the NSF-sponsored HydroLearn platform.
This resource includes both R Notebooks and Python Jupyter Notebooks to teach the basics of R and Python coding, data analysis and data visualization, as well as building machine learning models in both programming languages by using authentic research data and questions. All of these R/Python scripts can be executed either on the CUAHSI JupyterHub or on your local machine.
This resource is shared under the CC-BY license. Please contact the creator Tao Wen at Syracuse University (twen08@syr.edu) for any questions you have about this resource. If you identify any errors in the files, please contact the creator.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This resource contains a Jupyter Notebook that is used to introduce hydrologic data analysis and conservation laws. This resource is part of a HydroLearn Physical Hydrology learning module available at https://edx.hydrolearn.org/courses/course-v1:Utah_State_University+CEE6400+2019_Fall/about
In this activity, the student learns how to (1) calculate the residence time of water in land and rivers for the global hydrologic cycle; (2) quantify the relative and absolute uncertainties in components of the water balance; (3) navigate public websites and databases, extract key watershed attributes, and perform basic hydrologic data analysis for a watershed of interest; (4) assess, compare, and interpret hydrologic trends in the context of a specific watershed.
Please note that in problems 3-8, the user is asked to use an R package (i.e., dataRetrieval) and select a U.S. Geological Survey (USGS) streamflow gage to retrieve streamflow data and then apply the hydrological data analysis to the watershed of interest. We acknowledge that the material relies on USGS data that are only available within the U.S. If running for other watersheds of interest outside the U.S. or wishing to work with other datasets, the user must take some further steps and develop codes to prepare the streamflow dataset. Once a streamflow time series dataset is obtained for an international catchment of interest, the user would need to read that file into the workspace before working through subsequent analyses.
Facebook
TwitterBuilding strong quantitative skills prepares undergraduate biology students for successful careers in science and medicine. While math and statistics anxiety can negatively impact student learning within biology classrooms, instructors may reduce this anxiety by steadily building student competency in quantitative reasoning through instructional scaffolding, application-based approaches, and simple computer program interfaces. However, few statistical programs exist that meet all needs of an inclusive, inquiry-based laboratory course. These needs include an open-source program, a simple interface, little required background knowledge in statistics for student users, and customizability to minimize cognitive load, align with course learning outcomes, and create desirable difficulty. To address these needs, we used the Shiny package in R to develop a custom statistical analysis application. Our “BioStats” app provides students with scaffolded learning experiences in applied statistics that promotes student agency and is customizable by the instructor. It introduces students to the strengths of the R interface, while eliminating the need for complex coding in the R programming language. It also prioritizes practical implementation of statistical analyses over learning statistical theory. To our knowledge, this is the first statistics teaching tool where students are presented basic statistics initially, more complex analyses as they advance, and includes an option to learn R statistical coding. The BioStats app interface yields a simplified introduction to applied statistics that is adaptable to many biology laboratory courses.
Primary Image: Singing Junco. A sketch of a junco singing on a pine tree branch, created by the lead author of this paper.
Facebook
TwitterThis data package contains claims-based data about beneficiaries of Medicare program services including Inpatient, Outpatient, related to Chronic Conditions, Skilled Nursing Facility, Home Health Agency, Hospice, Carrier, Durable Medical Equipment (DME) and data related to Prescription Drug Events. It is necessary to mention that the values are estimated and counted, by using a random sample of fee-for-service Medicare claims.
Facebook
TwitterThis dataset was created by Lebelo Hailesilassie
Facebook
TwitterOne of the first steps in a reference interview is determining what is it the user really wants or needs. In many cases, the question comes down to the unit of analysis: what is it that is being investigated or researched? This presentation will take us through the concept of the unit of analysis so that we can improve our reference service — and make our lives easier as a result! Note: This presentation precedes Working with Complex Surveys: Canadian Travel Survey by Chuck Humphrey (14-Mar-2002).
Facebook
TwitterThis publication provides all the information required to understand the PISA 2003 educational performance database and perform analyses in accordance with the complex methodologies used to collect and process the data. It enables researchers to both reproduce the initial results and to undertake further analyses. The publication includes introductory chapters explaining the statistical theories and concepts required to analyse the PISA data, including full chapters on how to apply replicate weights and undertake analyses using plausible values; worked examples providing full syntax in SAS®; and a comprehensive description of the OECD PISA 2003 international database. The PISA 2003 database includes micro-level data on student educational performance for 41 countries collected in 2003, together with students’ responses to the PISA 2003 questionnaires and the test questions. A similar manual is available for SPSS users.
Facebook
Twitterhttps://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The Big Data Basic Platform market is booming, projected to reach $150 billion by 2033 at a 15% CAGR. Discover key trends, drivers, restraints, and leading companies shaping this rapidly evolving sector. Learn more about cloud-based solutions, regional market shares, and future growth potential.
Facebook
TwitterThis resource contains a Jupyter notebook that demonstrate how the CUAHSI JupyterHub platform can be used to perform basic hydrologic data analysis. Temperature data is collected via the CUAHSI Hydrologic Information System (HIS) using web services. These data are interrogated, organized using Python classes, and plotted in various ways to demonstrate common data analysis steps. To get started, click the Open with dropdown on the top right of the resource and select CUAHSI JupyterHub. To use CUAHSI JupyterHub, you will need a HydroShare account.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
As high-throughput methods become more common, training undergraduates to analyze data must include having them generate informative summaries of large datasets. This flexible case study provides an opportunity for undergraduate students to become familiar with the capabilities of R programming in the context of high-throughput evolutionary data collected using macroarrays. The story line introduces a recent graduate hired at a biotech firm and tasked with analysis and visualization of changes in gene expression from 20,000 generations of the Lenski Lab’s Long-Term Evolution Experiment (LTEE). Our main character is not familiar with R and is guided by a coworker to learn about this platform. Initially this involves a step-by-step analysis of the small Iris dataset built into R which includes sepal and petal length of three species of irises. Practice calculating summary statistics and correlations, and making histograms and scatter plots, prepares the protagonist to perform similar analyses with the LTEE dataset. In the LTEE module, students analyze gene expression data from the long-term evolutionary experiments, developing their skills in manipulating and interpreting large scientific datasets through visualizations and statistical analysis. Prerequisite knowledge is basic statistics, the Central Dogma, and basic evolutionary principles. The Iris module provides hands-on experience using R programming to explore and visualize a simple dataset; it can be used independently as an introduction to R for biological data or skipped if students already have some experience with R. Both modules emphasize understanding the utility of R, rather than creation of original code. Pilot testing showed the case study was well-received by students and faculty, who described it as a clear introduction to R and appreciated the value of R for visualizing and analyzing large datasets.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The “Students Performance Data” dataset provides academic and demographic information of students. It includes their marks in Maths, Science, and English along with attendance and city details. This dataset is ideal for beginners learning data entry, analysis, and visualization using tools like Excel or Kaggle Notebooks.