Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data analysis can be accurate and reliable only if the underlying assumptions of the used statistical method are validated. Any violations of these assumptions can change the outcomes and conclusions of the analysis. In this study, we developed Smart Data Analysis V2 (SDA-V2), an interactive and user-friendly web application, to assist users with limited statistical knowledge in data analysis, and it can be freely accessed at https://jularatchumnaul.shinyapps.io/SDA-V2/. SDA-V2 automatically explores and visualizes data, examines the underlying assumptions associated with the parametric test, and selects an appropriate statistical method for the given data. Furthermore, SDA-V2 can assess the quality of research instruments and determine the minimum sample size required for a meaningful study. However, while SDA-V2 is a valuable tool for simplifying statistical analysis, it does not replace the need for a fundamental understanding of statistical principles. Researchers are encouraged to combine their expertise with the software’s capabilities to achieve the most accurate and credible results.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
Abstract: We present here annotated MATLAB scripts (and specific guidelines for their use) for Q-mode factor analysis, a constrained least squares multiple linear regression technique, and a total inversion protocol, that are based on the well-known approaches taken by Dymond (1981), Leinen and Pisias (1984), Kyte et al. (1993), and their predecessors. Although these techniques have been used by investigators for the past decades, their application has been neither consistent nor transparent, as their code has remained in-house or in formats not commonly used by many of today's researchers (e.g., FORTRAN). In addition to providing the annotated scripts and instructions for use, we include a sample data set for the user to test their own manipulation of the scripts. Other Description: Pisias, N. G., R. W. Murray, and R. P. Scudder (2013), Multivariate statistical analysis and partitioning of sedimentary geochemical data sets: General principles and specific MATLAB scripts, Geochem. Geophys. Geosyst., 14, 4015–4020, doi:10.1002/ggge.20247.
Facebook
TwitterNatural history museums often contain large collections of the same species and therefore, are a resource for studying intraspecific variation. This module uses 172 images of rock pocket mouse skulls from the UTEP Biodiversity Collections to introduce students to collecting data from images and principles of basic statistics. This module resource focuses on immersing students into the development of study design, analysis, discussion, and communication without overwhelming them. Students enter their data into a Google sheet app that combines data entry, statistical analysis, and presentation all in one. The collaborative framework asks students to work together, share resources, and develop their own questions while learning the principles behind taking measurements from images of museum specimens.
Facebook
TwitterWhen evaluating the real-world treatment effect, the analysis based on randomized clinical trials (RCTs) often introduces generalizability bias due to the difference in risk factors between the trial participants and the real-world patient population. This problem of lack of generalizability associated with the RCT-only analysis can be addressed by leveraging observational studies with large sample sizes that are representative of the real-world population. A set of novel statistical methods, termed “genRCT”, for improving the generalizability of the trial has been developed using calibration weighting, which enforces the covariates balance between the RCT and observational study. This paper aims to review statistical methods for generalizing the RCT findings by harnessing information from large observational studies that represent real-world patients. Specifically, we discuss the choices of data sources and variables to meet key theoretical assumptions and principles. We introduce and compare estimation methods for continuous, binary, and survival endpoints. We showcase the use of the R package genRCT through a case study that estimates the average treatment effect of adjuvant chemotherapy for the stage 1B non-small cell lung patients represented by a large cancer registry.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparative overview of the statistical packages available in moreThanANOVA and SDA-V2.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
These are condensed notes covering selected key points in data analysis and statistics. They were developed by James Kirchner for the course "Analysis of Environmental Data" at Berkeley in the 1990's and 2000's. They are not intended to be comprehensive, and thus are not a substitute for a good textbook or a good education! License: These notes are released by James Kirchner under a Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Background: Ethics is fundamental to all human interactions, yet understanding of its precise elements and their hierarchy remain superficial. Prior attempts to establish a ranking of ethical principles have yielded varied results, indicating the need for further exploration in this area. This study aims to contribute to the understanding of ethics by exploring the relative importance of its elements through a cross-sectional analysis. Objective: The aim of this study was to determine the relative ranking of the elements of ethics including justice, nonmaleficence, autonomy, beneficence, fidelity, veracity, public good and loyalty, in order to establish a working hierarchy of the principles. Methods: Participants were tasked with evaluating ethical conflicts depicted in scenarios. Using a scale of 1 to 10, participants rated their preferred responses to each scenario, allowing for the comparison of different ethical principles. Statistical analysis, including independent samples t-tests, was employed to determine significant differences in preferences. Results: Analysis of participant responses revealed discernible trends in the hierarchy of ethical principles. Notably, justice, non-maleficence, lawfulness, and autonomy emerged as top-tier principles, while beneficence and fidelity constituted second-tier elements. Public good, veracity, and loyalty comprised the third tier. These findings align with and extend upon existing literature, providing valuable insights into the relative importance of ethical principles. Conclusion: Our results indicate that Justice, Nonmaleficence, Lawfulness, and Autonomy, were most important as first-tier principles. Following them, Beneficence and Fidelity were recognized as second-tier principles, with Public Good, Veracity, and Loyalty falling into the third tier.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparison of features in SDA-V2 and well-known statistical analysis software packages (Minitab and SPSS).
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
The Iris dataset is a classic dataset in the field of machine learning and statistics. It's often used for demonstrating various data analysis, machine learning, and statistical techniques. Here are some key details about it:
Background - Origin: The dataset was introduced by the British statistician and biologist Ronald Fisher in his 1936 paper titled "The use of multiple measurements in taxonomic problems." - Purpose: Fisher developed the dataset as an example of linear discriminant analysis.
Data Composition - Data Points: The dataset consists of 150 samples from three species of Iris flowers: Iris Setosa, Iris Versicolour, and Iris Virginica. - Features: There are four features measured in centimeters for each sample: 1. Sepal Length 2. Sepal Width 3. Petal Length 4. Petal Width - Classes: The dataset contains three classes, corresponding to the three species of Iris. Each class has 50 samples.
Usage - Classification: The Iris dataset is widely used for classification tasks, especially to illustrate the principles of supervised machine learning algorithms. - Testing Algorithms: It's often used to test out algorithms for linear regression, classification, and clustering due to its simplicity and small size. - Educational Purpose: Because of its clarity and simplicity, it's frequently used in teaching data science and machine learning.
Characteristics - Simple and Clean: The dataset is straightforward, with minimal preprocessing required, making it ideal for beginners. - Well-Behaved Classes: The species are relatively well separated, though there's some overlap between Versicolor and Virginica. - Multivariate Data: It involves understanding the relationship between multiple variables (the four features).
Applications - Benchmarking: The Iris dataset serves as a benchmark for evaluating the performance of different algorithms. - Visualization**: It's great for practicing data visualization, especially for exploring techniques like scatter plots, box plots, and pair plots to understand feature relationships.
Despite its simplicity, the Iris dataset remains one of the most famous datasets in the world of data science and machine learning. It serves as an excellent starting point for anyone new to the field and remains a baseline for testing algorithms and teaching concepts.
Facebook
TwitterConcentrations of particulate organic carbon (POC) and dissolved organic carbon (DOC), which together comprise total organic carbon, were measured in a reconnaissance study at sampling sites in the Upper Klamath River, Lost River, and Klamath Straits Drain in 2013–16. In addition, data for total nitrogen and chlorophyll a were collected. Optical absorbance and fluorescence properties of dissolved organic matter (DOM), which contains DOC, also were analyzed. Excitation-Emission matrices (EEMs) were produced for each sample and full absorbance spectra. The EEMs were compiled and key data points and regions of the spectra were extracted from each site. Parallel factor analysis was used to decompose the optical fluorescence data into five key components for all samples.
Facebook
TwitterReference Id: SFR18/2011
Publication Type: Statistical First Release
Local Authority data: LA data
Region: England
Release Date: 02 August 2011
Coverage status: Provisional
Publication Status: Published
National curriculum tests provide a snapshot of attainment at the end of key stage 2. Teacher assessment is the teachers’ judgement of pupils’ performance in the whole subject over the whole key stage programme of study.
The SFR contains statistics that were previously published separately in the academic year 2009 to 2010, on KS2 attainment of pupils in science. Science tests are only administered for a nationally representative sample of pupils at the end of key stage 2. This is used to monitor national standards in science, it is not designed to produce regional or local level statistics.
The Qualifications and Curriculum Development Agency (QCDA) undertakes the delivery of statutory tests at the end at Key Stage 2 and national data collection of Key Stages 2 and 3 teacher assessment and test results.
These statistics will be revised in late 2011.
The percentages of pupils achieving Level 4 or above in the 2011 Key Stage 2 tests by subject are as follows:
The percentages of pupils achieving level 4 or above in the 2011 key stage 2 science sampling tests are as follows:
The percentages of pupils achieving Level 5 in the 2011 Key Stage 2 tests by subject are as follows:
The percentages of pupils achieving Level 5 in the 2011 Key Stage 2 science sampling tests are as follows:
The percentage of pupils achieving level 4 or above in the 2011 Key Stage 2 teacher assessments by subject are as follows:
The percentage of pupils achieving level 5 or above in the 2011 Key Stage 3 teacher assessments by subject are as follows:
On 3 August 2011 a small issue was identified with some key stage 2 & 3 local authority figures arising from a small number of schools converting to academy status on 1 July 2011. Amended tables and underlying data were published on 5 August 2011. At key stage 2 no local authority figure was affected by more than a percentage point.
Adam Hatton - Attainment Statistics Team
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Seven Points by gender across 18 age groups. It lists the male and female population in each age group along with the gender ratio for Seven Points. The dataset can be utilized to understand the population distribution of Seven Points by gender and age. For example, using this dataset, we can identify the largest age group for both Men and Women in Seven Points. Additionally, it can be used to see how the gender ratio changes from birth to senior most age group and male to female ratio across each age group for Seven Points.
Key observations
Largest age group (population): Male # 0-4 years (106) | Female # 75-79 years (66). Source: U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Age groups:
Scope of gender :
Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis.
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Seven Points Population by Gender. You can refer the same here
Facebook
TwitterAttribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
The "Segmentation and Key Points of Human Body Dataset" is designed for the apparel and visual entertainment sectors, featuring a collection of internet-collected images with resolutions ranging from 1280 x 960 to 5184 x 3456 pixels. This dataset is comprehensive, including instance and semantic segmentation of 27 categories of body parts along with 24 key points annotations, providing detailed data for human body analysis and applications.
If you has interested in the full version of the datasets, featuring 6.6k annotated images, please visit our website maadaa.ai and leave a request.
| Dataset ID | MD-Image-053 |
|---|---|
| Dataset Name | Segmentation and Key Points of Human Body Dataset |
| Data Type | Image |
| Volume | About 6.6k |
| Data Collection | Internet collected images. Resolution ranges from 1280*960 to 5184*3456 |
| Annotation | Semantic Segmentation,Instance Segmentation |
| Annotation Notes | The dataset includes 27 categories of body parts and 24 key points. |
| Application Scenarios | Apparel, Visual Entertainment |
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22149246%2F7dc65763d846e1ce51d51de554889b40%2Fsegmentation%20keypoint.jpg?generation=1724923513996127&alt=media" alt="">
Since 2015, maadaa.ai has been dedicated to delivering specialized AI data services. Our key offerings include:
Data Collection: Comprehensive data gathering tailored to your needs.
Data Annotation: High-quality annotation services for precise data labeling.
Off-the-Shelf Datasets: Ready-to-use datasets to accelerate your projects.
Annotation Platform: Maid-X is our data annotation platform built for efficient data annotation.
We cater to various sectors, including automotive, healthcare, retail, and more, ensuring our clients receive the best data solutions for their AI initiatives.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This research aims to develop a principle-based framework for audit analytics implementation, which addresses the challenges of AA implementation and acknowledges its socio-technical complexities and interdependencies among challenges. This research relies on mixed methods to capture the phenomena from the research’s participants through various approaches, i.e., MICMAC-ISM, case study, and interview with practitioners, with literature exploration as the starting point. The raw data collected consists of multimedia data (audio and video recordings of interviews and focused group discussion), which is then transformed into a text file (transcript), complemented with a softcopy of the documents from the case study object.
The published data in this dataset, consists of the summarized or analyzed data, as the raw data (including transcript) is not allowed to be published according to the decision by the Human Research Ethics Committee pertinent to this research (Approval #1979, 14 February 2022). This dataset's published data are text files representing the summarized/analyzed raw data as an online appendices to the thesis.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Instructions
Examine the data: Start by thoroughly examining the dataset within the Claims Data resource. Focus on key variables such as claim dates, types of claims, amounts claimed, and additional details about the incidents. Manipulate the data: Derive the missing values in columns F, O, P, and Q. Use hints if needed. This step emphasizes data manipulation, a key component of account pricing analysis. Identify patterns and anomalies: Conduct EDA using the data in the Claims Data resource. Identify patterns, trends, and anomalies. Utilize visual tools such as histograms, scatter plots, and bar charts within Excel to help you visualize and interpret the data. 2. Apply actuarial principles to the data
Risk assessment: Use the actuarial principles you learned in Task 1 to assess the risks associated with the claims data. Calculate key metrics such as claim frequency, severity, and loss ratios based on the data provided. Calculate premiums: Develop a pricing model using experience-based rating. This involves adjusting historical data from the Claims Data resource to project future claims costs, considering factors such as inflation and changes in exposure. 3. Develop comprehensive reports in Excel
Analysis report: Compile your findings: Organize your EDA into a well-structured section within the Excel workbook. This section should include a detailed evaluation of the Marine Liability insurance claims data, visualizations of key findings, and a commentary on observed trends and anomalies. Commentary on risks and uncertainties: Provide a clear commentary on the risks and uncertainties associated with your assessment. Discuss how different scenarios could impact the pricing model and the potential financial implications for Oceanic Shipping Co. Pricing calculation: Perform a numbers-based premium calculation: Use the Claims Data resource to calculate the appropriate premiums for the Marine Liability insurance policy. Apply actuarial principles such as loss frequency, loss severity, and pure premium calculation, and adjust for expenses and profit margins. Sensitivity analysis: Include a sensitivity analysis within the Excel workbook to assess how changes in key assumptions (e.g., an increase in loss severity) could impact the final premium. Document your calculations: Ensure your premium calculation section in Excel clearly documents your methodology, assumptions, and final premium recommendations. Discuss the potential risks and uncertainties in your pricing model, including any external factors that could impact future claims.
Facebook
TwitterReference Id: SFR24/2010
Publication Type: Statistical First Release
Publication data: Underlying Statistical data
Region: England
Release Date: 10 August 2010
Coverage status: Final
Publication Status: Published
The figures in this SFR are produced from data provided to the department by the Qualifications and Curriculum Development Agency (QCDA) on 13 July 2010.
National curriculum assessment provides a measurement of achievement against the precise attainment targets of the national curriculum rather than any generalised concept of ability in any of the subject areas. The national Curriculum standards have been designed so that most pupils will progress by approximately one level every two years. This means that by the end of KS2 pupils are expected to achieve Level 4.
Details about the methodology and design of the sample can be found on the http://webarchive.nationalarchives.gov.uk/20110810144333/http://qcda.gov.uk/assessment/85.aspx">National Archives QCDA website
The estimated percentages of children achieving Level 4 or above based on the 2010 Key Stage 2 science sample tests are as follows:
Based on the confidence intervals given it is not possible to conclude that girls perform significantly better than boys.
When the whole key stage 2 cohort took tests in 2009 the overall percentage of pupils achieving Level 4 or above was 88%.
Comparisons with previous years are difficult as previous tests were taken under a policy of tests which fed the school accountability framework. These tests do not play any part in school accountability.
The estimated percentages of children achieving Level 5 based on the 2010 Key Stage 2 science sample tests are as follows:
Based on the confidence intervals given it is not possible to conclude that boys perform significantly better than girls at level 5.
When the whole Key Stage 2 cohort took tests in 2009 the overall percentage of pupils achieving Level 5 was 43%.
The underlying data for this publication was made available on 29 September 2010.
Adam Hatton - Attainment Statistics Team
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
This study presents a framework to automatically analyze head motion in birds from videos of natural behaviors. The process involves detecting birds, identifying key points on their heads, and tracking changes in their positions over time. Bird detection and key point extraction were trained on publicly available datasets, featuring videos and images of diverse bird species in uncontrolled settings. Initial challenges with complex video backgrounds causing misidentifications and inaccurate key points were addressed through validation, refinement, filtering, and smoothing. Head angular velocities and rotation frequencies were computed from the refined key points. The algorithm performed well at moderate speeds but was limited by the 30 Hz frame rate of most videos, which constrained measurable angular velocities and frequencies and caused motion blur, affecting key point detection. Our findings suggest that the framework may provide plausible estimates of head motion but also emphasize the importance of high frame rate videos in future research, including extensive comparisons against ground truth data, to fully characterize bird head movements. Importantly, this work is a foundational effort to understand the evolutionary drivers of the semicircular canals, the biosensor that monitors head rotations, for both extinct and extant tetrapods. Methods For the development of the bird head pose estimation (BHPE) module a new 2D BHPE annotated dataset is proposed, here entitled BirdGaze, which includes images from four prominent sources: the Animal Kingdom, NABirds, Birdsnap and eBird. These datasets represent the largest publicly available collections and are widely recognized in the literature for their significant role in avian research. Their extensive morphological diversity is crucial for this study. Besides the bird images, the proposed BirdGaze dataset includes a set of annotations, notably:
Center of the bounding box containing the bird body, described by its 2D coordinates; Scale factor, defining a multiplying factor to apply to the bird bounding box for resizing it to fit a fixed rectangle size, which is used as input to the adopted key point extraction model; Coordinates of the four selected 2D key points: top of head, tip of beak, left eye, right eye.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Explore the dynamic world of Adidas US Sales with this comprehensive dataset. The dataset encapsulates detailed information on sales transactions, retailer details, product categories, and more. Each entry includes critical metrics such as total sales, operating profit, units sold, and various operational aspects.
Key Points: - Rich sales data spanning from 2020 to 2021. - Granular details on product types, retailers, and sales methods. - Insights into regional performance, pricing strategies, and operating margins. - Ideal for exploratory data analysis, predictive modeling, and business strategy formulation.
Dataset Columns Which I am Using For Analysis: - Retailer - Retailer ID - Invoice Date - Region - State - City - Product - Price per Unit - Units Sold - Total Sales - Operating Profit - Operating Margin - Sales Method - Year This dataset to derive actionable insights, refine business strategies, and elevate your data analysis skills. Dive into the world of Adidas US Sales and uncover the stories hidden in the numbers.
Facebook
TwitterThis publication presents information on the number and pass rates of driving and riding tests conducted in Great Britain to 31st March 2012 (covering the whole of the 2011/12 financial year).
The statistics are derived from data held by the Driver and Vehicle Standards Agency (DVSA), which administers the driving test and training schemes in Great Britain.
A supplementary bulletin will be released in July. This will contain more detailed tables providing breakdowns of test passes by age of candidate and number of test attempts.
Information on Driver and Rider Test and Instructor statistics, including the pre-release access list, and related technical documentation can be found here.
Facebook
TwitterThe first part of this report discusses the overall statistical planning, coordination and design for several tar sand wastewater treatment projects contracted by the Laramie Energy Technology Center (LETC) of the Department of Energy. A general discussion of the benefits of consistent statistical design and analysis for data-oriented projects is included, with recommendations for implementation. A detailed outline of the principles of general linear models design is followed by an introduction to recent developments in general linear models by ranks (GLMR) analysis and a comparison to standard analysis using Gaussian or normal theory (GLMN). A listing of routines contained in the VPI Nonparametric Statistics Package (NPSP), installed on the Cyber computer system at the University of Wyoming is included. Part 2 describes in detail the design and analysis for treatments by Gas Flotation, Foam Separation, Coagulation, and Ozonation, with comparisons among the first three methods. Rank methods are used for most analyses, and several detailed examples are included. For optimization studies, the powerful tools of response surface analysis (RSA) are employed, and several sections contain discussion on the benefits of RSA. All four treatment methods proved to be effective for removal of TOC and suspended solids from the wastewater. Because the processes and equipment designs were new, optimum removals were not achieved by these initial studies and reasons for that are discussed. Pollutant levels were nevertheless reduced to levels appropriate for recycling within the process, and for such reuses as steam generation, according to the DOE/LETC project officer. 12 refs., 8 figs., 21 tabs.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data analysis can be accurate and reliable only if the underlying assumptions of the used statistical method are validated. Any violations of these assumptions can change the outcomes and conclusions of the analysis. In this study, we developed Smart Data Analysis V2 (SDA-V2), an interactive and user-friendly web application, to assist users with limited statistical knowledge in data analysis, and it can be freely accessed at https://jularatchumnaul.shinyapps.io/SDA-V2/. SDA-V2 automatically explores and visualizes data, examines the underlying assumptions associated with the parametric test, and selects an appropriate statistical method for the given data. Furthermore, SDA-V2 can assess the quality of research instruments and determine the minimum sample size required for a meaningful study. However, while SDA-V2 is a valuable tool for simplifying statistical analysis, it does not replace the need for a fundamental understanding of statistical principles. Researchers are encouraged to combine their expertise with the software’s capabilities to achieve the most accurate and credible results.