This graph presents the results of a survey, conducted by BARC in 2014/15, into the current and planned distribution of big data projects within companies. At the beginning of 2015, ** percent of respondents indicated that their company's marketing department had already begun using big data analysis.
Spatial analysis and statistical summaries of the Protected Areas Database of the United States (PAD-US) provide land managers and decision makers with a general assessment of management intent for biodiversity protection, natural resource management, and recreation access across the nation. The PAD-US 3.0 Combined Fee, Designation, Easement feature class (with Military Lands and Tribal Areas from the Proclamation and Other Planning Boundaries feature class) was modified to remove overlaps, avoiding overestimation in protected area statistics and to support user needs. A Python scripted process ("PADUS3_0_CreateVectorAnalysisFileScript.zip") associated with this data release prioritized overlapping designations (e.g. Wilderness within a National Forest) based upon their relative biodiversity conservation status (e.g. GAP Status Code 1 over 2), public access values (in the order of Closed, Restricted, Open, Unknown), and geodatabase load order (records are deliberately organized in the PAD-US full inventory with fee owned lands loaded before overlapping management designations, and easements). The Vector Analysis File ("PADUS3_0VectorAnalysisFile_ClipCensus.zip") associated item of PAD-US 3.0 Spatial Analysis and Statistics ( https://doi.org/10.5066/P9KLBB5D ) was clipped to the Census state boundary file to define the extent and serve as a common denominator for statistical summaries. Boundaries of interest to stakeholders (State, Department of the Interior Region, Congressional District, County, EcoRegions I-IV, Urban Areas, Landscape Conservation Cooperative) were incorporated into separate geodatabase feature classes to support various data summaries ("PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.zip") and Comma-separated Value (CSV) tables ("PADUS3_0SummaryStatistics_TabularData_CSV.zip") summarizing "PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.zip" are provided as an alternative format and enable users to explore and download summary statistics of interest (Comma-separated Table [CSV], Microsoft Excel Workbook [.XLSX], Portable Document Format [.PDF] Report) from the PAD-US Lands and Inland Water Statistics Dashboard ( https://www.usgs.gov/programs/gap-analysis-project/science/pad-us-statistics ). In addition, a "flattened" version of the PAD-US 3.0 combined file without other extent boundaries ("PADUS3_0VectorAnalysisFile_ClipCensus.zip") allow for other applications that require a representation of overall protection status without overlapping designation boundaries. The "PADUS3_0VectorAnalysis_State_Clip_CENSUS2020" feature class ("PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.gdb") is the source of the PAD-US 3.0 raster files (associated item of PAD-US 3.0 Spatial Analysis and Statistics, https://doi.org/10.5066/P9KLBB5D ). Note, the PAD-US inventory is now considered functionally complete with the vast majority of land protection types represented in some manner, while work continues to maintain updates and improve data quality (see inventory completeness estimates at: http://www.protectedlands.net/data-stewards/ ). In addition, changes in protected area status between versions of the PAD-US may be attributed to improving the completeness and accuracy of the spatial data more than actual management actions or new acquisitions. USGS provides no legal warranty for the use of this data. While PAD-US is the official aggregation of protected areas ( https://www.fgdc.gov/ngda-reports/NGDA_Datasets.html ), agencies are the best source of their lands data.
The Project for Statistics on Living standards and Development was a countrywide World Bank Living Standards Measurement Survey. It covered approximately 9000 households, drawn from a representative sample of South African households. The fieldwork was undertaken during the nine months leading up to the country's first democratic elections at the end of April 1994. The purpose of the survey was to collect statistical information about the conditions under which South Africans live in order to provide policymakers with the data necessary for planning strategies. This data would aid the implementation of goals such as those outlined in the Government of National Unity's Reconstruction and Development Programme.
National
Households
All Household members. Individuals in hospitals, old age homes, hotels and hostels of educational institutions were not included in the sample. Migrant labour hostels were included. In addition to those that turned up in the selected ESDs, a sample of three hostels was chosen from a national list provided by the Human Sciences Research Council and within each of these hostels a representative sample was drawn on a similar basis as described above for the households in ESDs.
Sample survey data [ssd]
(a) SAMPLING DESIGN
Sample size is 9,000 households. The sample design adopted for the study was a two-stage self-weighting design in which the first stage units were Census Enumerator Subdistricts (ESDs, or their equivalent) and the second stage were households. The advantage of using such a design is that it provides a representative sample that need not be based on accurate census population distribution in the case of South Africa, the sample will automatically include many poor people, without the need to go beyond this and oversample the poor. Proportionate sampling as in such a self-weighting sample design offers the simplest possible data files for further analysis, as weights do not have to be added. However, in the end this advantage could not be retained, and weights had to be added.
(b) SAMPLE FRAME
The sampling frame was drawn up on the basis of small, clearly demarcated area units, each with a population estimate. The nature of the self-weighting procedure adopted ensured that this population estimate was not important for determining the final sample, however. For most of the country, census ESDs were used. Where some ESDs comprised relatively large populations as for instance in some black townships such as Soweto, aerial photographs were used to divide the areas into blocks of approximately equal population size. In other instances, particularly in some of the former homelands, the area units were not ESDs but villages or village groups. In the sample design chosen, the area stage units (generally ESDs) were selected with probability proportional to size, based on the census population. Systematic sampling was used throughout that is, sampling at fixed interval in a list of ESDs, starting at a randomly selected starting point. Given that sampling was self-weighting, the impact of stratification was expected to be modest. The main objective was to ensure that the racial and geographic breakdown approximated the national population distribution. This was done by listing the area stage units (ESDs) by statistical region and then within the statistical region by urban or rural. Within these sub-statistical regions, the ESDs were then listed in order of percentage African. The sampling interval for the selection of the ESDs was obtained by dividing the 1991 census population of 38,120,853 by the 300 clusters to be selected. This yielded 105,800. Starting at a randomly selected point, every 105,800th person down the cluster list was selected. This ensured both geographic and racial diversity (ESDs were ordered by statistical sub-region and proportion of the population African). In three or four instances, the ESD chosen was judged inaccessible and replaced with a similar one. In the second sampling stage the unit of analysis was the household. In each selected ESD a listing or enumeration of households was carried out by means of a field operation. From the households listed in an ESD a sample of households was selected by systematic sampling. Even though the ultimate enumeration unit was the household, in most cases "stands" were used as enumeration units. However, when a stand was chosen as the enumeration unit all households on that stand had to be interviewed.
Face-to-face [f2f]
All the questionnaires were checked when received. Where information was incomplete or appeared contradictory, the questionnaire was sent back to the relevant survey organization. As soon as the data was available, it was captured using local development platform ADE. This was completed in February 1994. Following this, a series of exploratory programs were written to highlight inconsistencies and outlier. For example, all person level files were linked together to ensure that the same person code reported in different sections of the questionnaire corresponded to the same person. The error reports from these programs were compared to the questionnaires and the necessary alterations made. This was a lengthy process, as several files were checked more than once, and completed at the beginning of August 1994. In some cases, questionnaires would contain missing values, or comments that the respondent did not know, or refused to answer a question.
These responses are coded in the data files with the following values: VALUE MEANING -1 : The data was not available on the questionnaire or form -2 : The field is not applicable -3 : Respondent refused to answer -4 : Respondent did not know answer to question
The data collected in clusters 217 and 218 should be viewed as highly unreliable and therefore removed from the data set. The data currently available on the web site has been revised to remove the data from these clusters. Researchers who have downloaded the data in the past should revise their data sets. For information on the data in those clusters, contact SALDRU http://www.saldru.uct.ac.za/.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Gate household income by age. The dataset can be utilized to understand the age-based income distribution of Gate income.
The dataset will have the following datasets when applicable
Please note: The 2020 1-Year ACS estimates data was not reported by the Census Bureau due to the impact on survey collection and analysis caused by COVID-19. Consequently, median household income data for 2020 is unavailable for large cities (population 65,000 and above).
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
Explore our comprehensive data analysis and visual representations for a deeper understanding of Gate income distribution by age. You can refer the same here
The statistic shows the share of developers worldwide that are, will be, or have been involved in a big data or advanced analytics project, in and around 2016. When surveyed, 29 percent of developers said they were currently involved in a big data or advanced analytics project.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We include the course syllabus used to teach quantitative research design and analysis methods to graduate Linguistics students using a blended teaching and learning approach. The blended course took place over two weeks and builds on a face to face course presented over two days in 2019. Students worked through the topics in preparation for a live interactive video session each Friday to go through the activities. Additional communication took place on Slack for two hours each week. A survey was conducted at the start and end of the course to ascertain participants' perceptions of the usefulness of the course. The links to online elements and the evaluations have been removed from the uploaded course guide.Participants who complete this workshop will be able to:- outline the steps and decisions involved in quantitative data analysis of linguistic data- explain common statistical terminology (sample, mean, standard deviation, correlation, nominal, ordinal and scale data)- perform common statistical tests using jamovi (e.g. t-test, correlation, anova, regression)- interpret and report common statistical tests- describe and choose from the various graphing options used to display data- use jamovi to perform common statistical tests and graph resultsEvaluationParticipants who complete the course will use these skills and knowledge to complete the following activities for evaluation:- analyse the data for a project and/or assignment (in part or in whole)- plan the results section of an Honours research project (where applicable)Feedback and suggestions can be directed to M Schaefer schaemn@unisa.ac.za
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data is becoming increasingly ubiquitous today, and data literacy has emerged an essential skill in the workplace. Therefore, it is necessary to equip high school students with data literacy skills in order to prepare them for further learning and future employment. In Indonesia, there is a growing shift towards integrating data literacy in the high school curriculum. As part of a pilot intervention project, academics from two leading Universities organised data literacy boot camps for high school students across various cities in Indonesia. The boot camps aimed at increasing participants’ awareness of the power of analytical and exploration skills, which in turn, would contribute to creating independent and data-literate students. This paper explores student participants’ self-perception of their data literacy as a result of the skills acquired from the boot camps. Qualitative and quantitative data were collected through student surveys and a focus group discussion, and were used to analyse student perception post-intervention. The findings indicate that students became more aware of the usefulness of data literacy and its application in future studies and work after participating in the boot camp. Of the materials delivered at the boot camps, students found the greatest benefit in learning basic statistical concepts and applying them through the use of Microsoft Excel as a tool for basic data analysis. These findings provide valuable policy recommendations that educators and policymakers can use as guidelines for effective data literacy teaching in high schools.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Raw data for meta-analysis of replications project.
Data Science Platform Market Size 2025-2029
The data science platform market size is forecast to increase by USD 763.9 million, at a CAGR of 40.2% between 2024 and 2029.
The market is experiencing significant growth, driven by the increasing integration of Artificial Intelligence (AI) and Machine Learning (ML) technologies. This fusion enables organizations to derive deeper insights from their data, fueling business innovation and decision-making. Another trend shaping the market is the emergence of containerization and microservices in data science platforms. This approach offers enhanced flexibility, scalability, and efficiency, making it an attractive choice for businesses seeking to streamline their data science operations. However, the market also faces challenges. Data privacy and security remain critical concerns, with the increasing volume and complexity of data posing significant risks. Ensuring robust data security and privacy measures is essential for companies to maintain customer trust and comply with regulatory requirements. Additionally, managing the complexity of data science platforms and ensuring seamless integration with existing systems can be a daunting task, requiring significant investment in resources and expertise. Companies must navigate these challenges effectively to capitalize on the market's opportunities and stay competitive in the rapidly evolving data landscape.
What will be the Size of the Data Science Platform Market during the forecast period?
Explore in-depth regional segment analysis with market size data - historical 2019-2023 and forecasts 2025-2029 - in the full report.
Request Free SampleThe market continues to evolve, driven by the increasing demand for advanced analytics and artificial intelligence solutions across various sectors. Real-time analytics and classification models are at the forefront of this evolution, with APIs integrations enabling seamless implementation. Deep learning and model deployment are crucial components, powering applications such as fraud detection and customer segmentation. Data science platforms provide essential tools for data cleaning and data transformation, ensuring data integrity for big data analytics. Feature engineering and data visualization facilitate model training and evaluation, while data security and data governance ensure data privacy and compliance. Machine learning algorithms, including regression models and clustering models, are integral to predictive modeling and anomaly detection.
Statistical analysis and time series analysis provide valuable insights, while ETL processes streamline data integration. Cloud computing enables scalability and cost savings, while risk management and algorithm selection optimize model performance. Natural language processing and sentiment analysis offer new opportunities for data storytelling and computer vision. Supply chain optimization and recommendation engines are among the latest applications of data science platforms, demonstrating their versatility and continuous value proposition. Data mining and data warehousing provide the foundation for these advanced analytics capabilities.
How is this Data Science Platform Industry segmented?
The data science platform industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments. DeploymentOn-premisesCloudComponentPlatformServicesEnd-userBFSIRetail and e-commerceManufacturingMedia and entertainmentOthersSectorLarge enterprisesSMEsApplicationData PreparationData VisualizationMachine LearningPredictive AnalyticsData GovernanceOthersGeographyNorth AmericaUSCanadaEuropeFranceGermanyUKMiddle East and AfricaUAEAPACChinaIndiaJapanSouth AmericaBrazilRest of World (ROW)
By Deployment Insights
The on-premises segment is estimated to witness significant growth during the forecast period.In the dynamic the market, businesses increasingly adopt solutions to gain real-time insights from their data, enabling them to make informed decisions. Classification models and deep learning algorithms are integral parts of these platforms, providing capabilities for fraud detection, customer segmentation, and predictive modeling. API integrations facilitate seamless data exchange between systems, while data security measures ensure the protection of valuable business information. Big data analytics and feature engineering are essential for deriving meaningful insights from vast datasets. Data transformation, data mining, and statistical analysis are crucial processes in data preparation and discovery. Machine learning models, including regression and clustering, are employed for model training and evaluation. Time series analysis and natural language processing are valuable tools for understanding trends and customer sen
Code and associated data for the following preprint:
AP Browning, JA Sharp, RJ Murphy, G Gunasingh, B Lawson, K Burrage, NK Haass, MJ Simpson. 2021 Quantitative analysis of tumour spheroid structure. eLife http://dx.doi.org/https://doi.org/10.7554/eLife.73020
Data comprises measurements relating to the size and inner structure of spheroids grown from WM793b and WM983b melanoma cells over up to 24 days.
Code, data, and interactive figures are available as a Julia module on GitHub:
Browning AP (2021) Github ID v.0.6.2. Quantitative analysis of tumour spheroid structure. https://github.com/ap-browning/Spheroids
(copy archived here)
Code used to process the experimental images is available on Zenodo:
Browning AP, Murphy RJ (2021) Zenodo Image processing algorithm to identify structure of tumour spheroids with cell cycle labelling. https://doi.org/10.5281/zenodo.5121093
Overview
GMAT is a feature rich system containing high fidelity space system models, optimization and targeting,
built in scripting and programming infrastructure, and customizable plots, reports and data
products, to enable flexible analysis and solutions for custom and unique applications. GMAT can
be driven from a fully featured, interactive GUI or from a custom script language. Here are some
of GMAT’s key features broken down by feature group.
Dynamics and Environment Modelling
Plotting, Reporting and Product Generation
Optimization and Targeting
Programming Infrastructure
Interfaces
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Sale City by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Sale City across both sexes and to determine which sex constitutes the majority.
Key observations
There is a majority of female population, with 58.09% of total population being female. Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Scope of gender :
Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Sale City Population by Race & Ethnicity. You can refer the same here
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Spatial analysis and statistical summaries of the Protected Areas Database of the United States (PAD-US) provide land managers and decision makers with a general assessment of management intent for biodiversity protection, natural resource management, and outdoor recreation access across the nation. This data release presents results from statistical summaries of the PAD-US 4.0 protection status (by GAP Status Code) and public access status for various land unit boundaries (PAD-US 4.0 Vector Analysis and Summary Statistics). Summary statistics are also available to explore and download from the PAD-US Statistics Dashboard ( https://www.usgs.gov/programs/gap-analysis-project/science/pad-us-statistics ). The vector GIS analysis file, source data used to summarize statistics for areas of interest to stakeholders (National, State, Department of the Interior Region, Congressional District, County, EcoRegions I-IV, Urban Areas, Landscape Conservation Cooperative), and complete Summary ...
The purpose of this project is to improve the accuracy of statistical software by providing reference datasets with certified computational results that enable the objective evaluation of statistical software. Currently datasets and certified values are provided for assessing the accuracy of software for univariate statistics, linear regression, nonlinear regression, and analysis of variance. The collection includes both generated and 'real-world' data of varying levels of difficulty. Generated datasets are designed to challenge specific computations. These include the classic Wampler datasets for testing linear regression algorithms and the Simon & Lesage datasets for testing analysis of variance algorithms. Real-world data include challenging datasets such as the Longley data for linear regression, and more benign datasets such as the Daniel & Wood data for nonlinear regression. Certified values are 'best-available' solutions. The certification procedure is described in the web pages for each statistical method. Datasets are ordered by level of difficulty (lower, average, and higher). Strictly speaking the level of difficulty of a dataset depends on the algorithm. These levels are merely provided as rough guidance for the user. Producing correct results on all datasets of higher difficulty does not imply that your software will pass all datasets of average or even lower difficulty. Similarly, producing correct results for all datasets in this collection does not imply that your software will do the same for your particular dataset. It will, however, provide some degree of assurance, in the sense that your package provides correct results for datasets known to yield incorrect results for some software. The Statistical Reference Datasets is also supported by the Standard Reference Data Program.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
UNIDO PUB. DRAFT REPORT ON A MEETING ON MANAGEMENT INFORMATION SYSTEMS FOR PROJECT IMPLEMENTATION - (1) DISCUSSES THE NEED FOR SYSTEMATIC MANAGEMENT OF INDUSTRIAL PROJECTS (2) COVERS METHODS OF CONTROL, INCLUDING DATA COLLECTING AND ELECTRONIC DATA PROCESSING FOR STATISTICAL ANALYSIS, DECISION MAKING, RESOURCES ALLOCATION, MANPOWER AND COSTS ESTIMATES, ETC. (3) APPENDS A DESCRIPTION OF THE PERT MANAGEMENT CONTROL SYSTEM. RECOMMENDATIONS, STATISTICS, FLOW CHARTS, DIAGRAMS, BIBLIOGRAPHY.
The Consumer price surveys primarily provide the following: Data on CPI in Palestine covering the West Bank, Gaza Strip and Jerusalem J1 for major and sub groups of expenditure. Statistics needed for decision-makers, planners and those who are interested in the national economy. Contribution to the preparation of quarterly and annual national accounts data.
Consumer Prices and indices are used for a wide range of purposes, the most important of which are as follows: Adjustment of wages, government subsidies and social security benefits to compensate in part or in full for the changes in living costs. To provide an index to measure the price inflation of the entire household sector, which is used to eliminate the inflation impact of the components of the final consumption expenditure of households in national accounts and to dispose of the impact of price changes from income and national groups. Price index numbers are widely used to measure inflation rates and economic recession. Price indices are used by the public as a guide for the family with regard to its budget and its constituent items. Price indices are used to monitor changes in the prices of the goods traded in the market and the consequent position of price trends, market conditions and living costs. However, the price index does not reflect other factors affecting the cost of living, e.g. the quality and quantity of purchased goods. Therefore, it is only one of many indicators used to assess living costs. It is used as a direct method to identify the purchasing power of money, where the purchasing power of money is inversely proportional to the price index.
Palestine West Bank Gaza Strip Jerusalem
The target population for the CPI survey is the shops and retail markets such as grocery stores, supermarkets, clothing shops, restaurants, public service institutions, private schools and doctors.
The target population for the CPI survey is the shops and retail markets such as grocery stores, supermarkets, clothing shops, restaurants, public service institutions, private schools and doctors.
Sample survey data [ssd]
A non-probability purposive sample of sources from which the prices of different goods and services are collected was updated based on the establishment census 2017, in a manner that achieves full coverage of all goods and services that fall within the Palestinian consumer system. These sources were selected based on the availability of the goods within them. It is worth mentioning that the sample of sources was selected from the main cities inside Palestine: Jenin, Tulkarm, Nablus, Qalqiliya, Ramallah, Al-Bireh, Jericho, Jerusalem, Bethlehem, Hebron, Gaza, Jabalia, Dier Al-Balah, Nusseirat, Khan Yunis and Rafah. The selection of these sources was considered to be representative of the variation that can occur in the prices collected from the various sources. The number of goods and services included in the CPI is approximately 730 commodities, whose prices were collected from 3,200 sources. (COICOP) classification is used for consumer data as recommended by the United Nations System of National Accounts (SNA-2008).
Not apply
Computer Assisted Personal Interview [capi]
A tablet-supported electronic form was designed for price surveys to be used by the field teams in collecting data from different governorates, with the exception of Jerusalem J1. The electronic form is supported with GIS, and GPS mapping technique that allow the field workers to locate the outlets exactly on the map and the administrative staff to manage the field remotely. The electronic questionnaire is divided into a number of screens, namely: First screen: shows the metadata for the data source, governorate name, governorate code, source code, source name, full source address, and phone number. Second screen: shows the source interview result, which is either completed, temporarily paused or permanently closed. It also shows the change activity as incomplete or rejected with the explanation for the reason of rejection. Third screen: shows the item code, item name, item unit, item price, product availability, and reason for unavailability. Fourth screen: checks the price data of the related source and verifies their validity through the auditing rules, which was designed specifically for the price programs. Fifth screen: saves and sends data through (VPN-Connection) and (WI-FI technology).
In case of the Jerusalem J1 Governorate, a paper form has been designed to collect the price data so that the form in the top part contains the metadata of the data source and in the lower section contains the price data for the source collected. After that, the data are entered into the price program database.
The price survey forms were already encoded by the project management depending on the specific international statistical classification of each survey. After the researcher collected the price data and sent them electronically, the data was reviewed and audited by the project management. Achievement reports were reviewed on a daily and weekly basis. Also, the detailed price reports at data source levels were checked and reviewed on a daily basis by the project management. If there were any notes, the researcher was consulted in order to verify the data and call the owner in order to correct or confirm the information.
At the end of the data collection process in all governorates, the data will be edited using the following process: Logical revision of prices by comparing the prices of goods and services with others from different sources and other governorates. Whenever a mistake is detected, it should be returned to the field for correction. Mathematical revision of the average prices for items in governorates and the general average in all governorates. Field revision of prices through selecting a sample of the prices collected from the items.
Not apply
The findings of the survey may be affected by sampling errors due to the use of samples in conducting the survey rather than total enumeration of the units of the target population, which increases the chances of variances between the actual values we expect to obtain from the data if we had conducted the survey using total enumeration. The computation of differences between the most important key goods showed that the variation of these goods differs due to the specialty of each survey. For example, for the CPI, the variation between its goods was very low, except in some cases such as banana, tomato, and cucumber goods that had a high coefficient of variation during 2019 due to the high oscillation in their prices. The variance of the key goods in the computed and disseminated CPI survey that was carried out on the Palestine level was for reasons related to sample design and variance calculation of different indicators since there was a difficulty in the dissemination of results by governorates due to lack of weights. Non-sampling errors are probable at all stages of data collection or data entry. Non-sampling errors include: Non-response errors: the selected sources demonstrated a significant cooperation with interviewers; so, there wasn't any case of non-response reported during 2019. Response errors (respondent), interviewing errors (interviewer), and data entry errors: to avoid these types of errors and reduce their effect to a minimum, project managers adopted a number of procedures, including the following: More than one visit was made to every source to explain the objectives of the survey and emphasize the confidentiality of the data. The visits to data sources contributed to empowering relations, cooperation, and the verification of data accuracy. Interviewer errors: a number of procedures were taken to ensure data accuracy throughout the process of field data compilation: Interviewers were selected based on educational qualification, competence, and assessment. Interviewers were trained theoretically and practically on the questionnaire. Meetings were held to remind interviewers of instructions. In addition, explanatory notes were supplied with the surveys. A number of procedures were taken to verify data quality and consistency and ensure data accuracy for the data collected by a questioner throughout processing and data entry (knowing that data collected through paper questionnaires did not exceed 5%): Data entry staff was selected from among specialists in computer programming and were fully trained on the entry programs. Data verification was carried out for 10% of the entered questionnaires to ensure that data entry staff had entered data correctly and in accordance with the provisions of the questionnaire. The result of the verification was consistent with the original data to a degree of 100%. The files of the entered data were received, examined, and reviewed by project managers before findings were extracted. Project managers carried out many checks on data logic and coherence, such as comparing the data of the current month with that of the previous month, and comparing the data of sources and between governorates. Data collected by tablet devices were checked for consistency and accuracy by applying rules at item level to be checked.
Other technical procedures to improve data quality: Seasonal adjustment processes
https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order
The Project Management Software market has evolved significantly over the years, becoming an essential tool for businesses seeking to enhance efficiency, collaboration, and productivity in their project management processes. This software enables organizations to plan, execute, and monitor their projects with ease,
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
The open science movement produces vast quantities of openly published data connected to journal articles, creating an enormous resource for educators to engage students in current topics and analyses. However, educators face challenges using these materials to meet course objectives. I present a case study using open science (published articles and their corresponding datasets) and open educational practices in a capstone course. While engaging in current topics of conservation, students trace connections in the research process, learn statistical analyses, and recreate analyses using the programming language R. I assessed the presence of best practices in open articles and datasets, examined student selection in the open grading policy, surveyed students on their perceived learning gains, and conducted a thematic analysis on student reflections. First, articles and datasets met just over half of the assessed fairness practices, but this increased with the publication date. There was a marginal difference in how assessment categories were weighted by students, with reflections highlighting appreciation for student agency. In course content, students reported the greatest learning gains in describing variables, while collaborative activities (e.g., interacting with peers and instructor) were the most effective support. The most effective tasks to facilitate these learning gains included coding exercises and team-led assignments. Autocoding of student reflections identified 16 themes, and positive sentiments were written nearly 4x more often than negative sentiments. Students positively reflected on their growth in statistical analyses, and negative sentiments focused on how limited prior experience with statistics and coding made them feel nervous. As a group, we encountered several challenges and opportunities in using open science materials. I present key recommendations, based on student experiences, for scientists to consider when publishing open data to provide additional educational benefits to the open science community. Methods Article and dataset fairness To assess the utility of open articles and their datasets as an educational tool in an undergraduate academic setting, I measured the congruence of each pair to a set of best practices and guiding principles. I assessed ten guiding principles and best practices (Table 1), where each category was scored ‘1’ or ‘0’ based on whether it met that criteria, with a total possible score of ten. Open grading policies Students were allowed to specify the percentage weight for each assessment category in the course, including 1) six coding exercises (Exercises), 2) one lead exercise (Lead Exercise), 3) fourteen annotation assignments of readings (Annotations), 4) one final project (Final Project), 5) five discussion board posts and a statement of learning reflection (Discussion), and 6) attendance and participation (Participation). I examined if assessment categories (independent variable) were weighted (dependent variable) differently by students using an analysis of variance (ANOVA) and examined pairwise differences with Tukey HSD. Assessment of perceived learning gains I used a student assessment of learning gains (SALG) survey to measure students’ perceptions of learning gains related to course objectives (Seymour et al. 2000). This Likert-scale survey provided five response categories ranging from ‘no gains’ to ‘great gains’ in learning and the option of open responses in each category. A summary report that converted Likert responses to numbers and calculated descriptive statistics was produced from the SALG instrument website. Student reflections In student reflections, I examined the frequency of the 100 most frequent words, with stop words excluded and a minimum length of four (letters), both “with synonyms” and “with generalizations”. Due to this paper's explorative nature, I used autocoding to identify students' broad themes and sentiments in their reflections. Autocoding examines the sentiment of each word and scores it as positive, neutral, mixed, or negative. In this process, I compared how students felt about each theme, focusing on positive (i.e., satisfaction) and negative (i.e., dissatisfaction) sentiments. The relationship of how sentiment was coded to themes was visualized in a treemap, where the size of a block is relative to the number of references for that code. All reflection processing and analyses were performed in NVivo 14 (Windows). All data were collected with institutional IRB approval (IRB-24–0314). All statistical analyses were performed in R (ver. 4.3.1; R Core Team 2023).
Data availability. Multivariate data analysis. Validation of an instrument for the evaluation of teaching digital competence. SPSS DATA. Multivariate data analysis. Validation of an instrument for the evaluation of teaching digital competence (spss data.sav). The data presented in this file contains the data imported wiyh the Software IBM SPSS Statistics, versión 28.0.1.1(15). EXCEL DATA. Multivariate data analysis. Validation of an instrument for the evaluation of teaching digital competence (spss data.sav). The data presented in this file contains the data imported wiyh the Software IBM SPSS Statistics, versión 28.0.1.1(15). Data of Project factorial.xlsx (The data presented in this file contains the results of the statistical analysis carried out with the Software Microsoft Excel). Data Project reliability.xlsx (The data presented in this file contains the results of the statistical analysis carried out with the Software Microsoft Excel). FIGURES. Multivariate data analysis. Validation of an instrument for the evaluation of teaching digital competence (Figure 1.jpeg, Figure 2.jpeg, Figure 3 and Figure 4.jpeg).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
MethodsThe objective of this project was to determine the capability of a federated analysis approach using DataSHIELD to maintain the level of results of a classical centralized analysis in a real-world setting. This research was carried out on an anonymous synthetic longitudinal real-world oncology cohort randomly splitted in three local databases, mimicking three healthcare organizations, stored in a federated data platform integrating DataSHIELD. No individual data transfer, statistics were calculated simultaneously but in parallel within each healthcare organization and only summary statistics (aggregates) were provided back to the federated data analyst.Descriptive statistics, survival analysis, regression models and correlation were first performed on the centralized approach and then reproduced on the federated approach. The results were then compared between the two approaches.ResultsThe cohort was splitted in three samples (N1 = 157 patients, N2 = 94 and N3 = 64), 11 derived variables and four types of analyses were generated. All analyses were successfully reproduced using DataSHIELD, except for one descriptive variable due to data disclosure limitation in the federated environment, showing the good capability of DataSHIELD. For descriptive statistics, exactly equivalent results were found for the federated and centralized approaches, except some differences for position measures. Estimates of univariate regression models were similar, with a loss of accuracy observed for multivariate models due to source database variability.ConclusionOur project showed a practical implementation and use case of a real-world federated approach using DataSHIELD. The capability and accuracy of common data manipulation and analysis were satisfying, and the flexibility of the tool enabled the production of a variety of analyses while preserving the privacy of individual data. The DataSHIELD forum was also a practical source of information and support. In order to find the right balance between privacy and accuracy of the analysis, set-up of privacy requirements should be established prior to the start of the analysis, as well as a data quality review of the participating healthcare organization.
This graph presents the results of a survey, conducted by BARC in 2014/15, into the current and planned distribution of big data projects within companies. At the beginning of 2015, ** percent of respondents indicated that their company's marketing department had already begun using big data analysis.