100+ datasets found
  1. Departmental distribution of big data projects 2015

    • statista.com
    Updated Jul 29, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2015). Departmental distribution of big data projects 2015 [Dataset]. https://www.statista.com/statistics/491254/big-data-departmental-use/
    Explore at:
    Dataset updated
    Jul 29, 2015
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Dec 2014 - Feb 2015
    Area covered
    North America, Europe, Worldwide
    Description

    This graph presents the results of a survey, conducted by BARC in 2014/15, into the current and planned distribution of big data projects within companies. At the beginning of 2015, ** percent of respondents indicated that their company's marketing department had already begun using big data analysis.

  2. d

    Protected Areas Database of the United States (PAD-US) 3.0 Vector Analysis...

    • catalog.data.gov
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Protected Areas Database of the United States (PAD-US) 3.0 Vector Analysis and Summary Statistics [Dataset]. https://catalog.data.gov/dataset/protected-areas-database-of-the-united-states-pad-us-3-0-vector-analysis-and-summary-stati
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    United States
    Description

    Spatial analysis and statistical summaries of the Protected Areas Database of the United States (PAD-US) provide land managers and decision makers with a general assessment of management intent for biodiversity protection, natural resource management, and recreation access across the nation. The PAD-US 3.0 Combined Fee, Designation, Easement feature class (with Military Lands and Tribal Areas from the Proclamation and Other Planning Boundaries feature class) was modified to remove overlaps, avoiding overestimation in protected area statistics and to support user needs. A Python scripted process ("PADUS3_0_CreateVectorAnalysisFileScript.zip") associated with this data release prioritized overlapping designations (e.g. Wilderness within a National Forest) based upon their relative biodiversity conservation status (e.g. GAP Status Code 1 over 2), public access values (in the order of Closed, Restricted, Open, Unknown), and geodatabase load order (records are deliberately organized in the PAD-US full inventory with fee owned lands loaded before overlapping management designations, and easements). The Vector Analysis File ("PADUS3_0VectorAnalysisFile_ClipCensus.zip") associated item of PAD-US 3.0 Spatial Analysis and Statistics ( https://doi.org/10.5066/P9KLBB5D ) was clipped to the Census state boundary file to define the extent and serve as a common denominator for statistical summaries. Boundaries of interest to stakeholders (State, Department of the Interior Region, Congressional District, County, EcoRegions I-IV, Urban Areas, Landscape Conservation Cooperative) were incorporated into separate geodatabase feature classes to support various data summaries ("PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.zip") and Comma-separated Value (CSV) tables ("PADUS3_0SummaryStatistics_TabularData_CSV.zip") summarizing "PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.zip" are provided as an alternative format and enable users to explore and download summary statistics of interest (Comma-separated Table [CSV], Microsoft Excel Workbook [.XLSX], Portable Document Format [.PDF] Report) from the PAD-US Lands and Inland Water Statistics Dashboard ( https://www.usgs.gov/programs/gap-analysis-project/science/pad-us-statistics ). In addition, a "flattened" version of the PAD-US 3.0 combined file without other extent boundaries ("PADUS3_0VectorAnalysisFile_ClipCensus.zip") allow for other applications that require a representation of overall protection status without overlapping designation boundaries. The "PADUS3_0VectorAnalysis_State_Clip_CENSUS2020" feature class ("PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.gdb") is the source of the PAD-US 3.0 raster files (associated item of PAD-US 3.0 Spatial Analysis and Statistics, https://doi.org/10.5066/P9KLBB5D ). Note, the PAD-US inventory is now considered functionally complete with the vast majority of land protection types represented in some manner, while work continues to maintain updates and improve data quality (see inventory completeness estimates at: http://www.protectedlands.net/data-stewards/ ). In addition, changes in protected area status between versions of the PAD-US may be attributed to improving the completeness and accuracy of the spatial data more than actual management actions or new acquisitions. USGS provides no legal warranty for the use of this data. While PAD-US is the official aggregation of protected areas ( https://www.fgdc.gov/ngda-reports/NGDA_Datasets.html ), agencies are the best source of their lands data.

  3. f

    Project for Statistics on Living Standards and Development 1993 - South...

    • microdata.fao.org
    • catalog.ihsn.org
    • +2more
    Updated Oct 20, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Southern Africa Labour and Development Research Unit (2020). Project for Statistics on Living Standards and Development 1993 - South Africa [Dataset]. https://microdata.fao.org/index.php/catalog/1527
    Explore at:
    Dataset updated
    Oct 20, 2020
    Dataset authored and provided by
    Southern Africa Labour and Development Research Unit
    Time period covered
    1993
    Area covered
    South Africa
    Description

    Abstract

    The Project for Statistics on Living standards and Development was a countrywide World Bank Living Standards Measurement Survey. It covered approximately 9000 households, drawn from a representative sample of South African households. The fieldwork was undertaken during the nine months leading up to the country's first democratic elections at the end of April 1994. The purpose of the survey was to collect statistical information about the conditions under which South Africans live in order to provide policymakers with the data necessary for planning strategies. This data would aid the implementation of goals such as those outlined in the Government of National Unity's Reconstruction and Development Programme.

    Geographic coverage

    National

    Analysis unit

    Households

    Universe

    All Household members. Individuals in hospitals, old age homes, hotels and hostels of educational institutions were not included in the sample. Migrant labour hostels were included. In addition to those that turned up in the selected ESDs, a sample of three hostels was chosen from a national list provided by the Human Sciences Research Council and within each of these hostels a representative sample was drawn on a similar basis as described above for the households in ESDs.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    (a) SAMPLING DESIGN

    Sample size is 9,000 households. The sample design adopted for the study was a two-stage self-weighting design in which the first stage units were Census Enumerator Subdistricts (ESDs, or their equivalent) and the second stage were households. The advantage of using such a design is that it provides a representative sample that need not be based on accurate census population distribution in the case of South Africa, the sample will automatically include many poor people, without the need to go beyond this and oversample the poor. Proportionate sampling as in such a self-weighting sample design offers the simplest possible data files for further analysis, as weights do not have to be added. However, in the end this advantage could not be retained, and weights had to be added.

    (b) SAMPLE FRAME

    The sampling frame was drawn up on the basis of small, clearly demarcated area units, each with a population estimate. The nature of the self-weighting procedure adopted ensured that this population estimate was not important for determining the final sample, however. For most of the country, census ESDs were used. Where some ESDs comprised relatively large populations as for instance in some black townships such as Soweto, aerial photographs were used to divide the areas into blocks of approximately equal population size. In other instances, particularly in some of the former homelands, the area units were not ESDs but villages or village groups. In the sample design chosen, the area stage units (generally ESDs) were selected with probability proportional to size, based on the census population. Systematic sampling was used throughout that is, sampling at fixed interval in a list of ESDs, starting at a randomly selected starting point. Given that sampling was self-weighting, the impact of stratification was expected to be modest. The main objective was to ensure that the racial and geographic breakdown approximated the national population distribution. This was done by listing the area stage units (ESDs) by statistical region and then within the statistical region by urban or rural. Within these sub-statistical regions, the ESDs were then listed in order of percentage African. The sampling interval for the selection of the ESDs was obtained by dividing the 1991 census population of 38,120,853 by the 300 clusters to be selected. This yielded 105,800. Starting at a randomly selected point, every 105,800th person down the cluster list was selected. This ensured both geographic and racial diversity (ESDs were ordered by statistical sub-region and proportion of the population African). In three or four instances, the ESD chosen was judged inaccessible and replaced with a similar one. In the second sampling stage the unit of analysis was the household. In each selected ESD a listing or enumeration of households was carried out by means of a field operation. From the households listed in an ESD a sample of households was selected by systematic sampling. Even though the ultimate enumeration unit was the household, in most cases "stands" were used as enumeration units. However, when a stand was chosen as the enumeration unit all households on that stand had to be interviewed.

    Mode of data collection

    Face-to-face [f2f]

    Cleaning operations

    All the questionnaires were checked when received. Where information was incomplete or appeared contradictory, the questionnaire was sent back to the relevant survey organization. As soon as the data was available, it was captured using local development platform ADE. This was completed in February 1994. Following this, a series of exploratory programs were written to highlight inconsistencies and outlier. For example, all person level files were linked together to ensure that the same person code reported in different sections of the questionnaire corresponded to the same person. The error reports from these programs were compared to the questionnaires and the necessary alterations made. This was a lengthy process, as several files were checked more than once, and completed at the beginning of August 1994. In some cases, questionnaires would contain missing values, or comments that the respondent did not know, or refused to answer a question.

    These responses are coded in the data files with the following values: VALUE MEANING -1 : The data was not available on the questionnaire or form -2 : The field is not applicable -3 : Respondent refused to answer -4 : Respondent did not know answer to question

    Data appraisal

    The data collected in clusters 217 and 218 should be viewed as highly unreliable and therefore removed from the data set. The data currently available on the web site has been revised to remove the data from these clusters. Researchers who have downloaded the data in the past should revise their data sets. For information on the data in those clusters, contact SALDRU http://www.saldru.uct.ac.za/.

  4. N

    Comprehensive Income by Age Group Dataset: Longitudinal Analysis of Gate, OK...

    • neilsberg.com
    Updated Aug 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Comprehensive Income by Age Group Dataset: Longitudinal Analysis of Gate, OK Household Incomes Across 4 Age Groups and 16 Income Brackets. Annual Editions Collection // 2024 Edition [Dataset]. https://www.neilsberg.com/research/datasets/2ecf2ac3-aeee-11ee-aaca-3860777c1fe6/
    Explore at:
    Dataset updated
    Aug 7, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Gate
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the Gate household income by age. The dataset can be utilized to understand the age-based income distribution of Gate income.

    Content

    The dataset will have the following datasets when applicable

    Please note: The 2020 1-Year ACS estimates data was not reported by the Census Bureau due to the impact on survey collection and analysis caused by COVID-19. Consequently, median household income data for 2020 is unavailable for large cities (population 65,000 and above).

    • Gate, OK annual median income by age groups dataset (in 2022 inflation-adjusted dollars)
    • Age-wise distribution of Gate, OK household incomes: Comparative analysis across 16 income brackets

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Interested in deeper insights and visual analysis?

    Explore our comprehensive data analysis and visual representations for a deeper understanding of Gate income distribution by age. You can refer the same here

  5. Big data/advanced analytics project involvement of software developers...

    • statista.com
    Updated Oct 16, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2016). Big data/advanced analytics project involvement of software developers worldwide 2016 [Dataset]. https://www.statista.com/statistics/627199/worldwide-developer-survey-big-data-and-analytics-involvement/
    Explore at:
    Dataset updated
    Oct 16, 2016
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2016
    Area covered
    Worldwide
    Description

    The statistic shows the share of developers worldwide that are, will be, or have been involved in a big data or advanced analytics project, in and around 2016. When surveyed, 29 percent of developers said they were currently involved in a big data or advanced analytics project.

  6. f

    Quantitative Research Methods and Data Analysis Workshop 2020

    • unisa.figshare.com
    pdf
    Updated Jun 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tracy Probert; Maxine Schaefer; Anneke Carien Wilsenach (2025). Quantitative Research Methods and Data Analysis Workshop 2020 [Dataset]. http://doi.org/10.25399/UnisaData.12581483.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 12, 2025
    Dataset provided by
    University of South Africa
    Authors
    Tracy Probert; Maxine Schaefer; Anneke Carien Wilsenach
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We include the course syllabus used to teach quantitative research design and analysis methods to graduate Linguistics students using a blended teaching and learning approach. The blended course took place over two weeks and builds on a face to face course presented over two days in 2019. Students worked through the topics in preparation for a live interactive video session each Friday to go through the activities. Additional communication took place on Slack for two hours each week. A survey was conducted at the start and end of the course to ascertain participants' perceptions of the usefulness of the course. The links to online elements and the evaluations have been removed from the uploaded course guide.Participants who complete this workshop will be able to:- outline the steps and decisions involved in quantitative data analysis of linguistic data- explain common statistical terminology (sample, mean, standard deviation, correlation, nominal, ordinal and scale data)- perform common statistical tests using jamovi (e.g. t-test, correlation, anova, regression)- interpret and report common statistical tests- describe and choose from the various graphing options used to display data- use jamovi to perform common statistical tests and graph resultsEvaluationParticipants who complete the course will use these skills and knowledge to complete the following activities for evaluation:- analyse the data for a project and/or assignment (in part or in whole)- plan the results section of an Honours research project (where applicable)Feedback and suggestions can be directed to M Schaefer schaemn@unisa.ac.za

  7. f

    Descriptive statistics and reliability tests.

    • plos.figshare.com
    xls
    Updated Jan 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Charanjit Kaur; Pei P. Tan; Nurjannah Nurjannah; Ririn Yuniasih (2025). Descriptive statistics and reliability tests. [Dataset]. http://doi.org/10.1371/journal.pone.0312306.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 3, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Charanjit Kaur; Pei P. Tan; Nurjannah Nurjannah; Ririn Yuniasih
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data is becoming increasingly ubiquitous today, and data literacy has emerged an essential skill in the workplace. Therefore, it is necessary to equip high school students with data literacy skills in order to prepare them for further learning and future employment. In Indonesia, there is a growing shift towards integrating data literacy in the high school curriculum. As part of a pilot intervention project, academics from two leading Universities organised data literacy boot camps for high school students across various cities in Indonesia. The boot camps aimed at increasing participants’ awareness of the power of analytical and exploration skills, which in turn, would contribute to creating independent and data-literate students. This paper explores student participants’ self-perception of their data literacy as a result of the skills acquired from the boot camps. Qualitative and quantitative data were collected through student surveys and a focus group discussion, and were used to analyse student perception post-intervention. The findings indicate that students became more aware of the usefulness of data literacy and its application in future studies and work after participating in the boot camp. Of the materials delivered at the boot camps, students found the greatest benefit in learning basic statistical concepts and applying them through the use of Microsoft Excel as a tool for basic data analysis. These findings provide valuable policy recommendations that educators and policymakers can use as guidelines for effective data literacy teaching in high schools.

  8. Raw data for meta-analysis of replications project

    • figshare.com
    xlsx
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sonia Lee (2023). Raw data for meta-analysis of replications project [Dataset]. http://doi.org/10.6084/m9.figshare.3081610.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Sonia Lee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Raw data for meta-analysis of replications project.

  9. Data Science Platform Market Analysis, Size, and Forecast 2025-2029: North...

    • technavio.com
    Updated Feb 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2025). Data Science Platform Market Analysis, Size, and Forecast 2025-2029: North America (US and Canada), Europe (France, Germany, UK), APAC (China, India, Japan), South America (Brazil), and Middle East and Africa (UAE) [Dataset]. https://www.technavio.com/report/data-science-platform-market-industry-analysis
    Explore at:
    Dataset updated
    Feb 15, 2025
    Dataset provided by
    TechNavio
    Authors
    Technavio
    Time period covered
    2021 - 2025
    Area covered
    Canada, United States, Global
    Description

    Snapshot img

    Data Science Platform Market Size 2025-2029

    The data science platform market size is forecast to increase by USD 763.9 million, at a CAGR of 40.2% between 2024 and 2029.

    The market is experiencing significant growth, driven by the increasing integration of Artificial Intelligence (AI) and Machine Learning (ML) technologies. This fusion enables organizations to derive deeper insights from their data, fueling business innovation and decision-making. Another trend shaping the market is the emergence of containerization and microservices in data science platforms. This approach offers enhanced flexibility, scalability, and efficiency, making it an attractive choice for businesses seeking to streamline their data science operations. However, the market also faces challenges. Data privacy and security remain critical concerns, with the increasing volume and complexity of data posing significant risks. Ensuring robust data security and privacy measures is essential for companies to maintain customer trust and comply with regulatory requirements. Additionally, managing the complexity of data science platforms and ensuring seamless integration with existing systems can be a daunting task, requiring significant investment in resources and expertise. Companies must navigate these challenges effectively to capitalize on the market's opportunities and stay competitive in the rapidly evolving data landscape.

    What will be the Size of the Data Science Platform Market during the forecast period?

    Explore in-depth regional segment analysis with market size data - historical 2019-2023 and forecasts 2025-2029 - in the full report.
    Request Free SampleThe market continues to evolve, driven by the increasing demand for advanced analytics and artificial intelligence solutions across various sectors. Real-time analytics and classification models are at the forefront of this evolution, with APIs integrations enabling seamless implementation. Deep learning and model deployment are crucial components, powering applications such as fraud detection and customer segmentation. Data science platforms provide essential tools for data cleaning and data transformation, ensuring data integrity for big data analytics. Feature engineering and data visualization facilitate model training and evaluation, while data security and data governance ensure data privacy and compliance. Machine learning algorithms, including regression models and clustering models, are integral to predictive modeling and anomaly detection. Statistical analysis and time series analysis provide valuable insights, while ETL processes streamline data integration. Cloud computing enables scalability and cost savings, while risk management and algorithm selection optimize model performance. Natural language processing and sentiment analysis offer new opportunities for data storytelling and computer vision. Supply chain optimization and recommendation engines are among the latest applications of data science platforms, demonstrating their versatility and continuous value proposition. Data mining and data warehousing provide the foundation for these advanced analytics capabilities.

    How is this Data Science Platform Industry segmented?

    The data science platform industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments. DeploymentOn-premisesCloudComponentPlatformServicesEnd-userBFSIRetail and e-commerceManufacturingMedia and entertainmentOthersSectorLarge enterprisesSMEsApplicationData PreparationData VisualizationMachine LearningPredictive AnalyticsData GovernanceOthersGeographyNorth AmericaUSCanadaEuropeFranceGermanyUKMiddle East and AfricaUAEAPACChinaIndiaJapanSouth AmericaBrazilRest of World (ROW)

    By Deployment Insights

    The on-premises segment is estimated to witness significant growth during the forecast period.In the dynamic the market, businesses increasingly adopt solutions to gain real-time insights from their data, enabling them to make informed decisions. Classification models and deep learning algorithms are integral parts of these platforms, providing capabilities for fraud detection, customer segmentation, and predictive modeling. API integrations facilitate seamless data exchange between systems, while data security measures ensure the protection of valuable business information. Big data analytics and feature engineering are essential for deriving meaningful insights from vast datasets. Data transformation, data mining, and statistical analysis are crucial processes in data preparation and discovery. Machine learning models, including regression and clustering, are employed for model training and evaluation. Time series analysis and natural language processing are valuable tools for understanding trends and customer sen

  10. q

    Data from: Quantitative analysis of tumour spheroid structure

    • researchdatafinder.qut.edu.au
    Updated Feb 2, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Browning (2022). Quantitative analysis of tumour spheroid structure [Dataset]. https://researchdatafinder.qut.edu.au/display/n26538
    Explore at:
    Dataset updated
    Feb 2, 2022
    Dataset provided by
    Queensland University of Technology (QUT)
    Authors
    Alexander Browning
    Description

    Code and associated data for the following preprint:

    AP Browning, JA Sharp, RJ Murphy, G Gunasingh, B Lawson, K Burrage, NK Haass, MJ Simpson. 2021 Quantitative analysis of tumour spheroid structure. eLife http://dx.doi.org/https://doi.org/10.7554/eLife.73020

    Data comprises measurements relating to the size and inner structure of spheroids grown from WM793b and WM983b melanoma cells over up to 24 days.

    Code, data, and interactive figures are available as a Julia module on GitHub:

    Browning AP (2021) Github ID v.0.6.2. Quantitative analysis of tumour spheroid structure. https://github.com/ap-browning/Spheroids

    (copy archived here)

    Code used to process the experimental images is available on Zenodo:

    Browning AP, Murphy RJ (2021) Zenodo Image processing algorithm to identify structure of tumour spheroids with cell cycle labelling. https://doi.org/10.5281/zenodo.5121093

  11. d

    General Mission Analysis Tool Project

    • catalog.data.gov
    • data.staging.idas-ds1.appdat.jsc.nasa.gov
    • +2more
    Updated Apr 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). General Mission Analysis Tool Project [Dataset]. https://catalog.data.gov/dataset/general-mission-analysis-tool-project
    Explore at:
    Dataset updated
    Apr 10, 2025
    Description

    Overview

    GMAT is a feature rich system containing high fidelity space system models, optimization and targeting,
    built in scripting and programming infrastructure, and customizable plots, reports and data
    products, to enable flexible analysis and solutions for custom and unique applications. GMAT can
    be driven from a fully featured, interactive GUI or from a custom script language. Here are some
    of GMAT’s key features broken down by feature group.

    Dynamics and Environment Modelling

    • High fidelity dynamics models including harmonic gravity, drag, tides, and relativistic corrections
    • High fidelity spacecraft modeling
    • Formations and constellations
    • Impulsive and finite maneuver modeling and optimization
    • Propulsion system modeling including tanks and thrusters
    • Solar System modeling including high fidelity ephemerides, custom celestial bodies, libration points, and barycenters
    • Rich set of coordinate system including J2000, ICRF, fixed, rotating, topocentric, and many others
    • SPICE kernel propagation
    • Propagators that naturally synchronize epochs of multiple vehicles and avoid fixed step integration
    • and interpolation

    Plotting, Reporting and Product Generation

    • Interactive 3-D graphics
    • Customizable data plots and reports
    • Post computation animation
    • CCSDS, SPK, and Code-500 ephemeris generation

    Optimization and Targeting

    • Boundary value targeters
    • Nonlinear, constrained optimization
    • Custom, scriptable cost functions
    • Custom, scriptable nonlinear equality and inequality constraint functions
    • Custom targeter controls and constraints

    Programming Infrastructure

    • User defined variables, arrays, and strings
    • User defined equations using MATLAB syntax. (i.e. overloaded array operation)
    • Control flow such as If, For, and While loops for custom applications
    • Matlab interface
    • Built in parameters and calculations in multiple coordinate systems

    Interfaces

    • Fully featured, interactive GUI that makes simple analysis quick and easy
    • Custom scripting language that makes complex, custom analysis possible
    • Matlab interface for custom external simulations and calculations
    • File interface for the TCOPS Vector Hold

  12. N

    Sale City, GA Population Breakdown by Gender Dataset: Male and Female...

    • neilsberg.com
    csv, json
    Updated Feb 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Sale City, GA Population Breakdown by Gender Dataset: Male and Female Population Distribution // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/b25146a5-f25d-11ef-8c1b-3860777c1fe6/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Feb 24, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Georgia, Sale City
    Variables measured
    Male Population, Female Population, Male Population as Percent of Total Population, Female Population as Percent of Total Population
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the population of Sale City by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Sale City across both sexes and to determine which sex constitutes the majority.

    Key observations

    There is a majority of female population, with 58.09% of total population being female. Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Scope of gender :

    Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.

    Variables / Data Columns

    • Gender: This column displays the Gender (Male / Female)
    • Population: The population of the gender in the Sale City is shown in this column.
    • % of Total Population: This column displays the percentage distribution of each gender as a proportion of Sale City total population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Sale City Population by Race & Ethnicity. You can refer the same here

  13. U

    Protected Areas Database of the United States (PAD-US) 4.0 Spatial Analysis...

    • data.usgs.gov
    • catalog.data.gov
    Updated Aug 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Protected Areas Database of the United States (PAD-US) 4.0 Spatial Analysis and Statistics [Dataset]. https://data.usgs.gov/datacatalog/data/USGS:652d4c46d34e44db0e2ee447
    Explore at:
    Dataset updated
    Aug 30, 2024
    Dataset authored and provided by
    United States Geological Surveyhttp://www.usgs.gov/
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Time period covered
    Dec 31, 2023
    Area covered
    United States
    Description

    Spatial analysis and statistical summaries of the Protected Areas Database of the United States (PAD-US) provide land managers and decision makers with a general assessment of management intent for biodiversity protection, natural resource management, and outdoor recreation access across the nation. This data release presents results from statistical summaries of the PAD-US 4.0 protection status (by GAP Status Code) and public access status for various land unit boundaries (PAD-US 4.0 Vector Analysis and Summary Statistics). Summary statistics are also available to explore and download from the PAD-US Statistics Dashboard ( https://www.usgs.gov/programs/gap-analysis-project/science/pad-us-statistics ). The vector GIS analysis file, source data used to summarize statistics for areas of interest to stakeholders (National, State, Department of the Interior Region, Congressional District, County, EcoRegions I-IV, Urban Areas, Landscape Conservation Cooperative), and complete Summary ...

  14. NIST Statistical Reference Datasets - SRD 140

    • catalog.data.gov
    • datasets.ai
    • +2more
    Updated Jul 29, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2022). NIST Statistical Reference Datasets - SRD 140 [Dataset]. https://catalog.data.gov/dataset/nist-statistical-reference-datasets-srd-140-df30c
    Explore at:
    Dataset updated
    Jul 29, 2022
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Description

    The purpose of this project is to improve the accuracy of statistical software by providing reference datasets with certified computational results that enable the objective evaluation of statistical software. Currently datasets and certified values are provided for assessing the accuracy of software for univariate statistics, linear regression, nonlinear regression, and analysis of variance. The collection includes both generated and 'real-world' data of varying levels of difficulty. Generated datasets are designed to challenge specific computations. These include the classic Wampler datasets for testing linear regression algorithms and the Simon & Lesage datasets for testing analysis of variance algorithms. Real-world data include challenging datasets such as the Longley data for linear regression, and more benign datasets such as the Daniel & Wood data for nonlinear regression. Certified values are 'best-available' solutions. The certification procedure is described in the web pages for each statistical method. Datasets are ordered by level of difficulty (lower, average, and higher). Strictly speaking the level of difficulty of a dataset depends on the algorithm. These levels are merely provided as rough guidance for the user. Producing correct results on all datasets of higher difficulty does not imply that your software will pass all datasets of average or even lower difficulty. Similarly, producing correct results for all datasets in this collection does not imply that your software will do the same for your particular dataset. It will, however, provide some degree of assurance, in the sense that your package provides correct results for datasets known to yield incorrect results for some software. The Statistical Reference Datasets is also supported by the Standard Reference Data Program.

  15. U

    DRAFT REPORT. (MEETING ON INFORMATION SYSTEMS FOR PROJECT PLANNING AND...

    • unido.org
    Updated Jul 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UNIDO (2025). DRAFT REPORT. (MEETING ON INFORMATION SYSTEMS FOR PROJECT PLANNING AND IMPLEMENTATION, VIENNA) (04602.en) [Dataset]. https://www.unido.org/publications/ot/9640754
    Explore at:
    Dataset updated
    Jul 10, 2025
    Dataset authored and provided by
    UNIDO
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    1972
    Description

    UNIDO PUB. DRAFT REPORT ON A MEETING ON MANAGEMENT INFORMATION SYSTEMS FOR PROJECT IMPLEMENTATION - (1) DISCUSSES THE NEED FOR SYSTEMATIC MANAGEMENT OF INDUSTRIAL PROJECTS (2) COVERS METHODS OF CONTROL, INCLUDING DATA COLLECTING AND ELECTRONIC DATA PROCESSING FOR STATISTICAL ANALYSIS, DECISION MAKING, RESOURCES ALLOCATION, MANPOWER AND COSTS ESTIMATES, ETC. (3) APPENDS A DESCRIPTION OF THE PERT MANAGEMENT CONTROL SYSTEM. RECOMMENDATIONS, STATISTICS, FLOW CHARTS, DIAGRAMS, BIBLIOGRAPHY.

  16. Consumer Price Index 2020 - West Bank and Gaza

    • pcbs.gov.ps
    Updated Jan 2, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Palestinian Central Bureau of Statistics (2022). Consumer Price Index 2020 - West Bank and Gaza [Dataset]. https://www.pcbs.gov.ps/PCBS-Metadata-en-v5.2/index.php/catalog/706
    Explore at:
    Dataset updated
    Jan 2, 2022
    Dataset authored and provided by
    Palestinian Central Bureau of Statisticshttp://pcbs.gov.ps/
    Time period covered
    2020
    Area covered
    West Bank, Palestine
    Description

    Abstract

    The Consumer price surveys primarily provide the following: Data on CPI in Palestine covering the West Bank, Gaza Strip and Jerusalem J1 for major and sub groups of expenditure. Statistics needed for decision-makers, planners and those who are interested in the national economy. Contribution to the preparation of quarterly and annual national accounts data.

    Consumer Prices and indices are used for a wide range of purposes, the most important of which are as follows: Adjustment of wages, government subsidies and social security benefits to compensate in part or in full for the changes in living costs. To provide an index to measure the price inflation of the entire household sector, which is used to eliminate the inflation impact of the components of the final consumption expenditure of households in national accounts and to dispose of the impact of price changes from income and national groups. Price index numbers are widely used to measure inflation rates and economic recession. Price indices are used by the public as a guide for the family with regard to its budget and its constituent items. Price indices are used to monitor changes in the prices of the goods traded in the market and the consequent position of price trends, market conditions and living costs. However, the price index does not reflect other factors affecting the cost of living, e.g. the quality and quantity of purchased goods. Therefore, it is only one of many indicators used to assess living costs. It is used as a direct method to identify the purchasing power of money, where the purchasing power of money is inversely proportional to the price index.

    Geographic coverage

    Palestine West Bank Gaza Strip Jerusalem

    Analysis unit

    The target population for the CPI survey is the shops and retail markets such as grocery stores, supermarkets, clothing shops, restaurants, public service institutions, private schools and doctors.

    Universe

    The target population for the CPI survey is the shops and retail markets such as grocery stores, supermarkets, clothing shops, restaurants, public service institutions, private schools and doctors.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    A non-probability purposive sample of sources from which the prices of different goods and services are collected was updated based on the establishment census 2017, in a manner that achieves full coverage of all goods and services that fall within the Palestinian consumer system. These sources were selected based on the availability of the goods within them. It is worth mentioning that the sample of sources was selected from the main cities inside Palestine: Jenin, Tulkarm, Nablus, Qalqiliya, Ramallah, Al-Bireh, Jericho, Jerusalem, Bethlehem, Hebron, Gaza, Jabalia, Dier Al-Balah, Nusseirat, Khan Yunis and Rafah. The selection of these sources was considered to be representative of the variation that can occur in the prices collected from the various sources. The number of goods and services included in the CPI is approximately 730 commodities, whose prices were collected from 3,200 sources. (COICOP) classification is used for consumer data as recommended by the United Nations System of National Accounts (SNA-2008).

    Sampling deviation

    Not apply

    Mode of data collection

    Computer Assisted Personal Interview [capi]

    Research instrument

    A tablet-supported electronic form was designed for price surveys to be used by the field teams in collecting data from different governorates, with the exception of Jerusalem J1. The electronic form is supported with GIS, and GPS mapping technique that allow the field workers to locate the outlets exactly on the map and the administrative staff to manage the field remotely. The electronic questionnaire is divided into a number of screens, namely: First screen: shows the metadata for the data source, governorate name, governorate code, source code, source name, full source address, and phone number. Second screen: shows the source interview result, which is either completed, temporarily paused or permanently closed. It also shows the change activity as incomplete or rejected with the explanation for the reason of rejection. Third screen: shows the item code, item name, item unit, item price, product availability, and reason for unavailability. Fourth screen: checks the price data of the related source and verifies their validity through the auditing rules, which was designed specifically for the price programs. Fifth screen: saves and sends data through (VPN-Connection) and (WI-FI technology).

    In case of the Jerusalem J1 Governorate, a paper form has been designed to collect the price data so that the form in the top part contains the metadata of the data source and in the lower section contains the price data for the source collected. After that, the data are entered into the price program database.

    Cleaning operations

    The price survey forms were already encoded by the project management depending on the specific international statistical classification of each survey. After the researcher collected the price data and sent them electronically, the data was reviewed and audited by the project management. Achievement reports were reviewed on a daily and weekly basis. Also, the detailed price reports at data source levels were checked and reviewed on a daily basis by the project management. If there were any notes, the researcher was consulted in order to verify the data and call the owner in order to correct or confirm the information.

    At the end of the data collection process in all governorates, the data will be edited using the following process: Logical revision of prices by comparing the prices of goods and services with others from different sources and other governorates. Whenever a mistake is detected, it should be returned to the field for correction. Mathematical revision of the average prices for items in governorates and the general average in all governorates. Field revision of prices through selecting a sample of the prices collected from the items.

    Response rate

    Not apply

    Sampling error estimates

    The findings of the survey may be affected by sampling errors due to the use of samples in conducting the survey rather than total enumeration of the units of the target population, which increases the chances of variances between the actual values we expect to obtain from the data if we had conducted the survey using total enumeration. The computation of differences between the most important key goods showed that the variation of these goods differs due to the specialty of each survey. For example, for the CPI, the variation between its goods was very low, except in some cases such as banana, tomato, and cucumber goods that had a high coefficient of variation during 2019 due to the high oscillation in their prices. The variance of the key goods in the computed and disseminated CPI survey that was carried out on the Palestine level was for reasons related to sample design and variance calculation of different indicators since there was a difficulty in the dissemination of results by governorates due to lack of weights. Non-sampling errors are probable at all stages of data collection or data entry. Non-sampling errors include: Non-response errors: the selected sources demonstrated a significant cooperation with interviewers; so, there wasn't any case of non-response reported during 2019. Response errors (respondent), interviewing errors (interviewer), and data entry errors: to avoid these types of errors and reduce their effect to a minimum, project managers adopted a number of procedures, including the following: More than one visit was made to every source to explain the objectives of the survey and emphasize the confidentiality of the data. The visits to data sources contributed to empowering relations, cooperation, and the verification of data accuracy. Interviewer errors: a number of procedures were taken to ensure data accuracy throughout the process of field data compilation: Interviewers were selected based on educational qualification, competence, and assessment. Interviewers were trained theoretically and practically on the questionnaire. Meetings were held to remind interviewers of instructions. In addition, explanatory notes were supplied with the surveys. A number of procedures were taken to verify data quality and consistency and ensure data accuracy for the data collected by a questioner throughout processing and data entry (knowing that data collected through paper questionnaires did not exceed 5%): Data entry staff was selected from among specialists in computer programming and were fully trained on the entry programs. Data verification was carried out for 10% of the entered questionnaires to ensure that data entry staff had entered data correctly and in accordance with the provisions of the questionnaire. The result of the verification was consistent with the original data to a degree of 100%. The files of the entered data were received, examined, and reviewed by project managers before findings were extracted. Project managers carried out many checks on data logic and coherence, such as comparing the data of the current month with that of the previous month, and comparing the data of sources and between governorates. Data collected by tablet devices were checked for consistency and accuracy by applying rules at item level to be checked.

    Data appraisal

    Other technical procedures to improve data quality: Seasonal adjustment processes

  17. Global Project Management Software Market Research and Development Focus...

    • statsndata.org
    excel, pdf
    Updated May 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stats N Data (2025). Global Project Management Software Market Research and Development Focus 2025-2032 [Dataset]. https://www.statsndata.org/report/project-management-software-market-9665
    Explore at:
    pdf, excelAvailable download formats
    Dataset updated
    May 2025
    Dataset authored and provided by
    Stats N Data
    License

    https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order

    Area covered
    Global
    Description

    The Project Management Software market has evolved significantly over the years, becoming an essential tool for businesses seeking to enhance efficiency, collaboration, and productivity in their project management processes. This software enables organizations to plan, execute, and monitor their projects with ease,

  18. Data for: Integrating open education practices with data analysis of open...

    • data.niaid.nih.gov
    • search.dataone.org
    zip
    Updated Jul 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marja Bakermans (2024). Data for: Integrating open education practices with data analysis of open science in an undergraduate course [Dataset]. http://doi.org/10.5061/dryad.37pvmcvst
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 26, 2024
    Dataset provided by
    Worcester Polytechnic Institute
    Authors
    Marja Bakermans
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    The open science movement produces vast quantities of openly published data connected to journal articles, creating an enormous resource for educators to engage students in current topics and analyses. However, educators face challenges using these materials to meet course objectives. I present a case study using open science (published articles and their corresponding datasets) and open educational practices in a capstone course. While engaging in current topics of conservation, students trace connections in the research process, learn statistical analyses, and recreate analyses using the programming language R. I assessed the presence of best practices in open articles and datasets, examined student selection in the open grading policy, surveyed students on their perceived learning gains, and conducted a thematic analysis on student reflections. First, articles and datasets met just over half of the assessed fairness practices, but this increased with the publication date. There was a marginal difference in how assessment categories were weighted by students, with reflections highlighting appreciation for student agency. In course content, students reported the greatest learning gains in describing variables, while collaborative activities (e.g., interacting with peers and instructor) were the most effective support. The most effective tasks to facilitate these learning gains included coding exercises and team-led assignments. Autocoding of student reflections identified 16 themes, and positive sentiments were written nearly 4x more often than negative sentiments. Students positively reflected on their growth in statistical analyses, and negative sentiments focused on how limited prior experience with statistics and coding made them feel nervous. As a group, we encountered several challenges and opportunities in using open science materials. I present key recommendations, based on student experiences, for scientists to consider when publishing open data to provide additional educational benefits to the open science community. Methods Article and dataset fairness To assess the utility of open articles and their datasets as an educational tool in an undergraduate academic setting, I measured the congruence of each pair to a set of best practices and guiding principles. I assessed ten guiding principles and best practices (Table 1), where each category was scored ‘1’ or ‘0’ based on whether it met that criteria, with a total possible score of ten. Open grading policies Students were allowed to specify the percentage weight for each assessment category in the course, including 1) six coding exercises (Exercises), 2) one lead exercise (Lead Exercise), 3) fourteen annotation assignments of readings (Annotations), 4) one final project (Final Project), 5) five discussion board posts and a statement of learning reflection (Discussion), and 6) attendance and participation (Participation). I examined if assessment categories (independent variable) were weighted (dependent variable) differently by students using an analysis of variance (ANOVA) and examined pairwise differences with Tukey HSD. Assessment of perceived learning gains I used a student assessment of learning gains (SALG) survey to measure students’ perceptions of learning gains related to course objectives (Seymour et al. 2000). This Likert-scale survey provided five response categories ranging from ‘no gains’ to ‘great gains’ in learning and the option of open responses in each category. A summary report that converted Likert responses to numbers and calculated descriptive statistics was produced from the SALG instrument website. Student reflections In student reflections, I examined the frequency of the 100 most frequent words, with stop words excluded and a minimum length of four (letters), both “with synonyms” and “with generalizations”. Due to this paper's explorative nature, I used autocoding to identify students' broad themes and sentiments in their reflections. Autocoding examines the sentiment of each word and scores it as positive, neutral, mixed, or negative. In this process, I compared how students felt about each theme, focusing on positive (i.e., satisfaction) and negative (i.e., dissatisfaction) sentiments. The relationship of how sentiment was coded to themes was visualized in a treemap, where the size of a block is relative to the number of references for that code. All reflection processing and analyses were performed in NVivo 14 (Windows). All data were collected with institutional IRB approval (IRB-24–0314). All statistical analyses were performed in R (ver. 4.3.1; R Core Team 2023).

  19. o

    Data from: Data availability. Multivariate data analysis. Validation of an...

    • explore.openaire.eu
    • zenodo.org
    Updated May 25, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrés Cisneros Barahona; Luis Márques Molías; NIcolay Samaniego Erazo; Catalina Mejía Granizo; Gabriela de la Cruz Fernández (2023). Data availability. Multivariate data analysis. Validation of an instrument for the evaluation of teaching digital competence. [Dataset]. http://doi.org/10.5281/zenodo.10055380
    Explore at:
    Dataset updated
    May 25, 2023
    Authors
    Andrés Cisneros Barahona; Luis Márques Molías; NIcolay Samaniego Erazo; Catalina Mejía Granizo; Gabriela de la Cruz Fernández
    Description

    Data availability. Multivariate data analysis. Validation of an instrument for the evaluation of teaching digital competence. SPSS DATA. Multivariate data analysis. Validation of an instrument for the evaluation of teaching digital competence (spss data.sav). The data presented in this file contains the data imported wiyh the Software IBM SPSS Statistics, versión 28.0.1.1(15). EXCEL DATA. Multivariate data analysis. Validation of an instrument for the evaluation of teaching digital competence (spss data.sav). The data presented in this file contains the data imported wiyh the Software IBM SPSS Statistics, versión 28.0.1.1(15). Data of Project factorial.xlsx (The data presented in this file contains the results of the statistical analysis carried out with the Software Microsoft Excel). Data Project reliability.xlsx (The data presented in this file contains the results of the statistical analysis carried out with the Software Microsoft Excel). FIGURES. Multivariate data analysis. Validation of an instrument for the evaluation of teaching digital competence (Figure 1.jpeg, Figure 2.jpeg, Figure 3 and Figure 4.jpeg).

  20. f

    Rmd code data import federated.

    • plos.figshare.com
    txt
    Updated Nov 14, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Romain Jégou; Camille Bachot; Charles Monteil; Eric Boernert; Jacek Chmiel; Mathieu Boucher; David Pau (2024). Rmd code data import federated. [Dataset]. http://doi.org/10.1371/journal.pone.0312697.s005
    Explore at:
    txtAvailable download formats
    Dataset updated
    Nov 14, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Romain Jégou; Camille Bachot; Charles Monteil; Eric Boernert; Jacek Chmiel; Mathieu Boucher; David Pau
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MethodsThe objective of this project was to determine the capability of a federated analysis approach using DataSHIELD to maintain the level of results of a classical centralized analysis in a real-world setting. This research was carried out on an anonymous synthetic longitudinal real-world oncology cohort randomly splitted in three local databases, mimicking three healthcare organizations, stored in a federated data platform integrating DataSHIELD. No individual data transfer, statistics were calculated simultaneously but in parallel within each healthcare organization and only summary statistics (aggregates) were provided back to the federated data analyst.Descriptive statistics, survival analysis, regression models and correlation were first performed on the centralized approach and then reproduced on the federated approach. The results were then compared between the two approaches.ResultsThe cohort was splitted in three samples (N1 = 157 patients, N2 = 94 and N3 = 64), 11 derived variables and four types of analyses were generated. All analyses were successfully reproduced using DataSHIELD, except for one descriptive variable due to data disclosure limitation in the federated environment, showing the good capability of DataSHIELD. For descriptive statistics, exactly equivalent results were found for the federated and centralized approaches, except some differences for position measures. Estimates of univariate regression models were similar, with a loss of accuracy observed for multivariate models due to source database variability.ConclusionOur project showed a practical implementation and use case of a real-world federated approach using DataSHIELD. The capability and accuracy of common data manipulation and analysis were satisfying, and the flexibility of the tool enabled the production of a variety of analyses while preserving the privacy of individual data. The DataSHIELD forum was also a practical source of information and support. In order to find the right balance between privacy and accuracy of the analysis, set-up of privacy requirements should be established prior to the start of the analysis, as well as a data quality review of the participating healthcare organization.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2015). Departmental distribution of big data projects 2015 [Dataset]. https://www.statista.com/statistics/491254/big-data-departmental-use/
Organization logo

Departmental distribution of big data projects 2015

Explore at:
Dataset updated
Jul 29, 2015
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Dec 2014 - Feb 2015
Area covered
North America, Europe, Worldwide
Description

This graph presents the results of a survey, conducted by BARC in 2014/15, into the current and planned distribution of big data projects within companies. At the beginning of 2015, ** percent of respondents indicated that their company's marketing department had already begun using big data analysis.

Search
Clear search
Close search
Google apps
Main menu