53 datasets found
  1. Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm

    • plos.figshare.com
    docx
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tracey L. Weissgerber; Natasa M. Milic; Stacey J. Winham; Vesna D. Garovic (2023). Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm [Dataset]. http://doi.org/10.1371/journal.pbio.1002128
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Tracey L. Weissgerber; Natasa M. Milic; Stacey J. Winham; Vesna D. Garovic
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Figures in scientific publications are critically important because they often show the data supporting key findings. Our systematic review of research articles published in top physiology journals (n = 703) suggests that, as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies. Papers rarely included scatterplots, box plots, and histograms that allow readers to critically evaluate continuous data. Most papers presented continuous data in bar and line graphs. This is problematic, as many different data distributions can lead to the same bar or line graph. The full data may suggest different conclusions from the summary statistics. We recommend training investigators in data presentation, encouraging a more complete presentation of data, and changing journal editorial policies. Investigators can quickly make univariate scatterplots for small sample size studies using our Excel templates.

  2. Making the Most of Statistical Analyses: Improving Interpretation and...

    • icpsr.umich.edu
    Updated Jun 27, 2002
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    King, Gary; Tomz, Michael; Wittenberg, Jason (2002). Making the Most of Statistical Analyses: Improving Interpretation and Presentation [Dataset]. http://doi.org/10.3886/ICPSR01255.v1
    Explore at:
    Dataset updated
    Jun 27, 2002
    Dataset provided by
    Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
    Authors
    King, Gary; Tomz, Michael; Wittenberg, Jason
    License

    https://www.icpsr.umich.edu/web/ICPSR/studies/1255/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/1255/terms

    Description

    Social scientists rarely take full advantage of the information available in their statistical results. As a consequence, they miss opportunities to present quantities that are of greatest substantive interest for their research, and to express their degree of certainty about these quantities. In this article, the authors offer an approach, built on the technique of statistical simulation, to extract the currently overlooked information from any statistical method, no matter how complicated, and to interpret and present it in a reader-friendly manner. Using this technique requires some sophistication, but its application should make the results of quantitative articles more informative and transparent to all. To illustrate their recommendations, the authors replicate the results of several published works, showing in each case how the authors' own conclusions could be expressed more sharply and informatively, and how this approach reveals important new information about the research questions at hand.

  3. e

    Classification & presentation of Data

    • paper.erudition.co.in
    html
    Updated Nov 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Einetic (2025). Classification & presentation of Data [Dataset]. https://paper.erudition.co.in/makaut/bachelor-in-business-administration-2020-2021/3/business-research-methods
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Nov 23, 2025
    Dataset authored and provided by
    Einetic
    License

    https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms

    Description

    Question Paper Solutions of chapter Classification & presentation of Data of Business Research Methods, 3rd Semester , Bachelor in Business Administration 2020 - 2021

  4. Ten quick tips for getting the most scientific value out of numerical data

    • plos.figshare.com
    pdf
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lars Ole Schwen; Sabrina Rueschenbaum (2023). Ten quick tips for getting the most scientific value out of numerical data [Dataset]. http://doi.org/10.1371/journal.pcbi.1006141
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Lars Ole Schwen; Sabrina Rueschenbaum
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Most studies in the life sciences and other disciplines involve generating and analyzing numerical data of some type as the foundation for scientific findings. Working with numerical data involves multiple challenges. These include reproducible data acquisition, appropriate data storage, computationally correct data analysis, appropriate reporting and presentation of the results, and suitable data interpretation.Finding and correcting mistakes when analyzing and interpreting data can be frustrating and time-consuming. Presenting or publishing incorrect results is embarrassing but not uncommon. Particular sources of errors are inappropriate use of statistical methods and incorrect interpretation of data by software. To detect mistakes as early as possible, one should frequently check intermediate and final results for plausibility. Clearly documenting how quantities and results were obtained facilitates correcting mistakes. Properly understanding data is indispensable for reaching well-founded conclusions from experimental results. Units are needed to make sense of numbers, and uncertainty should be estimated to know how meaningful results are. Descriptive statistics and significance testing are useful tools for interpreting numerical results if applied correctly. However, blindly trusting in computed numbers can also be misleading, so it is worth thinking about how data should be summarized quantitatively to properly answer the question at hand. Finally, a suitable form of presentation is needed so that the data can properly support the interpretation and findings. By additionally sharing the relevant data, others can access, understand, and ultimately make use of the results.These quick tips are intended to provide guidelines for correctly interpreting, efficiently analyzing, and presenting numerical data in a useful way.

  5. Summary descriptive statistics of TIMSS dataset.

    • plos.figshare.com
    xls
    Updated Feb 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan Fries; Sandra Oberleiter; Jakob Pietschnig (2024). Summary descriptive statistics of TIMSS dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0297033.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Feb 2, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Jonathan Fries; Sandra Oberleiter; Jakob Pietschnig
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Regression ranks among the most popular statistical analysis methods across many research areas, including psychology. Typically, regression coefficients are displayed in tables. While this mode of presentation is information-dense, extensive tables can be cumbersome to read and difficult to interpret. Here, we introduce three novel visualizations for reporting regression results. Our methods allow researchers to arrange large numbers of regression models in a single plot. Using regression results from real-world as well as simulated data, we demonstrate the transformations which are necessary to produce the required data structure and how to subsequently plot the results. The proposed methods provide visually appealing ways to report regression results efficiently and intuitively. Potential applications range from visual screening in the model selection stage to formal reporting in research papers. The procedure is fully reproducible using the provided code and can be executed via free-of-charge, open-source software routines in R.

  6. m

    Python Script for Simulating, Analyzing, and Evaluating Statistical...

    • data.mendeley.com
    Updated Jun 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kabir Bindawa Abdullahi (2025). Python Script for Simulating, Analyzing, and Evaluating Statistical Mirroring-Based Ordinalysis and Other Estimators [Dataset]. http://doi.org/10.17632/zdhy83cv4p.3
    Explore at:
    Dataset updated
    Jun 5, 2025
    Authors
    Kabir Bindawa Abdullahi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This presentation involves simulation and data generation processes, data analysis, and evaluation of classical and proposed methods of ordinal data analysis. All the parameters and metrics used are based on the methodology presented in the article titled "Statistical Mirroring-Based Ordinalysis: A Sensitive, Robust, Efficient, and Ordinality-Preserving Descriptive Method for Analyzing Ordinal Assessment Data," authored by Kabir Bindawa Abdullahi in 2024. For further details, you can follow the paper's publication submitted to MethodsX Elsevier Publishing.

    The validation process of ordinal data analysis methods (estimators) has the following specifications: 
    

    • Simulation process: Monte Carlo simulation. • Data generation distributions: categorical, normal, and multivariate model distributions. • Data analysis: - Classical estimators: sum, average, and median ordinal score. - Proposed estimators: Kabirian coefficient of proximity, probability of proximity, probability of deviation.
    • Evaluation metrics: - Overall estimates average. - Overall estimates median. - Efficiency (by statistical absolute meanic deviation method). - Sensitivity (by entropy method). - Normality, Mann-Whitney-U test, and others.

  7. m

    Python Script for Simulating, Analyzing, and Evaluating Statistical...

    • data.mendeley.com
    Updated May 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kabir Bindawa Abdullahi (2025). Python Script for Simulating, Analyzing, and Evaluating Statistical Mirroring-Based Ordinalysis and Other Estimators [Dataset]. http://doi.org/10.17632/zdhy83cv4p.2
    Explore at:
    Dataset updated
    May 9, 2025
    Authors
    Kabir Bindawa Abdullahi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This presentation involves simulation and data generation processes, data analysis, and evaluation of classical and proposed methods of ordinal data analysis. All the parameters and metrics used are based on the method methodology presented in the article titled "Statistical Mirroring-Based Ordinalysis: A Sensitive, Robust, Efficient, and Ordinality-Preserving Method for Analyzing Ordinal Assessment Data" authored by Kabir Bindawa Abdullahi in 2024. For further details, you can follow the paper's publication submitted to MethodsX Elsevier Publishing. The validation process of ordinal data analysis methods (estimators) has the following specifications: • Simulation process: Monte Carlo simulation. • Data generation distributions: categorical, normal and multivariate model distributions. • Data analysis: - Classical estimators: sum, average and median ordinal score. - Proposed estimators: Kabirian coefficient of proximity, probability of proximity, probability of deviation.
    • Evaluation metrics: - Overall estimates average. - Overall estimates median. - Efficiency (by statistical absolute meanic deviation method). - Sensitivity (by entropy method). - Normality, Mann-Whitney-U test and others.

  8. d

    Data from: How to Cite Statistics Canada Products

    • search.dataone.org
    Updated Dec 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gaëtan Drolet (2023). How to Cite Statistics Canada Products [Dataset]. http://doi.org/10.5683/SP3/WFVNOW
    Explore at:
    Dataset updated
    Dec 28, 2023
    Dataset provided by
    Borealis
    Authors
    Gaëtan Drolet
    Description

    This presentations reviews the citation practices at Statistics Canada and in the data community. A demonstration of the web tool developed from the proposal presented at the DLI External Advisory Committee last May 2004 follows. Finally, the presentation examines the potential benefits of a citation guide for STC and users.

  9. d

    Data from: Citation Standards for Statistics and Data

    • dataone.org
    Updated Dec 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gaëtan Drolet (2023). Citation Standards for Statistics and Data [Dataset]. http://doi.org/10.5683/SP3/GX91YV
    Explore at:
    Dataset updated
    Dec 28, 2023
    Dataset provided by
    Borealis
    Authors
    Gaëtan Drolet
    Description

    This presentation reviews the citation practices of Statistics Canada and others in the data community. This is followed by a discussion of the benefits of a citation guide for users.

  10. I

    Global AI Presentation Tools Market Industry Best Practices 2025-2032

    • statsndata.org
    excel, pdf
    Updated Nov 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stats N Data (2025). Global AI Presentation Tools Market Industry Best Practices 2025-2032 [Dataset]. https://www.statsndata.org/report/ai-presentation-tools-market-342574
    Explore at:
    excel, pdfAvailable download formats
    Dataset updated
    Nov 2025
    Dataset authored and provided by
    Stats N Data
    License

    https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order

    Area covered
    Global
    Description

    The AI Presentation Tools market is rapidly evolving, reflecting a significant transformation in how businesses and individuals create, deliver, and engage with presentations. As organizations strive for efficiency and creativity in their communication, AI-powered presentation tools have emerged as a vital solution.

  11. f

    UC_vs_US Statistic Analysis.xlsx

    • figshare.com
    xlsx
    Updated Jul 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    F. (Fabiano) Dalpiaz (2020). UC_vs_US Statistic Analysis.xlsx [Dataset]. http://doi.org/10.23644/uu.12631628.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jul 9, 2020
    Dataset provided by
    Utrecht University
    Authors
    F. (Fabiano) Dalpiaz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Sheet 1 (Raw-Data): The raw data of the study is provided, presenting the tagging results for the used measures described in the paper. For each subject, it includes multiple columns: A. a sequential student ID B an ID that defines a random group label and the notation C. the used notation: user Story or use Cases D. the case they were assigned to: IFA, Sim, or Hos E. the subject's exam grade (total points out of 100). Empty cells mean that the subject did not take the first exam F. a categorical representation of the grade L/M/H, where H is greater or equal to 80, M is between 65 included and 80 excluded, L otherwise G. the total number of classes in the student's conceptual model H. the total number of relationships in the student's conceptual model I. the total number of classes in the expert's conceptual model J. the total number of relationships in the expert's conceptual model K-O. the total number of encountered situations of alignment, wrong representation, system-oriented, omitted, missing (see tagging scheme below) P. the researchers' judgement on how well the derivation process explanation was explained by the student: well explained (a systematic mapping that can be easily reproduced), partially explained (vague indication of the mapping ), or not present.

    Tagging scheme:
    Aligned (AL) - A concept is represented as a class in both models, either
    

    with the same name or using synonyms or clearly linkable names; Wrongly represented (WR) - A class in the domain expert model is incorrectly represented in the student model, either (i) via an attribute, method, or relationship rather than class, or (ii) using a generic term (e.g., user'' instead ofurban planner''); System-oriented (SO) - A class in CM-Stud that denotes a technical implementation aspect, e.g., access control. Classes that represent legacy system or the system under design (portal, simulator) are legitimate; Omitted (OM) - A class in CM-Expert that does not appear in any way in CM-Stud; Missing (MI) - A class in CM-Stud that does not appear in any way in CM-Expert.

    All the calculations and information provided in the following sheets
    

    originate from that raw data.

    Sheet 2 (Descriptive-Stats): Shows a summary of statistics from the data collection,
    

    including the number of subjects per case, per notation, per process derivation rigor category, and per exam grade category.

    Sheet 3 (Size-Ratio):
    

    The number of classes within the student model divided by the number of classes within the expert model is calculated (describing the size ratio). We provide box plots to allow a visual comparison of the shape of the distribution, its central value, and its variability for each group (by case, notation, process, and exam grade) . The primary focus in this study is on the number of classes. However, we also provided the size ratio for the number of relationships between student and expert model.

    Sheet 4 (Overall):
    

    Provides an overview of all subjects regarding the encountered situations, completeness, and correctness, respectively. Correctness is defined as the ratio of classes in a student model that is fully aligned with the classes in the corresponding expert model. It is calculated by dividing the number of aligned concepts (AL) by the sum of the number of aligned concepts (AL), omitted concepts (OM), system-oriented concepts (SO), and wrong representations (WR). Completeness on the other hand, is defined as the ratio of classes in a student model that are correctly or incorrectly represented over the number of classes in the expert model. Completeness is calculated by dividing the sum of aligned concepts (AL) and wrong representations (WR) by the sum of the number of aligned concepts (AL), wrong representations (WR) and omitted concepts (OM). The overview is complemented with general diverging stacked bar charts that illustrate correctness and completeness.

    For sheet 4 as well as for the following four sheets, diverging stacked bar
    

    charts are provided to visualize the effect of each of the independent and mediated variables. The charts are based on the relative numbers of encountered situations for each student. In addition, a "Buffer" is calculated witch solely serves the purpose of constructing the diverging stacked bar charts in Excel. Finally, at the bottom of each sheet, the significance (T-test) and effect size (Hedges' g) for both completeness and correctness are provided. Hedges' g was calculated with an online tool: https://www.psychometrica.de/effect_size.html. The independent and moderating variables can be found as follows:

    Sheet 5 (By-Notation):
    

    Model correctness and model completeness is compared by notation - UC, US.

    Sheet 6 (By-Case):
    

    Model correctness and model completeness is compared by case - SIM, HOS, IFA.

    Sheet 7 (By-Process):
    

    Model correctness and model completeness is compared by how well the derivation process is explained - well explained, partially explained, not present.

    Sheet 8 (By-Grade):
    

    Model correctness and model completeness is compared by the exam grades, converted to categorical values High, Low , and Medium.

  12. Data for the paper "Nonoverlap proportion and the representation of...

    • figshare.com
    txt
    Updated Jan 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanley Luck (2022). Data for the paper "Nonoverlap proportion and the representation of point-biserial variation" [Dataset]. http://doi.org/10.6084/m9.figshare.11591334.v3
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jan 23, 2022
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Stanley Luck
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data in these files were retrieved from the Nursing Home Compare (NHC) data repository, https://data.medicare.gov/data/nursing-home-compare, on April 26, 2019. The data were compiled from the NHC files ProviderInfo_Download.csv, QualityMsrMDS_Download.csv and QualityMsrClaims_Download.csv.

  13. BLS - Current Employment Statistics - CES (National)

    • datalumos.org
    Updated Oct 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States Department of Labor. Bureau of Labor Statistics (2025). BLS - Current Employment Statistics - CES (National) [Dataset]. http://doi.org/10.3886/E238628V1
    Explore at:
    Dataset updated
    Oct 3, 2025
    Dataset provided by
    United States Department of Laborhttp://www.dol.gov/
    Bureau of Labor Statisticshttp://www.bls.gov/
    Authors
    United States Department of Labor. Bureau of Labor Statistics
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    1939 - 2025
    Description

    The Current Employment Statistics (CES) program produces detailed industry estimates of nonfarm employment, hours, and earnings of workers on payrolls. CES National Estimates produces data for the nation, and CES State and Metro Area produces estimates for all 50 States, the District of Columbia, Puerto Rico, the Virgin Islands, and about 450 metropolitan areas and divisions.See ces-schemas.txt for a listing of the columns and a few rows of the zipped tables.See ce.series for a summary of each the tablesSee Handbook of Methods (https://www.bls.gov/opub/hom/ces/presentation.htm) for background of the data collection and presentation.See the pdf files for screenshots of the home pages.

  14. d

    Data from: Access and Dissemination Tools for Aggregate and Microdata

    • dataone.org
    Updated Dec 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chuck Humphrey (2023). Access and Dissemination Tools for Aggregate and Microdata [Dataset]. http://doi.org/10.5683/SP3/OS8I0C
    Explore at:
    Dataset updated
    Dec 28, 2023
    Dataset provided by
    Borealis
    Authors
    Chuck Humphrey
    Description

    This session outlines many data access and dissemination issues. There is also a discussion of the differences between aggregate data and microdata and between statistics and data. This is followed by a presentation on the key features, delivery strategies, and best practices of data extractors.

  15. p

    Population and Housing Census 2005 - Palau

    • microdata.pacificdata.org
    Updated Aug 18, 2013
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office of Planning and Statistics (2013). Population and Housing Census 2005 - Palau [Dataset]. https://microdata.pacificdata.org/index.php/catalog/27
    Explore at:
    Dataset updated
    Aug 18, 2013
    Dataset authored and provided by
    Office of Planning and Statistics
    Time period covered
    2005
    Area covered
    Palau
    Description

    Abstract

    The 2005 Republic of Palau Census of Population and Housing will be used to give a snapshot of Republic of Palau's population and housing at the mid-point of the decade. This Census is also important because it measures the population at the beginning of the implementation of the Compact of Free Association. The information collected in the census is needed to plan for the needs of the population. The government uses the census figures to allocate funds for public services in a wide variety of areas, such as education, housing, and job training. The figures also are used by private businesses, academic institutions, local organizations, and the public in general to understand who we are and what our situation is, in order to prepare better for our future needs.

    The fundamental purpose of a census is to provide information on the size, distribution and characteristics of a country's population. The census data are used for policymaking, planning and administration, as well as in management and evaluation of programmes in education, labour force, family planning, housing, health, transportation and rural development. A basic administrative use is in the demarcation of constituencies and allocation of representation to governing bodies. The census is also an invaluable resource for research, providing data for scientific analysis of the composition and distribution of the population and for statistical models to forecast its future growth. The census provides business and industry with the basic data they need to appraise the demand for housing, schools, furnishings, food, clothing, recreational facilities, medical supplies and other goods and services.

    Geographic coverage

    A hierarchical geographic presentation shows the geographic entities in a superior/subordinate structure in census products. This structure is derived from the legal, administrative, or areal relationships of the entities. The hierarchical structure is depicted in report tables by means of indentation. The following structure is used for the 2005 Census of the Republic of Palau:

    Republic of Palau State Hamlet/Village Enumeration District Block

    Analysis unit

    Individuals Families Households General Population

    Universe

    The Census covered all the households and respective residents in the entire country.

    Kind of data

    Census/enumeration data [cen]

    Sampling procedure

    Not applicable to a full enumeration census.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    The 2005 Palau Census of Population and Housing comprises three parts: 1. Housing - one form for each household 2. Population - one for for each member of the household 3. People who have left home - one form for each household.

    Cleaning operations

    Full scale processing and editing activiities comprised eight separate sessions either with or separately but with remote guidance of the U.S. Census Bureau experts to finalize all datasets for publishing stage.

    Processing operation was handled with care to produce a set of data that describes the population as clearly and accurately as possible. To meet this objective, questionnaires were reviewed and edited during field data collection operations by crew leaders for consistency, completeness, and acceptability. Questionnaires were also reviewed by census clerks in the census office for omissions, certain inconsistencies, and population coverage. For example, write-in entries such as "Don't know" or "NA" were considered unacceptable in certain quantities and/or in conjunction with other data omissions.

    As a result of this review operation, a telephone or personal visit follow-up was made to obtain missing information. Potential coverage errors were included in the follow-up, as well as questionnaires with omissions or inconsistencies beyond the completeness and quality tolerances specified in the review procedures.

    Subsequent to field operations, remaining incomplete or inconsistent information on the questionnaires was assigned using imputation procedures during the final automated edit of the collected data. Allocations, or computer assignments of acceptable data in place of unacceptable entries or blanks, were needed most often when an entry for a given item was lacking or when the information reported for a person or housing unit on that item was inconsistent with other information for that same person or housing unit. As in previous censuses, the general procedure for changing unacceptable entries was to assign an entry for a person or housing unit that was consistent with entries for persons or housing units with similar characteristics. The assignment of acceptable data in lace of blanks or unacceptable entries enhanced the usefulness of the data.

    Another way to make corrections during the computer editing process is substitution. Substitution is the assignment of a full set of characteristics for a person or housing unit. Because of the detailed field operations, substitution was not needed for the 2005 Census.

    Sampling error estimates

    Sampling Error is not applicable to full enumeration censuses.

    Data appraisal

    In any large-scale statistical operation, such as the 2005 Census of the Republic of Palau, human- and machine-related errors were anticipated. These errors are commonly referred to as nonsampling errors. Such errors include not enumerating every household or every person in the population, not obtaining all required information form the respondents, obtaining incorrect or inconsistent information, and recording information incorrectly. In addition, errors can occur during the field review of the enumerators' work, during clerical handling of the census questionnaires, or during the electronic processing of the questionnaires.

    To reduce various types of nonsampling errors, a number of techniques were implemented during the planning, data collection, and data processing activities. Quality assurance methods were used throughout the data collection and processing phases of the census to improve the quality of the data.

  16. f

    Data from: Coverage Score: A Model Agnostic Method to Efficiently Explore...

    • acs.figshare.com
    zip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel J. Woodward; Anthony R. Bradley; Willem P. van Hoorn (2023). Coverage Score: A Model Agnostic Method to Efficiently Explore Chemical Space [Dataset]. http://doi.org/10.1021/acs.jcim.2c00258.s002
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    ACS Publications
    Authors
    Daniel J. Woodward; Anthony R. Bradley; Willem P. van Hoorn
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Selecting the most appropriate compounds to synthesize and test is a vital aspect of drug discovery. Methods like clustering and diversity present weaknesses in selecting the optimal sets for information gain. Active learning techniques often rely on an initial model and computationally expensive semi-supervised batch selection. Herein, we describe a new subset-based selection method, Coverage Score, that combines Bayesian statistics and information entropy to balance representation and diversity to select a maximally informative subset. Coverage Score can be influenced by prior selections and desirable properties. In this paper, subsets selected through Coverage Score are compared against subsets selected through model-independent and model-dependent techniques for several datasets. In drug-like chemical space, Coverage Score consistently selects subsets that lead to more accurate predictions compared to other selection methods. Subsets selected through Coverage Score produced Random Forest models that have a root-mean-square-error up to 12.8% lower than subsets selected at random and can retain up to 99% of the structural dissimilarity of a diversity selection.

  17. BALANCE OF PAYMENTS STATISTICS

    • kaggle.com
    zip
    Updated Feb 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    willian oliveira (2024). BALANCE OF PAYMENTS STATISTICS [Dataset]. https://www.kaggle.com/datasets/willianoliveiragibin/balance-of-payments-statistics/suggestions?status=pending
    Explore at:
    zip(103557597 bytes)Available download formats
    Dataset updated
    Feb 24, 2024
    Authors
    willian oliveira
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16731800%2Fe06b8e2244e50f055658217c601e6854%2Fgraphs12221.png?generation=1708806134708808&alt=media" alt="">

    Table of Contents 1. Overview 2. World and Regional Tables 3. Country Tables 4. Annex I. Analytic Presentation of the Balance of Payments 5. Annex II. Standard Presentation of the Balance of Payments 6. Annex III. Standard Components of the International Investment Position 7. Annex IV. Reporting Currency 8. Annex V. Conceptual Framework of the Balance of Payments and International Investment Position 9. Annex VI. Classification and Standard Components of the Balance of Payments and International Investment Position Overview The electronic release of the Balance of Payments Statistics Yearbook (BOPSY), produced by the International Monetary Fund (IMF), contains balance of payments and international investment position (IIP) data in accordance with the sixth edition of the Balance of Payments and International Investment Position Manual (BPM6) published in 2009. Individual country data for all available periods along with world and regional aggregates for the period 2005-2021 are included in this release. The IMF is grateful for countries’ cooperation in providing comprehensive, timely, and regular 1 Volume 1 of the Yearbook, published in 1949, was based on the first edition of the IMF’s Balance of Payments Manual, issued in 1948; Volumes 2–12 were compiled pursuant to the second edition of the Manual, issued in 1950; Volumes 13–23 were based on the third edition of the Manual, issued in 1961; and Volumes 24– 29 were associated with that edition as well as the Balance of Payments Manual: Supplement to Third Edition, issued in 1973. Volumes 30–45 followed the guidance of the fourth edition of the data to the Fund for re-dissemination. These data support the IMF’s Statistics Department (STA) in its efforts to respond to the analytical and policy needs of the IMF, member countries, and the international community. The electronic release, available through the online database accessible at http://data.imf.org, contains a section on the World and Regional Tables, which presents 21 World and Regional Tables for major components of the balance of payments and IIP accounts. Individual country tables covering annual balance of payments and IIP data of individual countries, jurisdictions, and other reporting entities, as well as Balance of Payments and IIP metadata are also published through the online database.. The release of the Yearbook based on BPM61 was endorsed by the IMF’s Committee on Balance of Payments Statistics. The BPM6 provides updated international standards covering the methodologies for compiling, and the presentation of, balance of payments and IIP statistics. It incorporates clarifications and improvements reflecting significant developments and expansion in globalized international trade arrangements and financial markets that had been identified since the release of the fifth edition of the Balance of Payments Manual (BPM5) in 1993. Moreover, the linkages to and consistency with other macroeconomic statistics are maintained and enhanced through the parallel update of the OECD Benchmark Definition of Foreign Direct Investment and the System of National Accounts. Manual, published in 1977. Volumes 46–62 were presented in accordance with the standard components of the BPM5. However, the standard components changed with the publication of Financial Derivatives, a Supplement to the Fifth Edition (1993) of the Balance of Payments Manual, published in 2000 and amended in 2002. As noted, Volume 63 and subsequent volumes were presented based on BPM6. Beginning 2019, BOPSY is released in electronic format only. International Monetary Fund: Balance of Payments Statistics: Introductory Notes, as of November 2023 3 For many decades, the IMF has published data on a basis that is consistent across countries and across time periods. Such data consistency is required to perform cross-country data comparisons, track growth rates across time, and produce regional or global data aggregates. Data conversion work undertaken by IMF staff, in close consultation with IMF member countries, has made possible the presentation in the BPM6 format of data for the few economies that have not yet implemented BPM6. To assist users in understanding the impact of conversion to BPM6, as well as in understanding major methodological changes from BPM5 to BPM6, see FAQs on Conversion from BPM5 to BPM6 The methodologies, compiling practices, and data sources available through data.imf.org are based on information provided to the IMF by reporting countries. The descriptions are intended to enhance user understanding of the coverage, as well as the limitations, of individual country data. At the same time, they are useful in informing compilers of data sources and practices used by their counterparts in other countries.

  18. j

    Data from: Prefectural and municipal statistics on forestry management of...

    • jstagedata.jst.go.jp
    • datasetcatalog.nlm.nih.gov
    xlsx
    Updated Jul 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ichiro Fujikake; Kazuya Tamura (2023). Prefectural and municipal statistics on forestry management of Census of Agriculture and Forestry, 2005, 2010 and 2015 [Dataset]. http://doi.org/10.50933/data.rinrin.17139158.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jul 27, 2023
    Dataset provided by
    FOREST ECONOMIC RESEARCH INSTITUTE
    Authors
    Ichiro Fujikake; Kazuya Tamura
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    This dataset is provided for use in analyzing trends in forestry management in Japan, using individual data from the 2005, 2010, and 2015 Censuses of Agriculture and Forestry. We introduced the unique management categorization suitable for analyzing forestry management., and the results were aggregated by prefecture and municipality. The method of data preparation and presentation is explained in detail in the paper associated with this dataset, so please refer to that paper first before using the data.

  19. I

    Global Video Presentation Software Market Research and Development Focus...

    • statsndata.org
    excel, pdf
    Updated Oct 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stats N Data (2025). Global Video Presentation Software Market Research and Development Focus 2025-2032 [Dataset]. https://www.statsndata.org/report/video-presentation-software-market-228653
    Explore at:
    excel, pdfAvailable download formats
    Dataset updated
    Oct 2025
    Dataset authored and provided by
    Stats N Data
    License

    https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order

    Area covered
    Global
    Description

    The Video Presentation Software market has witnessed remarkable growth in recent years, fueled by the increasing demand for dynamic, engaging content across various industries. As organizations shift from traditional presentation methods to more innovative and visually appealing formats, video presentation software

  20. l

    LScDC Word-Category RIG Matrix

    • figshare.le.ac.uk
    pdf
    Updated Apr 28, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neslihan Suzen (2020). LScDC Word-Category RIG Matrix [Dataset]. http://doi.org/10.25392/leicester.data.12133431.v2
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Apr 28, 2020
    Dataset provided by
    University of Leicester
    Authors
    Neslihan Suzen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    LScDC Word-Category RIG MatrixApril 2020 by Neslihan Suzen, PhD student at the University of Leicester (ns433@leicester.ac.uk / suzenneslihan@hotmail.com)Supervised by Prof Alexander Gorban and Dr Evgeny MirkesGetting StartedThis file describes the Word-Category RIG Matrix for theLeicester Scientific Corpus (LSC) [1], the procedure to build the matrix and introduces the Leicester Scientific Thesaurus (LScT) with the construction process. The Word-Category RIG Matrix is a 103,998 by 252 matrix, where rows correspond to words of Leicester Scientific Dictionary-Core (LScDC) [2] and columns correspond to 252 Web of Science (WoS) categories [3, 4, 5]. Each entry in the matrix corresponds to a pair (category,word). Its value for the pair shows the Relative Information Gain (RIG) on the belonging of a text from the LSC to the category from observing the word in this text. The CSV file of Word-Category RIG Matrix in the published archive is presented with two additional columns of the sum of RIGs in categories and the maximum of RIGs over categories (last two columns of the matrix). So, the file ‘Word-Category RIG Matrix.csv’ contains a total of 254 columns.This matrix is created to be used in future research on quantifying of meaning in scientific texts under the assumption that words have scientifically specific meanings in subject categories and the meaning can be estimated by information gains from word to categories. LScT (Leicester Scientific Thesaurus) is a scientific thesaurus of English. The thesaurus includes a list of 5,000 words from the LScDC. We consider ordering the words of LScDC by the sum of their RIGs in categories. That is, words are arranged in their informativeness in the scientific corpus LSC. Therefore, meaningfulness of words evaluated by words’ average informativeness in the categories. We have decided to include the most informative 5,000 words in the scientific thesaurus. Words as a Vector of Frequencies in WoS CategoriesEach word of the LScDC is represented as a vector of frequencies in WoS categories. Given the collection of the LSC texts, each entry of the vector consists of the number of texts containing the word in the corresponding category.It is noteworthy that texts in a corpus do not necessarily belong to a single category, as they are likely to correspond to multidisciplinary studies, specifically in a corpus of scientific texts. In other words, categories may not be exclusive. There are 252 WoS categories and a text can be assigned to at least 1 and at most 6 categories in the LSC. Using the binary calculation of frequencies, we introduce the presence of a word in a category. We create a vector of frequencies for each word, where dimensions are categories in the corpus.The collection of vectors, with all words and categories in the entire corpus, can be shown in a table, where each entry corresponds to a pair (word,category). This table is build for the LScDC with 252 WoS categories and presented in published archive with this file. The value of each entry in the table shows how many times a word of LScDC appears in a WoS category. The occurrence of a word in a category is determined by counting the number of the LSC texts containing the word in a category. Words as a Vector of Relative Information Gains Extracted for CategoriesIn this section, we introduce our approach to representation of a word as a vector of relative information gains for categories under the assumption that meaning of a word can be quantified by their information gained for categories.For each category, a function is defined on texts that takes the value 1, if the text belongs to the category, and 0 otherwise. For each word, a function is defined on texts that takes the value 1 if the word belongs to the text, and 0 otherwise. Consider LSC as a probabilistic sample space (the space of equally probable elementary outcomes). For the Boolean random variables, the joint probability distribution, the entropy and information gains are defined.The information gain about the category from the word is the amount of information on the belonging of a text from the LSC to the category from observing the word in the text [6]. We used the Relative Information Gain (RIG) providing a normalised measure of the Information Gain. This provides the ability of comparing information gains for different categories. The calculations of entropy, Information Gains and Relative Information Gains can be found in the README file in the archive published. Given a word, we created a vector where each component of the vector corresponds to a category. Therefore, each word is represented as a vector of relative information gains. It is obvious that the dimension of vector for each word is the number of categories. The set of vectors is used to form the Word-Category RIG Matrix, in which each column corresponds to a category, each row corresponds to a word and each component is the relative information gain from the word to the category. In Word-Category RIG Matrix, a row vector represents the corresponding word as a vector of RIGs in categories. We note that in the matrix, a column vector represents RIGs of all words in an individual category. If we choose an arbitrary category, words can be ordered by their RIGs from the most informative to the least informative for the category. As well as ordering words in each category, words can be ordered by two criteria: sum and maximum of RIGs in categories. The top n words in this list can be considered as the most informative words in the scientific texts. For a given word, the sum and maximum of RIGs are calculated from the Word-Category RIG Matrix.RIGs for each word of LScDC in 252 categories are calculated and vectors of words are formed. We then form the Word-Category RIG Matrix for the LSC. For each word, the sum (S) and maximum (M) of RIGs in categories are calculated and added at the end of the matrix (last two columns of the matrix). The Word-Category RIG Matrix for the LScDC with 252 categories, the sum of RIGs in categories and the maximum of RIGs over categories can be found in the database.Leicester Scientific Thesaurus (LScT)Leicester Scientific Thesaurus (LScT) is a list of 5,000 words form the LScDC [2]. Words of LScDC are sorted in descending order by the sum (S) of RIGs in categories and the top 5,000 words are selected to be included in the LScT. We consider these 5,000 words as the most meaningful words in the scientific corpus. In other words, meaningfulness of words evaluated by words’ average informativeness in the categories and the list of these words are considered as a ‘thesaurus’ for science. The LScT with value of sum can be found as CSV file with the published archive. Published archive contains following files:1) Word_Category_RIG_Matrix.csv: A 103,998 by 254 matrix where columns are 252 WoS categories, the sum (S) and the maximum (M) of RIGs in categories (last two columns of the matrix), and rows are words of LScDC. Each entry in the first 252 columns is RIG from the word to the category. Words are ordered as in the LScDC.2) Word_Category_Frequency_Matrix.csv: A 103,998 by 252 matrix where columns are 252 WoS categories and rows are words of LScDC. Each entry of the matrix is the number of texts containing the word in the corresponding category. Words are ordered as in the LScDC.3) LScT.csv: List of words of LScT with sum (S) values. 4) Text_No_in_Cat.csv: The number of texts in categories. 5) Categories_in_Documents.csv: List of WoS categories for each document of the LSC.6) README.txt: Description of Word-Category RIG Matrix, Word-Category Frequency Matrix and LScT and forming procedures.7) README.pdf (same as 6 in PDF format)References[1] Suzen, Neslihan (2019): LSC (Leicester Scientific Corpus). figshare. Dataset. https://doi.org/10.25392/leicester.data.9449639.v2[2] Suzen, Neslihan (2019): LScDC (Leicester Scientific Dictionary-Core). figshare. Dataset. https://doi.org/10.25392/leicester.data.9896579.v3[3] Web of Science. (15 July). Available: https://apps.webofknowledge.com/[4] WoS Subject Categories. Available: https://images.webofknowledge.com/WOKRS56B5/help/WOS/hp_subject_category_terms_tasca.html [5] Suzen, N., Mirkes, E. M., & Gorban, A. N. (2019). LScDC-new large scientific dictionary. arXiv preprint arXiv:1912.06858. [6] Shannon, C. E. (1948). A mathematical theory of communication. Bell system technical journal, 27(3), 379-423.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Tracey L. Weissgerber; Natasa M. Milic; Stacey J. Winham; Vesna D. Garovic (2023). Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm [Dataset]. http://doi.org/10.1371/journal.pbio.1002128
Organization logo

Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm

Explore at:
312 scholarly articles cite this dataset (View in Google Scholar)
docxAvailable download formats
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Tracey L. Weissgerber; Natasa M. Milic; Stacey J. Winham; Vesna D. Garovic
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Figures in scientific publications are critically important because they often show the data supporting key findings. Our systematic review of research articles published in top physiology journals (n = 703) suggests that, as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies. Papers rarely included scatterplots, box plots, and histograms that allow readers to critically evaluate continuous data. Most papers presented continuous data in bar and line graphs. This is problematic, as many different data distributions can lead to the same bar or line graph. The full data may suggest different conclusions from the summary statistics. We recommend training investigators in data presentation, encouraging a more complete presentation of data, and changing journal editorial policies. Investigators can quickly make univariate scatterplots for small sample size studies using our Excel templates.

Search
Clear search
Close search
Google apps
Main menu