Facebook
TwitterAustralian and New Zealand journal of statistics Impact Factor 2024-2025 - ResearchHelpDesk - The Australian & New Zealand Journal of Statistics is an international journal managed jointly by the Statistical Society of Australia and the New Zealand Statistical Association. Its purpose is to report significant and novel contributions in statistics, ranging across articles on statistical theory, methodology, applications and computing. The journal has a particular focus on statistical techniques that can be readily applied to real-world problems, and on application papers with an Australasian emphasis. Outstanding articles submitted to the journal may be selected as Discussion Papers, to be read at a meeting of either the Statistical Society of Australia or the New Zealand Statistical Association. The main body of the journal is divided into three sections. The Theory and Methods Section publishes papers containing original contributions to the theory and methodology of statistics, econometrics and probability, and seeks papers motivated by a real problem and which demonstrate the proposed theory or methodology in that situation. There is a strong preference for papers motivated by, and illustrated with, real data. The Applications Section publishes papers demonstrating applications of statistical techniques to problems faced by users of statistics in the sciences, government and industry. A particular focus is the application of newly developed statistical methodology to real data and the demonstration of better use of established statistical methodology in an area of application. It seeks to aid teachers of statistics by placing statistical methods in context. The Statistical Computing Section publishes papers containing new algorithms, code snippets, or software descriptions (for open source software only) which enhance the field through the application of computing. Preference is given to papers featuring publically available code and/or data, and to those motivated by statistical methods for practical problems. In addition, suitable review papers and articles of historical and general interest will be considered. The journal also publishes book reviews on a regular basis. Abstracting and Indexing Information Academic Search (EBSCO Publishing) Academic Search Alumni Edition (EBSCO Publishing) Academic Search Elite (EBSCO Publishing) Academic Search Premier (EBSCO Publishing) CompuMath Citation Index (Clarivate Analytics) Current Index to Statistics (ASA/IMS) Journal Citation Reports/Science Edition (Clarivate Analytics) Mathematical Reviews/MathSciNet/Current Mathematical Publications (AMS) RePEc: Research Papers in Economics Science Citation Index Expanded (Clarivate Analytics) SCOPUS (Elsevier) Statistical Theory & Method Abstracts (Zentralblatt MATH) ZBMATH (Zentralblatt MATH)
Facebook
TwitterAustralian and New Zealand journal of statistics Acceptance Rate - ResearchHelpDesk - The Australian & New Zealand Journal of Statistics is an international journal managed jointly by the Statistical Society of Australia and the New Zealand Statistical Association. Its purpose is to report significant and novel contributions in statistics, ranging across articles on statistical theory, methodology, applications and computing. The journal has a particular focus on statistical techniques that can be readily applied to real-world problems, and on application papers with an Australasian emphasis. Outstanding articles submitted to the journal may be selected as Discussion Papers, to be read at a meeting of either the Statistical Society of Australia or the New Zealand Statistical Association. The main body of the journal is divided into three sections. The Theory and Methods Section publishes papers containing original contributions to the theory and methodology of statistics, econometrics and probability, and seeks papers motivated by a real problem and which demonstrate the proposed theory or methodology in that situation. There is a strong preference for papers motivated by, and illustrated with, real data. The Applications Section publishes papers demonstrating applications of statistical techniques to problems faced by users of statistics in the sciences, government and industry. A particular focus is the application of newly developed statistical methodology to real data and the demonstration of better use of established statistical methodology in an area of application. It seeks to aid teachers of statistics by placing statistical methods in context. The Statistical Computing Section publishes papers containing new algorithms, code snippets, or software descriptions (for open source software only) which enhance the field through the application of computing. Preference is given to papers featuring publically available code and/or data, and to those motivated by statistical methods for practical problems. In addition, suitable review papers and articles of historical and general interest will be considered. The journal also publishes book reviews on a regular basis. Abstracting and Indexing Information Academic Search (EBSCO Publishing) Academic Search Alumni Edition (EBSCO Publishing) Academic Search Elite (EBSCO Publishing) Academic Search Premier (EBSCO Publishing) CompuMath Citation Index (Clarivate Analytics) Current Index to Statistics (ASA/IMS) Journal Citation Reports/Science Edition (Clarivate Analytics) Mathematical Reviews/MathSciNet/Current Mathematical Publications (AMS) RePEc: Research Papers in Economics Science Citation Index Expanded (Clarivate Analytics) SCOPUS (Elsevier) Statistical Theory & Method Abstracts (Zentralblatt MATH) ZBMATH (Zentralblatt MATH)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundAlthough peer review is widely considered to be the most credible way of selecting manuscripts and improving the quality of accepted papers in scientific journals, there is little evidence to support its use. Our aim was to estimate the effects on manuscript quality of either adding a statistical peer reviewer or suggesting the use of checklists such as CONSORT or STARD to clinical reviewers or both.Methodology and Principal FindingsInterventions were defined as 1) the addition of a statistical reviewer to the clinical peer review process, and 2) suggesting reporting guidelines to reviewers; with “no statistical expert” and “no checklist” as controls. The two interventions were crossed in a 2×2 balanced factorial design including original research articles consecutively selected, between May 2004 and March 2005, by the Medicina Clinica (Barc) editorial committee. We randomized manuscripts to minimize differences in terms of baseline quality and type of study (intervention, longitudinal, cross-sectional, others). Sample-size calculations indicated that 100 papers provide an 80% power to test a 55% standardized difference. We specified the main outcome as the increment in quality of papers as measured on the Goodman Scale. Two blinded evaluators rated the quality of manuscripts at initial submission and final post peer review version. Of the 327 manuscripts submitted to the journal, 131 were accepted for further review, and 129 were randomized. Of those, 14 that were lost to follow-up showed no differences in initial quality to the followed-up papers. Hence, 115 were included in the main analysis, with 16 rejected for publication after peer review. 21 (18.3%) of the 115 included papers were interventions, 46 (40.0%) were longitudinal designs, 28 (24.3%) cross-sectional and 20 (17.4%) others. The 16 (13.9%) rejected papers had a significantly lower initial score on the overall Goodman scale than accepted papers (difference 15.0, 95% CI: 4.6–24.4). The effect of suggesting a guideline to the reviewers had no effect on change in overall quality as measured by the Goodman scale (0.9, 95% CI: −0.3–+2.1). The estimated effect of adding a statistical reviewer was 5.5 (95% CI: 4.3–6.7), showing a significant improvement in quality.Conclusions and SignificanceThis prospective randomized study shows the positive effect of adding a statistical reviewer to the field-expert peers in improving manuscript quality. We did not find a statistically significant positive effect by suggesting reviewers use reporting guidelines.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This submission includes all assessment data from the paper "Think-aloud interviews: A tool for exploring student statistical reasoning", as well as code necessary to reproduce the figures and tables presented in the paper, and the supplemental materials for the paper. See the file README.txt for full description of all files. Each student interviewed has been given a fictitious name.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
From January 2014, Psychological Science introduced new submission guidelines that encouraged the use of effect sizes, estimation, and meta-analysis (the “new statistics”), required extra detail of methods, and offered badges for use of open science practices. We investigated the use of these practices in empirical articles published by Psychological Science and, for comparison, by the Journal of Experimental Psychology: General, during the period of January 2013 to December 2015. The use of null hypothesis significance testing (NHST) was extremely high at all times and in both journals. In Psychological Science, the use of confidence intervals increased markedly overall, from 28% of articles in 2013 to 70% in 2015, as did the availability of open data (3 to 39%) and open materials (7 to 31%). The other journal showed smaller or much smaller changes. Our findings suggest that journal-specific submission guidelines may encourage desirable changes in authors’ practices.
Facebook
TwitterStatistical data on the number of income tax returns submitted on paper and through Rapid Declaration and Electronic Declaration services.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Objectives: In quantitative research, understanding basic parameters of the study population is key for interpretation of the results. As a result, it is typical for the first table ("Table 1") of a research paper to include summary statistics for the study data. Our objectives are 2-fold. First, we seek to provide a simple, reproducible method for providing summary statistics for research papers in the Python programming language. Second, we seek to use the package to improve the quality of summary statistics reported in research papers.
Materials and Methods: The tableone package is developed following good practice guidelines for scientific computing and all code is made available under a permissive MIT License. A testing framework runs on a continuous integration server, helping to maintain code stability. Issues are tracked openly and public contributions are encouraged.
Results: The tableone software package automatically compiles summary statistics into publishable formats such as CSV, HTML, and LaTeX. An executable Jupyter Notebook demonstrates application of the package to a subset of data from the MIMIC-III database. Tests such as Tukey's rule for outlier detection and Hartigan's Dip Test for modality are computed to highlight potential issues in summarizing the data.
Discussion and Conclusion: We present open source software for researchers to facilitate carrying out reproducible studies in Python, an increasingly popular language in scientific research. The toolkit is intended to mature over time with community feedback and input. Development of a common tool for summarizing data may help to promote good practice when used as a supplement to existing guidelines and recommendations. We encourage use of tableone alongside other methods of descriptive statistics and, in particular, visualization to ensure appropriate data handling. We also suggest seeking guidance from a statistician when using tableone for a research study, especially prior to submitting the study for publication.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This year, from January 1 to December 28, a total of 111 papers were submitted to Journal of Educational Evaluation for Health Professions (JEEHP). Of these 111 papers, 88 were regarded as unsuitable because they did not follow the instructions for manuscript preparation for JEEHP, and some of the papers were eventually rejected or were resubmitted after revision. So far, 34 papers have been published this year, and 21 are in the processing stage. The acceptance rate is currently 27.4%, which is lower than the acceptance rate for 2016.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset ”DBLP-SIGWEB.zip” is derived from September 17, 2015 snapshot of dblp bibliography database. It contains all publications and authors records (available in dblp data and ACM metadata) of 7 ACM SIGWEB conferences (HT, DL, DocEng, WebSci, CIKM, WSDM, UMAP) dblp-sigweb.sql file creates 15 tables in mysql. Followings are the list and description of all attributes and tables used in the dataset. Same attributes used in different tables are listed only once.
Table- papers dblp_key- unique id of each publication in dblp database crossref- unique id of each conference in dblp database doi- unique doi url to publisher page paper_id- unique id of each article in acm digital library (DL) cite_count- number of citations for each article calculated for the papers published in acm DL pages- number of pages for each article in conference proceedings conf_id- unique id of each conference in acm DL funding- funding source information of article. NULL- if no funcding source available
Table- paper_authors author_id- unique id of an author in acm DL affiliation- affiliation information of author for associated article
Table- concepts concept- concepts in an article- tagged by ACM
Table- author_tags author_tag- Keywords/tags provided by authors
cited_by paper_id- acm DL id of article A to be cited cite_id- unique id of article that has cited article A
paper_references refer_id- unique id of the articles (published in sigweb conferences) cited in article A.
Table- conferences dblp_key- unique id of each conference in dblp database year- year of the conference publisher- publisher name of each conference (ACM, Springer, IEEE etc.) title- full name of the conference proceeding doi- unique doi url to the conference publisher page
Table- general_chairs, program_chairs, editors author_id- unique id of author affiliation- affiliation of author
authors_affiliation_history, colleagues author_id- unique id of author A in ACM DL position- index of affiliation- starts from 0 affiliation- lists all affiliations of an author colleague_id- lists acm IDs of all authors publishing papers in ACM co-authored with A.
authors_info author_name- full name of author acquired from ACM publisher page year_first- year of first article publication in ACM year_last- year of recent article publication in ACM pub_count- total number of publciations in ACM DL cite_count- total number of citations mentioned in ACM publciations avg_cite- average number of citation in ACM publications
affiliations_info affiliation- name of the affiliation affiliation_type- type of affiliatioin (Industry, Academic Institution) city, state, country- geographical location of affiliation lat, lng- geocodes of affiliation
Table- acceptance rate conf_id- acm id of conference dblp_key- dblp id of a conference submitted- #submission received in conf X in year Y accepted- #accepted papers in conf X in year Y rate- acceptance rate of conf X in year Y.
Facebook
TwitterElectronic archive and distribution server for research articles providing open access to more than 850,000 e-prints in Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance and Statistics. Users can retrieve papers via the web interface. Registered authors may use the web interface to submit their articles to arXiv. Authors can also update their submissions if they choose, though previous versions remain available. Listings of newly submitted articles in areas of interest are available via the web interface, via RSS feeds, and by subscription to automatic email alerts.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This simple dataset contains publication statistics of Swedish PhD and Licentiate thesis in Software Engineering from 1999 to 2018. The contents of this dataset were discussed in a blog post on https://grischaliebel.de.
The data is offered in two formats, xlsx and csv, but with the same content. Names and affiliation are anonymised in the data set to prevent identification of subjects. In the following, we describe the content of the different columns in the table.
Level: 'lic' for Licentiate theses or 'phd' for PhD theses
Year: The year of publication of the thesis
Included: The total number of papers included in the compilation-style thesis.
Listed: Number of papers listed in addition to the included papers (basically "I have also published these, but they are not relevant to the thesis). Note that we cannot distinguish between cases, where no papers are listed because none are published, or because the author decided not to list them.
IncludedPublished: The amount of included papers that are published or accepted for publication.
IncludedSubmitted: The amount of included papers that in submission/under review.
IncludedPublishedISI: The amount of included, published papers that are in ISI-ranked journals.
IncludedPublishedNonISIJ: The amount of included, published papers that are in non ISI-ranked journals.
IncludedPublishedConf: The amount of included, published papers that are in CORE-ranked conferences (any grade).
IncludedPublishedWS: The amount of included, published papers that are in workshops. Non CORE-ranked conferences are counted as workshops as well.
IncludedPublishedOther: The amount of included, published papers that do not fit in any other category (e.g., book chapters, technical reports).
IncludedSubmitted*: Amount of included, submitted papers broken down by category (Journal, conference, workshop, and other).
ListedPublished*: Amount of listed, published papers broken down by category (ISI/Non-ISI Journal, conference, workshop, and other).
ListedSubmitted*: Amount of listed, submitted papers broken down by category (Journal, conference, workshop, and other).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a research compendium (RC) for the publication "Hyperparameter tuning and performance assessment of statistical and machine-learning algorithms using spatial data".
The code (including figures, appendices and the manuscript) is packed in pathogen-modeling-3.zip or can be found directly in the Github repository.
This RC represents a static snapshot at the time of submission. The Github repository will receive changes after the publication was published.
Data sources
Licenses
All files are shared via the given license with the exception of "soil.tif" which is shared via the ODbL license.
Facebook
Twitterhttps://datacatalog.worldbank.org/public-licenses?fragment=cchttps://datacatalog.worldbank.org/public-licenses?fragment=cc
This dataset contains metadata (title, abstract, date of publication, field, etc) for around 1 million academic articles. Each record contains additional information on the country of study and whether the article makes use of data. Machine learning tools were used to classify the country of study and data use.
Our data source of academic articles is the Semantic Scholar Open Research Corpus (S2ORC) (Lo et al. 2020). The corpus contains more than 130 million English language academic papers across multiple disciplines. The papers included in the Semantic Scholar corpus are gathered directly from publishers, from open archives such as arXiv or PubMed, and crawled from the internet.
We placed some restrictions on the articles to make them usable and relevant for our purposes. First, only articles with an abstract and parsed PDF or latex file are included in the analysis. The full text of the abstract is necessary to classify the country of study and whether the article uses data. The parsed PDF and latex file are important for extracting important information like the date of publication and field of study. This restriction eliminated a large number of articles in the original corpus. Around 30 million articles remain after keeping only articles with a parsable (i.e., suitable for digital processing) PDF, and around 26% of those 30 million are eliminated when removing articles without an abstract. Second, only articles from the year 2000 to 2020 were considered. This restriction eliminated an additional 9% of the remaining articles. Finally, articles from the following fields of study were excluded, as we aim to focus on fields that are likely to use data produced by countries’ national statistical system: Biology, Chemistry, Engineering, Physics, Materials Science, Environmental Science, Geology, History, Philosophy, Math, Computer Science, and Art. Fields that are included are: Economics, Political Science, Business, Sociology, Medicine, and Psychology. This third restriction eliminated around 34% of the remaining articles. From an initial corpus of 136 million articles, this resulted in a final corpus of around 10 million articles.
Due to the intensive computer resources required, a set of 1,037,748 articles were randomly selected from the 10 million articles in our restricted corpus as a convenience sample.
The empirical approach employed in this project utilizes text mining with Natural Language Processing (NLP). The goal of NLP is to extract structured information from raw, unstructured text. In this project, NLP is used to extract the country of study and whether the paper makes use of data. We will discuss each of these in turn.
To determine the country or countries of study in each academic article, two approaches are employed based on information found in the title, abstract, or topic fields. The first approach uses regular expression searches based on the presence of ISO3166 country names. A defined set of country names is compiled, and the presence of these names is checked in the relevant fields. This approach is transparent, widely used in social science research, and easily extended to other languages. However, there is a potential for exclusion errors if a country’s name is spelled non-standardly.
The second approach is based on Named Entity Recognition (NER), which uses machine learning to identify objects from text, utilizing the spaCy Python library. The Named Entity Recognition algorithm splits text into named entities, and NER is used in this project to identify countries of study in the academic articles. SpaCy supports multiple languages and has been trained on multiple spellings of countries, overcoming some of the limitations of the regular expression approach. If a country is identified by either the regular expression search or NER, it is linked to the article. Note that one article can be linked to more than one country.
The second task is to classify whether the paper uses data. A supervised machine learning approach is employed, where 3500 publications were first randomly selected and manually labeled by human raters using the Mechanical Turk service (Paszke et al. 2019).[1] To make sure the human raters had a similar and appropriate definition of data in mind, they were given the following instructions before seeing their first paper:
Each of these documents is an academic article. The goal of this study is to measure whether a specific academic article is using data and from which country the data came.
There are two classification tasks in this exercise:
1. identifying whether an academic article is using data from any country
2. Identifying from which country that data came.
For task 1, we are looking specifically at the use of data. Data is any information that has been collected, observed, generated or created to produce research findings. As an example, a study that reports findings or analysis using a survey data, uses data. Some clues to indicate that a study does use data includes whether a survey or census is described, a statistical model estimated, or a table or means or summary statistics is reported.
After an article is classified as using data, please note the type of data used. The options are population or business census, survey data, administrative data, geospatial data, private sector data, and other data. If no data is used, then mark "Not applicable". In cases where multiple data types are used, please click multiple options.[2]
For task 2, we are looking at the country or countries that are studied in the article. In some cases, no country may be applicable. For instance, if the research is theoretical and has no specific country application. In some cases, the research article may involve multiple countries. In these cases, select all countries that are discussed in the paper.
We expect between 10 and 35 percent of all articles to use data.
The median amount of time that a worker spent on an article, measured as the time between when the article was accepted to be classified by the worker and when the classification was submitted was 25.4 minutes. If human raters were exclusively used rather than machine learning tools, then the corpus of 1,037,748 articles examined in this study would take around 50 years of human work time to review at a cost of $3,113,244, which assumes a cost of $3 per article as was paid to MTurk workers.
A model is next trained on the 3,500 labelled articles. We use a distilled version of the BERT (bidirectional Encoder Representations for transformers) model to encode raw text into a numeric format suitable for predictions (Devlin et al. (2018)). BERT is pre-trained on a large corpus comprising the Toronto Book Corpus and Wikipedia. The distilled version (DistilBERT) is a compressed model that is 60% the size of BERT and retains 97% of the language understanding capabilities and is 60% faster (Sanh, Debut, Chaumond, Wolf 2019). We use PyTorch to produce a model to classify articles based on the labeled data. Of the 3,500 articles that were hand coded by the MTurk workers, 900 are fed to the machine learning model. 900 articles were selected because of computational limitations in training the NLP model. A classification of “uses data” was assigned if the model predicted an article used data with at least 90% confidence.
The performance of the models classifying articles to countries and as using data or not can be compared to the classification by the human raters. We consider the human raters as giving us the ground truth. This may underestimate the model performance if the workers at times got the allocation wrong in a way that would not apply to the model. For instance, a human rater could mistake the Republic of Korea for the Democratic People’s Republic of Korea. If both humans and the model perform the same kind of errors, then the performance reported here will be overestimated.
The model was able to predict whether an article made use of data with 87% accuracy evaluated on the set of articles held out of the model training. The correlation between the number of articles written about each country using data estimated under the two approaches is given in the figure below. The number of articles represents an aggregate total of
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset collects a raw dataset and a processed dataset derived from the raw dataset. There is a document containing the analytical code for statistical analysis of the processed dataset in .Rmd format and .html format. The study examined some aspects of mechanical performance of solid wood composites. We were interested in certain properties of solid wood composites made using different adhesives with different grain orientations at the bondline, then treated at different temperatures prior to testing. Performance was tested by assessing fracture energy and critical fracture energy, lap shear strength, and compression strength of the composites. This document concerns only the fracture properties, which are the focus of the related paper. Notes: * the raw data is provided in this upload, but the processing is not addressed here. * the authors of this document are a subset of the authors of the related paper. * this document and the related data files were uploaded at the time of submission for review. An update providing the doi of the related paper will be provided when it is available.
Facebook
Twitterhttps://www.gnu.org/licenses/gpl-3.0-standalone.htmlhttps://www.gnu.org/licenses/gpl-3.0-standalone.html
Replication Package for "A Study on the Pythonic Functional Constructs' Understandability" to appear at ICSE 2024
Authors: Cyrine Zid, Fiorella Zampetti, Giuliano Antoniol, Massimiliano Di penta
Article Preprint: https://mdipenta.github.io/files/ICSE24_funcExperiment.pdf
Artifacts: https://doi.org/10.5281/zenodo.8191782
License: GPL V3.0
This package contains folders and files with code and data used in the study described in the paper. In the following, we first provide all fields required for the submission, and then report a detailed description of all repository folders.
Artifact Description
Purpose
The artifact is about a controlled experiment aimed at investigating the extent to which Pythonic functional constructs have an impact on source code understandability. The artifact archive contains:
The material to allow replicating the study (see Section Experimental-Material)
Raw quantitative results, working datasets, and scripts to replicate the statistical analyses reported in the paper. Specifically, the executable part of the replication package reproduces figures and tables of the quantitative analysis (RQ1 and RQ2) of the paper starting from the working datasets.
Spreadsheets used for the qualitative analysis (RQ3).
We apply for the following badges:
Available and reusable: because we provide all the material that can be used to replicate the experiment, but also to perform the statistical analyses and the qualitative analyses (spreadsheets, in this case)
Provenance
Paper preprint link: https://mdipenta.github.io/files/ICSE24_funcExperiment.pdf
Artifacts: https://doi.org/10.5281/zenodo.8191782
Data
Results have been obtained by conducting the controlled experiment involving Prolificworkers as participants. Data collection and processing followed a protocol approved by the University ethical board. Note that all data enclosed in the artifact is completely anonymized and does not contain sensible information.
Further details about the provided dataset can be found in the Section Results' directory and files
Setup and Usage (for executable artifacts):
See the Section Scripts to reproduce the results, and instructions for running them
Experiment-Material/
Contains the material used for the experiment, and, specifically, the following subdirectories:
Google-Forms/
Contains (as PDF documents) the questionnaires submitted to the ten experimental groups.
Task-Sources/
Contains, for each experimental group (G-1...G-10), the sources used to produce the Google Forms, and, specifically: - The cover letter (Letter.docx). - A directory for each experimental task (Lambda 1, Lambda 2, Comp 1, Comp 2, MRF 1, MRF 2, Lambda Comparison, Comp Comparison, MRF Comparison). Each directory contains: (i) the exercise text (in both Word and .txt format), the source code snippet, and its .png image to be used in the form. Note: the "Comparison" tasks do not have any exercise as the purpose is always the same, i.e., to compare the (perceived) understandability of the snippets and return the results of the comparison.
Code-Examples-Table1/
Contains the source code snippets used as objects of the study (the same you can find under "Task-Sources/"), named as reported in Table 1.
Results' directory and files
raw-responses/
Contains, as spreadsheets, the raw responses provided by the study participants through Google forms.
raw-results-RQ1/
Contains the raw results for RQ1. Specifically, the directory contains a subdirectory for each group (G1-G10). Each subdirectory contains: - For each user (named using their Prolific IDs, a directory containing, for each question (Q1-Q6) the produced python code (Qn.py) its output (QnR.txt) and its StdErr output (QnErr.txt). - "expected-outputs/": A directory containing the expected outputs for each task (Qn.txt).
working-results/RQ1-RQ2-files-for-statistical-analysis/
Contains three .csv files used as input for conducting the statistical analysis and drawing the graphs for addressing the first two research questions of the study. Specifically:
ConstructUsage.csv contains the declared frequency usage of the three functional constructs object of the study. This file is used to draw Figure 4. The file contains an entry for each participant, reporting the (text-coded) frequency of construct usage for Comprehension, Lambda, and MRF.
RQ1.csv contains the collected data used for the mixed-effect logistic regression relating the use of functional constructs with the correctness of the change task, as well as the logistic regression relating the use of map/reduce/filter functions with the correctness of the change task. The csv file contains an entry for each answer provided by each subject, and features the following columns:
Group: experimental group to which the participant is assigned
User: user ID
Time: task time in seconds
Approvals: number of approvals on previous tasks performed on Prolific
Student: whether the participant declared themselves as a student
Section: section of the questionnaire (lambda, comp, or mrf)
Construct: specific construct being presented (same as "Section" for lambda and comp, for mrf it says whether it is a map, reduce, or filter)
Question: question id, from Q1 to Q6, indicate the ordering of the question
MainFactor: main factor treatment for the given question - "f" for functional, "p" for procedural counterpart
Outcome: TRUE if the task was correctly performed, FALSE otherwise
Complexity: cyclomatic complexity of the construct (empty for mrf)
UsageFrequency: usage frequency of the given construct
RQ1Paired-RQ2.csv contains the collected data used for the ordinal logistic regression of the relationship between the perceived ease of understanding of the functional constructs and (i) participants' usage frequency, and (ii) constructs' complexity (except for map/reduce/filter). The file features a row for each participant, and the columns are the following:
Group: experimental group to which the participant is assigned
User: user ID
Time: task time in seconds
Approvals: number of approvals on previous tasks performed on Prolific
Student: whether the participant declared themselves as a student
LambdaF: result for the change task related to a lambda construct
LambdaP: result for the change task related to the procedural counterpart of a lambda construct
CompF: result for the change task related to a comprehension construct
CompP: result for the change task related to the procedural counterpart of a comprehension construct
MrfF: result for the change task related to an MRF construct
MrfP: result for the change task related to the procedural counterpart of a MRF construct
LambdaComp: perceived understandability level for the comparison task (RQ2) between a lambda and its procedural counterpart
CompComp: perceived understandability level for the comparison task (RQ2) between a comprehension and its procedural counterpart
MrfComp: perceived understandability level for the comparison task (RQ2) between a MRF and its procedural counterpart
LambdaCompCplx: cyclomatic complexity of the lambda construct involved in the comparison task (RQ2)
CompCompCplx: cyclomatic complexity of the comprehension construct involved in the comparison task (RQ2)
MrfCompType: type of MRF construct (map, reduce, or filter) used in the comparison task (RQ2)
LambdaUsageFrequency: self-declared usage frequency on lambda constructs
CompUsageFrequency: self-declared usage frequency on comprehension constructs
MrfUsageFrequency: self-declared usage frequency on MRF constructs
LambdaComparisonAssessment: outcome of the manual assessment of the answer to the "check question" required for the lambda comparison ("yes" means valid, "no" means wrong, "moderatechatgpt" and "extremechatgpt" are the results of GPTZero)
CompComparisonAssessment: as above, but for comprehension
MrfComparisonAssessment: as above, but for MRF
working-results/inter-rater-RQ3-files/
This directory contains four .csv files used as input for computing the inter-rater agreement for the manual labeling used for addressing RQ3. Specifically, you will find one file for each functional construct, i.e., comprehension.csv, lambda.csv, and mrf.csv, and a different file used for highlighting the reasons why participants prefer to use the procedural paradigm, i.e., procedural.csv.
working-results/RQ2ManualValidation.csv
This file contains the results of the manual validation being done to sanitize the answers provided by our participants used for addressing RQ2. Specifically, we coded the behaviour description using four different levels: (i) correct ("yes"), (ii) somewhat correct ("partial"), (iii) wrong ("no"), and (iv) automatically generated. The file features a row for each participant, and the columns are the following:
ID: ID we used to refer the participant in the paper's qualitative analysis
Group: experimental group to which the participant is assigned
ProlificID: user ID
Comparison for lambda construct description: answer provided by the user for the lambda comparison task
Final Classification: our assessment of the lambda comparison answer
Comparison for comprehension description: answer provided by the user for the comprehension comparison task
Final Classification: our assessment of the comprehension comparison answer
Comparison for MRF description: answer provided by the user for the MRF comparison task
Final Classification: our assessment of the MRF comparison answer
working-results/RQ3ManualValidation.xlsx
This file contains the results of the open coding applied to address our third research question. Specifically, you will find four sheets, one for each functional construct and one for the procedural paradigm. Each sheet reports the provided answers together with the categories assigned to them. Each sheet contains the following columns:
ID: ID we used to refer the participant in the paper's qualitative
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset of the paper submitted to Plos One
Facebook
TwitterJournal of Experimental Biology and Agricultural Sciences FAQ - ResearchHelpDesk - Journal of Experimental Biology and Agricultural Sciences - JEBAS is an open-access journal that provides a suitable platform to the researcher for rapid publication of articles in all aspects of biology (microbiology, botany, ethnobotany, parasitology, biochemistry, bioinformatics, molecular biology, physiology, pathology, health sciences, pharmacology, toxicology, biotechnology, environmental biology, food science, nutrition, zoology, enzymology, and endocrinology), agricultural sciences (breeding, plant pathology; genetics, agricultural economics, agricultural biotechnology, agricultural statistics, agricultural physiology, agricultural botany, forestry, agroforestry) and veterinary science (veterinary medicine, veterinary pathology, veterinary microbiology, veterinary genetics, veterinary microbiology, and veterinary parasitology). The journal's major emphasis is laid on Agricultural and biological sciences work. The journal also welcomes articles of inter-disciplinary nature work. The main criteria for acceptance of the articles are novelty, clarity, and significance as relevant to a better understanding of the biological sciences. The Journal welcomes the submission of manuscripts that meet the general criteria of significance and scientific excellence. Papers will be published shortly after acceptance. All articles are peer-reviewed.
Facebook
TwitterTopic Modeling for Research Articles Researchers have access to large online archives of scientific articles. As a consequence, finding relevant articles has become more difficult. Tagging or topic modelling provides a way to give token of identification to research articles which facilitates recommendation and search process.
Given the abstract and title for a set of research articles, predict the topics for each article included in the test set.
Note that a research article can possibly have more than 1 topic. The research article abstracts and titles are sourced from the following 6 topics:
Computer Science
Physics
Mathematics
Statistics
Quantitative Biology
Quantitative Finance
| Column | Description |
|---|---|
| ID | Unique ID for each article |
| TITLE | Title of the research article |
| ABSTRACT | Abstract of the research article |
| Computer Science | Whether article belongs to topic computer science (1/0) |
| Physics | Whether article belongs to topic physics (1/0) |
| Mathematics | Whether article belongs to topic Mathematics (1/0) |
| Statistics | Whether article belongs to topic Statistics (1/0) |
| Quantitative Biology | Whether article belongs to topic Quantitative Biology (1/0) |
| Quantitative Finance | Whether article belongs to topic Quantitative Finance (1/0) |
| ID | Unique ID for each article |
|---|---|
| TITLE | Title of the research article |
| ABSTRACT | Abstract of the research article |
| ID | Unique ID for each article |
|---|---|
| TITLE | Title of the research article |
| ABSTRACT | Abstract of the research article |
| Computer Science | Whether article belongs to topic computer science (1/0) |
| Physics | Whether article belongs to topic physics (1/0) |
| Mathematics | Whether article belongs to topic Mathematics (1/0) |
| Statistics | Whether article belongs to topic Statistics (1/0) |
| Quantitative Biology | Whether article belongs to topic Quantitative Biology (1/0) |
| Quantitative Finance | Whether article belongs to topic Quantitative Finance (1/0) |
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Some recent papers have concluded that authoritarian regimes have faster economic growth than democracies. These supposed growth benefits of autocracies are estimated using data sets in which growth rates rely heavily on data reported by each government. Governments have incentives to exaggerate their economic growth figures, however, and authoritarian regimes may have fewer limitations than democracies on their ability to do so. This paper argues that growth data submitted to international agencies are overstated by authoritarian regimes compared to democracies. If true, it calls into question the estimated relationship between government type and economic growth found in the literature. To measure the degree to which each government's official growth statistics are overstated, the economic growth rates reported in the World Bank's World Development Indicators are compared to a new measure of economic growth based on satellite imaging of nighttime lights. This comparison reveals whether or not dictators exaggerate their true growth rates and by how much. Annual GDP growth rates are estimated to be overstated by 0.5–1.5 percentage points in the statistics that dictatorships report to the World Bank.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
See Sunday et al., 2020 "Novel and familiar object recognition rely on the same ability" for details
Facebook
TwitterAustralian and New Zealand journal of statistics Impact Factor 2024-2025 - ResearchHelpDesk - The Australian & New Zealand Journal of Statistics is an international journal managed jointly by the Statistical Society of Australia and the New Zealand Statistical Association. Its purpose is to report significant and novel contributions in statistics, ranging across articles on statistical theory, methodology, applications and computing. The journal has a particular focus on statistical techniques that can be readily applied to real-world problems, and on application papers with an Australasian emphasis. Outstanding articles submitted to the journal may be selected as Discussion Papers, to be read at a meeting of either the Statistical Society of Australia or the New Zealand Statistical Association. The main body of the journal is divided into three sections. The Theory and Methods Section publishes papers containing original contributions to the theory and methodology of statistics, econometrics and probability, and seeks papers motivated by a real problem and which demonstrate the proposed theory or methodology in that situation. There is a strong preference for papers motivated by, and illustrated with, real data. The Applications Section publishes papers demonstrating applications of statistical techniques to problems faced by users of statistics in the sciences, government and industry. A particular focus is the application of newly developed statistical methodology to real data and the demonstration of better use of established statistical methodology in an area of application. It seeks to aid teachers of statistics by placing statistical methods in context. The Statistical Computing Section publishes papers containing new algorithms, code snippets, or software descriptions (for open source software only) which enhance the field through the application of computing. Preference is given to papers featuring publically available code and/or data, and to those motivated by statistical methods for practical problems. In addition, suitable review papers and articles of historical and general interest will be considered. The journal also publishes book reviews on a regular basis. Abstracting and Indexing Information Academic Search (EBSCO Publishing) Academic Search Alumni Edition (EBSCO Publishing) Academic Search Elite (EBSCO Publishing) Academic Search Premier (EBSCO Publishing) CompuMath Citation Index (Clarivate Analytics) Current Index to Statistics (ASA/IMS) Journal Citation Reports/Science Edition (Clarivate Analytics) Mathematical Reviews/MathSciNet/Current Mathematical Publications (AMS) RePEc: Research Papers in Economics Science Citation Index Expanded (Clarivate Analytics) SCOPUS (Elsevier) Statistical Theory & Method Abstracts (Zentralblatt MATH) ZBMATH (Zentralblatt MATH)