Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This book is written for statisticians, data analysts, programmers, researchers, teachers, students, professionals, and general consumers on how to perform different types of statistical data analysis for research purposes using the R programming language. R is an open-source software and object-oriented programming language with a development environment (IDE) called RStudio for computing statistics and graphical displays through data manipulation, modelling, and calculation. R packages and supported libraries provides a wide range of functions for programming and analyzing of data. Unlike many of the existing statistical softwares, R has the added benefit of allowing the users to write more efficient codes by using command-line scripting and vectors. It has several built-in functions and libraries that are extensible and allows the users to define their own (customized) functions on how they expect the program to behave while handling the data, which can also be stored in the simple object system.For all intents and purposes, this book serves as both textbook and manual for R statistics particularly in academic research, data analytics, and computer programming targeted to help inform and guide the work of the R users or statisticians. It provides information about different types of statistical data analysis and methods, and the best scenarios for use of each case in R. It gives a hands-on step-by-step practical guide on how to identify and conduct the different parametric and non-parametric procedures. This includes a description of the different conditions or assumptions that are necessary for performing the various statistical methods or tests, and how to understand the results of the methods. The book also covers the different data formats and sources, and how to test for reliability and validity of the available datasets. Different research experiments, case scenarios and examples are explained in this book. It is the first book to provide a comprehensive description and step-by-step practical hands-on guide to carrying out the different types of statistical analysis in R particularly for research purposes with examples. Ranging from how to import and store datasets in R as Objects, how to code and call the methods or functions for manipulating the datasets or objects, factorization, and vectorization, to better reasoning, interpretation, and storage of the results for future use, and graphical visualizations and representations. Thus, congruence of Statistics and Computer programming for Research.
This child page contains a zipped folder which contains all items necessary to run trend models and produce results published in U.S. Geological Scientific Investigations Report 2021–XXXX [Tatge, W.S., Nustad, R.A., and Galloway, J.M., 2021, Evaluation of Salinity and Nutrient Conditions in the Heart River Basin, North Dakota, 1970-2020: U.S. Geological Survey Scientific Investigations Report 2021-XXXX, XX p.]. To run the R-QWTREND program in R 6 files are required and each is included in this child page: prepQWdataV4.txt, runQWmodelV4XXUEP.txt, plotQWtrendV4XXUEP.txt, qwtrend2018v4.exe, salflibc.dll, and StartQWTrendV4.R (Vecchia and Nustad, 2020). The folder contains: six items required to run the R–QWTREND trend analysis tool; a readme.txt file; a flowtrendData.RData file; an allsiteinfo.table.csv file, a folder called "scripts", and a folder called "waterqualitydata". The "scripts" folder contains the scripts that can be used to reproduce the results found in the USGS Scientific Investigations Report referenced above. The "waterqualitydata" folder contains .csv files with the naming convention of site_ions or site_nuts for major ions and nutrients constituents and contains machine readable files with the water-quality data used for the trend analysis at each site. R–QWTREND is a software package for analyzing trends in stream-water quality. The package is a collection of functions written in R (R Development Core Team, 2019), an open source language and a general environment for statistical computing and graphics. The following system requirements are necessary for using R–QWTREND: • Windows 10 operating system • R (version 3.4 or later; 64 bit recommended) • RStudio (version 1.1.456 or later). An accompanying report (Vecchia and Nustad, 2020) serves as the formal documentation for R–QWTREND. Vecchia, A.V., and Nustad, R.A., 2020, Time-series model, statistical methods, and software documentation for R–QWTREND—An R package for analyzing trends in stream-water quality: U.S. Geological Survey Open-File Report 2020–1014, 51 p., https://doi.org/10.3133/ofr20201014 R Development Core Team, 2019, R—A language and environment for statistical computing: Vienna, Austria, R Foundation for Statistical Computing, accessed December 7, 2020, at https://www.r-project.org.
This R script can be used to analyze SELDM results. The script is specifically tailored for the SELDM simulations used in the publication: Stonewall, A.J., and Granato, G.E., 2018, Assessing potential effects of highway and urban runoff on receiving streams in total maximum daily load watersheds in Oregon using the Stochastic Empirical Loading and Dilution Model: U.S. Geological Survey Scientific Investigations Report 2019-5053, 116 p., https://doi.org/10.3133/sir20195053
This child page contains a zipped folder which contains all items necessary to run trend models and produce results published in U.S. Geological Scientific Investigations Report 2022–XXXX [Nustad, R.A., and Tatge, W.S., 2023, Comprehensive Water-Quality Trend Analysis for Selected Sites and Constituents in the International Souris River Basin, Saskatchewan and Manitoba, Canada and North Dakota, United States, 1970-2020: U.S. Geological Survey Scientific Investigations Report 2023-XXXX, XX p.]. To run the R-QWTREND program in R, 6 files are required and each is included in this child page: prepQWdataV4.txt, runQWmodelV4.txt, plotQWtrendV4.txt, qwtrend2018v4.exe, salflibc.dll, and StartQWTrendV4.R (Vecchia and Nustad, 2020). The folder contains: three items required to run the R–QWTREND trend analysis tool; a README.txt file; a folder called "dataout"; and a folder called "scripts". The "scripts" folder contains the scripts that can be used to reproduce the results found in the USGS Scientific Investigations Report referenced above. The "dataout" folder contains folders for each site that contain .RData files with the naming convention of site_flow for streamflow data and site_qw_XXX depending upon the group of constituents MI, NUT, or TM. R–QWTREND is a software package for analyzing trends in stream-water quality. The package is a collection of functions written in R (R Development Core Team, 2019), an open source language and a general environment for statistical computing and graphics. The following system requirements are necessary for using R–QWTREND: • Windows 10 operating system • R (version 3.4 or later; 64 bit recommended) • RStudio (version 1.1.456 or later). An accompanying report (Vecchia and Nustad, 2020) serves as the formal documentation for R–QWTREND. Vecchia, A.V., and Nustad, R.A., 2020, Time-series model, statistical methods, and software documentation for R–QWTREND—An R package for analyzing trends in stream-water quality: U.S. Geological Survey Open-File Report 2020–1014, 51 p., https://doi.org/10.3133/ofr20201014 R Development Core Team, 2019, R—A language and environment for statistical computing: Vienna, Austria, R Foundation for Statistical Computing, accessed December 7, 2020, at https://www.r-project.org.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Information released in the Department of Education and Training's Annual Report 2021-22 on performance against output performance measures: strategy, review and regulation; early childhood development; school education - primary and secondary; training, higher education; support services delivery; and support for students with disability.\r \r Annual Report published: https://www.vic.gov.au/department-education-annual-reports
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here you can find the model results of the report:
De Felice, M., Busch, S., Kanellopoulos, K., Kavvadias, K. and Hidalgo Gonzalez, I., Power system flexibility in a variable climate, EUR 30184 EN, Publications Office of the European Union, Luxembourg, 2020, ISBN 978-92-76-18183-5 (online), doi:10.2760/75312 (online), JRC120338.
This dataset contains both the raw GDX files generated by the GAMS (<www.gams.com>) optimiser for the Dispa-SET model. Details on the output format and the names of the variables can be found in the Dispa-SET documentation. A markdown notebook in R (and the rendered PDF) contains an example on how to read the GDX files in R.
We also include in this dataset a data frame saved in the Apache Parquet format that can be read both in R and Python.
A description of the methodology and the data sources with the references can be found into the report.
Linked resources
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Untargeted mass spectrometry is a robust tool for biology, but it usually requires a large amount of time on data analysis, especially for system biology. A framework called Multiple-Chemical nebula (MCnebula) was developed herein to facilitate the LC–MS data analysis process by focusing on critical chemical classes and visualization in multiple dimensions. This framework consists of three vital steps as follows: (1) abundance-based classes (ABC) selection algorithm, (2) critical chemical classes to classify “features” (corresponding to compounds), and (3) visualization as multiple Child-Nebulae (network graph) with annotation, chemical classification, and structure. Notably, MCnebula can be used to explore the classification and structural characteristic of unknown compounds beyond the limit of the spectral library. Moreover, it is intuitive and convenient for pathway analysis and biomarker discovery because of its function of ABC selection and visualization. MCnebula was implemented in the R language. A series of tools in R packages were provided to facilitate downstream analysis in an MCnebula-featured way, including feature selection, homology tracing of top features, pathway enrichment analysis, heat map clustering analysis, spectral visualization analysis, chemical information query, and output analysis reports. The broad utility of MCnebula was illustrated by a human-derived serum data set for metabolomics analysis. The results indicated that “Acyl carnitines” were screened out by tracing structural classes of biomarkers, which was consistent with the reference. A plant-derived data set was investigated to achieve a rapid annotation and discovery of compounds in E. ulmoides.
This child page contains a zipped folder which contains all files necessary to run trend models and produce results published in U.S. Geological Scientific Investigations Report 2020–5079 [Nustad, R.A., and Vecchia, A.V., 2020, Water-quality trends for selected sites and constituents in the international Red River of the North Basin, Minnesota and North Dakota, United States, and Manitoba, Canada, 1970–2017: U.S. Geological Survey Scientific Investigations Report 2020–5079, 75 p., https://doi.org/10.3133/sir20205079]. The folder contains: six files required to run the R–QWTREND trend analysis tool; a readme.txt file; an alldata.RData file; a siteinfo_appendix.txt: and a folder called "scripts". R–QWTREND is a software package for analyzing trends in stream-water quality. The package is a collection of functions written in R (R Development Core Team, 2019), an open source language and a general environment for statistical computing and graphics. The following system requirements are necessary for using R–QWTREND: • Windows 10 operating system • R (version 3.4 or later; 64 bit recommended) • RStudio (version 1.1.456 or later). An accompanying report (Vecchia and Nustad, 2020) serves as the formal documentation for R–QWTREND. Vecchia, A.V., and Nustad, R.A., 2020, Time-series model, statistical methods, and software documentation for R–QWTREND—An R package for analyzing trends in stream-water quality: U.S. Geological Survey Open-File Report 2020–1014, 51 p., https://doi.org/10.3133/ofr20201014 R Development Core Team, 2019, R—A language and environment for statistical computing: Vienna, Austria, R Foundation for Statistical Computing, accessed June 12, 2019, at https://www.r-project.org.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Regression ranks among the most popular statistical analysis methods across many research areas, including psychology. Typically, regression coefficients are displayed in tables. While this mode of presentation is information-dense, extensive tables can be cumbersome to read and difficult to interpret. Here, we introduce three novel visualizations for reporting regression results. Our methods allow researchers to arrange large numbers of regression models in a single plot. Using regression results from real-world as well as simulated data, we demonstrate the transformations which are necessary to produce the required data structure and how to subsequently plot the results. The proposed methods provide visually appealing ways to report regression results efficiently and intuitively. Potential applications range from visual screening in the model selection stage to formal reporting in research papers. The procedure is fully reproducible using the provided code and can be executed via free-of-charge, open-source software routines in R.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Reference
Studies who have been using the data (in any form) are required to add the following reference to their report/paper:
@inproceedings{Ye:2014, author = {Ye, Xin and Bunescu, Razvan and Liu, Chang}, title = {Learning to Rank Relevant Files for Bug Reports Using Domain Knowledge}, booktitle = {Proceedings of the 22Nd ACM SIGSOFT International Symposium on Foundations of Software Engineering}, series = {FSE 2014}, year = {2014}, location = {Hong Kong, China}, pages = {689--699}, numpages = {11}, }
About the Data
Overview of Data
This dataset contains bug reports, commit history, and API descriptions of six open source Java projects including Eclipse Platform UI, SWT, JDT, AspectJ, Birt, and Tomcat. This dataset was used to evaluate a learning to rank approach that recommends relevant files for bug reports.
Dataset structure
File list:
Attribute Information
How to obtain the source code
A before-fix version of the source code package needs to be checked out for each bug report. Taking Eclipse Bug 420972 for example, this bug was fixed at commit 657bd90. To check out the before-fix version 2143203 of the source code package, use the commandgit checkout 657bd90~1.
Efficient indexing of the code
If bug 420972 is the first bug processed by the system, we check out its before-fix version 2143203 and index all the corresponding source files. To process another bug report 423588, we need to check out its before-fix version 602d549 of the source code package. For efficiency reasons, we do not need to index all the source files again. Instead, we index only the changed files, i.e., files that were “Added”, “Modified”, or “Deleted” between the two bug reports. The changed files can be obtained as follows:
Paper abstract
When a new bug report is received, developers usually need to reproduce the bug and perform code reviews to find the cause, a process that can be tedious and time consuming. A tool for ranking all the source files of a project with respect to how likely they are to contain the cause of the bug would enable developers to narrow down their search and potentially could lead to a substantial increase in productivity. This paper introduces an adaptive ranking approach that leverages domain knowledge through functional decompositions of source code files into methods, API descriptions of library components used in the code, the bug-fixing history, and the code change history. Given a bug report, the ranking score of each source file is computed as a weighted combination of an array of features encoding domain knowledge, where the weights are trained automatically on previously solved bug reports using a learning-to-rank technique. We evaluated our system on six large scale open source Java projects, using the before-fix version of the project for every bug report. The experimental results show that the newly introduced learning-to-rank approach significantly outperforms two recent state-of-the-art methods in recommending relevant files for bug reports. In particular, our method makes correct recommendations within the top 10 ranked source files for over 70% of the bug reports in the Eclipse Platform and Tomcat projects.
See the package documentation website on dataset.dataobservatory.eu. Report bugs and suggestions on Github: https://github.com/dataobservatory-eu/dataset/issues The primary aim of dataset is to build well-documented data.frames, tibbles or data.tables that follow the W3C Data Cube Vocabulary based on the statistical SDMX data cube model. Such standard R objects (data.fame, data.table, tibble, or well-structured lists like json) become highly interoperable and can be placed into relational databases, semantic web applications, archives, repositories. They follow the FAIR principles: they are findable, accessible, interoperable and reusable. Our datasets: Contain Dublin Core or DataCite (or both) metadata that makes the findable and easier accessible via online libraries. See vignette article Datasets With FAIR Metadata. Their dimensions can be easily and unambigously reduced to triples for RDF applications; they can be easily serialized to, or synchronized with semantic web applications. See vignette article From dataset To RDF. Contain processing metadata that greatly enhance the reproducibility of the results, and the reviewability of the contents of the dataset, including metadata defined by the DDI Alliance, which is particularly helpful for not yet processed data; Follow the datacube model of the Statistical Data and Metadata eXchange, therefore allowing easy refreshing with new data from the source of the analytical work, and particularly useful for datasets containing results of statistical operations in R; Correct exporting with FAIR metadata to the most used file formats and straighforward publication to open science repositories with correct bibliographical and use metadata. See Export And Publish a dataset. Relatively lightweight in dependencies and easily works with data.frame, tibble or data.table R objects.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset supports the 2013-2014 annual report of the Department of Human Services, which details how the department met its objectives and highlights key achievements for the reporting period. This particular dataset is additional information providing a summary of Social housing data including, public rental housing, public housing client profiles, rental stock, stock management program activities, social housing dwellings and changes to Director-owned dwellings during 2013–14,\r \r Social housing assistance focuses on providing adequate, affordable and accessible housing targeted to those in greatest need, delivered cost-effectively and in coordination with support services where required. Social housing assistance is provided on a long or short-term basis. \r \r Long-term social housing assistance includes public rental accommodation, community-managed housing in Director-owned properties and community-owned stock for designated client groups and rental accommodation for low income Victorians with identified support needs. Long-term public rental housing also includes movable units.\r \r In recent years, housing assistance has been increasingly targeted to people in greatest need. Targeting to high need groups has impacts in terms of stock turnover and costs. \r \r Short-term social housing is provided to Victoria’s homeless individuals and families. Clients are assisted under the Crisis Supported Accommodation and Transitional Housing Management programs.\r
This child page contains a zipped folder which contains all of the items necessary to run load estimation using R-LOADEST to produce results that are published in U.S. Geological Survey Investigations Report 2021-XXXX [Tatge, W.S., Nustad, R.A., and Galloway, J.M., 2021, Evaluation of Salinity and Nutrient Conditions in the Heart River Basin, North Dakota, 1970-2020: U.S. Geological Survey Scientific Investigations Report 2021-XXXX, XX p]. The folder contains an allsiteinfo.table.csv file, a "datain" folder, and a "scripts" folder. The allsiteinfo.table.csv file can be used to cross reference the sites with the main report (Tatge and others, 2021). The "datain" folder contains all the input data necessary to reproduce the load estimation results. The naming convention in the "datain" folder is site_MI_rloadest or site_NUT_rloadest for either the major ion loads or the nutrient loads. The .Rdata files are used in the scripts to run the estimations and the .csv files can be used to look at the data. The "scripts" folder contains the written R scripts to produce the results of the load estimation from the main report. R-LOADEST is a software package for analyzing loads in streams and an accompanying report (Runkel and others, 2004) serves as the formal documentation for R-LOADEST. The package is a collection of functions written in R (R Development Core Team, 2019), an open source language and a general environment for statistical computing and graphics. The following system requirements are necessary for producing results: Windows 10 operating system R (version 3.4 or later; 64-bit recommended) RStudio (version 1.1.456 or later) R-LOADEST program (available at https://github.com/USGS-R/rloadest). Runkel, R.L., Crawford, C.G., and Cohn, T.A., 2004, Load Estimator (LOADEST): A FORTRAN Program for Estimating Constituent Loads in Streams and Rivers: U.S. Geological Survey Techniques and Methods Book 4, Chapter A5, 69 p., [Also available at https://pubs.usgs.gov/tm/2005/tm4A5/pdf/508final.pdf.] R Development Core Team, 2019, R—A language and environment for statistical computing: Vienna, Austria, R Foundation for Statistical Computing, accessed December 7, 2020, at https://www.r-project.org.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data for Figure Atlas.16 from Atlas of the Working Group I (WGI) Contribution to the Intergovernmental Panel on Climate Change (IPCC) Sixth Assessment Report (AR6).
Figure Atlas.16 shows changes in annual mean surface air temperature and precipitation from reference regions in Africa for different lines of evidence (CMIP5, CORDEX and CMIP6).
How to cite this dataset
When citing this dataset, please include both the data citation below (under 'Citable as') and the following citations: For the report component from which the figure originates: Gutiérrez, J.M., R.G. Jones, G.T. Narisma, L.M. Alves, M. Amjad, I.V. Gorodetskaya, M. Grose, N.A.B. Klutse, S. Krakovska, J. Li, D. Martínez-Castro, L.O. Mearns, S.H. Mernild, T. Ngo-Duc, B. van den Hurk, and J.-H. Yoon, 2021: Atlas. In Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change [Masson-Delmotte, V., P. Zhai, A. Pirani, S.L. Connors, C. Péan, S. Berger, N. Caud, Y. Chen, L. Goldfarb, M.I. Gomis, M. Huang, K. Leitzell, E. Lonnoy, J.B.R. Matthews, T.K. Maycock, T. Waterfield, O. Yelekçi, R. Yu, and B. Zhou (eds.)]. Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA, pp. 1927–2058, doi:10.1017/9781009157896.021
Iturbide, M. et al., 2021: Repository supporting the implementation of FAIR principles in the IPCC-WG1 Interactive Atlas. Zenodo. Retrieved from: http://doi.org/10.5281/zenodo.5171760
Figure subpanels
The figure has twenty-eight panels, with data provided for all panels in the master GitHub repository linked in the documentation.
List of data provided
This dataset contains global monthly precipitation and near surface temperature aggregated by reference region for model output datasets: - CMIP5, CMIP6 (1850-2100) - CORDEX (1970-2100) These are presented separately for land, sea, and land-sea gridboxes (a single run per model). Regional averages are weighted by the cosine of latitude in all cases. An observation-based product (1979-2016) is also provided in the same format for reference: W5E5 (Lange, 2019).
Data provided in relation to figure
All datasets of monthly precipitation and near surface temperature aggregated by region for CMIP5, CMIP6 and CORDEX models are provided in the labelled directories and regions over Africa are used for the production of this figure.
CMIP5 is the fifth phase of the Coupled Model Intercomparison Project. CMIP6 is the sixth phase of the Coupled Model Intercomparison Project. CORDEX is The Coordinated Regional Downscaling Experiment from the WCRP. SSP1-2.6 is based on SSP1 with low climate change mitigation and adaptation challenges and RCP2.6, a future pathway with a radiative forcing of 2.6 W/m2 in the year 2100. SSP2-4.5 is based on SSP2 with medium challenges to climate change mitigation and adaptation and RCP4.5, a future pathway with a radiative forcing of 4.5 W/m2 in the year 2100. SSP5-8.5 is based on SSP5 where climate change mitigation challenges dominate and RCP8.5, a future pathway with a radiative forcing of 8.5 W/m2 in the year 2100. RCP2.6 is the Representative Concentration Pathway for 2.6 Wm-2 global warming by 2100. RCP4.5 is the Representative Concentration Pathway for 4.5 Wm-2 global warming by 2100. RCP8.5 is the Representative Concentration Pathway for 8.5 Wm-2 global warming by 2100. GWL stands for global warming levels. JJAS and DJFM stand for June, July, August, September and December, January, February, March respectively.
Notes on reproducing the figure from the provided data
Data and figures are produced by the Jupyter Notebooks that live inside the notebooks directory. To reproduce each panel in this figure using the 'regional-scatter-plots_R.ipynb' notebook, in regions: select each of the 9 regions over Africa in the top right panel of the figure, area: 'land', cordex.domain: 'AFR' and scatter.seasons: list of months by number e.g. JJAS: list(c(12, 1, 2),6:9).
The notebooks describe step by step the basic process followed to generate some key figures of the AR6 WGI Atlas and some products underpinning the Interactive Atlas, such as reference regions, global warming levels, aggregated datasets. They include comments and hints to extend the analysis, thus promoting reusability of the r... For full abstract see: https://catalogue.ceda.ac.uk/uuid/b140e520e22e45daa8525d18c1c8cced.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the dataset of 88 OSS projects mined from GitHub used for SRGM analysis of 9 models ** (Goel-Okumoto, Goel-Okumoto S-Shaped, Hossain-Dahiya, Musa-Okumoto, Duane, Weibull, Yamada Exponential, Yamada Raleigh, and Log-Logistic). Results of the analysis, scripts and generated outputs from STRAIT tool are also included. Folder Structure:
Experiment-Summary: boxplots and confidence intervals for all projects, categories, domains, releases; GitHubDatasets: datasets mined from GitHub; STRAIT-Reports: output from the STRAIT tool (published at IEEE/ACM MSR'19*) all the SRGMs outputs from the tool applied to the dataset; RepositoriesResults.ods: list of the repositories included and some summary results;
analyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
From 20 October 2023, COVID-19 datasets will no longer be updated.\r Detailed information is available in the fortnightly NSW Respiratory Surveillance Report: https://www.health.nsw.gov.au/Infectious/covid-19/Pages/reports.aspx.\r Latest national COVID-19 spread, vaccination and treatment metrics are available on the Australian Government Health website: https://www.health.gov.au/topics/covid-19/reporting?language=und\r \r COVID-19 tests by date and postcode, local health district, local government area and result. \r \r The data is for people tested for COVID-19 and is based on location of residence reported at the time of the test. A surge in total number of tests performed on a particular day may occur as the test results are updated in batches and new laboratories gain testing capacity. \r \r The underlying dataset was assessed to measure the risk of identifying an individual and the level of sensitivity of the information gained if it was known that an individual was in the dataset. The dataset was then treated to mitigate these risks, including suppressing and aggregating data.
This data set includes water quality data and microbial community abundance tables for periphyton samples from this project. The data set also includes extensive R markdown code used to process the data and generate the results included in the report. This dataset is associated with the following publication: Hagy, J., R. Devereux, K. Houghton, D. Beddick, T. Pierce, and S. Friedman. Developing Microbial Community Indicators of Nutrient Exposure in Southeast Coastal Plain Streams using a Molecular Approach. US EPA Office of Research and Development, Washington, DC, USA, 2018.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
** RD DATASET ** RD dataset was created by the images from the melanoma community on the internet (https://reddit.com/r/melanoma). Consecutive images were included using a python library (https://github.com/aliparlakci/bulk-downloader-for-reddit) from Jan 25, 2020, to July 30, 2021. The ground truth was voted by four dermatologists and one plastic surgeon while referring to the chief complaint and brief history. A total of 1,282 images (1,201 cases) were finally included. Because of the deleted cases by users, the links of 860 cases are valid in July 2021.
RD_RAW.xlsx The download links and ground truth of the RD dataset are included in this excel file. In addition, the raw data of the AI (Model Dermatology Build2021 - https://modelderm.com) and 32 laypersons were included.
v1_public.zip "v1_public.zip" includes the 1,282 lesional images (full-size). The 24 images that were excluded from the study are also available.
v1_private.zip is not available here. Wide field images are not available here. If the archive is needed for research purpose, please email to Dr. Han Seung Seog (whria78@gmail.com) or Dr Cristian Navarrete-Dechent (ctnavarr@gmail.com).
References - The Degradation of Performance of a State-of-the-art Skin Image Classifier When Applied to Patient-driven Internet Search - Scientific Report (in-press)
** Background normal test with the ISIC images ** ISIC dataset (https://www.isic-archive.com; Gallery -> 2018 JID Editorial images; 99 images; ISIC_0024262 and ISIC_0024261 are identical images and ISIC_0024262 was skipped) was used for the background normal test. We defined 10% area rectangle crop to “specialist-size crop”, and 5% area rectangle crop to “layperson-size crop” a) S-crops.zip: specialist-size crops Format: CROPNO_AGE(0~99)_GENDER(1=male,0=female)[m]_FILENAME.png b) L-crops.zip: layperson-size crops Format: CROPNO_AGE(0~99)_GENDER(1=male,0=female)[m]_FILENAME.png c) result_S.zip: Background normal test result using the specialist-size crops d) result_L.zip; Background normal test result using the layperson-size crops
Reference - Automated Dermatological Diagnosis: Hype or Reality? - https://doi.org/10.1016/j.jid.2018.04.040 - Multiclass Artificial Intelligence in Dermatology: Progress but Still Room for Improvement - https://doi.org/10.1016/j.jid.2020.06.040
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This book is written for statisticians, data analysts, programmers, researchers, teachers, students, professionals, and general consumers on how to perform different types of statistical data analysis for research purposes using the R programming language. R is an open-source software and object-oriented programming language with a development environment (IDE) called RStudio for computing statistics and graphical displays through data manipulation, modelling, and calculation. R packages and supported libraries provides a wide range of functions for programming and analyzing of data. Unlike many of the existing statistical softwares, R has the added benefit of allowing the users to write more efficient codes by using command-line scripting and vectors. It has several built-in functions and libraries that are extensible and allows the users to define their own (customized) functions on how they expect the program to behave while handling the data, which can also be stored in the simple object system.For all intents and purposes, this book serves as both textbook and manual for R statistics particularly in academic research, data analytics, and computer programming targeted to help inform and guide the work of the R users or statisticians. It provides information about different types of statistical data analysis and methods, and the best scenarios for use of each case in R. It gives a hands-on step-by-step practical guide on how to identify and conduct the different parametric and non-parametric procedures. This includes a description of the different conditions or assumptions that are necessary for performing the various statistical methods or tests, and how to understand the results of the methods. The book also covers the different data formats and sources, and how to test for reliability and validity of the available datasets. Different research experiments, case scenarios and examples are explained in this book. It is the first book to provide a comprehensive description and step-by-step practical hands-on guide to carrying out the different types of statistical analysis in R particularly for research purposes with examples. Ranging from how to import and store datasets in R as Objects, how to code and call the methods or functions for manipulating the datasets or objects, factorization, and vectorization, to better reasoning, interpretation, and storage of the results for future use, and graphical visualizations and representations. Thus, congruence of Statistics and Computer programming for Research.