Facebook
TwitterIn this tutorial, we will explore the tidyverse data manipulation package dplyr.
Facebook
TwitterVideo on normalizing microbiome data from the Research Experiences in Microbiomes Network
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Publication
will_INF.txt and go_INF.txt). They represent the co-occurrence frequency of top-200 infinitival collocates for will and be going to respectively across the twenty decades of Corpus of Historical American English (from the 1810s to the 2000s).1-script-create-input-data-raw.r. The codes preprocess and combine the two files into a long format data frame consisting of the following columns: (i) decade, (ii) coll (for "collocate"), (iii) BE going to (for frequency of the collocates with be going to) and (iv) will (for frequency of the collocates with will); it is available in the input_data_raw.txt. 2-script-create-motion-chart-input-data.R processes the input_data_raw.txt for normalising the co-occurrence frequency of the collocates per million words (the COHA size and normalising base frequency are available in coha_size.txt). The output from the second script is input_data_futurate.txt.input_data_futurate.txt contains the relevant input data for generating (i) the static motion chart as an image plot in the publication (using the script 3-script-create-motion-chart-plot.R), and (ii) the dynamic motion chart (using the script 4-script-motion-chart-dynamic.R).Future Constructions.Rproj file to open an RStudio session whose working directory is associated with the contents of this repository.
Facebook
TwitterTrend Detection and Forecasting
This lesson was adapted from educational material written by Dr. Kateri Salk for her Fall 2019 Hydrologic Data Analysis course at Duke University. This is the second part of a two-part exercise focusing on time series analysis.
Introduction
Time series are a special class of dataset, where a response variable is tracked over time. Time series analysis is a powerful technique that can be used to understand the various temporal patterns in our data by decomposing data into different cyclic trends. Time series analysis can also be used to predict how levels of a variable will change in the future, taking into account what has happened in the past.
Learning Objectives
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 2. Model estimates table. Column 1: Taxa names. Column 2: Model coefficients. Column 3: Estimated rate ratios from exponentiated β estimates. For models with interaction terms, the appropriate β estimates are summed before being exponentiated. Column 4: Exponentiated 95% Wald confidence intervals. For models with interaction terms, the appropriate β estimates and covariance terms are summed for the Wald intervals. Column 5: Z-statistics from β estimates. Column 6: False discovery rate adjusted p-value
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Author: Andrew J. Felton
Date: 11/15/2024
This R project contains the primary code and data (following pre-processing in python) used for data production, manipulation, visualization, and analysis, and figure production for the study entitled:
"Global estimates of the storage and transit time of water through vegetation"
Please note that 'turnover' and 'transit' are used interchangeably. Also please note that this R project has been updated multiple times as the analysis has updated throughout the peer review process.
#Data information:
The data folder contains key data sets used for analysis. In particular:
"data/turnover_from_python/updated/august_2024_lc/" contains the core datasets used in this study including global arrays summarizing five year (2016-2020) averages of mean (annual) and minimum (monthly) transit time, storage, canopy transpiration, and number of months of data able as both an array (.nc) or data table (.csv). These data were produced in python using the python scripts found in the "supporting_code" folder. The remaining files in the "data" and "data/supporting_data" folder primarily contain ground-based estimates of storage and transit found in public databases or through a literature search, but have been extensively processed and filtered here. The "supporting_data"" folder also contains annual (2016-2020) MODIS land cover data used in the analysis and contains separate filters containing the original data (.hdf) and then the final process (filtered) data in .nc format. The resulting annual land cover distributions were used in the pre-processing of data in python.
#Code information
Python scripts can be found in the "supporting_code" folder.
Each R script in this project has a role:
"01_start.R": This script sets the working directory, loads in the tidyverse package (the remaining packages in this project are called using the `::` operator), and can run two other scripts: one that loads the customized functions (02_functions.R) and one for importing and processing the key dataset for this analysis (03_import_data.R).
"02_functions.R": This script contains custom functions. Load this using the `source()` function in the 01_start.R script.
"03_import_data.R": This script imports and processes the .csv transit data. It joins the mean (annual) transit time data with the minimum (monthly) transit data to generate one dataset for analysis: annual_turnover_2. Load this using the
`source()` function in the 01_start.R script.
"04_figures_tables.R": This is the main workhouse for figure/table production and supporting analyses. This script generates the key figures and summary statistics used in the study that then get saved in the "manuscript_figures" folder. Note that all maps were produced using Python code found in the "supporting_code"" folder. Also note that within the "manuscript_figures" folder there is an "extended_data" folder, which contains tables of the summary statistics (e.g., quartiles and sample sizes) behind figures containing box plots or depicting regression coefficients.
"supporting_generate_data.R": This script processes supporting data used in the analysis, primarily the varying ground-based datasets of leaf water content.
"supporting_process_land_cover.R": This takes annual MODIS land cover distributions and processes them through a multi-step filtering process so that they can be used in preprocessing of datasets in python.
Facebook
TwitterMarket basket analysis with Apriori algorithm
The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.
Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.
Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.
Number of Attributes: 7
https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">
First, we need to load required libraries. Shortly I describe all libraries.
https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">
Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.
https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png">
https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">
After we will clear our data frame, will remove missing values.
https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">
To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Exploratory Data Visualization for the Physical Properties of Lakes
This lesson was adapted from educational material written by Dr. Kateri Salk for her Fall 2019 Hydrologic Data Analysis course at Duke University. This is the second part of a two-part exercise focusing on the physical properties of lakes.
Introduction
The field of limnology, the study of inland waters, uses a unique graph format to display relationships of variables by depth in a lake (the field of oceanography uses the same convention). Depth is placed on the y-axis in reverse order and the other variable(s) are placed on the x-axis. In this manner, the graph appears as if a cross section were taken from that point in the lake, with the surface at the top of the graph. This lesson introduces physical properties of lakes, namely stratification, and its visualization using the package ggplot2.
Learning Objectives
After successfully completing this notebook, you will be able to:
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is my very first data analytics project. Data for it is found in https://artscience.blog/home/divvy-dataviz-case-study. I followed the R script written by Kevin Hartman. His analysis is based on the Divvy case study "'Sophisticatedd, Clear, and Polished': Divvy and Data Visualization"
Facebook
TwitterWhen working with data, you often spend the most amount of time cleaning your data. Learn how to write more efficient code using the tidyverse in R.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Objective: To develop a clinical informatics pipeline designed to capture large-scale structured EHR data for a national patient registry.
Materials and Methods: The EHR-R-REDCap pipeline is implemented using R-statistical software to remap and import structured EHR data into the REDCap-based multi-institutional Merkel Cell Carcinoma (MCC) Patient Registry using an adaptable data dictionary.
Results: Clinical laboratory data were extracted from EPIC Clarity across several participating institutions. Labs were transformed, remapped and imported into the MCC registry using the EHR labs abstraction (eLAB) pipeline. Forty-nine clinical tests encompassing 482,450 results were imported into the registry for 1,109 enrolled MCC patients. Data-quality assessment revealed highly accurate, valid labs. Univariate modeling was performed for labs at baseline on overall survival (N=176) using this clinical informatics pipeline.
Conclusion: We demonstrate feasibility of the facile eLAB workflow. EHR data is successfully transformed, and bulk-loaded/imported into a REDCap-based national registry to execute real-world data analysis and interoperability.
Methods eLAB Development and Source Code (R statistical software):
eLAB is written in R (version 4.0.3), and utilizes the following packages for processing: DescTools, REDCapR, reshape2, splitstackshape, readxl, survival, survminer, and tidyverse. Source code for eLAB can be downloaded directly (https://github.com/TheMillerLab/eLAB).
eLAB reformats EHR data abstracted for an identified population of patients (e.g. medical record numbers (MRN)/name list) under an Institutional Review Board (IRB)-approved protocol. The MCCPR does not host MRNs/names and eLAB converts these to MCCPR assigned record identification numbers (record_id) before import for de-identification.
Functions were written to remap EHR bulk lab data pulls/queries from several sources including Clarity/Crystal reports or institutional EDW including Research Patient Data Registry (RPDR) at MGB. The input, a csv/delimited file of labs for user-defined patients, may vary. Thus, users may need to adapt the initial data wrangling script based on the data input format. However, the downstream transformation, code-lab lookup tables, outcomes analysis, and LOINC remapping are standard for use with the provided REDCap Data Dictionary, DataDictionary_eLAB.csv. The available R-markdown ((https://github.com/TheMillerLab/eLAB) provides suggestions and instructions on where or when upfront script modifications may be necessary to accommodate input variability.
The eLAB pipeline takes several inputs. For example, the input for use with the ‘ehr_format(dt)’ single-line command is non-tabular data assigned as R object ‘dt’ with 4 columns: 1) Patient Name (MRN), 2) Collection Date, 3) Collection Time, and 4) Lab Results wherein several lab panels are in one data frame cell. A mock dataset in this ‘untidy-format’ is provided for demonstration purposes (https://github.com/TheMillerLab/eLAB).
Bulk lab data pulls often result in subtypes of the same lab. For example, potassium labs are reported as “Potassium,” “Potassium-External,” “Potassium(POC),” “Potassium,whole-bld,” “Potassium-Level-External,” “Potassium,venous,” and “Potassium-whole-bld/plasma.” eLAB utilizes a key-value lookup table with ~300 lab subtypes for remapping labs to the Data Dictionary (DD) code. eLAB reformats/accepts only those lab units pre-defined by the registry DD. The lab lookup table is provided for direct use or may be re-configured/updated to meet end-user specifications. eLAB is designed to remap, transform, and filter/adjust value units of semi-structured/structured bulk laboratory values data pulls from the EHR to align with the pre-defined code of the DD.
Data Dictionary (DD)
EHR clinical laboratory data is captured in REDCap using the ‘Labs’ repeating instrument (Supplemental Figures 1-2). The DD is provided for use by researchers at REDCap-participating institutions and is optimized to accommodate the same lab-type captured more than once on the same day for the same patient. The instrument captures 35 clinical lab types. The DD serves several major purposes in the eLAB pipeline. First, it defines every lab type of interest and associated lab unit of interest with a set field/variable name. It also restricts/defines the type of data allowed for entry for each data field, such as a string or numerics. The DD is uploaded into REDCap by every participating site/collaborator and ensures each site collects and codes the data the same way. Automation pipelines, such as eLAB, are designed to remap/clean and reformat data/units utilizing key-value look-up tables that filter and select only the labs/units of interest. eLAB ensures the data pulled from the EHR contains the correct unit and format pre-configured by the DD. The use of the same DD at every participating site ensures that the data field code, format, and relationships in the database are uniform across each site to allow for the simple aggregation of the multi-site data. For example, since every site in the MCCPR uses the same DD, aggregation is efficient and different site csv files are simply combined.
Study Cohort
This study was approved by the MGB IRB. Search of the EHR was performed to identify patients diagnosed with MCC between 1975-2021 (N=1,109) for inclusion in the MCCPR. Subjects diagnosed with primary cutaneous MCC between 2016-2019 (N= 176) were included in the test cohort for exploratory studies of lab result associations with overall survival (OS) using eLAB.
Statistical Analysis
OS is defined as the time from date of MCC diagnosis to date of death. Data was censored at the date of the last follow-up visit if no death event occurred. Univariable Cox proportional hazard modeling was performed among all lab predictors. Due to the hypothesis-generating nature of the work, p-values were exploratory and Bonferroni corrections were not applied.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Physical Properties of Rivers: Querying Metadata and Discharge Data
This lesson was adapted from educational material written by Dr. Kateri Salk for her Fall 2019 Hydrologic Data Analysis course at Duke University. This is the second part of a two-part exercise focusing on the physical properties of rivers.
Introduction
Rivers are bodies of freshwater flowing from higher elevations to lower elevations due to the force of gravity. One of the most important physical characteristics of a stream or river is discharge, the volume of water moving through the river or stream over a given amount of time. Discharge can be measured directly by measuring the velocity of flow in several spots in a stream and multiplying the flow velocity over the cross-sectional area of the stream. However, this method is effort-intensive. This exercise will demonstrate how to approximate discharge by developing a rating curve for a stream at a given sampling point. You will also learn to query metadata from and compare discharge patterns in climatically different regions of the United States.
Learning Objectives
After successfully completing this exercise, you will be able to:
Facebook
Twittertitle: 'BellaBeat Fitbit'
author: 'C Romero'
date: 'r Sys.Date()'
output:
html_document:
number_sections: true
##Installation of the base package for data analysis tool
install.packages("base")
##Installation of the ggplot2 package for data analysis tool
install.packages("ggplot2")
##install Lubridate is an R package that makes it easier to work with dates and times.
install.packages("lubridate")
```{r}
##Installation of the tidyverse package for data analysis tool
install.packages("tidyverse")
##Installation of the tidyr package for data analysis tool
install.packages("dplyr")
##Installation of the readr package for data analysis tool
install.packages("readr")
##Installation of the tidyr package for data analysis tool
install.packages("tidyr")
library(base) library(lubridate)# make dealing with dates a little easier library(ggplot2)# create elegant data visialtions using the grammar of graphics library(dplyr)# a grammar of data manpulation library(readr)# read rectangular data text library(tidyr)
## Running code
In a notebook, you can run a single code cell by clicking in the cell and then hitting
the blue arrow to the left, or by clicking in the cell and pressing Shift+Enter. In a script,
you can run code by highlighting the code you want to run and then clicking the blue arrow
at the bottom of this window.
## Reading in files
```{r}
list.files(path = "../input")
# load the activity and sleep data set
```{r}
dailyActivity <- read_csv("../input/wellness/dailyActivity_merge.csv")
sleepDay <- read_csv("../input/wellness/sleepDay_merged.csv")
sum(duplicated(dailyActivity)) sum(duplicated(sleepDay)) sum(is.na(dailyActivity)) sum(is.na(sleepDay))
sleepy <- sleepDay %>% distinct() head(sleepy) head(dailyActivity)
n_distinct(dailyActivity$Id) n_distinct(sleepy$Id)
dailyActivity %>% group_by(Id) %>% summarise(freq = sum(TotalSteps)) %>% arrange(-freq) Tot_dist <- dailyActivity %>% mutate(Id = as.character(dailyActivity$Id)) %>% group_by(Id) %>% summarise(dizzy = sum(TotalDistance)) %>% arrange(-dizzy)
sleepy %>% group_by(Id) %>% summarise(Msleep = sum(TotalMinutesAsleep)) %>% arrange(Msleep) sleepy %>% group_by(Id) %>% summarise(inBed = sum(TotalTimeInBed)) %>% arrange(inBed)
ggplot(Tot_dist) +
geom_count(mapping = aes(y= dizzy, x= Id, color = Id, fill = Id, size = 2)) +
labs(x = "member id's", title = "distance miles" ) +
theme(axis.text.x = element_text(angle = 90))
```
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
What is the relationship between environment and democracy? The framework of cultural evolution suggests that societal development is an adaptation to ecological threats. Pertinent theories assume that democracy emerges as societies adapt to ecological factors such as higher economic wealth, lower pathogen threats, less demanding climates, and fewer natural disasters. However, previous research confused within-country processes with between-country processes and erroneously interpreted between-country findings as if they generalize to within-country mechanisms. In this article, we analyze a time-series cross-sectional dataset to study the dynamic relationship between environment and democracy (1949-2016), accounting for previous misconceptions in levels of analysis. By separating within-country processes from between-country processes, we find that the relationship between environment and democracy not only differs by countries but also depends on the level of analysis. Economic wealth predicts increasing levels of democracy in between-country comparisons, but within-country comparisons show that democracy declines as countries become wealthier over time. This relationship is only prevalent among historically wealthy countries but not among historically poor countries, whose wealth also increased over time. By contrast, pathogen prevalence predicts lower levels of democracy in both between-country and within-country comparisons. Our longitudinal analyses identifying temporal precedence reveal that not only reductions in pathogen prevalence drive future democracy, but also democracy reduces future pathogen prevalence and increases future wealth. These nuanced results contrast with previous analyses using narrow, cross-sectional data. As a whole, our findings illuminate the dynamic process by which environment and democracy shape each other.
Methods Our Time-Series Cross-Sectional data combine various online databases. Country names were first identified and matched using R-package “countrycode” (Arel-Bundock, Enevoldsen, & Yetman, 2018) before all datasets were merged. Occasionally, we modified unidentified country names to be consistent across datasets. We then transformed “wide” data into “long” data and merged them using R’s Tidyverse framework (Wickham, 2014). Our analysis begins with the year 1949, which was occasioned by the fact that one of the key time-variant level-1 variables, pathogen prevalence was only available from 1949 on. See our Supplemental Material for all data, Stata syntax, R-markdown for visualization, supplemental analyses and detailed results (available at https://osf.io/drt8j/).
Facebook
TwitterThis dataset contains all data and code required to reproduce the time-to-event analysis in the associated manuscript. More detail is found in the associated README.md file.Contents of repositoryYaeger_ReservoirDataset_Oct2022.csv: comma-separated data file with reservoir characteristics, construction times, and water table depths and percent saturation at 5-year intervals from 1975-2015.CGA_reservoir_analysis.Rmd: RMarkdown notebook with all code required to reproduce the time-to-event analysis in the manuscript and generate the associated plots.CGA_reservoir_analysis.html: HTML file rendered from the .Rmd notebook.README.md: additional details, including column descriptions from the CSV file.Software versions usedR version 4.1.2 (https://cran.r-project.org/bin/windows/base/old/4.1.2)R packages:data.table v1.14.8 (https://rdatatable.gitlab.io/data.table/)ggplot2 v3.4.4 (https://ggplot2.tidyverse.org/)sf v1.0-14 (https://r-spatial.github.io/sf/)survival v3.5-5 (https://cran.r-project.org/package=survival)icenReg v2.0.15 (https://cran.r-project.org/package=icenReg)
Facebook
TwitterI downloaded the divvy_trip_data 2021 from Jan to Dec. Since google sheets won't let me edit and upload these files as these files are way too big and exceeds 1 GB and google bigquery won't let me use DML with the free account, I will be using R-studio for all the analysis esp: - Tidyverse for data manipulation, exploration, and visualization - Palmerpenguins (if necessary and same as tidyverse) - Lubridate for dates and times - ggplot for visualization
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Introduction to Primate Data Exploration and Linear Modeling with R was created with the goal of providing training to undergraduate biology students on data management and statistical analysis using authentic data of Cayo Santiago rhesus macaques. Module M.1 introduces basic functions from R, as well as from its package tidyverse, for data exploration and management.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data and conda software environment file for the chapter 'Introduction to R and the Tidyverse' of the SPAAM Community's textbook: Introduction to Ancient Metagenomics (https://www.spaam-community.org/intro-to-ancient-metagenomics-book).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Teaching data for practical session: "3b1 Introduction to R and the Tidyverse" of the 2022 SPAAM Summer School: Introduction to Ancient Metagenomics (Aug. 1-5 2022).
See: https://spaam-community.github.io/wss-summer-school/#/2022/ or https://doi.org/10.5281/zenodo.6976711 for slides.
Once downloaded, run:
tar xvfz .tar.gz
to decompress the data directory for the session.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This has been copied from the README.md file
bris-lib-checkout
This provides tidied up data from the Brisbane library checkouts
Retrieving and cleaning the data
The script for retrieving and cleaning the data is made available in scrape-library.R.
The data
The data/ folder contains the tidy data
The data-raw/ folder contains the raw data
data/
This contains four tidied up dataframes:
tidy-brisbane-library-checkout.csv
metadata_branch.csv
metadata_heading.csv
metadata_item_type.csv
tidy-brisbane-library-checkout.csv contains the following columns, with the metadata file metadata_heading containing the description of these columns.
knitr::kable(readr::read_csv("data/metadata_heading.csv"))
heading
heading_explanation
Title
Title of Item
Author
Author of Item
Call Number
Call Number of Item
Item id
Unique Item Identifier
Item Type
Type of Item (see next column)
Status
Current Status of Item
Language
Published language of item (if not English)
Age
Suggested audience
Checkout Library
Checkout branch
Date
Checkout date
We also added year, month, and day columns.
The remaining data are all metadata files that contain meta information on the columns in the checkout data:
library(tidyverse)
knitr::kable(readr::read_csv("data/metadata_branch.csv"))
branch_code
branch_heading
ANN
Annerley
ASH
Ashgrove
BNO
Banyo
BRR
BrackenRidge
BSQ
Brisbane Square Library
BUL
Bulimba
CDA
Corinda
CDE
Chermside
CNL
Carindale
CPL
Coopers Plains
CRA
Carina
EPK
Everton Park
FAI
Fairfield
GCY
Garden City
GNG
Grange
HAM
Hamilton
HPK
Holland Park
INA
Inala
IPY
Indooroopilly
MBG
Mt. Coot-tha
MIT
Mitchelton
MTG
Mt. Gravatt
MTO
Mt. Ommaney
NDH
Nundah
NFM
New Farm
SBK
Sunnybank Hills
SCR
Stones Corner
SGT
Sandgate
VAN
Mobile Library
TWG
Toowong
WND
West End
WYN
Wynnum
ZIL
Zillmere
knitr::kable(readr::read_csv("data/metadata_item_type.csv"))
item_type_code
item_type_explanation
AD-FICTION
Adult Fiction
AD-MAGS
Adult Magazines
AD-PBK
Adult Paperback
BIOGRAPHY
Biography
BSQCDMUSIC
Brisbane Square CD Music
BSQCD-ROM
Brisbane Square CD Rom
BSQ-DVD
Brisbane Square DVD
CD-BOOK
Compact Disc Book
CD-MUSIC
Compact Disc Music
CD-ROM
CD Rom
DVD
DVD
DVD_R18+
DVD Restricted - 18+
FASTBACK
Fastback
GAYLESBIAN
Gay and Lesbian Collection
GRAPHICNOV
Graphic Novel
ILL
InterLibrary Loan
JU-FICTION
Junior Fiction
JU-MAGS
Junior Magazines
JU-PBK
Junior Paperback
KITS
Kits
LARGEPRINT
Large Print
LGPRINTMAG
Large Print Magazine
LITERACY
Literacy
LITERACYAV
Literacy Audio Visual
LOCSTUDIES
Local Studies
LOTE-BIO
Languages Other than English Biography
LOTE-BOOK
Languages Other than English Book
LOTE-CDMUS
Languages Other than English CD Music
LOTE-DVD
Languages Other than English DVD
LOTE-MAG
Languages Other than English Magazine
LOTE-TB
Languages Other than English Taped Book
MBG-DVD
Mt Coot-tha Botanical Gardens DVD
MBG-MAG
Mt Coot-tha Botanical Gardens Magazine
MBG-NF
Mt Coot-tha Botanical Gardens Non Fiction
MP3-BOOK
MP3 Audio Book
NONFIC-SET
Non Fiction Set
NONFICTION
Non Fiction
PICTURE-BK
Picture Book
PICTURE-NF
Picture Book Non Fiction
PLD-BOOK
Public Libraries Division Book
YA-FICTION
Young Adult Fiction
YA-MAGS
Young Adult Magazine
YA-PBK
Young Adult Paperback
Example usage
Let’s explore the data
bris_libs <- readr::read_csv("data/bris-lib-checkout.csv")
We can count the number of titles, item types, suggested age, and the library given:
library(dplyr) count(bris_libs, title, sort = TRUE)
count(bris_libs, item_type, sort = TRUE)
count(bris_libs, age, sort = TRUE)
count(bris_libs, library, sort = TRUE)
License
This data is provided under a CC BY 4.0 license
It has been downloaded from Brisbane library checkouts, and tidied up using the code in data-raw.
Facebook
TwitterIn this tutorial, we will explore the tidyverse data manipulation package dplyr.