6 datasets found

Brisbane Library Checkout Data
zenodo.org
data.niaid.nih.gov
application/gzip, bin
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicholas Tierney; Nicholas Tierney (2020). Brisbane Library Checkout Data [Dataset]. http://doi.org/10.5281/zenodo.2437860
Explore at:
bin, application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.2437860
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Nicholas Tierney; Nicholas Tierney
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Brisbane
Description
This has been copied from the README.md file

bris-lib-checkout

This provides tidied up data from the Brisbane library checkouts

Retrieving and cleaning the data

The script for retrieving and cleaning the data is made available in scrape-library.R.

The data

The data/ folder contains the tidy data

The data-raw/ folder contains the raw data

data/

This contains four tidied up dataframes:

tidy-brisbane-library-checkout.csv

metadata_branch.csv

metadata_heading.csv

metadata_item_type.csv

tidy-brisbane-library-checkout.csv contains the following columns, with the metadata file metadata_heading containing the description of these columns.

knitr::kable(readr::read_csv("data/metadata_heading.csv"))
#> Parsed with column specification:
#> cols(
#> heading = col_character(),
#> heading_explanation = col_character()
#> )

heading

heading_explanation

Title

Title of Item

Author

Author of Item

Call Number

Call Number of Item

Item id

Unique Item Identifier

Item Type

Type of Item (see next column)

Status

Current Status of Item

Language

Published language of item (if not English)

Age

Suggested audience

Checkout Library

Checkout branch

Date

Checkout date

We also added year, month, and day columns.

The remaining data are all metadata files that contain meta information on the columns in the checkout data:

library(tidyverse)
#> ── Attaching packages ────────────── tidyverse 1.2.1 ──
#> ✔ ggplot2 3.1.0 ✔ purrr 0.2.5
#> ✔ tibble 1.4.99.9006 ✔ dplyr 0.7.8
#> ✔ tidyr 0.8.2 ✔ stringr 1.3.1
#> ✔ readr 1.3.0 ✔ forcats 0.3.0
#> ── Conflicts ───────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag() masks stats::lag()
knitr::kable(readr::read_csv("data/metadata_branch.csv"))
#> Parsed with column specification:
#> cols(
#> branch_code = col_character(),
#> branch_heading = col_character()
#> )

branch_code

branch_heading

ANN

Annerley

ASH

Ashgrove

BNO

Banyo

BRR

BrackenRidge

BSQ

Brisbane Square Library

BUL

Bulimba

CDA

Corinda

CDE

Chermside

CNL

Carindale

CPL

Coopers Plains

CRA

Carina

EPK

Everton Park

FAI

Fairfield

GCY

Garden City

GNG

Grange

HAM

Hamilton

HPK

Holland Park

INA

Inala

IPY

Indooroopilly

MBG

Mt. Coot-tha

MIT

Mitchelton

MTG

Mt. Gravatt

MTO

Mt. Ommaney

NDH

Nundah

NFM

New Farm

SBK

Sunnybank Hills

SCR

Stones Corner

SGT

Sandgate

VAN

Mobile Library

TWG

Toowong

WND

West End

WYN

Wynnum

ZIL

Zillmere

knitr::kable(readr::read_csv("data/metadata_item_type.csv"))
#> Parsed with column specification:
#> cols(
#> item_type_code = col_character(),
#> item_type_explanation = col_character()
#> )

item_type_code

item_type_explanation

AD-FICTION

Adult Fiction

AD-MAGS

Adult Magazines

AD-PBK

Adult Paperback

BIOGRAPHY

Biography

BSQCDMUSIC

Brisbane Square CD Music

BSQCD-ROM

Brisbane Square CD Rom

BSQ-DVD

Brisbane Square DVD

CD-BOOK

Compact Disc Book

CD-MUSIC

Compact Disc Music

CD-ROM

CD Rom

DVD

DVD

DVD_R18+

DVD Restricted - 18+

FASTBACK

Fastback

GAYLESBIAN

Gay and Lesbian Collection

GRAPHICNOV

Graphic Novel

ILL

InterLibrary Loan

JU-FICTION

Junior Fiction

JU-MAGS

Junior Magazines

JU-PBK

Junior Paperback

KITS

Kits

LARGEPRINT

Large Print

LGPRINTMAG

Large Print Magazine

LITERACY

Literacy

LITERACYAV

Literacy Audio Visual

LOCSTUDIES

Local Studies

LOTE-BIO

Languages Other than English Biography

LOTE-BOOK

Languages Other than English Book

LOTE-CDMUS

Languages Other than English CD Music

LOTE-DVD

Languages Other than English DVD

LOTE-MAG

Languages Other than English Magazine

LOTE-TB

Languages Other than English Taped Book

MBG-DVD

Mt Coot-tha Botanical Gardens DVD

MBG-MAG

Mt Coot-tha Botanical Gardens Magazine

MBG-NF

Mt Coot-tha Botanical Gardens Non Fiction

MP3-BOOK

MP3 Audio Book

NONFIC-SET

Non Fiction Set

NONFICTION

Non Fiction

PICTURE-BK

Picture Book

PICTURE-NF

Picture Book Non Fiction

PLD-BOOK

Public Libraries Division Book

YA-FICTION

Young Adult Fiction

YA-MAGS

Young Adult Magazine

YA-PBK

Young Adult Paperback

Example usage

Let’s explore the data

bris_libs <- readr::read_csv("data/bris-lib-checkout.csv")
#> Parsed with column specification:
#> cols(
#> title = col_character(),
#> author = col_character(),
#> call_number = col_character(),
#> item_id = col_double(),
#> item_type = col_character(),
#> status = col_character(),
#> language = col_character(),
#> age = col_character(),
#> library = col_character(),
#> date = col_double(),
#> datetime = col_datetime(format = ""),
#> year = col_double(),
#> month = col_double(),
#> day = col_character()
#> )
#> Warning: 20 parsing failures.
#> row col expected actual file
#> 587795 item_id a double REFRESH 'data/bris-lib-checkout.csv'
#> 590579 item_id a double REFRESH 'data/bris-lib-checkout.csv'
#> 590597 item_id a double REFRESH 'data/bris-lib-checkout.csv'
#> 595774 item_id a double REFRESH 'data/bris-lib-checkout.csv'
#> 597567 item_id a double REFRESH 'data/bris-lib-checkout.csv'
#> ...... ....... ........ ....... ............................
#> See problems(...) for more details.

We can count the number of titles, item types, suggested age, and the library given:

library(dplyr)
count(bris_libs, title, sort = TRUE)
#> # A tibble: 121,046 x 2
#> title n
#>

License

This data is provided under a CC BY 4.0 license

It has been downloaded from Brisbane library checkouts, and tidied up using the code in data-raw.
q
Investigating human impacts on stream ecology: Intro to R
qubeshub.org
Updated May 16, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kristen Kaczynski (2019). Investigating human impacts on stream ecology: Intro to R [Dataset]. http://doi.org/10.25334/Q45M9H
Explore at:
Unique identifier
https://doi.org/10.25334/Q45M9H
Dataset updated
May 16, 2019
Dataset provided by
QUBES
Authors
Kristen Kaczynski
Description
This resource uses the Human Impact on Stream Ecology data set, background and questions and provides students an very general introduction to using R. Students perform basic summary statistics and data visualization in R (using tidyr language).
o
Introduction to Machine Learning using R: Introduction & Linear Regression
explore.openaire.eu
Updated Jan 1, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Khuong Tran; Dr Anastasios Papaioannou (2021). Introduction to Machine Learning using R: Introduction & Linear Regression [Dataset]. http://doi.org/10.5281/zenodo.6423740
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.6423740
Dataset updated
Jan 1, 2021
Authors
Khuong Tran; Dr Anastasios Papaioannou
Description
About this course Machine Learning (ML) is a new way to program computers to solve real world problems. It has gained popularity over the last few years by achieving tremendous success in tasks that we believed only humans could solve, from recognising images to self-driving cars. In this course, we will explore the fundamentals of Machine Learning from a practical perspective with the help of the R programming language and its scientific computing packages. Learning Outcomes Understand the difference between supervised and unsupervised Machine Learning. Understand the fundamentals of Machine Learning. Comprehensive introduction to Machine Learning models and techniques such as Linear Regression and Model Training. Understand the Machine Learning modelling workflows. Use R and and its relevant packages to process real datasets, train and apply Machine Learning models Prerequisites Either Learn to Program: R and Data Manipulation in R or Learn to Program: R and Data Manipulation and Visualisation in R needed to attend this course. If you already have experience with programming, please check the topics covered in the Learn to Program: R, Data Manipulation in R and Data Manipulation and Visualisation in R courses to ensure that you are familiar with the knowledge needed for this course, such as good understanding of R syntax and basic programming concepts and familiarity with dplyr, tidyr and ggplot2 packages. Maths knowledge is not required. There are only a few Math formula that you are going to see in this course, however references to Mathematics required for learning about Machine Learning will be provided. Having an understanding of the Mathematics behind each Machine Learning algorithms is going to make you appreciate the behaviour of the model and know its pros/cons when using them. Why do this course? Useful for anyone who wants to learn about Machine Learning but are overwhelmed with the tremendous amount of resources. It does not go in depth into mathematical concepts and formula, however formal intuitions and references are provided to guide the participants for further learning. We do have applications on real datasets! Machine Learning models are introduced in this course together with important feature engineering techniques that are guaranteed to be useful in your own projects. Give you enough background to kickstart your own Machine Learning journey, or transition yourself into Deep Learning. For a better and more complete understanding of the most popular Machine Learning models and techniques please consider attending all three Introduction to Machine Learning using R workshops: Introduction to Machine Learning using R: Introduction & Linear Regression Introduction to Machine Learning using R: Classification Introduction to Machine Learning using R: SVM & Unsupervised Learning Licence Copyright © 2021 Intersect Australia Ltd. All rights reserved.
Beach Volleyball
kaggle.com
Updated May 18, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jesse Mostipak (2020). Beach Volleyball [Dataset]. https://www.kaggle.com/jessemostipak/beach-volleyball/metadata
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 18, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Jesse Mostipak
Description
Beach Volleyball

The data this week comes from Adam Vagnar who also blogged about this dataset. There's a LOT of data here - match-level results, player details, and match-level statistics for some matches. For all this dataset all the matches are played 2 vs 2, so there are columns for 2 winners (1 team) and 2 losers (1 team). The data is relatively ready for analysis and clean, although there are some duplicated columns and the data is wide due to the 2-players per team.

Check out the data dictionary, or Wikipedia for some longer-form details around what the various match statistics mean.

Most of the data is from the international FIVB tournaments but about 1/3 is from the US-centric AVP.

The FIVB Beach Volleyball World Tour (known between 2003 and 2012 as the FIVB Beach Volleyball Swatch World Tour for sponsorship reasons) is the worldwide professional beach volleyball tour for both men and women organized by the Fédération Internationale de Volleyball (FIVB). The World Tour was introduced for men in 1989 while the women first competed in 1992.

Winning the World Tour is considered to be one of the highest honours in international beach volleyball, being surpassed only by the World Championships, and the Beach Volleyball tournament at the Summer Olympic Games.

FiveThirtyEight examined the disadvantage of serving in beach volleyball, although they used Olympic-level data. Again, Adam Vagnar also covered this data on his blog.

What is Tidy Tuesday?

TidyTuesday A weekly data project aimed at the R ecosystem. As this project was borne out of the R4DS Online Learning Community and the R for Data Science textbook, an emphasis was placed on understanding how to summarize and arrange data to make meaningful charts with ggplot2, tidyr, dplyr, and other tools in the tidyverse ecosystem. However, any code-based methodology is welcome - just please remember to share the code used to generate the results.

Join the R4DS Online Learning Community in the weekly #TidyTuesday event! Every week we post a raw dataset, a chart or article related to that dataset, and ask you to explore the data. While the dataset will be “tamed”, it will not always be tidy!

We will have many sources of data and want to emphasize that no causation is implied. There are various moderating variables that affect all data, many of which might not have been captured in these datasets. As such, our guidelines are to use the data provided to practice your data tidying and plotting techniques. Participants are invited to consider for themselves what nuancing factors might underlie these relationships.

The intent of Tidy Tuesday is to provide a safe and supportive forum for individuals to practice their wrangling and data visualization skills independent of drawing conclusions. While we understand that the two are related, the focus of this practice is purely on building skills with real-world data.
Gene Expression DEconvolution Pipeline in R
figshare.com
application/gzip
Updated Aug 31, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Slim Karkar (2021). Gene Expression DEconvolution Pipeline in R [Dataset]. http://doi.org/10.6084/m9.figshare.16545708.v1
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.16545708.v1
Dataset updated
Aug 31, 2021
Dataset provided by
Figsharehttp://figshare.com/
Authors
Slim Karkar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
gedepir is an R package that simplifies the use of deconvolution tools within a complete transcriptomics analysis pipeline. It simplify the definition of a end-to-end analysis pipeline with a set of base functions that are connected through the pipe syntax used in magrittr, tidyr or dplyr packages.This dataset example is comprised of 50 pseudo-bulk samples.
Data from: Technical Debt in the Peer-Review Documentation of R Packages: a...
zenodo.org
zip
Updated Mar 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zadia Codabux; Zadia Codabux; Melina Vidoni; Fatemeh H. Fard; Melina Vidoni; Fatemeh H. Fard (2021). Technical Debt in the Peer-Review Documentation of R Packages: a rOpenSci Case Study [Dataset]. http://doi.org/10.5281/zenodo.4589573
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4589573
Dataset updated
Mar 9, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Zadia Codabux; Zadia Codabux; Melina Vidoni; Fatemeh H. Fard; Melina Vidoni; Fatemeh H. Fard
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Replication Package for the paper "Technical Debt in the Peer-Review Documentation of R Packages: a rOpenSci Case Study" (MSR '21).

# Scripts: Data Collection and Processing

These are the scripts used to extract the data from _rOpenSci_. The following steps indicate how to use them.

1. Add all attached R files into an R project.

2. Install the following R packages. Moreover, the process also requires to have a working GitHub account, in order to obtain the corresponding token.

```{r}

library(dplyr)

library(stringr)

library(stringi)

library(jsonlite)

library(httpuv)

library(httr)

library(ggplot2)

library(tidyr)

```'

3. All the individual functions on the following files should be sourced into the R Environment: `getToken.R`, `comments.R`, `issues.R`, and `tagging.R`.

4. Run the script located on the file `process.R`. This will run all the previous functions in the corresponding order.

# Datasets

The following files are included:

-Dataset_1-100_Author1.xlsx contains the randomly selected 100 comments that were classified according to TD types by Author 1.

-Dataset_1-100_Author2.xlsx contains the randomly selected 100 comments that were classified according to TD types by Author 2 and the combined classification (in blue) after discussion.

-Dataset_Phrases_Both.xlsx contains the randomly selected 358 comments (resulting in 602 phrases) that were classified according to TD types by both authors 1 and 2. Their classification was incorporated into a single spreadsheet side by side for easy comparison. Disagreement was discussed and final classification is in the “Agreement” field.

-UserRoles.csv contains the user roles associated with the 600 phrases. The “comment_id” is the unique identifier for the comment from which the phrase is extracted. The phrase is represented in the “statement” field. The “agreement” field shows the final technical debt label after the analysis by two of the authors. The user roles are shown in the “user_role” column.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Nicholas Tierney; Nicholas Tierney (2020). Brisbane Library Checkout Data [Dataset]. http://doi.org/10.5281/zenodo.2437860

Brisbane Library Checkout Data

Explore at:

bin, application/gzipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.2437860

Dataset updated

Jan 24, 2020

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Nicholas Tierney; Nicholas Tierney

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered

Brisbane

Description

This has been copied from the README.md file

bris-lib-checkout

This provides tidied up data from the Brisbane library checkouts

Retrieving and cleaning the data

The script for retrieving and cleaning the data is made available in scrape-library.R.

The data

The data/ folder contains the tidy data
The data-raw/ folder contains the raw data

data/

This contains four tidied up dataframes:

tidy-brisbane-library-checkout.csv
metadata_branch.csv
metadata_heading.csv
metadata_item_type.csv

tidy-brisbane-library-checkout.csv contains the following columns, with the metadata file metadata_heading containing the description of these columns.

knitr::kable(readr::read_csv("data/metadata_heading.csv"))
#> Parsed with column specification:
#> cols(
#> heading = col_character(),
#> heading_explanation = col_character()
#> )

heading

heading_explanation

Title

Title of Item

Author

Author of Item

Call Number

Call Number of Item

Item id

Unique Item Identifier

Item Type

Type of Item (see next column)

Status

Current Status of Item

Language

Published language of item (if not English)

Age

Suggested audience

Checkout Library

Checkout branch

Date

Checkout date

We also added year, month, and day columns.

The remaining data are all metadata files that contain meta information on the columns in the checkout data:

library(tidyverse)
#> ── Attaching packages ────────────── tidyverse 1.2.1 ──
#> ✔ ggplot2 3.1.0 ✔ purrr 0.2.5
#> ✔ tibble 1.4.99.9006 ✔ dplyr 0.7.8
#> ✔ tidyr 0.8.2 ✔ stringr 1.3.1
#> ✔ readr 1.3.0 ✔ forcats 0.3.0
#> ── Conflicts ───────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag() masks stats::lag()
knitr::kable(readr::read_csv("data/metadata_branch.csv"))
#> Parsed with column specification:
#> cols(
#> branch_code = col_character(),
#> branch_heading = col_character()
#> )

branch_code

branch_heading

ANN

Annerley

ASH

Ashgrove

BNO

Banyo

BRR

BrackenRidge

BSQ

Brisbane Square Library

BUL

Bulimba

CDA

Corinda

CDE

Chermside

CNL

Carindale

CPL

Coopers Plains

CRA

Carina

EPK

Everton Park

FAI

Fairfield

GCY

Garden City

GNG

Grange

HAM

Hamilton

HPK

Holland Park

INA

Inala

IPY

Indooroopilly

MBG

Mt. Coot-tha

MIT

Mitchelton

MTG

Mt. Gravatt

MTO

Mt. Ommaney

NDH

Nundah

NFM

New Farm

SBK

Sunnybank Hills

SCR

Stones Corner

SGT

Sandgate

VAN

Mobile Library

TWG

Toowong

WND

West End

WYN

Wynnum

ZIL

Zillmere

knitr::kable(readr::read_csv("data/metadata_item_type.csv"))
#> Parsed with column specification:
#> cols(
#> item_type_code = col_character(),
#> item_type_explanation = col_character()
#> )

item_type_code

item_type_explanation

AD-FICTION

Adult Fiction

AD-MAGS

Adult Magazines

AD-PBK

Adult Paperback

BIOGRAPHY

Biography

BSQCDMUSIC

Brisbane Square CD Music

BSQCD-ROM

Brisbane Square CD Rom

BSQ-DVD

Brisbane Square DVD

CD-BOOK

Compact Disc Book

CD-MUSIC

Compact Disc Music

CD-ROM

CD Rom

DVD

DVD_R18+

DVD Restricted - 18+

FASTBACK

Fastback

GAYLESBIAN

Gay and Lesbian Collection

GRAPHICNOV

Graphic Novel

ILL

InterLibrary Loan

JU-FICTION

Junior Fiction

JU-MAGS

Junior Magazines

JU-PBK

Junior Paperback

KITS

Kits

LARGEPRINT

Large Print

LGPRINTMAG

Large Print Magazine

LITERACY

Literacy

LITERACYAV

Literacy Audio Visual

LOCSTUDIES

Local Studies

LOTE-BIO

Languages Other than English Biography

LOTE-BOOK

Languages Other than English Book

LOTE-CDMUS

Languages Other than English CD Music

LOTE-DVD

Languages Other than English DVD

LOTE-MAG

Languages Other than English Magazine

LOTE-TB

Languages Other than English Taped Book

MBG-DVD

Mt Coot-tha Botanical Gardens DVD

MBG-MAG

Mt Coot-tha Botanical Gardens Magazine

MBG-NF

Mt Coot-tha Botanical Gardens Non Fiction

MP3-BOOK

MP3 Audio Book

NONFIC-SET

Non Fiction Set

NONFICTION

Non Fiction

PICTURE-BK

Picture Book

PICTURE-NF

Picture Book Non Fiction

PLD-BOOK

Public Libraries Division Book

YA-FICTION

Young Adult Fiction

YA-MAGS

Young Adult Magazine

YA-PBK

Young Adult Paperback

Example usage

Let’s explore the data

bris_libs <- readr::read_csv("data/bris-lib-checkout.csv")
#> Parsed with column specification:
#> cols(
#> title = col_character(),
#> author = col_character(),
#> call_number = col_character(),
#> item_id = col_double(),
#> item_type = col_character(),
#> status = col_character(),
#> language = col_character(),
#> age = col_character(),
#> library = col_character(),
#> date = col_double(),
#> datetime = col_datetime(format = ""),
#> year = col_double(),
#> month = col_double(),
#> day = col_character()
#> )
#> Warning: 20 parsing failures.
#> row col expected actual file
#> 587795 item_id a double REFRESH 'data/bris-lib-checkout.csv'
#> 590579 item_id a double REFRESH 'data/bris-lib-checkout.csv'
#> 590597 item_id a double REFRESH 'data/bris-lib-checkout.csv'
#> 595774 item_id a double REFRESH 'data/bris-lib-checkout.csv'
#> 597567 item_id a double REFRESH 'data/bris-lib-checkout.csv'
#> ...... ....... ........ ....... ............................
#> See problems(...) for more details.

We can count the number of titles, item types, suggested age, and the library given:

library(dplyr)
count(bris_libs, title, sort = TRUE)
#> # A tibble: 121,046 x 2
#> title n
#>

License

This data is provided under a CC BY 4.0 license

It has been downloaded from Brisbane library checkouts, and tidied up using the code in data-raw.

Clear search

Close search

Google apps

Main menu

Brisbane Library Checkout Data

Investigating human impacts on stream ecology: Intro to R

Introduction to Machine Learning using R: Introduction & Linear Regression

Beach Volleyball

Beach Volleyball

What is Tidy Tuesday?

Gene Expression DEconvolution Pipeline in R

Data from: Technical Debt in the Peer-Review Documentation of R Packages: a...

Brisbane Library Checkout Data