This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days. Data is extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system. In order to protect the privacy of crime victims, addresses are shown at the block level only and specific locations are not identified. Should you have questions about this dataset, you may contact the Research & Development Division of the Chicago Police Department at 312.745.6071 or RandD@chicagopolice.org. Disclaimer: These crimes may be based upon preliminary information supplied to the Police Department by the reporting parties that have not been verified. The preliminary crime classifications may be changed at a later date based upon additional investigation and there is always the possibility of mechanical or human error. Therefore, the Chicago Police Department does not guarantee (either expressed or implied) the accuracy, completeness, timeliness, or correct sequencing of the information and the information should not be used for comparison purposes over time. The Chicago Police Department will not be responsible for any error or omission, or for the use of, or the results obtained from the use of this information. All data visualizations on maps should be considered approximate and attempts to derive specific addresses are strictly prohibited. The Chicago Police Department is not responsible for the content of any off-site pages that are referenced by or that reference this web page other than an official City of Chicago or Chicago Police Department web page. The user specifically acknowledges that the Chicago Police Department is not responsible for any defamatory, offensive, misleading, or illegal conduct of other users, links, or third parties and that the risk of injury from the foregoing rests entirely with the user. The unauthorized use of the words "Chicago Police Department," "Chicago Police," or any colorable imitation of these words or the unauthorized use of the Chicago Police Department logo is unlawful. This web page does not, in any way, authorize such use. Data is updated daily Tuesday through Sunday. The dataset contains more than 65,000 records/rows of data and cannot be viewed in full in Microsoft Excel. Therefore, when downloading the file, select CSV from the Export menu. Open the file in an ASCII text editor, such as Wordpad, to view and search. To access a list of Chicago Police Department - Illinois Uniform Crime Reporting (IUCR) codes, go to http://data.cityofchicago.org/Public-Safety/Chicago-Police-Department-Illinois-Uniform-Crime-R/c7ck-438e
Contains data and R scripts for the JOP article, "Why Partisans Don't Sort: The Constraints on Political Segregation." When downloading tabular data files, ensure that they appear in your working directory in CSV format.
The Kitchell Data Recovery project Ceramic Rough Sort (RS_CERAMICS) Data sheet contains data from the rough sort analysis of ceramics recovered during the Kitchell data recovery project. It contains information on ceramic types, tempers and counts; it also records vessel and rim forms where applicable. The data sheet also contains rim circumference and rim diameter measurements for some ceramic specimens.
See Partial Data Recovery and Burial Removal at Pueblo Grande (AZ U:9:1 (ASM)): Unit 15, The Former Maricopa County Sheriff's Substation, Washington and 48th Streets, Phoenix, Arizona (SSI Technical Report No. 02-43) for the final report on the Kitchell Data Recovery project.
Phase 1: ASK
1. Business Task * Cyclist is looking to increase their earnings, and wants to know if creating a social media campaign can influence "Casual" users to become "Annual" members.
2. Key Stakeholders: * The main stakeholder from Cyclist is Lily Moreno, whom is the Director of Marketing and responsible for the development of campaigns and initiatives to promote their bike-share program. The other teams involved with this project will be Marketing & Analytics, and the Executive Team.
3. Business Task: * Comparing the two kinds of users and defining how they use the platform, what variables they have in common, what variables are different, and how can they get Casual users to become Annual members
Phase 2: PREPARE:
1. Determine Data Credibility * Cyclist provided data from years 2013-2021 (through March 2021), all of which is first-hand data collected by the company.
2. Sort & Filter Data: * The stakeholders want to know how the current users are using their service, so I am focusing on using the data from 2020-2021 since this is the most relevant period of time to answer the business task.
#Installing packages
install.packages("tidyverse", repos = "http://cran.us.r-project.org")
install.packages("readr", repos = "http://cran.us.r-project.org")
install.packages("janitor", repos = "http://cran.us.r-project.org")
install.packages("geosphere", repos = "http://cran.us.r-project.org")
install.packages("gridExtra", repos = "http://cran.us.r-project.org")
library(tidyverse)
library(readr)
library(janitor)
library(geosphere)
library(gridExtra)
#Importing data & verifying the information within the dataset
all_tripdata_clean <- read.csv("/Data Projects/cyclist/cyclist_data_cleaned.csv")
glimpse(all_tripdata_clean)
summary(all_tripdata_clean)
Phase 3: PROCESS
1. Cleaning Data & Preparing for Analysis: * Once the data has been placed into one dataset, and checked for errors, we began cleaning the data. * Eliminating data that correlates to the company servicing the bikes, and any ride with a traveled distance of zero. * New columns will be added to assist in the analysis, and to provide accurate assessments of whom is using the bikes.
#Eliminating any data that represents the company performing maintenance, and trips without any measureable distance
all_tripdata_clean <- all_tripdata_clean[!(all_tripdata_clean$start_station_name == "HQ QR" | all_tripdata_clean$ride_length<0),]
#Creating columns for the individual date components (days_of_week should be run last)
all_tripdata_clean$day_of_week <- format(as.Date(all_tripdata_clean$date), "%A")
all_tripdata_clean$date <- as.Date(all_tripdata_clean$started_at)
all_tripdata_clean$day <- format(as.Date(all_tripdata_clean$date), "%d")
all_tripdata_clean$month <- format(as.Date(all_tripdata_clean$date), "%m")
all_tripdata_clean$year <- format(as.Date(all_tripdata_clean$date), "%Y")
** Now I will begin calculating the length of rides being taken, distance traveled, and the mean amount of time & distance.**
#Calculating the ride length in miles & minutes
all_tripdata_clean$ride_length <- difftime(all_tripdata_clean$ended_at,all_tripdata_clean$started_at,units = "mins")
all_tripdata_clean$ride_distance <- distGeo(matrix(c(all_tripdata_clean$start_lng, all_tripdata_clean$start_lat), ncol = 2), matrix(c(all_tripdata_clean$end_lng, all_tripdata_clean$end_lat), ncol = 2))
all_tripdata_clean$ride_distance = all_tripdata_clean$ride_distance/1609.34 #converting to miles
#Calculating the mean time and distance based on the user groups
userType_means <- all_tripdata_clean %>% group_by(member_casual) %>% summarise(mean_time = mean(ride_length))
userType_means <- all_tripdata_clean %>%
group_by(member_casual) %>%
summarise(mean_time = mean(ride_length),mean_distance = mean(ride_distance))
Adding in calculations that will differentiate between bike types and which type of user is using each specific bike type.
#Calculations
with_bike_type <- all_tripdata_clean %>% filter(rideable_type=="classic_bike" | rideable_type=="electric_bike")
with_bike_type %>%
mutate(weekday = wday(started_at, label = TRUE)) %>%
group_by(member_casual,rideable_type,weekday) %>%
summarise(totals=n(), .groups="drop") %>%
with_bike_type %>%
group_by(member_casual,rideable_type) %>%
summarise(totals=n(), .groups="drop") %>%
#Calculating the ride differential
all_tripdata_clean %>%
mutate(weekday = wkday(started_at, label = TRUE)) %>%
group_by(member_casual, weekday) %>%
summarise(number_of_rides = n()
,average_duration = mean(ride_length),.groups = 'drop') %>%
arrange(me...
Attribution 2.5 (CC BY 2.5)https://creativecommons.org/licenses/by/2.5/
License information was derived automatically
\r The Australian Charities and Not-for-profits Commission (ACNC) is Australia’s national regulator of charities.\r \r Since 3 December 2012, charities wanting to access Commonwealth charity tax concessions (and other benefits), need to register with the ACNC. Although many charities choose to register, registration with the ACNC is voluntary.\r \r Each year, registered charities are required to lodge an Annual Information Statement (AIS) with the ACNC. Charities are required to submit their AIS within six months of the end of their reporting period.\r \r Registered charities can apply to the ACNC to have some or all of the information they provide withheld from the ACNC Register. However, there are only limited circumstances when the ACNC can agree to withhold information. If a charity has applied to have their data withheld, the AIS data relating to that charity has been excluded from this dataset.\r \r This dataset can be used to find the AIS information lodged by multiple charities. It can also be used to filter and sort by different variables across all AIS information.\r \r This dataset can be used to find the AIS information lodged by multiple charities. It can also be used to filter and sort by different variables across all AIS information. AIS Information for individual charities can be viewed via the ACNC Charity Register.\r \r The AIS collects information about charity finances, and financial information provides a basis for understanding the charity and its activities in greater detail. \r We have published explanatory notes to help you understand this dataset.\r \r When comparing charities’ financial information it is important to consider each charity's unique situation. This is particularly true for small charities, which are not compelled to provide financial reports – reports that often contain more details about their financial position and activities – as part of their AIS.\r \r For more information on interpreting financial information, please refer to the ACNC website.\r \r The ACNC also publishes other datasets on data.gov.au as part of our commitment to open data and transparent regulation. Please click here to view them.\r \r NOTE: It is possible that some information in this dataset might be subject to a future request from a charity to have their information withheld. If this occurs, this information will still appear in the dataset until the next update.\r \r Please consider this risk when using this dataset.
This dataset contains 146 jpeg images of multicores collected during R/V Weatherbird II cruise 1305 from September 22nd to 29th 2012. Additionally, this includes a file of raw sort data for macrofauna to the family level.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset brings to you Iris Dataset in several data formats (see more details in the next sections).
You can use it to test the ingestion of data in all these formats using Python or R libraries. We also prepared Python Jupyter Notebook and R Markdown report that input all these formats:
Iris Dataset was created by R. A. Fisher and donated by Michael Marshall.
Repository on UCI site: https://archive.ics.uci.edu/ml/datasets/iris
Data Source: https://archive.ics.uci.edu/ml/machine-learning-databases/iris/
The file downloaded is iris.data and is formatted as a comma delimited file.
This small data collection was created to help you test your skills with ingesting various data formats.
This file was processed to convert the data in the following formats:
* csv - comma separated values format
* tsv - tab separated values format
* parquet - parquet format
* feather - feather format
* parquet.gzip - compressed parquet format
* h5 - hdf5 format
* pickle - Python binary object file - pickle format
* xslx - Excel format
* npy - Numpy (Python library) binary format
* npz - Numpy (Python library) binary compressed format
* rds - Rds (R specific data format) binary format
I would like to acknowledge the work of the creator of the dataset - R. A. Fisher and of the donor - Michael Marshall.
Use these data formats to test your skills in ingesting data in various formats.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Water Data Online provides free access to nationally consistent, current and historical water information. It allows you to view and download standardised data and reports. \r \r Watercourse level and watercourse discharge time series data from approximately 3500 water monitoring stations across Australia are available. \r \r Water Data Online displays time series data supplied by lead water agencies from each State and Territory with updates provided to the Bureau on a daily basis. \r \r Over time, more stations and parameters will become available and linkages to Water Data Online from the Geofabric will be implemented. \r \r Before using data please refer to licence preferences of the supplying organisations under the Copyright tab \r \r
Crossing table of the BOAMP table (DILA) with the Sirene Business Base (INSEE) / First Quarter 2024. - The BUYER's Siren number (column "SN_30_Siren") is implemented for each ad (column and primary key "B_17_idweb"); - Several columns facilitating datamining have been added; - The names of the original columns have been prefixed, numbered and sorted alphabetically. ---- You will find here - The BSA for the first quarter of 2024 in free and open access (csv/separator ** semicolon** formats, and Public Prosecutor’s Office); - The schema of the BSA table (csv/comma separator format); - An excerpt from the March 30 BSA (csv/comma separator format) to quickly give you an idea of the Datagouv explorer. NB / The March 30 extract sees its columns of cells in json GESTION, DATA, and ANNONCES_ANTERIEURES purged. The data thus deleted can be found in a nicer format by following the links of the added columns: - B_41_GESTION_URL_JSON; - B_43_DONNEES_URL_JSON; - B_45_ANNONCES_ANTERIEURES_URL_JSON. ---- More info - Daily and paid updates on the entire BOAMP 2024 are available on our website under ► AuFilDuBoamp Downloads; - Further documentation can be found at ► AuFilDuBoamp Doc & TP. ---- Data sources - SIRENE database of companies and their establishments (SIREN, SIRET) of August - BOAMP API ---- To download the first quarter of the BSA with Python, run: For the CSV: df = pd.read_csv("https://www.data.gouv.fr/en/datasets/r/63f0d792-148a-4c95-a0b6-9e8ea8b0b34a", dtype='string', sep=';')
For the Public Prosecutor's Office: df = pd.read_parquet("https://www.data.gouv.fr/en/datasets/r/f7a4a76e-ff50-4dc6-bae8-97368081add2")
Enjoy! https://www.aufilduboamp.com/shares/aufilduboamp_docs/ap_tampon_blanc.jpg" alt="www.aufilduboamp.com and BOAMP data on datagouv" title="www.aufilduboamp.com and BOAMP data on datagouv">
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
R scripts used to generate the design and sort and score the data of Study 3, with annotation: intended as a template to build future BWS studies. (ZIP 15 kb)
This is the carbon monoxide data. Each sheet (tab) is formatted to be exported as a .csv for use with the R-code (AQ-June20.R). In order for this code to work properly, it is important that this file remain intact. Do not change the column names or codes for data, for example. And to be safe, don’t even sort. Just in case. One simple change in the excel file could make the code full of bugs.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
# TDMentions: A Dataset of Technical Debt Mentions in Online Posts (version 1.0)
TDMentions is a dataset that contains mentions of technical debt from Reddit, Hacker News, and Stack Exchange. It also contains a list of blog posts on Medium that were tagged as technical debt. The dataset currently contains approximately 35,000 items.
## Data collection and processing
The dataset is mainly collected from existing datasets. We used data from:
- the archive of Reddit posts by Jason Baumgartner (available at [https://pushshift.io](https://pushshift.io),
- the archive of Hacker News available at Google's BigQuery (available at [https://console.cloud.google.com/marketplace/details/y-combinator/hacker-news](https://console.cloud.google.com/marketplace/details/y-combinator/hacker-news)), and the Stack Exchange data dump (available at [https://archive.org/details/stackexchange](https://archive.org/details/stackexchange)).
- the [GHTorrent](http://ghtorrent.org) project
- the [GH Archive](https://www.gharchive.org)
The data set currently contains data from the start of each source/service until 2018-12-31. For GitHub, we currently only include data from 2015-01-01.
We use the regular expression `tech(nical)?[\s\-_]*?debt` to find mentions in all sources except for Medium. We decided to limit our matches to variations of technical debt and tech debt. Other shorter forms, such as TD, can result in too many false positives. For Medium, we used the tag `technical-debt`.
## Data Format
The dataset is stored as a compressed (bzip2) JSON file with one JSON object per line. Each mention is represented as a JSON object with the following keys.
- `id`: the id used in the original source. We use the URL path to identify Medium posts.
- `body`: the text that contains the mention. This is either the comment or the title of the post. For Medium posts this is the title and subtitle (which might not mention technical debt, since posts are identified by the tag).
- `created_utc`: the time the item was posted in seconds since epoch in UTC.
- `author`: the author of the item. We use the username or userid from the source.
- `source`: where the item was posted. Valid sources are:
- HackerNews Comment
- HackerNews Job
- HackerNews Submission
- Reddit Comment
- Reddit Submission
- StackExchange Answer
- StackExchange Comment
- StackExchange Question
- Medium Post
- `meta`: Additional information about the item specific to the source. This includes, e.g., the subreddit a Reddit submission or comment was posted to, the score, etc. We try to use the same names, e.g., `score` and `num_comments` for keys that have the same meaning/information across multiple sources.
This is a sample item from Reddit:
```JSON
{
"id": "ab8auf",
"body": "Technical Debt Explained (x-post r/Eve)",
"created_utc": 1546271789,
"author": "totally_100_human",
"source": "Reddit Submission",
"meta": {
"title": "Technical Debt Explained (x-post r/Eve)",
"score": 1,
"num_comments": 0,
"url": "http://jestertrek.com/eve/technical-debt-2.png",
"subreddit": "RCBRedditBot"
}
}
```
## Sample Analyses
We decided to use JSON to store the data, since it is easy to work with from multiple programming languages. In the following examples, we use [`jq`](https://stedolan.github.io/jq/) to process the JSON.
### How many items are there for each source?
```
lbzip2 -cd postscomments.json.bz2 | jq '.source' | sort | uniq -c
```
### How many submissions that mentioned technical debt were posted each month?
```
lbzip2 -cd postscomments.json.bz2 | jq 'select(.source == "Reddit Submission") | .created_utc | strftime("%Y-%m")' | sort | uniq -c
```
### What are the titles of items that link (`meta.url`) to PDF documents?
```
lbzip2 -cd postscomments.json.bz2 | jq '. as $r | select(.meta.url?) | .meta.url | select(endswith(".pdf")) | $r.body'
```
### Please, I want CSV!
```
lbzip2 -cd postscomments.json.bz2 | jq -r '[.id, .body, .author] | @csv'
```
Note that you need to specify the keys you want to include for the CSV, so it is easier to either ignore the meta information or process each source.
Please see [https://github.com/sse-lnu/tdmentions](https://github.com/sse-lnu/tdmentions) for more analyses
# Limitations and Future updates
The current version of the dataset lacks GitHub data and Medium comments. GitHub data will be added in the next update. Medium comments (responses) will be added in a future update if we find a good way to represent these.
This dataset was created by Ashish R. Soni
This is the gravimetric data used to calibrate the real time readings. Each sheet (tab) is formatted to be exported as a .csv for use with the R-code (AQ-June20.R). In order for this code to work properly, it is important that this file remain intact. Do not change the column names or codes for data, for example. And to be safe, don’t even sort. One simple change in the excel file could make the code full of bugs.
This is the raw SO2 data. Each sheet (tab) is formatted to be exported as a .csv for use with the R-code (AQ-June20.R). In order for this code to work properly, it is important that this file remain intact. Do not change the column names or codes for data, for example. And to be safe, don’t even sort. One simple change in the excel file could make the code full of bugs.
This is the raw H2S data- concentration of H2S in parts per million in the biogas. Each sheet (tab) is formatted to be exported as a .csv for use with the R-code (AQ-June20.R). In order for this code to work properly, it is important that this file remain intact. Do not change the column names or codes for data, for example. And to be safe, don’t even sort. One simple change in the excel file could make the code full of bugs.
Methane concentration of biogas. Each sheet (tab) is formatted to be exported as a .csv for use with the R-code (AQ-June20.R). In order for this code to work properly, it is important that this file remain intact. Do not change the column names or codes for data, for example. And to be safe, don’t even sort. Just in case. One simple change in the excel file could make the code full of bugs.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This has been copied from the README.md file
bris-lib-checkout
This provides tidied up data from the Brisbane library checkouts
Retrieving and cleaning the data
The script for retrieving and cleaning the data is made available in scrape-library.R.
The data
data/
This contains four tidied up dataframes:
tidy-brisbane-library-checkout.csv contains the following columns, with the metadata file metadata_heading containing the description of these columns.
knitr::kable(readr::read_csv("data/metadata_heading.csv"))
#> Parsed with column specification:
#> cols(
#> heading = col_character(),
#> heading_explanation = col_character()
#> )
heading
heading_explanation
Title
Title of Item
Author
Author of Item
Call Number
Call Number of Item
Item id
Unique Item Identifier
Item Type
Type of Item (see next column)
Status
Current Status of Item
Language
Published language of item (if not English)
Age
Suggested audience
Checkout Library
Checkout branch
Date
Checkout date
We also added year, month, and day columns.
The remaining data are all metadata files that contain meta information on the columns in the checkout data:
library(tidyverse)
#> ── Attaching packages ────────────── tidyverse 1.2.1 ──
#> ✔ ggplot2 3.1.0 ✔ purrr 0.2.5
#> ✔ tibble 1.4.99.9006 ✔ dplyr 0.7.8
#> ✔ tidyr 0.8.2 ✔ stringr 1.3.1
#> ✔ readr 1.3.0 ✔ forcats 0.3.0
#> ── Conflicts ───────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag() masks stats::lag()
knitr::kable(readr::read_csv("data/metadata_branch.csv"))
#> Parsed with column specification:
#> cols(
#> branch_code = col_character(),
#> branch_heading = col_character()
#> )
branch_code
branch_heading
ANN
Annerley
ASH
Ashgrove
BNO
Banyo
BRR
BrackenRidge
BSQ
Brisbane Square Library
BUL
Bulimba
CDA
Corinda
CDE
Chermside
CNL
Carindale
CPL
Coopers Plains
CRA
Carina
EPK
Everton Park
FAI
Fairfield
GCY
Garden City
GNG
Grange
HAM
Hamilton
HPK
Holland Park
INA
Inala
IPY
Indooroopilly
MBG
Mt. Coot-tha
MIT
Mitchelton
MTG
Mt. Gravatt
MTO
Mt. Ommaney
NDH
Nundah
NFM
New Farm
SBK
Sunnybank Hills
SCR
Stones Corner
SGT
Sandgate
VAN
Mobile Library
TWG
Toowong
WND
West End
WYN
Wynnum
ZIL
Zillmere
knitr::kable(readr::read_csv("data/metadata_item_type.csv"))
#> Parsed with column specification:
#> cols(
#> item_type_code = col_character(),
#> item_type_explanation = col_character()
#> )
item_type_code
item_type_explanation
AD-FICTION
Adult Fiction
AD-MAGS
Adult Magazines
AD-PBK
Adult Paperback
BIOGRAPHY
Biography
BSQCDMUSIC
Brisbane Square CD Music
BSQCD-ROM
Brisbane Square CD Rom
BSQ-DVD
Brisbane Square DVD
CD-BOOK
Compact Disc Book
CD-MUSIC
Compact Disc Music
CD-ROM
CD Rom
DVD
DVD
DVD_R18+
DVD Restricted - 18+
FASTBACK
Fastback
GAYLESBIAN
Gay and Lesbian Collection
GRAPHICNOV
Graphic Novel
ILL
InterLibrary Loan
JU-FICTION
Junior Fiction
JU-MAGS
Junior Magazines
JU-PBK
Junior Paperback
KITS
Kits
LARGEPRINT
Large Print
LGPRINTMAG
Large Print Magazine
LITERACY
Literacy
LITERACYAV
Literacy Audio Visual
LOCSTUDIES
Local Studies
LOTE-BIO
Languages Other than English Biography
LOTE-BOOK
Languages Other than English Book
LOTE-CDMUS
Languages Other than English CD Music
LOTE-DVD
Languages Other than English DVD
LOTE-MAG
Languages Other than English Magazine
LOTE-TB
Languages Other than English Taped Book
MBG-DVD
Mt Coot-tha Botanical Gardens DVD
MBG-MAG
Mt Coot-tha Botanical Gardens Magazine
MBG-NF
Mt Coot-tha Botanical Gardens Non Fiction
MP3-BOOK
MP3 Audio Book
NONFIC-SET
Non Fiction Set
NONFICTION
Non Fiction
PICTURE-BK
Picture Book
PICTURE-NF
Picture Book Non Fiction
PLD-BOOK
Public Libraries Division Book
YA-FICTION
Young Adult Fiction
YA-MAGS
Young Adult Magazine
YA-PBK
Young Adult Paperback
Example usage
Let’s explore the data
bris_libs <- readr::read_csv("data/bris-lib-checkout.csv")
#> Parsed with column specification:
#> cols(
#> title = col_character(),
#> author = col_character(),
#> call_number = col_character(),
#> item_id = col_double(),
#> item_type = col_character(),
#> status = col_character(),
#> language = col_character(),
#> age = col_character(),
#> library = col_character(),
#> date = col_double(),
#> datetime = col_datetime(format = ""),
#> year = col_double(),
#> month = col_double(),
#> day = col_character()
#> )
#> Warning: 20 parsing failures.
#> row col expected actual file
#> 587795 item_id a double REFRESH 'data/bris-lib-checkout.csv'
#> 590579 item_id a double REFRESH 'data/bris-lib-checkout.csv'
#> 590597 item_id a double REFRESH 'data/bris-lib-checkout.csv'
#> 595774 item_id a double REFRESH 'data/bris-lib-checkout.csv'
#> 597567 item_id a double REFRESH 'data/bris-lib-checkout.csv'
#> ...... ....... ........ ....... ............................
#> See problems(...) for more details.
We can count the number of titles, item types, suggested age, and the library given:
library(dplyr)
count(bris_libs, title, sort = TRUE)
#> # A tibble: 121,046 x 2
#> title n
#>
License
This data is provided under a CC BY 4.0 license
It has been downloaded from Brisbane library checkouts, and tidied up using the code in data-raw.
It-TRI d’Agen jinkludi 20 muniċipalità mifruxa fuq il-baċir ta’ Garonne magħruf bħala Agenaise. Tabella taż-żoni tal-ispazju tal-veloċità (żoni li għalihom hija disponibbli stima tal-veloċità għall-għargħar ta’ ċertu tip f’ċertu xenarju). Sett ta’ dejta ġeografika prodott mid-Direttiva GIS dwar l-GIS dwar l-Għargħar tar-Riskju Għoli (TRI) tal-Agen u mmappjat għal skopijiet ta’ rappurtar għad-Direttiva Ewropea dwar l-Għargħar. Id-Direttiva Ewropea 2007/60/KE tat-23 ta’ Ottubru 2007 dwar il-valutazzjoni u l-immaniġġjar tar-riskji tal-għargħar (ĠU L 288, 06–11–2007, p. 27) tinfluwenza l-istrateġija għall-prevenzjoni tal-għargħar fl-Ewropa. Dan jirrikjedi l-produzzjoni ta’ pjanijiet ta’ ġestjoni tar-riskju ta’ għargħar biex jitnaqqsu l-konsegwenzi negattivi tal-għargħar fuq is-saħħa tal-bniedem, l-ambjent, il-wirt kulturali u l-attività ekonomika. L-għanijiet u r-rekwiżiti ta’ implimentazzjoni huma stabbiliti fil-Liġi tat-12 ta’ Lulju 2010 dwar l-Impenn Nazzjonali għall-Ambjent (LENE) u d-Digriet tat-2 ta’ Marzu 2011. F’dan il-kuntest, l-għan primarju tal-immappjar tar-riskju tal-għargħar u tal-għargħar għall-IRRs huwa li jikkontribwixxi, bl-omoġenizzazzjoni u l-oġġettività tal-għarfien dwar l-esponiment għall-għargħar, għall-iżvilupp ta’ pjanijiet ta’ ġestjoni tar-riskju tal-għargħar (WRMS). Dan is-sett ta’ dejta jintuża biex jiġu prodotti mapep tal-wiċċ tal-għargħar u mapep tar-riskju ta’ għargħar li jirrappreżentaw perikli u kwistjonijiet ta’ għargħar fuq skala xierqa, rispettivament.L-għan tagħhom huwa li jipprovdu evidenza kwantitattiva biex jivvalutaw aktar il-vulnerabbiltà ta’ territorju għat-tliet livelli ta’ probabbiltà ta’ għargħar (għoli, medju, baxx). Tabella taż-żoni tal-ispazju tal-veloċità (żoni li għalihom hija disponibbli stima tal-veloċità għall-għargħar ta’ ċertu tip f’ċertu xenarju). Sett ta’ dejta ġeografika prodott mid-Direttiva GIS dwar l-GIS dwar l-Għargħar tar-Riskju Għoli (TRI) tal-Agen u mmappjat għal skopijiet ta’ rappurtar għad-Direttiva Ewropea dwar l-Għargħar. Id-Direttiva Ewropea 2007/60/KE tat-23 ta’ Ottubru 2007 dwar il-valutazzjoni u l-immaniġġjar tar-riskji tal-għargħar (ĠU L 288, 06–11–2007, p. 27) tinfluwenza l-istrateġija għall-prevenzjoni tal-għargħar fl-Ewropa. Dan jirrikjedi l-produzzjoni ta’ pjanijiet ta’ ġestjoni tar-riskju ta’ għargħar biex jitnaqqsu l-konsegwenzi negattivi tal-għargħar fuq is-saħħa tal-bniedem, l-ambjent, il-wirt kulturali u l-attività ekonomika. L-għanijiet u r-rekwiżiti ta’ implimentazzjoni huma stabbiliti fil-Liġi tat-12 ta’ Lulju 2010 dwar l-Impenn Nazzjonali għall-Ambjent (LENE) u d-Digriet tat-2 ta’ Marzu 2011. F’dan il-kuntest, l-għan primarju tal-immappjar tar-riskju tal-għargħar u tal-għargħar għall-IRRs huwa li jikkontribwixxi, bl-omoġenizzazzjoni u l-oġġettività tal-għarfien dwar l-esponiment għall-għargħar, għall-iżvilupp ta’ pjanijiet ta’ ġestjoni tar-riskju tal-għargħar (WRMS). Dan is-sett ta’ dejta jintuża biex jiġu prodotti mapep tal-wiċċ tal-għargħar u mapep tar-riskju ta’ għargħar li jirrappreżentaw perikli u kwistjonijiet ta’ għargħar fuq skala xierqa, rispettivament.L-għan tagħhom huwa li jipprovdu evidenza kwantitattiva biex jivvalutaw aktar il-vulnerabbiltà ta’ territorju għat-tliet livelli ta’ probabbiltà ta’ għargħar (għoli, medju, baxx). Tabella taż-żoni tal-ispazju tal-veloċità (żoni li għalihom hija disponibbli stima tal-veloċità għall-għargħar ta’ ċertu tip f’ċertu xenarju). Sett ta’ dejta ġeografika prodott mid-Direttiva GIS dwar l-GIS dwar l-Għargħar tar-Riskju Għoli (TRI) tal-Agen u mmappjat għal skopijiet ta’ rappurtar għad-Direttiva Ewropea dwar l-Għargħar. Id-Direttiva Ewropea 2007/60/KE tat-23 ta’ Ottubru 2007 dwar il-valutazzjoni u l-immaniġġjar tar-riskji tal-għargħar (ĠU L 288, 06–11–2007, p. 27) tinfluwenza l-istrateġija għall-prevenzjoni tal-għargħar fl-Ewropa. Dan jirrikjedi l-produzzjoni ta’ pjanijiet ta’ ġestjoni tar-riskju ta’ għargħar biex jitnaqqsu l-konsegwenzi negattivi tal-għargħar fuq is-saħħa tal-bniedem, l-ambjent, il-wirt kulturali u l-attività ekonomika. L-għanijiet u r-rekwiżiti ta’ implimentazzjoni huma stabbiliti fil-Liġi tat-12 ta’ Lulju 2010 dwar l-Impenn Nazzjonali għall-Ambjent (LENE) u d-Digriet tat-2 ta’ Marzu 2011. F’dan il-kuntest, l-għan primarju tal-immappjar tar-riskju tal-għargħar u tal-għargħar għall-IRRs huwa li jikkontribwixxi, bl-omoġenizzazzjoni u l-oġġettività tal-għarfien dwar l-esponiment għall-għargħar, għall-iżvilupp ta’ pjanijiet ta’ ġestjoni tar-riskju tal-għargħar (WRMS). Dan is-sett ta’ dejta jintuża biex jiġu prodotti mapep tal-wiċċ tal-għargħar u mapep tar-riskju ta’ għargħar li jirrappreżentaw perikli u kwistjonijiet ta’ għargħar fuq skala xierqa, rispettivament.L-għan tagħhom huwa li jipprovdu evidenza kwantitattiva biex jivvalutaw aktar il-vulnerabbiltà ta’ territorju għat-tliet livelli ta’ probabbiltà ta’ għargħar (għoli, medju, baxx).
Tabella taż-żoni tal-ispazju tal-veloċità (żoni li għalihom hija disponibbli stima tal-veloċità għall-għargħar ta’ ċertu
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Normalization
# Generate a resting state (rs) timeseries (ts)
# Install / load package to make fake fMRI ts
# install.packages("neuRosim")
library(neuRosim)
# Generate a ts
ts.rs <- simTSrestingstate(nscan=2000, TR=1, SNR=1)
# 3dDetrend -normalize
# R command version for 3dDetrend -normalize -polort 0 which normalizes by making "the sum-of-squares equal to 1"
# Do for the full timeseries
ts.normalised.long <- (ts.rs-mean(ts.rs))/sqrt(sum((ts.rs-mean(ts.rs))^2));
# Do this again for a shorter version of the same timeseries
ts.shorter.length <- length(ts.normalised.long)/4
ts.normalised.short <- (ts.rs[1:ts.shorter.length]- mean(ts.rs[1:ts.shorter.length]))/sqrt(sum((ts.rs[1:ts.shorter.length]- mean(ts.rs[1:ts.shorter.length]))^2));
# By looking at the summaries, it can be seen that the median values become larger
summary(ts.normalised.long)
summary(ts.normalised.short)
# Plot results for the long and short ts
# Truncate the longer ts for plotting only
ts.normalised.long.made.shorter <- ts.normalised.long[1:ts.shorter.length]
# Give the plot a title
title <- "3dDetrend -normalize for long (blue) and short (red) timeseries";
plot(x=0, y=0, main=title, xlab="", ylab="", xaxs='i', xlim=c(1,length(ts.normalised.short)), ylim=c(min(ts.normalised.short),max(ts.normalised.short)));
# Add zero line
lines(x=c(-1,ts.shorter.length), y=rep(0,2), col='grey');
# 3dDetrend -normalize -polort 0 for long timeseries
lines(ts.normalised.long.made.shorter, col='blue');
# 3dDetrend -normalize -polort 0 for short timeseries
lines(ts.normalised.short, col='red');
Standardization/modernization
New afni_proc.py command line
afni_proc.py \
-subj_id "$sub_id_name_1" \
-blocks despike tshift align tlrc volreg mask blur scale regress \
-radial_correlate_blocks tcat volreg \
-copy_anat anatomical_warped/anatSS.1.nii.gz \
-anat_has_skull no \
-anat_follower anat_w_skull anat anatomical_warped/anatU.1.nii.gz \
-anat_follower_ROI aaseg anat freesurfer/SUMA/aparc.a2009s+aseg.nii.gz \
-anat_follower_ROI aeseg epi freesurfer/SUMA/aparc.a2009s+aseg.nii.gz \
-anat_follower_ROI fsvent epi freesurfer/SUMA/fs_ap_latvent.nii.gz \
-anat_follower_ROI fswm epi freesurfer/SUMA/fs_ap_wm.nii.gz \
-anat_follower_ROI fsgm epi freesurfer/SUMA/fs_ap_gm.nii.gz \
-anat_follower_erode fsvent fswm \
-dsets media_?.nii.gz \
-tcat_remove_first_trs 8 \
-tshift_opts_ts -tpattern alt+z2 \
-align_opts_aea -cost lpc+ZZ -giant_move -check_flip \
-tlrc_base "$basedset" \
-tlrc_NL_warp \
-tlrc_NL_warped_dsets \
anatomical_warped/anatQQ.1.nii.gz \
anatomical_warped/anatQQ.1.aff12.1D \
anatomical_warped/anatQQ.1_WARP.nii.gz \
-volreg_align_to MIN_OUTLIER \
-volreg_post_vr_allin yes \
-volreg_pvra_base_index MIN_OUTLIER \
-volreg_align_e2a \
-volreg_tlrc_warp \
-mask_opts_automask -clfrac 0.10 \
-mask_epi_anat yes \
-blur_to_fwhm -blur_size $blur \
-regress_motion_per_run \
-regress_ROI_PC fsvent 3 \
-regress_ROI_PC_per_run fsvent \
-regress_make_corr_vols aeseg fsvent \
-regress_anaticor_fast \
-regress_anaticor_label fswm \
-regress_censor_motion 0.3 \
-regress_censor_outliers 0.1 \
-regress_apply_mot_types demean deriv \
-regress_est_blur_epits \
-regress_est_blur_errts \
-regress_run_clustsim no \
-regress_polort 2 \
-regress_bandpass 0.01 1 \
-html_review_style pythonic
We used similar command lines to generate ‘blurred and not censored’ and the ‘not blurred and not censored’ timeseries files (described more fully below). We will provide the code used to make all derivative files available on our github site (https://github.com/lab-lab/nndb).We made one choice above that is different enough from our original pipeline that it is worth mentioning here. Specifically, we have quite long runs, with the average being ~40 minutes but this number can be variable (thus leading to the above issue with 3dDetrend’s -normalise). A discussion on the AFNI message board with one of our team (starting here, https://afni.nimh.nih.gov/afni/community/board/read.php?1,165243,165256#msg-165256), led to the suggestion that '-regress_polort 2' with '-regress_bandpass 0.01 1' be used for long runs. We had previously used only a variable polort with the suggested 1 + int(D/150) approach. Our new polort 2 + bandpass approach has the added benefit of working well with afni_proc.py.
Which timeseries file you use is up to you but I have been encouraged by Rick and Paul to include a sort of PSA about this. In Paul’s own words: * Blurred data should not be used for ROI-based analyses (and potentially not for ICA? I am not certain about standard practice). * Unblurred data for ISC might be pretty noisy for voxelwise analyses, since blurring should effectively boost the SNR of active regions (and even good alignment won't be perfect everywhere). * For uncensored data, one should be concerned about motion effects being left in the data (e.g., spikes in the data). * For censored data: * Performing ISC requires the users to unionize the censoring patterns during the correlation calculation. * If wanting to calculate power spectra or spectral parameters like ALFF/fALFF/RSFA etc. (which some people might do for naturalistic tasks still), then standard FT-based methods can't be used because sampling is no longer uniform. Instead, people could use something like 3dLombScargle+3dAmpToRSFC, which calculates power spectra (and RSFC params) based on a generalization of the FT that can handle non-uniform sampling, as long as the censoring pattern is mostly random and, say, only up to about 10-15% of the data. In sum, think very carefully about which files you use. If you find you need a file we have not provided, we can happily generate different versions of the timeseries upon request and can generally do so in a week or less.
Effect on results
This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days. Data is extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system. In order to protect the privacy of crime victims, addresses are shown at the block level only and specific locations are not identified. Should you have questions about this dataset, you may contact the Research & Development Division of the Chicago Police Department at 312.745.6071 or RandD@chicagopolice.org. Disclaimer: These crimes may be based upon preliminary information supplied to the Police Department by the reporting parties that have not been verified. The preliminary crime classifications may be changed at a later date based upon additional investigation and there is always the possibility of mechanical or human error. Therefore, the Chicago Police Department does not guarantee (either expressed or implied) the accuracy, completeness, timeliness, or correct sequencing of the information and the information should not be used for comparison purposes over time. The Chicago Police Department will not be responsible for any error or omission, or for the use of, or the results obtained from the use of this information. All data visualizations on maps should be considered approximate and attempts to derive specific addresses are strictly prohibited. The Chicago Police Department is not responsible for the content of any off-site pages that are referenced by or that reference this web page other than an official City of Chicago or Chicago Police Department web page. The user specifically acknowledges that the Chicago Police Department is not responsible for any defamatory, offensive, misleading, or illegal conduct of other users, links, or third parties and that the risk of injury from the foregoing rests entirely with the user. The unauthorized use of the words "Chicago Police Department," "Chicago Police," or any colorable imitation of these words or the unauthorized use of the Chicago Police Department logo is unlawful. This web page does not, in any way, authorize such use. Data is updated daily Tuesday through Sunday. The dataset contains more than 65,000 records/rows of data and cannot be viewed in full in Microsoft Excel. Therefore, when downloading the file, select CSV from the Export menu. Open the file in an ASCII text editor, such as Wordpad, to view and search. To access a list of Chicago Police Department - Illinois Uniform Crime Reporting (IUCR) codes, go to http://data.cityofchicago.org/Public-Safety/Chicago-Police-Department-Illinois-Uniform-Crime-R/c7ck-438e