33 datasets found

C
sort
data.cityofchicago.org
application/rdfxml +5
Updated Jul 13, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chicago Police Department (2025). sort [Dataset]. https://data.cityofchicago.org/Public-Safety/sort/bnsx-zzcw
Explore at:
xml, tsv, csv, json, application/rdfxml, application/rssxmlAvailable download formats
Dataset updated
Jul 13, 2025
Authors
Chicago Police Department
Description
This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days. Data is extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system. In order to protect the privacy of crime victims, addresses are shown at the block level only and specific locations are not identified. Should you have questions about this dataset, you may contact the Research & Development Division of the Chicago Police Department at 312.745.6071 or RandD@chicagopolice.org. Disclaimer: These crimes may be based upon preliminary information supplied to the Police Department by the reporting parties that have not been verified. The preliminary crime classifications may be changed at a later date based upon additional investigation and there is always the possibility of mechanical or human error. Therefore, the Chicago Police Department does not guarantee (either expressed or implied) the accuracy, completeness, timeliness, or correct sequencing of the information and the information should not be used for comparison purposes over time. The Chicago Police Department will not be responsible for any error or omission, or for the use of, or the results obtained from the use of this information. All data visualizations on maps should be considered approximate and attempts to derive specific addresses are strictly prohibited. The Chicago Police Department is not responsible for the content of any off-site pages that are referenced by or that reference this web page other than an official City of Chicago or Chicago Police Department web page. The user specifically acknowledges that the Chicago Police Department is not responsible for any defamatory, offensive, misleading, or illegal conduct of other users, links, or third parties and that the risk of injury from the foregoing rests entirely with the user. The unauthorized use of the words "Chicago Police Department," "Chicago Police," or any colorable imitation of these words or the unauthorized use of the Chicago Police Department logo is unlawful. This web page does not, in any way, authorize such use. Data is updated daily Tuesday through Sunday. The dataset contains more than 65,000 records/rows of data and cannot be viewed in full in Microsoft Excel. Therefore, when downloading the file, select CSV from the Export menu. Open the file in an ASCII text editor, such as Wordpad, to view and search. To access a list of Chicago Police Department - Illinois Uniform Crime Reporting (IUCR) codes, go to http://data.cityofchicago.org/Public-Safety/Chicago-Police-Department-Illinois-Uniform-Crime-R/c7ck-438e
d
Replication Data for \"Why Partisans Don't Sort: The Constraints on Partisan...
search.dataone.org
dataverse.harvard.edu
Updated Nov 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nall, Clayton; Mummolo, Jonathan (2023). Replication Data for \"Why Partisans Don't Sort: The Constraints on Partisan Segregation\" [Dataset]. http://doi.org/10.7910/DVN/EDGRDC
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/EDGRDC
Dataset updated
Nov 21, 2023
Dataset provided by
Harvard Dataverse
Authors
Nall, Clayton; Mummolo, Jonathan
Description
Contains data and R scripts for the JOP article, "Why Partisans Don't Sort: The Constraints on Political Segregation." When downloading tabular data files, ensure that they appear in your working directory in CSV format.
d
Data from: Pueblo Grande (AZ U:9:1(ASM)): Unit 15, Washington and 48th...
search.dataone.org
Updated Jul 8, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abbott, David R.; Martin, Maria (2014). Pueblo Grande (AZ U:9:1(ASM)): Unit 15, Washington and 48th Streets: SSI Kitchell Data Recovery, Ceramic Rough Sort (RS_CERAMICS) Data [Dataset]. http://doi.org/10.6067/XCV8BG2MZH
Explore at:
Unique identifier
https://doi.org/10.6067/XCV8BG2MZH
Dataset updated
Jul 8, 2014
Dataset provided by
the Digital Archaeological Record
Authors
Abbott, David R.; Martin, Maria
Area covered

Description
The Kitchell Data Recovery project Ceramic Rough Sort (RS_CERAMICS) Data sheet contains data from the rough sort analysis of ceramics recovered during the Kitchell data recovery project. It contains information on ceramic types, tempers and counts; it also records vessel and rim forms where applicable. The data sheet also contains rim circumference and rim diameter measurements for some ceramic specimens.

See Partial Data Recovery and Burial Removal at Pueblo Grande (AZ U:9:1 (ASM)): Unit 15, The Former Maricopa County Sheriff's Substation, Washington and 48th Streets, Phoenix, Arizona (SSI Technical Report No. 02-43) for the final report on the Kitchell Data Recovery project.

Case Study: Cyclist

kaggle.com

Updated Jul 27, 2021

Facebook

Twitter

Click to copy link

Link copied

Cite

PatrickRCampbell (2021). Case Study: Cyclist [Dataset]. https://www.kaggle.com/patrickrcampbell/case-study-cyclist/discussion

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jul 27, 2021

Dataset provided by

Kagglehttp://kaggle.com/

Authors

PatrickRCampbell

Description

Phase 1: ASK

Key Objectives:

1. Business Task * Cyclist is looking to increase their earnings, and wants to know if creating a social media campaign can influence "Casual" users to become "Annual" members.

2. Key Stakeholders: * The main stakeholder from Cyclist is Lily Moreno, whom is the Director of Marketing and responsible for the development of campaigns and initiatives to promote their bike-share program. The other teams involved with this project will be Marketing & Analytics, and the Executive Team.

3. Business Task: * Comparing the two kinds of users and defining how they use the platform, what variables they have in common, what variables are different, and how can they get Casual users to become Annual members

Phase 2: PREPARE:

Key Objectives:

1. Determine Data Credibility * Cyclist provided data from years 2013-2021 (through March 2021), all of which is first-hand data collected by the company.

2. Sort & Filter Data: * The stakeholders want to know how the current users are using their service, so I am focusing on using the data from 2020-2021 since this is the most relevant period of time to answer the business task.

#Installing packages
install.packages("tidyverse", repos = "http://cran.us.r-project.org")
install.packages("readr", repos = "http://cran.us.r-project.org")
install.packages("janitor", repos = "http://cran.us.r-project.org")
install.packages("geosphere", repos = "http://cran.us.r-project.org")
install.packages("gridExtra", repos = "http://cran.us.r-project.org")

library(tidyverse)
library(readr)
library(janitor)
library(geosphere)
library(gridExtra)

#Importing data & verifying the information within the dataset
all_tripdata_clean <- read.csv("/Data Projects/cyclist/cyclist_data_cleaned.csv")

glimpse(all_tripdata_clean)

summary(all_tripdata_clean)

Phase 3: PROCESS

Key Objectives:

1. Cleaning Data & Preparing for Analysis: * Once the data has been placed into one dataset, and checked for errors, we began cleaning the data. * Eliminating data that correlates to the company servicing the bikes, and any ride with a traveled distance of zero. * New columns will be added to assist in the analysis, and to provide accurate assessments of whom is using the bikes.

#Eliminating any data that represents the company performing maintenance, and trips without any measureable distance
all_tripdata_clean <- all_tripdata_clean[!(all_tripdata_clean$start_station_name == "HQ QR" | all_tripdata_clean$ride_length<0),] 

#Creating columns for the individual date components (days_of_week should be run last)
all_tripdata_clean$day_of_week <- format(as.Date(all_tripdata_clean$date), "%A")
all_tripdata_clean$date <- as.Date(all_tripdata_clean$started_at)
all_tripdata_clean$day <- format(as.Date(all_tripdata_clean$date), "%d")
all_tripdata_clean$month <- format(as.Date(all_tripdata_clean$date), "%m")
all_tripdata_clean$year <- format(as.Date(all_tripdata_clean$date), "%Y")

** Now I will begin calculating the length of rides being taken, distance traveled, and the mean amount of time & distance.**

#Calculating the ride length in miles & minutes
all_tripdata_clean$ride_length <- difftime(all_tripdata_clean$ended_at,all_tripdata_clean$started_at,units = "mins")

all_tripdata_clean$ride_distance <- distGeo(matrix(c(all_tripdata_clean$start_lng, all_tripdata_clean$start_lat), ncol = 2), matrix(c(all_tripdata_clean$end_lng, all_tripdata_clean$end_lat), ncol = 2))
all_tripdata_clean$ride_distance = all_tripdata_clean$ride_distance/1609.34 #converting to miles

#Calculating the mean time and distance based on the user groups
userType_means <- all_tripdata_clean %>% group_by(member_casual) %>% summarise(mean_time = mean(ride_length))


userType_means <- all_tripdata_clean %>% 
 group_by(member_casual) %>% 
 summarise(mean_time = mean(ride_length),mean_distance = mean(ride_distance))

Adding in calculations that will differentiate between bike types and which type of user is using each specific bike type.

#Calculations

with_bike_type <- all_tripdata_clean %>% filter(rideable_type=="classic_bike" | rideable_type=="electric_bike")

with_bike_type %>%
 mutate(weekday = wday(started_at, label = TRUE)) %>% 
 group_by(member_casual,rideable_type,weekday) %>%
 summarise(totals=n(), .groups="drop") %>%
 
with_bike_type %>%
 group_by(member_casual,rideable_type) %>%
 summarise(totals=n(), .groups="drop") %>%

 #Calculating the ride differential
 
 all_tripdata_clean %>% 
 mutate(weekday = wkday(started_at, label = TRUE)) %>% 
 group_by(member_casual, weekday) %>% 
 summarise(number_of_rides = n()
      ,average_duration = mean(ride_length),.groups = 'drop') %>% 
 arrange(me...

ACNC 2019 Annual Information Statement Data
researchdata.edu.au
data.gov.au
Updated May 10, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ACNC 2019 Annual Information Statement Data [Dataset]. https://researchdata.edu.au/acnc-2019-annual-statement-data/2975980
Explore at:
Dataset updated
May 10, 2021
Dataset provided by
Data.govhttps://data.gov/
Authors
Australian Charities and Not-for-profits Commission (ACNC)
License
Attribution 2.5 (CC BY 2.5)https://creativecommons.org/licenses/by/2.5/
License information was derived automatically
Description
This dataset is updated weekly. Please ensure that you use the most up-to-date version.###\r

\r The Australian Charities and Not-for-profits Commission (ACNC) is Australia’s national regulator of charities.\r \r Since 3 December 2012, charities wanting to access Commonwealth charity tax concessions (and other benefits), need to register with the ACNC. Although many charities choose to register, registration with the ACNC is voluntary.\r \r Each year, registered charities are required to lodge an Annual Information Statement (AIS) with the ACNC. Charities are required to submit their AIS within six months of the end of their reporting period.\r \r Registered charities can apply to the ACNC to have some or all of the information they provide withheld from the ACNC Register. However, there are only limited circumstances when the ACNC can agree to withhold information. If a charity has applied to have their data withheld, the AIS data relating to that charity has been excluded from this dataset.\r \r This dataset can be used to find the AIS information lodged by multiple charities. It can also be used to filter and sort by different variables across all AIS information.\r \r This dataset can be used to find the AIS information lodged by multiple charities. It can also be used to filter and sort by different variables across all AIS information. AIS Information for individual charities can be viewed via the ACNC Charity Register.\r \r The AIS collects information about charity finances, and financial information provides a basis for understanding the charity and its activities in greater detail. \r We have published explanatory notes to help you understand this dataset.\r \r When comparing charities’ financial information it is important to consider each charity's unique situation. This is particularly true for small charities, which are not compelled to provide financial reports – reports that often contain more details about their financial position and activities – as part of their AIS.\r \r For more information on interpreting financial information, please refer to the ACNC website.\r \r The ACNC also publishes other datasets on data.gov.au as part of our commitment to open data and transparent regulation. Please click here to view them.\r \r NOTE: It is possible that some information in this dataset might be subject to a future request from a charity to have their information withheld. If this occurs, this information will still appear in the dataset until the next update.\r \r Please consider this risk when using this dataset.
d
Sediment macrofauna count data and images of multicores collected during R/V...
search.dataone.org
data.griidc.org
Updated Feb 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MacDonald, Ian (2025). Sediment macrofauna count data and images of multicores collected during R/V Weatherbird II cruise 1305, September 22-29, 2012 [Dataset]. http://doi.org/10.7266/N7BV7DKC
Explore at:
Unique identifier
https://doi.org/10.7266/N7BV7DKC
Dataset updated
Feb 5, 2025
Dataset provided by
GRIIDC
Authors
MacDonald, Ian
Description
This dataset contains 146 jpeg images of multicores collected during R/V Weatherbird II cruise 1305 from September 22nd to 29th 2012. Additionally, this includes a file of raw sort data for macrofauna to the family level.
Explore data formats and ingestion methods
kaggle.com
Updated Feb 12, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gabriel Preda (2021). Explore data formats and ingestion methods [Dataset]. https://www.kaggle.com/datasets/gpreda/iris-dataset/discussion?sort=undefined
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 12, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Gabriel Preda
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Why this Dataset

This dataset brings to you Iris Dataset in several data formats (see more details in the next sections).

You can use it to test the ingestion of data in all these formats using Python or R libraries. We also prepared Python Jupyter Notebook and R Markdown report that input all these formats:

Test Data Formats in Python

Test Data Formats in R

Iris Dataset

Iris Dataset was created by R. A. Fisher and donated by Michael Marshall.

Repository on UCI site: https://archive.ics.uci.edu/ml/datasets/iris

Data Source: https://archive.ics.uci.edu/ml/machine-learning-databases/iris/

The file downloaded is iris.data and is formatted as a comma delimited file.

This small data collection was created to help you test your skills with ingesting various data formats.

Content

This file was processed to convert the data in the following formats: * csv - comma separated values format * tsv - tab separated values format * parquet - parquet format
* feather - feather format * parquet.gzip - compressed parquet format * h5 - hdf5 format * pickle - Python binary object file - pickle format * xslx - Excel format
* npy - Numpy (Python library) binary format * npz - Numpy (Python library) binary compressed format * rds - Rds (R specific data format) binary format

Acknowledgements

I would like to acknowledge the work of the creator of the dataset - R. A. Fisher and of the donor - Michael Marshall.

Inspiration

Use these data formats to test your skills in ingesting data in various formats.
Water Data Online
researchdata.edu.au
data.gov.au
+2more
Updated Oct 21, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bureau of Meteorology (2014). Water Data Online [Dataset]. https://researchdata.edu.au/water-data-online/3528495
Explore at:
Dataset updated
Oct 21, 2014
Dataset provided by
Data.govhttps://data.gov/
Authors
Bureau of Meteorology
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Area covered

Description
Water Data Online provides free access to nationally consistent, current and historical water information. It allows you to view and download standardised data and reports. \r \r Watercourse level and watercourse discharge time series data from approximately 3500 water monitoring stations across Australia are available. \r \r Water Data Online displays time series data supplied by lead water agencies from each State and Territory with updates provided to the Bureau on a daily basis. \r \r Over time, more stations and parameters will become available and linkages to Water Data Online from the Geofabric will be implemented. \r \r Before using data please refer to licence preferences of the supplying organisations under the Copyright tab \r \r
g
First quarter 2024 / Table BOAMP-SIREN-BUYERS (BSA): a cross between the...
gimi9.com
Updated Feb 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). First quarter 2024 / Table BOAMP-SIREN-BUYERS (BSA): a cross between the BOAMP table (DILA) and the Sirene Business Base (INSEE) | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_6644ac7663969d80f6047dd8/
Explore at:
Dataset updated
Feb 13, 2025
Description
Crossing table of the BOAMP table (DILA) with the Sirene Business Base (INSEE) / First Quarter 2024. - The BUYER's Siren number (column "SN_30_Siren") is implemented for each ad (column and primary key "B_17_idweb"); - Several columns facilitating datamining have been added; - The names of the original columns have been prefixed, numbered and sorted alphabetically. ---- You will find here - The BSA for the first quarter of 2024 in free and open access (csv/separator ** semicolon** formats, and Public Prosecutor’s Office); - The schema of the BSA table (csv/comma separator format); - An excerpt from the March 30 BSA (csv/comma separator format) to quickly give you an idea of the Datagouv explorer. NB / The March 30 extract sees its columns of cells in json GESTION, DATA, and ANNONCES_ANTERIEURES purged. The data thus deleted can be found in a nicer format by following the links of the added columns: - B_41_GESTION_URL_JSON; - B_43_DONNEES_URL_JSON; - B_45_ANNONCES_ANTERIEURES_URL_JSON. ---- More info - Daily and paid updates on the entire BOAMP 2024 are available on our website under ► AuFilDuBoamp Downloads; - Further documentation can be found at ► AuFilDuBoamp Doc & TP. ---- Data sources - SIRENE database of companies and their establishments (SIREN, SIRET) of August - BOAMP API ---- To download the first quarter of the BSA with Python, run: For the CSV: df = pd.read_csv("https://www.data.gouv.fr/en/datasets/r/63f0d792-148a-4c95-a0b6-9e8ea8b0b34a", dtype='string', sep=';') For the Public Prosecutor's Office: df = pd.read_parquet("https://www.data.gouv.fr/en/datasets/r/f7a4a76e-ff50-4dc6-bae8-97368081add2") Enjoy! https://www.aufilduboamp.com/shares/aufilduboamp_docs/ap_tampon_blanc.jpg" alt="www.aufilduboamp.com and BOAMP data on datagouv" title="www.aufilduboamp.com and BOAMP data on datagouv">
Additional file 1: of Best-worst scaling improves measurement of first...
springernature.figshare.com
zip
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nichola Burton; Michael Burton; Dan Rigby; Clare Sutherland; Gillian Rhodes (2023). Additional file 1: of Best-worst scaling improves measurement of first impressions [Dataset]. http://doi.org/10.6084/m9.figshare.9894992.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.9894992.v1
Dataset updated
Jun 4, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Nichola Burton; Michael Burton; Dan Rigby; Clare Sutherland; Gillian Rhodes
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
R scripts used to generate the design and sort and score the data of Study 3, with annotation: intended as a template to build future BWS studies. (ZIP 15 kb)
Integration of Slurry Separation Technology & Refrigeration Units: Air...
catalog.data.gov
s.cnmilf.com
Updated Jun 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.usaid.gov (2024). Integration of Slurry Separation Technology & Refrigeration Units: Air Quality - CO [Dataset]. https://catalog.data.gov/dataset/integration-of-slurry-separation-technology-refrigeration-units-air-quality-co-b7d1e
Explore at:
Dataset updated
Jun 25, 2024
Dataset provided by
United States Agency for International Developmenthttp://usaid.gov/
Description
This is the carbon monoxide data. Each sheet (tab) is formatted to be exported as a .csv for use with the R-code (AQ-June20.R). In order for this code to work properly, it is important that this file remain intact. Do not change the column names or codes for data, for example. And to be safe, don’t even sort. Just in case. One simple change in the excel file could make the code full of bugs.
Data from: TDMentions: A Dataset of Technical Debt Mentions in Online Posts
zenodo.org
data.niaid.nih.gov
bin, bz2
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Morgan Ericsson; Morgan Ericsson; Anna Wingkvist; Anna Wingkvist (2020). TDMentions: A Dataset of Technical Debt Mentions in Online Posts [Dataset]. http://doi.org/10.5281/zenodo.2593142
Explore at:
bin, bz2Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.2593142
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Morgan Ericsson; Morgan Ericsson; Anna Wingkvist; Anna Wingkvist
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
# TDMentions: A Dataset of Technical Debt Mentions in Online Posts (version 1.0)

TDMentions is a dataset that contains mentions of technical debt from Reddit, Hacker News, and Stack Exchange. It also contains a list of blog posts on Medium that were tagged as technical debt. The dataset currently contains approximately 35,000 items.

## Data collection and processing

The dataset is mainly collected from existing datasets. We used data from:

- the archive of Reddit posts by Jason Baumgartner (available at [https://pushshift.io](https://pushshift.io),
- the archive of Hacker News available at Google's BigQuery (available at [https://console.cloud.google.com/marketplace/details/y-combinator/hacker-news](https://console.cloud.google.com/marketplace/details/y-combinator/hacker-news)), and the Stack Exchange data dump (available at [https://archive.org/details/stackexchange](https://archive.org/details/stackexchange)).
- the [GHTorrent](http://ghtorrent.org) project
- the [GH Archive](https://www.gharchive.org)

The data set currently contains data from the start of each source/service until 2018-12-31. For GitHub, we currently only include data from 2015-01-01.

We use the regular expression `tech(nical)?[\s\-_]*?debt` to find mentions in all sources except for Medium. We decided to limit our matches to variations of technical debt and tech debt. Other shorter forms, such as TD, can result in too many false positives. For Medium, we used the tag `technical-debt`.

## Data Format

The dataset is stored as a compressed (bzip2) JSON file with one JSON object per line. Each mention is represented as a JSON object with the following keys.

- `id`: the id used in the original source. We use the URL path to identify Medium posts.
- `body`: the text that contains the mention. This is either the comment or the title of the post. For Medium posts this is the title and subtitle (which might not mention technical debt, since posts are identified by the tag).
- `created_utc`: the time the item was posted in seconds since epoch in UTC.
- `author`: the author of the item. We use the username or userid from the source.
- `source`: where the item was posted. Valid sources are:
- HackerNews Comment
- HackerNews Job
- HackerNews Submission
- Reddit Comment
- Reddit Submission
- StackExchange Answer
- StackExchange Comment
- StackExchange Question
- Medium Post
- `meta`: Additional information about the item specific to the source. This includes, e.g., the subreddit a Reddit submission or comment was posted to, the score, etc. We try to use the same names, e.g., `score` and `num_comments` for keys that have the same meaning/information across multiple sources.

This is a sample item from Reddit:

```JSON
{
"id": "ab8auf",
"body": "Technical Debt Explained (x-post r/Eve)",
"created_utc": 1546271789,
"author": "totally_100_human",
"source": "Reddit Submission",
"meta": {
"title": "Technical Debt Explained (x-post r/Eve)",
"score": 1,
"num_comments": 0,
"url": "http://jestertrek.com/eve/technical-debt-2.png",
"subreddit": "RCBRedditBot"
}
}
```

## Sample Analyses

We decided to use JSON to store the data, since it is easy to work with from multiple programming languages. In the following examples, we use [`jq`](https://stedolan.github.io/jq/) to process the JSON.

### How many items are there for each source?

```
lbzip2 -cd postscomments.json.bz2 | jq '.source' | sort | uniq -c
```

### How many submissions that mentioned technical debt were posted each month?

```
lbzip2 -cd postscomments.json.bz2 | jq 'select(.source == "Reddit Submission") | .created_utc | strftime("%Y-%m")' | sort | uniq -c
```

### What are the titles of items that link (`meta.url`) to PDF documents?

```
lbzip2 -cd postscomments.json.bz2 | jq '. as $r | select(.meta.url?) | .meta.url | select(endswith(".pdf")) | $r.body'
```

### Please, I want CSV!

```
lbzip2 -cd postscomments.json.bz2 | jq -r '[.id, .body, .author] | @csv'
```

Note that you need to specify the keys you want to include for the CSV, so it is easier to either ignore the meta information or process each source.

Please see [https://github.com/sse-lnu/tdmentions](https://github.com/sse-lnu/tdmentions) for more analyses

# Limitations and Future updates

The current version of the dataset lacks GitHub data and Medium comments. GitHub data will be added in the next update. Medium comments (responses) will be added in a future update if we find a good way to represent these.
User Data
kaggle.com
Updated Jul 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ashish R. Soni (2023). User Data [Dataset]. https://www.kaggle.com/datasets/ashishrsoni/user-data/discussion?sort=undefined
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 25, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ashish R. Soni
Description
Dataset

This dataset was created by Ashish R. Soni

Contents
g
Integration of Slurry Separation Technology & Refrigeration Units: Air...
gimi9.com
Updated Jun 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Integration of Slurry Separation Technology & Refrigeration Units: Air Quality - PMVa | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_integration-of-slurry-separation-technology-refrigeration-units-air-quality-pmva-87359/
Explore at:
Dataset updated
Jun 25, 2024
Description
This is the gravimetric data used to calibrate the real time readings. Each sheet (tab) is formatted to be exported as a .csv for use with the R-code (AQ-June20.R). In order for this code to work properly, it is important that this file remain intact. Do not change the column names or codes for data, for example. And to be safe, don’t even sort. One simple change in the excel file could make the code full of bugs.
Integration of Slurry Separation Technology & Refrigeration Units: Air...
catalog.data.gov
Updated Jun 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.usaid.gov (2024). Integration of Slurry Separation Technology & Refrigeration Units: Air Quality - SO2 [Dataset]. https://catalog.data.gov/dataset/integration-of-slurry-separation-technology-refrigeration-units-air-quality-so2-a3122
Explore at:
Dataset updated
Jun 25, 2024
Dataset provided by
United States Agency for International Developmenthttp://usaid.gov/
Description
This is the raw SO2 data. Each sheet (tab) is formatted to be exported as a .csv for use with the R-code (AQ-June20.R). In order for this code to work properly, it is important that this file remain intact. Do not change the column names or codes for data, for example. And to be safe, don’t even sort. One simple change in the excel file could make the code full of bugs.
d
Integration of Slurry Separation Technology & Refrigeration Units: Air...
datasets.ai
catalog.data.gov
23, 40, 55, 8
Updated Sep 13, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
US Agency for International Development (2024). Integration of Slurry Separation Technology & Refrigeration Units: Air Quality - H2S [Dataset]. https://datasets.ai/datasets/integration-of-slurry-separation-technology-refrigeration-units-air-quality-h2s-4af17
Explore at:
23, 40, 8, 55Available download formats
Dataset updated
Sep 13, 2024
Dataset authored and provided by
US Agency for International Development
Description
This is the raw H2S data- concentration of H2S in parts per million in the biogas. Each sheet (tab) is formatted to be exported as a .csv for use with the R-code (AQ-June20.R). In order for this code to work properly, it is important that this file remain intact. Do not change the column names or codes for data, for example. And to be safe, don’t even sort. One simple change in the excel file could make the code full of bugs.
Integration of Slurry Separation Technology & Refrigeration Units: Air...
catalog.data.gov
Updated Jun 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.usaid.gov (2024). Integration of Slurry Separation Technology & Refrigeration Units: Air Quality - CH4 [Dataset]. https://catalog.data.gov/dataset/integration-of-slurry-separation-technology-refrigeration-units-air-quality-ch4-8abb6
Explore at:
Dataset updated
Jun 25, 2024
Dataset provided by
United States Agency for International Developmenthttp://usaid.gov/
Description
Methane concentration of biogas. Each sheet (tab) is formatted to be exported as a .csv for use with the R-code (AQ-June20.R). In order for this code to work properly, it is important that this file remain intact. Do not change the column names or codes for data, for example. And to be safe, don’t even sort. Just in case. One simple change in the excel file could make the code full of bugs.
Brisbane Library Checkout Data
zenodo.org
data.niaid.nih.gov
application/gzip, bin
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicholas Tierney; Nicholas Tierney (2020). Brisbane Library Checkout Data [Dataset]. http://doi.org/10.5281/zenodo.2437860
Explore at:
bin, application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.2437860
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Nicholas Tierney; Nicholas Tierney
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Brisbane
Description
This has been copied from the README.md file

bris-lib-checkout

This provides tidied up data from the Brisbane library checkouts

Retrieving and cleaning the data

The script for retrieving and cleaning the data is made available in scrape-library.R.

The data

The data/ folder contains the tidy data

The data-raw/ folder contains the raw data

data/

This contains four tidied up dataframes:

tidy-brisbane-library-checkout.csv

metadata_branch.csv

metadata_heading.csv

metadata_item_type.csv

tidy-brisbane-library-checkout.csv contains the following columns, with the metadata file metadata_heading containing the description of these columns.

knitr::kable(readr::read_csv("data/metadata_heading.csv"))
#> Parsed with column specification:
#> cols(
#> heading = col_character(),
#> heading_explanation = col_character()
#> )

heading

heading_explanation

Title

Title of Item

Author

Author of Item

Call Number

Call Number of Item

Item id

Unique Item Identifier

Item Type

Type of Item (see next column)

Status

Current Status of Item

Language

Published language of item (if not English)

Age

Suggested audience

Checkout Library

Checkout branch

Date

Checkout date

We also added year, month, and day columns.

The remaining data are all metadata files that contain meta information on the columns in the checkout data:

library(tidyverse)
#> ── Attaching packages ────────────── tidyverse 1.2.1 ──
#> ✔ ggplot2 3.1.0 ✔ purrr 0.2.5
#> ✔ tibble 1.4.99.9006 ✔ dplyr 0.7.8
#> ✔ tidyr 0.8.2 ✔ stringr 1.3.1
#> ✔ readr 1.3.0 ✔ forcats 0.3.0
#> ── Conflicts ───────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag() masks stats::lag()
knitr::kable(readr::read_csv("data/metadata_branch.csv"))
#> Parsed with column specification:
#> cols(
#> branch_code = col_character(),
#> branch_heading = col_character()
#> )

branch_code

branch_heading

ANN

Annerley

ASH

Ashgrove

BNO

Banyo

BRR

BrackenRidge

BSQ

Brisbane Square Library

BUL

Bulimba

CDA

Corinda

CDE

Chermside

CNL

Carindale

CPL

Coopers Plains

CRA

Carina

EPK

Everton Park

FAI

Fairfield

GCY

Garden City

GNG

Grange

HAM

Hamilton

HPK

Holland Park

INA

Inala

IPY

Indooroopilly

MBG

Mt. Coot-tha

MIT

Mitchelton

MTG

Mt. Gravatt

MTO

Mt. Ommaney

NDH

Nundah

NFM

New Farm

SBK

Sunnybank Hills

SCR

Stones Corner

SGT

Sandgate

VAN

Mobile Library

TWG

Toowong

WND

West End

WYN

Wynnum

ZIL

Zillmere

knitr::kable(readr::read_csv("data/metadata_item_type.csv"))
#> Parsed with column specification:
#> cols(
#> item_type_code = col_character(),
#> item_type_explanation = col_character()
#> )

item_type_code

item_type_explanation

AD-FICTION

Adult Fiction

AD-MAGS

Adult Magazines

AD-PBK

Adult Paperback

BIOGRAPHY

Biography

BSQCDMUSIC

Brisbane Square CD Music

BSQCD-ROM

Brisbane Square CD Rom

BSQ-DVD

Brisbane Square DVD

CD-BOOK

Compact Disc Book

CD-MUSIC

Compact Disc Music

CD-ROM

CD Rom

DVD

DVD

DVD_R18+

DVD Restricted - 18+

FASTBACK

Fastback

GAYLESBIAN

Gay and Lesbian Collection

GRAPHICNOV

Graphic Novel

ILL

InterLibrary Loan

JU-FICTION

Junior Fiction

JU-MAGS

Junior Magazines

JU-PBK

Junior Paperback

KITS

Kits

LARGEPRINT

Large Print

LGPRINTMAG

Large Print Magazine

LITERACY

Literacy

LITERACYAV

Literacy Audio Visual

LOCSTUDIES

Local Studies

LOTE-BIO

Languages Other than English Biography

LOTE-BOOK

Languages Other than English Book

LOTE-CDMUS

Languages Other than English CD Music

LOTE-DVD

Languages Other than English DVD

LOTE-MAG

Languages Other than English Magazine

LOTE-TB

Languages Other than English Taped Book

MBG-DVD

Mt Coot-tha Botanical Gardens DVD

MBG-MAG

Mt Coot-tha Botanical Gardens Magazine

MBG-NF

Mt Coot-tha Botanical Gardens Non Fiction

MP3-BOOK

MP3 Audio Book

NONFIC-SET

Non Fiction Set

NONFICTION

Non Fiction

PICTURE-BK

Picture Book

PICTURE-NF

Picture Book Non Fiction

PLD-BOOK

Public Libraries Division Book

YA-FICTION

Young Adult Fiction

YA-MAGS

Young Adult Magazine

YA-PBK

Young Adult Paperback

Example usage

Let’s explore the data

bris_libs <- readr::read_csv("data/bris-lib-checkout.csv")
#> Parsed with column specification:
#> cols(
#> title = col_character(),
#> author = col_character(),
#> call_number = col_character(),
#> item_id = col_double(),
#> item_type = col_character(),
#> status = col_character(),
#> language = col_character(),
#> age = col_character(),
#> library = col_character(),
#> date = col_double(),
#> datetime = col_datetime(format = ""),
#> year = col_double(),
#> month = col_double(),
#> day = col_character()
#> )
#> Warning: 20 parsing failures.
#> row col expected actual file
#> 587795 item_id a double REFRESH 'data/bris-lib-checkout.csv'
#> 590579 item_id a double REFRESH 'data/bris-lib-checkout.csv'
#> 590597 item_id a double REFRESH 'data/bris-lib-checkout.csv'
#> 595774 item_id a double REFRESH 'data/bris-lib-checkout.csv'
#> 597567 item_id a double REFRESH 'data/bris-lib-checkout.csv'
#> ...... ....... ........ ....... ............................
#> See problems(...) for more details.

We can count the number of titles, item types, suggested age, and the library given:

library(dplyr)
count(bris_libs, title, sort = TRUE)
#> # A tibble: 121,046 x 2
#> title n
#>

License

This data is provided under a CC BY 4.0 license

It has been downloaded from Brisbane library checkouts, and tidied up using the code in data-raw.
e
Servizz sempliċi ta’ tniżżil (Atom) tas-sett ta’ data: Sort Agen — żoni...
data.europa.eu
unknown
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Servizz sempliċi ta’ tniżżil (Atom) tas-sett ta’ data: Sort Agen — żoni tal-qasam tal-veloċità [Dataset]. https://data.europa.eu/data/datasets/fr-120066022-srv-50e8b8e8-fae2-442f-86bc-46f0618f8b58?locale=mt
Explore at:
unknownAvailable download formats
Description
It-TRI d’Agen jinkludi 20 muniċipalità mifruxa fuq il-baċir ta’ Garonne magħruf bħala Agenaise. Tabella taż-żoni tal-ispazju tal-veloċità (żoni li għalihom hija disponibbli stima tal-veloċità għall-għargħar ta’ ċertu tip f’ċertu xenarju). Sett ta’ dejta ġeografika prodott mid-Direttiva GIS dwar l-GIS dwar l-Għargħar tar-Riskju Għoli (TRI) tal-Agen u mmappjat għal skopijiet ta’ rappurtar għad-Direttiva Ewropea dwar l-Għargħar. Id-Direttiva Ewropea 2007/60/KE tat-23 ta’ Ottubru 2007 dwar il-valutazzjoni u l-immaniġġjar tar-riskji tal-għargħar (ĠU L 288, 06–11–2007, p. 27) tinfluwenza l-istrateġija għall-prevenzjoni tal-għargħar fl-Ewropa. Dan jirrikjedi l-produzzjoni ta’ pjanijiet ta’ ġestjoni tar-riskju ta’ għargħar biex jitnaqqsu l-konsegwenzi negattivi tal-għargħar fuq is-saħħa tal-bniedem, l-ambjent, il-wirt kulturali u l-attività ekonomika. L-għanijiet u r-rekwiżiti ta’ implimentazzjoni huma stabbiliti fil-Liġi tat-12 ta’ Lulju 2010 dwar l-Impenn Nazzjonali għall-Ambjent (LENE) u d-Digriet tat-2 ta’ Marzu 2011. F’dan il-kuntest, l-għan primarju tal-immappjar tar-riskju tal-għargħar u tal-għargħar għall-IRRs huwa li jikkontribwixxi, bl-omoġenizzazzjoni u l-oġġettività tal-għarfien dwar l-esponiment għall-għargħar, għall-iżvilupp ta’ pjanijiet ta’ ġestjoni tar-riskju tal-għargħar (WRMS). Dan is-sett ta’ dejta jintuża biex jiġu prodotti mapep tal-wiċċ tal-għargħar u mapep tar-riskju ta’ għargħar li jirrappreżentaw perikli u kwistjonijiet ta’ għargħar fuq skala xierqa, rispettivament.L-għan tagħhom huwa li jipprovdu evidenza kwantitattiva biex jivvalutaw aktar il-vulnerabbiltà ta’ territorju għat-tliet livelli ta’ probabbiltà ta’ għargħar (għoli, medju, baxx). Tabella taż-żoni tal-ispazju tal-veloċità (żoni li għalihom hija disponibbli stima tal-veloċità għall-għargħar ta’ ċertu tip f’ċertu xenarju). Sett ta’ dejta ġeografika prodott mid-Direttiva GIS dwar l-GIS dwar l-Għargħar tar-Riskju Għoli (TRI) tal-Agen u mmappjat għal skopijiet ta’ rappurtar għad-Direttiva Ewropea dwar l-Għargħar. Id-Direttiva Ewropea 2007/60/KE tat-23 ta’ Ottubru 2007 dwar il-valutazzjoni u l-immaniġġjar tar-riskji tal-għargħar (ĠU L 288, 06–11–2007, p. 27) tinfluwenza l-istrateġija għall-prevenzjoni tal-għargħar fl-Ewropa. Dan jirrikjedi l-produzzjoni ta’ pjanijiet ta’ ġestjoni tar-riskju ta’ għargħar biex jitnaqqsu l-konsegwenzi negattivi tal-għargħar fuq is-saħħa tal-bniedem, l-ambjent, il-wirt kulturali u l-attività ekonomika. L-għanijiet u r-rekwiżiti ta’ implimentazzjoni huma stabbiliti fil-Liġi tat-12 ta’ Lulju 2010 dwar l-Impenn Nazzjonali għall-Ambjent (LENE) u d-Digriet tat-2 ta’ Marzu 2011. F’dan il-kuntest, l-għan primarju tal-immappjar tar-riskju tal-għargħar u tal-għargħar għall-IRRs huwa li jikkontribwixxi, bl-omoġenizzazzjoni u l-oġġettività tal-għarfien dwar l-esponiment għall-għargħar, għall-iżvilupp ta’ pjanijiet ta’ ġestjoni tar-riskju tal-għargħar (WRMS). Dan is-sett ta’ dejta jintuża biex jiġu prodotti mapep tal-wiċċ tal-għargħar u mapep tar-riskju ta’ għargħar li jirrappreżentaw perikli u kwistjonijiet ta’ għargħar fuq skala xierqa, rispettivament.L-għan tagħhom huwa li jipprovdu evidenza kwantitattiva biex jivvalutaw aktar il-vulnerabbiltà ta’ territorju għat-tliet livelli ta’ probabbiltà ta’ għargħar (għoli, medju, baxx). Tabella taż-żoni tal-ispazju tal-veloċità (żoni li għalihom hija disponibbli stima tal-veloċità għall-għargħar ta’ ċertu tip f’ċertu xenarju). Sett ta’ dejta ġeografika prodott mid-Direttiva GIS dwar l-GIS dwar l-Għargħar tar-Riskju Għoli (TRI) tal-Agen u mmappjat għal skopijiet ta’ rappurtar għad-Direttiva Ewropea dwar l-Għargħar. Id-Direttiva Ewropea 2007/60/KE tat-23 ta’ Ottubru 2007 dwar il-valutazzjoni u l-immaniġġjar tar-riskji tal-għargħar (ĠU L 288, 06–11–2007, p. 27) tinfluwenza l-istrateġija għall-prevenzjoni tal-għargħar fl-Ewropa. Dan jirrikjedi l-produzzjoni ta’ pjanijiet ta’ ġestjoni tar-riskju ta’ għargħar biex jitnaqqsu l-konsegwenzi negattivi tal-għargħar fuq is-saħħa tal-bniedem, l-ambjent, il-wirt kulturali u l-attività ekonomika. L-għanijiet u r-rekwiżiti ta’ implimentazzjoni huma stabbiliti fil-Liġi tat-12 ta’ Lulju 2010 dwar l-Impenn Nazzjonali għall-Ambjent (LENE) u d-Digriet tat-2 ta’ Marzu 2011. F’dan il-kuntest, l-għan primarju tal-immappjar tar-riskju tal-għargħar u tal-għargħar għall-IRRs huwa li jikkontribwixxi, bl-omoġenizzazzjoni u l-oġġettività tal-għarfien dwar l-esponiment għall-għargħar, għall-iżvilupp ta’ pjanijiet ta’ ġestjoni tar-riskju tal-għargħar (WRMS). Dan is-sett ta’ dejta jintuża biex jiġu prodotti mapep tal-wiċċ tal-għargħar u mapep tar-riskju ta’ għargħar li jirrappreżentaw perikli u kwistjonijiet ta’ għargħar fuq skala xierqa, rispettivament.L-għan tagħhom huwa li jipprovdu evidenza kwantitattiva biex jivvalutaw aktar il-vulnerabbiltà ta’ territorju għat-tliet livelli ta’ probabbiltà ta’ għargħar (għoli, medju, baxx).

Tabella taż-żoni tal-ispazju tal-veloċità (żoni li għalihom hija disponibbli stima tal-veloċità għall-għargħar ta’ ċertu
Naturalistic Neuroimaging Database
openneuro.org
Updated Apr 20, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sarah Aliko; Jiawen Huang; Florin Gheorghiu; Stefanie Meliss; Jeremy I Skipper (2021). Naturalistic Neuroimaging Database [Dataset]. http://doi.org/10.18112/openneuro.ds002837.v2.0.0
Explore at:
Unique identifier
https://doi.org/10.18112/openneuro.ds002837.v2.0.0
Dataset updated
Apr 20, 2021
Dataset provided by
OpenNeurohttps://openneuro.org/
Authors
Sarah Aliko; Jiawen Huang; Florin Gheorghiu; Stefanie Meliss; Jeremy I Skipper
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Overview

The Naturalistic Neuroimaging Database (NNDb v2.0) contains datasets from 86 human participants doing the NIH Toolbox and then watching one of 10 full-length movies during functional magnetic resonance imaging (fMRI).The participants were all right-handed, native English speakers, with no history of neurological/psychiatric illnesses, with no hearing impairments, unimpaired or corrected vision and taking no medication. Each movie was stopped in 40-50 minute intervals or when participants asked for a break, resulting in 2-6 runs of BOLD-fMRI. A 10 minute high-resolution defaced T1-weighted anatomical MRI scan (MPRAGE) is also provided.

The NNDb V2.0 is now on Neuroscout, a platform for fast and flexible re-analysis of (naturalistic) fMRI studies. See: https://neuroscout.org/

v2.0 Changes

Overview

We have replaced our own preprocessing pipeline with that implemented in AFNI’s afni_proc.py, thus changing only the derivative files. This introduces a fix for an issue with our normalization (i.e., scaling) step and modernizes and standardizes the preprocessing applied to the NNDb derivative files. We have done a bit of testing and have found that results in both pipelines are quite similar in terms of the resulting spatial patterns of activity but with the benefit that the afni_proc.py results are 'cleaner' and statistically more robust.

Normalization

Emily Finn and Clare Grall at Dartmouth and Rick Reynolds and Paul Taylor at AFNI, discovered and showed us that the normalization procedure we used for the derivative files was less than ideal for timeseries runs of varying lengths. Specifically, the 3dDetrend flag -normalize makes 'the sum-of-squares equal to 1'. We had not thought through that an implication of this is that the resulting normalized timeseries amplitudes will be affected by run length, increasing as run length decreases (and maybe this should go in 3dDetrend’s help text). To demonstrate this, I wrote a version of 3dDetrend’s -normalize for R so you can see for yourselves by running the following code:

# Generate a resting state (rs) timeseries (ts) # Install / load package to make fake fMRI ts # install.packages("neuRosim") library(neuRosim) # Generate a ts ts.rs <- simTSrestingstate(nscan=2000, TR=1, SNR=1) # 3dDetrend -normalize # R command version for 3dDetrend -normalize -polort 0 which normalizes by making "the sum-of-squares equal to 1" # Do for the full timeseries ts.normalised.long <- (ts.rs-mean(ts.rs))/sqrt(sum((ts.rs-mean(ts.rs))^2)); # Do this again for a shorter version of the same timeseries ts.shorter.length <- length(ts.normalised.long)/4 ts.normalised.short <- (ts.rs[1:ts.shorter.length]- mean(ts.rs[1:ts.shorter.length]))/sqrt(sum((ts.rs[1:ts.shorter.length]- mean(ts.rs[1:ts.shorter.length]))^2)); # By looking at the summaries, it can be seen that the median values become larger summary(ts.normalised.long) summary(ts.normalised.short) # Plot results for the long and short ts # Truncate the longer ts for plotting only ts.normalised.long.made.shorter <- ts.normalised.long[1:ts.shorter.length] # Give the plot a title title <- "3dDetrend -normalize for long (blue) and short (red) timeseries"; plot(x=0, y=0, main=title, xlab="", ylab="", xaxs='i', xlim=c(1,length(ts.normalised.short)), ylim=c(min(ts.normalised.short),max(ts.normalised.short))); # Add zero line lines(x=c(-1,ts.shorter.length), y=rep(0,2), col='grey'); # 3dDetrend -normalize -polort 0 for long timeseries lines(ts.normalised.long.made.shorter, col='blue'); # 3dDetrend -normalize -polort 0 for short timeseries lines(ts.normalised.short, col='red');

Standardization/modernization

The above individuals also encouraged us to implement the afni_proc.py script over our own pipeline. It introduces at least three additional improvements: First, we now use Bob’s @SSwarper to align our anatomical files with an MNI template (now MNI152_2009_template_SSW.nii.gz) and this, in turn, integrates nicely into the afni_proc.py pipeline. This seems to result in a generally better or more consistent alignment, though this is only a qualitative observation. Second, all the transformations / interpolations and detrending are now done in fewers steps compared to our pipeline. This is preferable because, e.g., there is less chance of inadvertently reintroducing noise back into the timeseries (see Lindquist, Geuter, Wager, & Caffo 2019). Finally, many groups are advocating using tools like fMRIPrep or afni_proc.py to increase standardization of analyses practices in our neuroimaging community. This presumably results in less error, less heterogeneity and more interpretability of results across studies. Along these lines, the quality control (‘QC’) html pages generated by afni_proc.py are a real help in assessing data quality and almost a joy to use.

New afni_proc.py command line

The following is the afni_proc.py command line that we used to generate blurred and censored timeseries files. The afni_proc.py tool comes with extensive help and examples. As such, you can quickly understand our preprocessing decisions by scrutinising the below. Specifically, the following command is most similar to Example 11 for ‘Resting state analysis’ in the help file (see https://afni.nimh.nih.gov/pub/dist/doc/program_help/afni_proc.py.html): afni_proc.py \ -subj_id "$sub_id_name_1" \ -blocks despike tshift align tlrc volreg mask blur scale regress \ -radial_correlate_blocks tcat volreg \ -copy_anat anatomical_warped/anatSS.1.nii.gz \ -anat_has_skull no \ -anat_follower anat_w_skull anat anatomical_warped/anatU.1.nii.gz \ -anat_follower_ROI aaseg anat freesurfer/SUMA/aparc.a2009s+aseg.nii.gz \ -anat_follower_ROI aeseg epi freesurfer/SUMA/aparc.a2009s+aseg.nii.gz \ -anat_follower_ROI fsvent epi freesurfer/SUMA/fs_ap_latvent.nii.gz \ -anat_follower_ROI fswm epi freesurfer/SUMA/fs_ap_wm.nii.gz \ -anat_follower_ROI fsgm epi freesurfer/SUMA/fs_ap_gm.nii.gz \ -anat_follower_erode fsvent fswm \ -dsets media_?.nii.gz \ -tcat_remove_first_trs 8 \ -tshift_opts_ts -tpattern alt+z2 \ -align_opts_aea -cost lpc+ZZ -giant_move -check_flip \ -tlrc_base "$basedset" \ -tlrc_NL_warp \ -tlrc_NL_warped_dsets \ anatomical_warped/anatQQ.1.nii.gz \ anatomical_warped/anatQQ.1.aff12.1D \ anatomical_warped/anatQQ.1_WARP.nii.gz \ -volreg_align_to MIN_OUTLIER \ -volreg_post_vr_allin yes \ -volreg_pvra_base_index MIN_OUTLIER \ -volreg_align_e2a \ -volreg_tlrc_warp \ -mask_opts_automask -clfrac 0.10 \ -mask_epi_anat yes \ -blur_to_fwhm -blur_size $blur \ -regress_motion_per_run \ -regress_ROI_PC fsvent 3 \ -regress_ROI_PC_per_run fsvent \ -regress_make_corr_vols aeseg fsvent \ -regress_anaticor_fast \ -regress_anaticor_label fswm \ -regress_censor_motion 0.3 \ -regress_censor_outliers 0.1 \ -regress_apply_mot_types demean deriv \ -regress_est_blur_epits \ -regress_est_blur_errts \ -regress_run_clustsim no \ -regress_polort 2 \ -regress_bandpass 0.01 1 \ -html_review_style pythonic We used similar command lines to generate ‘blurred and not censored’ and the ‘not blurred and not censored’ timeseries files (described more fully below). We will provide the code used to make all derivative files available on our github site (https://github.com/lab-lab/nndb).

We made one choice above that is different enough from our original pipeline that it is worth mentioning here. Specifically, we have quite long runs, with the average being ~40 minutes but this number can be variable (thus leading to the above issue with 3dDetrend’s -normalise). A discussion on the AFNI message board with one of our team (starting here, https://afni.nimh.nih.gov/afni/community/board/read.php?1,165243,165256#msg-165256), led to the suggestion that '-regress_polort 2' with '-regress_bandpass 0.01 1' be used for long runs. We had previously used only a variable polort with the suggested 1 + int(D/150) approach. Our new polort 2 + bandpass approach has the added benefit of working well with afni_proc.py.

Which timeseries file you use is up to you but I have been encouraged by Rick and Paul to include a sort of PSA about this. In Paul’s own words: * Blurred data should not be used for ROI-based analyses (and potentially not for ICA? I am not certain about standard practice). * Unblurred data for ISC might be pretty noisy for voxelwise analyses, since blurring should effectively boost the SNR of active regions (and even good alignment won't be perfect everywhere). * For uncensored data, one should be concerned about motion effects being left in the data (e.g., spikes in the data). * For censored data: * Performing ISC requires the users to unionize the censoring patterns during the correlation calculation. * If wanting to calculate power spectra or spectral parameters like ALFF/fALFF/RSFA etc. (which some people might do for naturalistic tasks still), then standard FT-based methods can't be used because sampling is no longer uniform. Instead, people could use something like 3dLombScargle+3dAmpToRSFC, which calculates power spectra (and RSFC params) based on a generalization of the FT that can handle non-uniform sampling, as long as the censoring pattern is mostly random and, say, only up to about 10-15% of the data. In sum, think very carefully about which files you use. If you find you need a file we have not provided, we can happily generate different versions of the timeseries upon request and can generally do so in a week or less.

Effect on results

From numerous tests on our own analyses, we have qualitatively found that results using our old vs the new afni_proc.py preprocessing pipeline do not change all that much in terms of general spatial patterns. There is, however, an

Facebook

Twitter

Click to copy link

Link copied

Cite

Chicago Police Department (2025). sort [Dataset]. https://data.cityofchicago.org/Public-Safety/sort/bnsx-zzcw

sort

Explore at:

xml, tsv, csv, json, application/rdfxml, application/rssxmlAvailable download formats

Dataset updated

Jul 13, 2025

Authors

Chicago Police Department

Description

This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days. Data is extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system. In order to protect the privacy of crime victims, addresses are shown at the block level only and specific locations are not identified. Should you have questions about this dataset, you may contact the Research & Development Division of the Chicago Police Department at 312.745.6071 or RandD@chicagopolice.org. Disclaimer: These crimes may be based upon preliminary information supplied to the Police Department by the reporting parties that have not been verified. The preliminary crime classifications may be changed at a later date based upon additional investigation and there is always the possibility of mechanical or human error. Therefore, the Chicago Police Department does not guarantee (either expressed or implied) the accuracy, completeness, timeliness, or correct sequencing of the information and the information should not be used for comparison purposes over time. The Chicago Police Department will not be responsible for any error or omission, or for the use of, or the results obtained from the use of this information. All data visualizations on maps should be considered approximate and attempts to derive specific addresses are strictly prohibited. The Chicago Police Department is not responsible for the content of any off-site pages that are referenced by or that reference this web page other than an official City of Chicago or Chicago Police Department web page. The user specifically acknowledges that the Chicago Police Department is not responsible for any defamatory, offensive, misleading, or illegal conduct of other users, links, or third parties and that the risk of injury from the foregoing rests entirely with the user. The unauthorized use of the words "Chicago Police Department," "Chicago Police," or any colorable imitation of these words or the unauthorized use of the Chicago Police Department logo is unlawful. This web page does not, in any way, authorize such use. Data is updated daily Tuesday through Sunday. The dataset contains more than 65,000 records/rows of data and cannot be viewed in full in Microsoft Excel. Therefore, when downloading the file, select CSV from the Export menu. Open the file in an ASCII text editor, such as Wordpad, to view and search. To access a list of Chicago Police Department - Illinois Uniform Crime Reporting (IUCR) codes, go to http://data.cityofchicago.org/Public-Safety/Chicago-Police-Department-Illinois-Uniform-Crime-R/c7ck-438e

Clear search

Close search

Google apps

Main menu

sort

Replication Data for \"Why Partisans Don't Sort: The Constraints on Partisan...

Data from: Pueblo Grande (AZ U:9:1(ASM)): Unit 15, Washington and 48th...

Case Study: Cyclist

Phase 1: ASK

Key Objectives:

Phase 2: PREPARE:

Key Objectives:

Phase 3: PROCESS

Key Objectives:

ACNC 2019 Annual Information Statement Data

This dataset is updated weekly. Please ensure that you use the most up-to-date version.###\r

Sediment macrofauna count data and images of multicores collected during R/V...

Explore data formats and ingestion methods

Why this Dataset

Iris Dataset

Content

Acknowledgements

Inspiration

Water Data Online

First quarter 2024 / Table BOAMP-SIREN-BUYERS (BSA): a cross between the...

Additional file 1: of Best-worst scaling improves measurement of first...

Integration of Slurry Separation Technology & Refrigeration Units: Air...

Data from: TDMentions: A Dataset of Technical Debt Mentions in Online Posts

User Data

Dataset

Contents

Integration of Slurry Separation Technology & Refrigeration Units: Air...

Integration of Slurry Separation Technology & Refrigeration Units: Air...

Integration of Slurry Separation Technology & Refrigeration Units: Air...

Integration of Slurry Separation Technology & Refrigeration Units: Air...

Brisbane Library Checkout Data

Servizz sempliċi ta’ tniżżil (Atom) tas-sett ta’ data: Sort Agen — żoni...

Naturalistic Neuroimaging Database

Overview

v2.0 Changes

sortSee More Versions

sort