33 datasets found
  1. C

    sort

    • data.cityofchicago.org
    application/rdfxml +5
    Updated Jul 13, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chicago Police Department (2025). sort [Dataset]. https://data.cityofchicago.org/Public-Safety/sort/bnsx-zzcw
    Explore at:
    xml, tsv, csv, json, application/rdfxml, application/rssxmlAvailable download formats
    Dataset updated
    Jul 13, 2025
    Authors
    Chicago Police Department
    Description

    This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days. Data is extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system. In order to protect the privacy of crime victims, addresses are shown at the block level only and specific locations are not identified. Should you have questions about this dataset, you may contact the Research & Development Division of the Chicago Police Department at 312.745.6071 or RandD@chicagopolice.org. Disclaimer: These crimes may be based upon preliminary information supplied to the Police Department by the reporting parties that have not been verified. The preliminary crime classifications may be changed at a later date based upon additional investigation and there is always the possibility of mechanical or human error. Therefore, the Chicago Police Department does not guarantee (either expressed or implied) the accuracy, completeness, timeliness, or correct sequencing of the information and the information should not be used for comparison purposes over time. The Chicago Police Department will not be responsible for any error or omission, or for the use of, or the results obtained from the use of this information. All data visualizations on maps should be considered approximate and attempts to derive specific addresses are strictly prohibited. The Chicago Police Department is not responsible for the content of any off-site pages that are referenced by or that reference this web page other than an official City of Chicago or Chicago Police Department web page. The user specifically acknowledges that the Chicago Police Department is not responsible for any defamatory, offensive, misleading, or illegal conduct of other users, links, or third parties and that the risk of injury from the foregoing rests entirely with the user. The unauthorized use of the words "Chicago Police Department," "Chicago Police," or any colorable imitation of these words or the unauthorized use of the Chicago Police Department logo is unlawful. This web page does not, in any way, authorize such use. Data is updated daily Tuesday through Sunday. The dataset contains more than 65,000 records/rows of data and cannot be viewed in full in Microsoft Excel. Therefore, when downloading the file, select CSV from the Export menu. Open the file in an ASCII text editor, such as Wordpad, to view and search. To access a list of Chicago Police Department - Illinois Uniform Crime Reporting (IUCR) codes, go to http://data.cityofchicago.org/Public-Safety/Chicago-Police-Department-Illinois-Uniform-Crime-R/c7ck-438e

  2. d

    Replication Data for \"Why Partisans Don't Sort: The Constraints on Partisan...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nall, Clayton; Mummolo, Jonathan (2023). Replication Data for \"Why Partisans Don't Sort: The Constraints on Partisan Segregation\" [Dataset]. http://doi.org/10.7910/DVN/EDGRDC
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Nall, Clayton; Mummolo, Jonathan
    Description

    Contains data and R scripts for the JOP article, "Why Partisans Don't Sort: The Constraints on Political Segregation." When downloading tabular data files, ensure that they appear in your working directory in CSV format.

  3. d

    Data from: Pueblo Grande (AZ U:9:1(ASM)): Unit 15, Washington and 48th...

    • search.dataone.org
    Updated Jul 8, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abbott, David R.; Martin, Maria (2014). Pueblo Grande (AZ U:9:1(ASM)): Unit 15, Washington and 48th Streets: SSI Kitchell Data Recovery, Ceramic Rough Sort (RS_CERAMICS) Data [Dataset]. http://doi.org/10.6067/XCV8BG2MZH
    Explore at:
    Dataset updated
    Jul 8, 2014
    Dataset provided by
    the Digital Archaeological Record
    Authors
    Abbott, David R.; Martin, Maria
    Area covered
    Description

    The Kitchell Data Recovery project Ceramic Rough Sort (RS_CERAMICS) Data sheet contains data from the rough sort analysis of ceramics recovered during the Kitchell data recovery project. It contains information on ceramic types, tempers and counts; it also records vessel and rim forms where applicable. The data sheet also contains rim circumference and rim diameter measurements for some ceramic specimens.

    See Partial Data Recovery and Burial Removal at Pueblo Grande (AZ U:9:1 (ASM)): Unit 15, The Former Maricopa County Sheriff's Substation, Washington and 48th Streets, Phoenix, Arizona (SSI Technical Report No. 02-43) for the final report on the Kitchell Data Recovery project.

  4. Case Study: Cyclist

    • kaggle.com
    Updated Jul 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PatrickRCampbell (2021). Case Study: Cyclist [Dataset]. https://www.kaggle.com/patrickrcampbell/case-study-cyclist/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 27, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    PatrickRCampbell
    Description

    Phase 1: ASK

    Key Objectives:

    1. Business Task * Cyclist is looking to increase their earnings, and wants to know if creating a social media campaign can influence "Casual" users to become "Annual" members.

    2. Key Stakeholders: * The main stakeholder from Cyclist is Lily Moreno, whom is the Director of Marketing and responsible for the development of campaigns and initiatives to promote their bike-share program. The other teams involved with this project will be Marketing & Analytics, and the Executive Team.

    3. Business Task: * Comparing the two kinds of users and defining how they use the platform, what variables they have in common, what variables are different, and how can they get Casual users to become Annual members

    Phase 2: PREPARE:

    Key Objectives:

    1. Determine Data Credibility * Cyclist provided data from years 2013-2021 (through March 2021), all of which is first-hand data collected by the company.

    2. Sort & Filter Data: * The stakeholders want to know how the current users are using their service, so I am focusing on using the data from 2020-2021 since this is the most relevant period of time to answer the business task.

    #Installing packages
    install.packages("tidyverse", repos = "http://cran.us.r-project.org")
    install.packages("readr", repos = "http://cran.us.r-project.org")
    install.packages("janitor", repos = "http://cran.us.r-project.org")
    install.packages("geosphere", repos = "http://cran.us.r-project.org")
    install.packages("gridExtra", repos = "http://cran.us.r-project.org")
    
    library(tidyverse)
    library(readr)
    library(janitor)
    library(geosphere)
    library(gridExtra)
    
    #Importing data & verifying the information within the dataset
    all_tripdata_clean <- read.csv("/Data Projects/cyclist/cyclist_data_cleaned.csv")
    
    glimpse(all_tripdata_clean)
    
    summary(all_tripdata_clean)
    
    

    Phase 3: PROCESS

    Key Objectives:

    1. Cleaning Data & Preparing for Analysis: * Once the data has been placed into one dataset, and checked for errors, we began cleaning the data. * Eliminating data that correlates to the company servicing the bikes, and any ride with a traveled distance of zero. * New columns will be added to assist in the analysis, and to provide accurate assessments of whom is using the bikes.

    #Eliminating any data that represents the company performing maintenance, and trips without any measureable distance
    all_tripdata_clean <- all_tripdata_clean[!(all_tripdata_clean$start_station_name == "HQ QR" | all_tripdata_clean$ride_length<0),] 
    
    #Creating columns for the individual date components (days_of_week should be run last)
    all_tripdata_clean$day_of_week <- format(as.Date(all_tripdata_clean$date), "%A")
    all_tripdata_clean$date <- as.Date(all_tripdata_clean$started_at)
    all_tripdata_clean$day <- format(as.Date(all_tripdata_clean$date), "%d")
    all_tripdata_clean$month <- format(as.Date(all_tripdata_clean$date), "%m")
    all_tripdata_clean$year <- format(as.Date(all_tripdata_clean$date), "%Y")
    
    

    ** Now I will begin calculating the length of rides being taken, distance traveled, and the mean amount of time & distance.**

    #Calculating the ride length in miles & minutes
    all_tripdata_clean$ride_length <- difftime(all_tripdata_clean$ended_at,all_tripdata_clean$started_at,units = "mins")
    
    all_tripdata_clean$ride_distance <- distGeo(matrix(c(all_tripdata_clean$start_lng, all_tripdata_clean$start_lat), ncol = 2), matrix(c(all_tripdata_clean$end_lng, all_tripdata_clean$end_lat), ncol = 2))
    all_tripdata_clean$ride_distance = all_tripdata_clean$ride_distance/1609.34 #converting to miles
    
    #Calculating the mean time and distance based on the user groups
    userType_means <- all_tripdata_clean %>% group_by(member_casual) %>% summarise(mean_time = mean(ride_length))
    
    
    userType_means <- all_tripdata_clean %>% 
     group_by(member_casual) %>% 
     summarise(mean_time = mean(ride_length),mean_distance = mean(ride_distance))
    

    Adding in calculations that will differentiate between bike types and which type of user is using each specific bike type.

    #Calculations
    
    with_bike_type <- all_tripdata_clean %>% filter(rideable_type=="classic_bike" | rideable_type=="electric_bike")
    
    with_bike_type %>%
     mutate(weekday = wday(started_at, label = TRUE)) %>% 
     group_by(member_casual,rideable_type,weekday) %>%
     summarise(totals=n(), .groups="drop") %>%
     
    with_bike_type %>%
     group_by(member_casual,rideable_type) %>%
     summarise(totals=n(), .groups="drop") %>%
    
     #Calculating the ride differential
     
     all_tripdata_clean %>% 
     mutate(weekday = wkday(started_at, label = TRUE)) %>% 
     group_by(member_casual, weekday) %>% 
     summarise(number_of_rides = n()
          ,average_duration = mean(ride_length),.groups = 'drop') %>% 
     arrange(me...
    
  5. ACNC 2019 Annual Information Statement Data

    • researchdata.edu.au
    • data.gov.au
    Updated May 10, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ACNC 2019 Annual Information Statement Data [Dataset]. https://researchdata.edu.au/acnc-2019-annual-statement-data/2975980
    Explore at:
    Dataset updated
    May 10, 2021
    Dataset provided by
    Data.govhttps://data.gov/
    Authors
    Australian Charities and Not-for-profits Commission (ACNC)
    License

    Attribution 2.5 (CC BY 2.5)https://creativecommons.org/licenses/by/2.5/
    License information was derived automatically

    Description

    This dataset is updated weekly. Please ensure that you use the most up-to-date version.###\r

    \r The Australian Charities and Not-for-profits Commission (ACNC) is Australia’s national regulator of charities.\r \r Since 3 December 2012, charities wanting to access Commonwealth charity tax concessions (and other benefits), need to register with the ACNC. Although many charities choose to register, registration with the ACNC is voluntary.\r \r Each year, registered charities are required to lodge an Annual Information Statement (AIS) with the ACNC. Charities are required to submit their AIS within six months of the end of their reporting period.\r \r Registered charities can apply to the ACNC to have some or all of the information they provide withheld from the ACNC Register. However, there are only limited circumstances when the ACNC can agree to withhold information. If a charity has applied to have their data withheld, the AIS data relating to that charity has been excluded from this dataset.\r \r This dataset can be used to find the AIS information lodged by multiple charities. It can also be used to filter and sort by different variables across all AIS information.\r \r This dataset can be used to find the AIS information lodged by multiple charities. It can also be used to filter and sort by different variables across all AIS information. AIS Information for individual charities can be viewed via the ACNC Charity Register.\r \r The AIS collects information about charity finances, and financial information provides a basis for understanding the charity and its activities in greater detail. \r We have published explanatory notes to help you understand this dataset.\r \r When comparing charities’ financial information it is important to consider each charity's unique situation. This is particularly true for small charities, which are not compelled to provide financial reports – reports that often contain more details about their financial position and activities – as part of their AIS.\r \r For more information on interpreting financial information, please refer to the ACNC website.\r \r The ACNC also publishes other datasets on data.gov.au as part of our commitment to open data and transparent regulation. Please click here to view them.\r \r NOTE: It is possible that some information in this dataset might be subject to a future request from a charity to have their information withheld. If this occurs, this information will still appear in the dataset until the next update.\r \r Please consider this risk when using this dataset.

  6. d

    Sediment macrofauna count data and images of multicores collected during R/V...

    • search.dataone.org
    • data.griidc.org
    Updated Feb 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MacDonald, Ian (2025). Sediment macrofauna count data and images of multicores collected during R/V Weatherbird II cruise 1305, September 22-29, 2012 [Dataset]. http://doi.org/10.7266/N7BV7DKC
    Explore at:
    Dataset updated
    Feb 5, 2025
    Dataset provided by
    GRIIDC
    Authors
    MacDonald, Ian
    Description

    This dataset contains 146 jpeg images of multicores collected during R/V Weatherbird II cruise 1305 from September 22nd to 29th 2012. Additionally, this includes a file of raw sort data for macrofauna to the family level.

  7. Explore data formats and ingestion methods

    • kaggle.com
    Updated Feb 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabriel Preda (2021). Explore data formats and ingestion methods [Dataset]. https://www.kaggle.com/datasets/gpreda/iris-dataset/discussion?sort=undefined
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 12, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Gabriel Preda
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Why this Dataset

    This dataset brings to you Iris Dataset in several data formats (see more details in the next sections).

    You can use it to test the ingestion of data in all these formats using Python or R libraries. We also prepared Python Jupyter Notebook and R Markdown report that input all these formats:

    Iris Dataset

    Iris Dataset was created by R. A. Fisher and donated by Michael Marshall.

    Repository on UCI site: https://archive.ics.uci.edu/ml/datasets/iris

    Data Source: https://archive.ics.uci.edu/ml/machine-learning-databases/iris/

    The file downloaded is iris.data and is formatted as a comma delimited file.

    This small data collection was created to help you test your skills with ingesting various data formats.

    Content

    This file was processed to convert the data in the following formats: * csv - comma separated values format * tsv - tab separated values format * parquet - parquet format
    * feather - feather format * parquet.gzip - compressed parquet format * h5 - hdf5 format * pickle - Python binary object file - pickle format * xslx - Excel format
    * npy - Numpy (Python library) binary format * npz - Numpy (Python library) binary compressed format * rds - Rds (R specific data format) binary format

    Acknowledgements

    I would like to acknowledge the work of the creator of the dataset - R. A. Fisher and of the donor - Michael Marshall.

    Inspiration

    Use these data formats to test your skills in ingesting data in various formats.

  8. Water Data Online

    • researchdata.edu.au
    • data.gov.au
    • +2more
    Updated Oct 21, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bureau of Meteorology (2014). Water Data Online [Dataset]. https://researchdata.edu.au/water-data-online/3528495
    Explore at:
    Dataset updated
    Oct 21, 2014
    Dataset provided by
    Data.govhttps://data.gov/
    Authors
    Bureau of Meteorology
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Area covered
    Description

    Water Data Online provides free access to nationally consistent, current and historical water information. It allows you to view and download standardised data and reports. \r \r Watercourse level and watercourse discharge time series data from approximately 3500 water monitoring stations across Australia are available. \r \r Water Data Online displays time series data supplied by lead water agencies from each State and Territory with updates provided to the Bureau on a daily basis. \r \r Over time, more stations and parameters will become available and linkages to Water Data Online from the Geofabric will be implemented. \r \r Before using data please refer to licence preferences of the supplying organisations under the Copyright tab \r \r

  9. g

    First quarter 2024 / Table BOAMP-SIREN-BUYERS (BSA): a cross between the...

    • gimi9.com
    Updated Feb 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). First quarter 2024 / Table BOAMP-SIREN-BUYERS (BSA): a cross between the BOAMP table (DILA) and the Sirene Business Base (INSEE) | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_6644ac7663969d80f6047dd8/
    Explore at:
    Dataset updated
    Feb 13, 2025
    Description

    Crossing table of the BOAMP table (DILA) with the Sirene Business Base (INSEE) / First Quarter 2024. - The BUYER's Siren number (column "SN_30_Siren") is implemented for each ad (column and primary key "B_17_idweb"); - Several columns facilitating datamining have been added; - The names of the original columns have been prefixed, numbered and sorted alphabetically. ---- You will find here - The BSA for the first quarter of 2024 in free and open access (csv/separator ** semicolon** formats, and Public Prosecutor’s Office); - The schema of the BSA table (csv/comma separator format); - An excerpt from the March 30 BSA (csv/comma separator format) to quickly give you an idea of the Datagouv explorer. NB / The March 30 extract sees its columns of cells in json GESTION, DATA, and ANNONCES_ANTERIEURES purged. The data thus deleted can be found in a nicer format by following the links of the added columns: - B_41_GESTION_URL_JSON; - B_43_DONNEES_URL_JSON; - B_45_ANNONCES_ANTERIEURES_URL_JSON. ---- More info - Daily and paid updates on the entire BOAMP 2024 are available on our website under ► AuFilDuBoamp Downloads; - Further documentation can be found at ► AuFilDuBoamp Doc & TP. ---- Data sources - SIRENE database of companies and their establishments (SIREN, SIRET) of August - BOAMP API ---- To download the first quarter of the BSA with Python, run: For the CSV: df = pd.read_csv("https://www.data.gouv.fr/en/datasets/r/63f0d792-148a-4c95-a0b6-9e8ea8b0b34a", dtype='string', sep=';') For the Public Prosecutor's Office: df = pd.read_parquet("https://www.data.gouv.fr/en/datasets/r/f7a4a76e-ff50-4dc6-bae8-97368081add2") Enjoy! https://www.aufilduboamp.com/shares/aufilduboamp_docs/ap_tampon_blanc.jpg" alt="www.aufilduboamp.com and BOAMP data on datagouv" title="www.aufilduboamp.com and BOAMP data on datagouv">

  10. Additional file 1: of Best-worst scaling improves measurement of first...

    • springernature.figshare.com
    zip
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nichola Burton; Michael Burton; Dan Rigby; Clare Sutherland; Gillian Rhodes (2023). Additional file 1: of Best-worst scaling improves measurement of first impressions [Dataset]. http://doi.org/10.6084/m9.figshare.9894992.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Nichola Burton; Michael Burton; Dan Rigby; Clare Sutherland; Gillian Rhodes
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    R scripts used to generate the design and sort and score the data of Study 3, with annotation: intended as a template to build future BWS studies. (ZIP 15 kb)

  11. Integration of Slurry Separation Technology & Refrigeration Units: Air...

    • catalog.data.gov
    • s.cnmilf.com
    Updated Jun 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.usaid.gov (2024). Integration of Slurry Separation Technology & Refrigeration Units: Air Quality - CO [Dataset]. https://catalog.data.gov/dataset/integration-of-slurry-separation-technology-refrigeration-units-air-quality-co-b7d1e
    Explore at:
    Dataset updated
    Jun 25, 2024
    Dataset provided by
    United States Agency for International Developmenthttp://usaid.gov/
    Description

    This is the carbon monoxide data. Each sheet (tab) is formatted to be exported as a .csv for use with the R-code (AQ-June20.R). In order for this code to work properly, it is important that this file remain intact. Do not change the column names or codes for data, for example. And to be safe, don’t even sort. Just in case. One simple change in the excel file could make the code full of bugs.

  12. Data from: TDMentions: A Dataset of Technical Debt Mentions in Online Posts

    • zenodo.org
    • data.niaid.nih.gov
    bin, bz2
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Morgan Ericsson; Morgan Ericsson; Anna Wingkvist; Anna Wingkvist (2020). TDMentions: A Dataset of Technical Debt Mentions in Online Posts [Dataset]. http://doi.org/10.5281/zenodo.2593142
    Explore at:
    bin, bz2Available download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Morgan Ericsson; Morgan Ericsson; Anna Wingkvist; Anna Wingkvist
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    # TDMentions: A Dataset of Technical Debt Mentions in Online Posts (version 1.0)

    TDMentions is a dataset that contains mentions of technical debt from Reddit, Hacker News, and Stack Exchange. It also contains a list of blog posts on Medium that were tagged as technical debt. The dataset currently contains approximately 35,000 items.

    ## Data collection and processing

    The dataset is mainly collected from existing datasets. We used data from:

    - the archive of Reddit posts by Jason Baumgartner (available at [https://pushshift.io](https://pushshift.io),
    - the archive of Hacker News available at Google's BigQuery (available at [https://console.cloud.google.com/marketplace/details/y-combinator/hacker-news](https://console.cloud.google.com/marketplace/details/y-combinator/hacker-news)), and the Stack Exchange data dump (available at [https://archive.org/details/stackexchange](https://archive.org/details/stackexchange)).
    - the [GHTorrent](http://ghtorrent.org) project
    - the [GH Archive](https://www.gharchive.org)

    The data set currently contains data from the start of each source/service until 2018-12-31. For GitHub, we currently only include data from 2015-01-01.

    We use the regular expression `tech(nical)?[\s\-_]*?debt` to find mentions in all sources except for Medium. We decided to limit our matches to variations of technical debt and tech debt. Other shorter forms, such as TD, can result in too many false positives. For Medium, we used the tag `technical-debt`.

    ## Data Format

    The dataset is stored as a compressed (bzip2) JSON file with one JSON object per line. Each mention is represented as a JSON object with the following keys.

    - `id`: the id used in the original source. We use the URL path to identify Medium posts.
    - `body`: the text that contains the mention. This is either the comment or the title of the post. For Medium posts this is the title and subtitle (which might not mention technical debt, since posts are identified by the tag).
    - `created_utc`: the time the item was posted in seconds since epoch in UTC.
    - `author`: the author of the item. We use the username or userid from the source.
    - `source`: where the item was posted. Valid sources are:
    - HackerNews Comment
    - HackerNews Job
    - HackerNews Submission
    - Reddit Comment
    - Reddit Submission
    - StackExchange Answer
    - StackExchange Comment
    - StackExchange Question
    - Medium Post
    - `meta`: Additional information about the item specific to the source. This includes, e.g., the subreddit a Reddit submission or comment was posted to, the score, etc. We try to use the same names, e.g., `score` and `num_comments` for keys that have the same meaning/information across multiple sources.

    This is a sample item from Reddit:

    ```JSON
    {
    "id": "ab8auf",
    "body": "Technical Debt Explained (x-post r/Eve)",
    "created_utc": 1546271789,
    "author": "totally_100_human",
    "source": "Reddit Submission",
    "meta": {
    "title": "Technical Debt Explained (x-post r/Eve)",
    "score": 1,
    "num_comments": 0,
    "url": "http://jestertrek.com/eve/technical-debt-2.png",
    "subreddit": "RCBRedditBot"
    }
    }
    ```

    ## Sample Analyses

    We decided to use JSON to store the data, since it is easy to work with from multiple programming languages. In the following examples, we use [`jq`](https://stedolan.github.io/jq/) to process the JSON.

    ### How many items are there for each source?

    ```
    lbzip2 -cd postscomments.json.bz2 | jq '.source' | sort | uniq -c
    ```

    ### How many submissions that mentioned technical debt were posted each month?

    ```
    lbzip2 -cd postscomments.json.bz2 | jq 'select(.source == "Reddit Submission") | .created_utc | strftime("%Y-%m")' | sort | uniq -c
    ```

    ### What are the titles of items that link (`meta.url`) to PDF documents?

    ```
    lbzip2 -cd postscomments.json.bz2 | jq '. as $r | select(.meta.url?) | .meta.url | select(endswith(".pdf")) | $r.body'
    ```

    ### Please, I want CSV!

    ```
    lbzip2 -cd postscomments.json.bz2 | jq -r '[.id, .body, .author] | @csv'
    ```

    Note that you need to specify the keys you want to include for the CSV, so it is easier to either ignore the meta information or process each source.

    Please see [https://github.com/sse-lnu/tdmentions](https://github.com/sse-lnu/tdmentions) for more analyses

    # Limitations and Future updates

    The current version of the dataset lacks GitHub data and Medium comments. GitHub data will be added in the next update. Medium comments (responses) will be added in a future update if we find a good way to represent these.

  13. User Data

    • kaggle.com
    Updated Jul 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ashish R. Soni (2023). User Data [Dataset]. https://www.kaggle.com/datasets/ashishrsoni/user-data/discussion?sort=undefined
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 25, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ashish R. Soni
    Description

    Dataset

    This dataset was created by Ashish R. Soni

    Contents

  14. g

    Integration of Slurry Separation Technology & Refrigeration Units: Air...

    • gimi9.com
    Updated Jun 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Integration of Slurry Separation Technology & Refrigeration Units: Air Quality - PMVa | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_integration-of-slurry-separation-technology-refrigeration-units-air-quality-pmva-87359/
    Explore at:
    Dataset updated
    Jun 25, 2024
    Description

    This is the gravimetric data used to calibrate the real time readings. Each sheet (tab) is formatted to be exported as a .csv for use with the R-code (AQ-June20.R). In order for this code to work properly, it is important that this file remain intact. Do not change the column names or codes for data, for example. And to be safe, don’t even sort. One simple change in the excel file could make the code full of bugs.

  15. Integration of Slurry Separation Technology & Refrigeration Units: Air...

    • catalog.data.gov
    Updated Jun 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.usaid.gov (2024). Integration of Slurry Separation Technology & Refrigeration Units: Air Quality - SO2 [Dataset]. https://catalog.data.gov/dataset/integration-of-slurry-separation-technology-refrigeration-units-air-quality-so2-a3122
    Explore at:
    Dataset updated
    Jun 25, 2024
    Dataset provided by
    United States Agency for International Developmenthttp://usaid.gov/
    Description

    This is the raw SO2 data. Each sheet (tab) is formatted to be exported as a .csv for use with the R-code (AQ-June20.R). In order for this code to work properly, it is important that this file remain intact. Do not change the column names or codes for data, for example. And to be safe, don’t even sort. One simple change in the excel file could make the code full of bugs.

  16. d

    Integration of Slurry Separation Technology & Refrigeration Units: Air...

    • datasets.ai
    • catalog.data.gov
    23, 40, 55, 8
    Updated Sep 13, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    US Agency for International Development (2024). Integration of Slurry Separation Technology & Refrigeration Units: Air Quality - H2S [Dataset]. https://datasets.ai/datasets/integration-of-slurry-separation-technology-refrigeration-units-air-quality-h2s-4af17
    Explore at:
    23, 40, 8, 55Available download formats
    Dataset updated
    Sep 13, 2024
    Dataset authored and provided by
    US Agency for International Development
    Description

    This is the raw H2S data- concentration of H2S in parts per million in the biogas. Each sheet (tab) is formatted to be exported as a .csv for use with the R-code (AQ-June20.R). In order for this code to work properly, it is important that this file remain intact. Do not change the column names or codes for data, for example. And to be safe, don’t even sort. One simple change in the excel file could make the code full of bugs.

  17. Integration of Slurry Separation Technology & Refrigeration Units: Air...

    • catalog.data.gov
    Updated Jun 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.usaid.gov (2024). Integration of Slurry Separation Technology & Refrigeration Units: Air Quality - CH4 [Dataset]. https://catalog.data.gov/dataset/integration-of-slurry-separation-technology-refrigeration-units-air-quality-ch4-8abb6
    Explore at:
    Dataset updated
    Jun 25, 2024
    Dataset provided by
    United States Agency for International Developmenthttp://usaid.gov/
    Description

    Methane concentration of biogas. Each sheet (tab) is formatted to be exported as a .csv for use with the R-code (AQ-June20.R). In order for this code to work properly, it is important that this file remain intact. Do not change the column names or codes for data, for example. And to be safe, don’t even sort. Just in case. One simple change in the excel file could make the code full of bugs.

  18. Brisbane Library Checkout Data

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip, bin
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicholas Tierney; Nicholas Tierney (2020). Brisbane Library Checkout Data [Dataset]. http://doi.org/10.5281/zenodo.2437860
    Explore at:
    bin, application/gzipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Nicholas Tierney; Nicholas Tierney
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Brisbane
    Description

    This has been copied from the README.md file

    bris-lib-checkout

    This provides tidied up data from the Brisbane library checkouts

    Retrieving and cleaning the data

    The script for retrieving and cleaning the data is made available in scrape-library.R.

    The data

    • The data/ folder contains the tidy data
    • The data-raw/ folder contains the raw data

    data/

    This contains four tidied up dataframes:

    • tidy-brisbane-library-checkout.csv
    • metadata_branch.csv
    • metadata_heading.csv
    • metadata_item_type.csv

    tidy-brisbane-library-checkout.csv contains the following columns, with the metadata file metadata_heading containing the description of these columns.

    knitr::kable(readr::read_csv("data/metadata_heading.csv"))
    #> Parsed with column specification:
    #> cols(
    #> heading = col_character(),
    #> heading_explanation = col_character()
    #> )

    heading

    heading_explanation

    Title

    Title of Item

    Author

    Author of Item

    Call Number

    Call Number of Item

    Item id

    Unique Item Identifier

    Item Type

    Type of Item (see next column)

    Status

    Current Status of Item

    Language

    Published language of item (if not English)

    Age

    Suggested audience

    Checkout Library

    Checkout branch

    Date

    Checkout date

    We also added year, month, and day columns.

    The remaining data are all metadata files that contain meta information on the columns in the checkout data:

    library(tidyverse)
    #> ── Attaching packages ────────────── tidyverse 1.2.1 ──
    #> ✔ ggplot2 3.1.0 ✔ purrr 0.2.5
    #> ✔ tibble 1.4.99.9006 ✔ dplyr 0.7.8
    #> ✔ tidyr 0.8.2 ✔ stringr 1.3.1
    #> ✔ readr 1.3.0 ✔ forcats 0.3.0
    #> ── Conflicts ───────────────── tidyverse_conflicts() ──
    #> ✖ dplyr::filter() masks stats::filter()
    #> ✖ dplyr::lag() masks stats::lag()
    knitr::kable(readr::read_csv("data/metadata_branch.csv"))
    #> Parsed with column specification:
    #> cols(
    #> branch_code = col_character(),
    #> branch_heading = col_character()
    #> )

    branch_code

    branch_heading

    ANN

    Annerley

    ASH

    Ashgrove

    BNO

    Banyo

    BRR

    BrackenRidge

    BSQ

    Brisbane Square Library

    BUL

    Bulimba

    CDA

    Corinda

    CDE

    Chermside

    CNL

    Carindale

    CPL

    Coopers Plains

    CRA

    Carina

    EPK

    Everton Park

    FAI

    Fairfield

    GCY

    Garden City

    GNG

    Grange

    HAM

    Hamilton

    HPK

    Holland Park

    INA

    Inala

    IPY

    Indooroopilly

    MBG

    Mt. Coot-tha

    MIT

    Mitchelton

    MTG

    Mt. Gravatt

    MTO

    Mt. Ommaney

    NDH

    Nundah

    NFM

    New Farm

    SBK

    Sunnybank Hills

    SCR

    Stones Corner

    SGT

    Sandgate

    VAN

    Mobile Library

    TWG

    Toowong

    WND

    West End

    WYN

    Wynnum

    ZIL

    Zillmere

    knitr::kable(readr::read_csv("data/metadata_item_type.csv"))
    #> Parsed with column specification:
    #> cols(
    #> item_type_code = col_character(),
    #> item_type_explanation = col_character()
    #> )

    item_type_code

    item_type_explanation

    AD-FICTION

    Adult Fiction

    AD-MAGS

    Adult Magazines

    AD-PBK

    Adult Paperback

    BIOGRAPHY

    Biography

    BSQCDMUSIC

    Brisbane Square CD Music

    BSQCD-ROM

    Brisbane Square CD Rom

    BSQ-DVD

    Brisbane Square DVD

    CD-BOOK

    Compact Disc Book

    CD-MUSIC

    Compact Disc Music

    CD-ROM

    CD Rom

    DVD

    DVD

    DVD_R18+

    DVD Restricted - 18+

    FASTBACK

    Fastback

    GAYLESBIAN

    Gay and Lesbian Collection

    GRAPHICNOV

    Graphic Novel

    ILL

    InterLibrary Loan

    JU-FICTION

    Junior Fiction

    JU-MAGS

    Junior Magazines

    JU-PBK

    Junior Paperback

    KITS

    Kits

    LARGEPRINT

    Large Print

    LGPRINTMAG

    Large Print Magazine

    LITERACY

    Literacy

    LITERACYAV

    Literacy Audio Visual

    LOCSTUDIES

    Local Studies

    LOTE-BIO

    Languages Other than English Biography

    LOTE-BOOK

    Languages Other than English Book

    LOTE-CDMUS

    Languages Other than English CD Music

    LOTE-DVD

    Languages Other than English DVD

    LOTE-MAG

    Languages Other than English Magazine

    LOTE-TB

    Languages Other than English Taped Book

    MBG-DVD

    Mt Coot-tha Botanical Gardens DVD

    MBG-MAG

    Mt Coot-tha Botanical Gardens Magazine

    MBG-NF

    Mt Coot-tha Botanical Gardens Non Fiction

    MP3-BOOK

    MP3 Audio Book

    NONFIC-SET

    Non Fiction Set

    NONFICTION

    Non Fiction

    PICTURE-BK

    Picture Book

    PICTURE-NF

    Picture Book Non Fiction

    PLD-BOOK

    Public Libraries Division Book

    YA-FICTION

    Young Adult Fiction

    YA-MAGS

    Young Adult Magazine

    YA-PBK

    Young Adult Paperback

    Example usage

    Let’s explore the data

    bris_libs <- readr::read_csv("data/bris-lib-checkout.csv")
    #> Parsed with column specification:
    #> cols(
    #> title = col_character(),
    #> author = col_character(),
    #> call_number = col_character(),
    #> item_id = col_double(),
    #> item_type = col_character(),
    #> status = col_character(),
    #> language = col_character(),
    #> age = col_character(),
    #> library = col_character(),
    #> date = col_double(),
    #> datetime = col_datetime(format = ""),
    #> year = col_double(),
    #> month = col_double(),
    #> day = col_character()
    #> )
    #> Warning: 20 parsing failures.
    #> row col expected actual file
    #> 587795 item_id a double REFRESH 'data/bris-lib-checkout.csv'
    #> 590579 item_id a double REFRESH 'data/bris-lib-checkout.csv'
    #> 590597 item_id a double REFRESH 'data/bris-lib-checkout.csv'
    #> 595774 item_id a double REFRESH 'data/bris-lib-checkout.csv'
    #> 597567 item_id a double REFRESH 'data/bris-lib-checkout.csv'
    #> ...... ....... ........ ....... ............................
    #> See problems(...) for more details.

    We can count the number of titles, item types, suggested age, and the library given:

    library(dplyr)
    count(bris_libs, title, sort = TRUE)
    #> # A tibble: 121,046 x 2
    #> title n
    #>

    License

    This data is provided under a CC BY 4.0 license

    It has been downloaded from Brisbane library checkouts, and tidied up using the code in data-raw.

  19. e

    Servizz sempliċi ta’ tniżżil (Atom) tas-sett ta’ data: Sort Agen — żoni...

    • data.europa.eu
    unknown
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Servizz sempliċi ta’ tniżżil (Atom) tas-sett ta’ data: Sort Agen — żoni tal-qasam tal-veloċità [Dataset]. https://data.europa.eu/data/datasets/fr-120066022-srv-50e8b8e8-fae2-442f-86bc-46f0618f8b58?locale=mt
    Explore at:
    unknownAvailable download formats
    Description

    It-TRI d’Agen jinkludi 20 muniċipalità mifruxa fuq il-baċir ta’ Garonne magħruf bħala Agenaise. Tabella taż-żoni tal-ispazju tal-veloċità (żoni li għalihom hija disponibbli stima tal-veloċità għall-għargħar ta’ ċertu tip f’ċertu xenarju). Sett ta’ dejta ġeografika prodott mid-Direttiva GIS dwar l-GIS dwar l-Għargħar tar-Riskju Għoli (TRI) tal-Agen u mmappjat għal skopijiet ta’ rappurtar għad-Direttiva Ewropea dwar l-Għargħar. Id-Direttiva Ewropea 2007/60/KE tat-23 ta’ Ottubru 2007 dwar il-valutazzjoni u l-immaniġġjar tar-riskji tal-għargħar (ĠU L 288, 06–11–2007, p. 27) tinfluwenza l-istrateġija għall-prevenzjoni tal-għargħar fl-Ewropa. Dan jirrikjedi l-produzzjoni ta’ pjanijiet ta’ ġestjoni tar-riskju ta’ għargħar biex jitnaqqsu l-konsegwenzi negattivi tal-għargħar fuq is-saħħa tal-bniedem, l-ambjent, il-wirt kulturali u l-attività ekonomika. L-għanijiet u r-rekwiżiti ta’ implimentazzjoni huma stabbiliti fil-Liġi tat-12 ta’ Lulju 2010 dwar l-Impenn Nazzjonali għall-Ambjent (LENE) u d-Digriet tat-2 ta’ Marzu 2011. F’dan il-kuntest, l-għan primarju tal-immappjar tar-riskju tal-għargħar u tal-għargħar għall-IRRs huwa li jikkontribwixxi, bl-omoġenizzazzjoni u l-oġġettività tal-għarfien dwar l-esponiment għall-għargħar, għall-iżvilupp ta’ pjanijiet ta’ ġestjoni tar-riskju tal-għargħar (WRMS). Dan is-sett ta’ dejta jintuża biex jiġu prodotti mapep tal-wiċċ tal-għargħar u mapep tar-riskju ta’ għargħar li jirrappreżentaw perikli u kwistjonijiet ta’ għargħar fuq skala xierqa, rispettivament.L-għan tagħhom huwa li jipprovdu evidenza kwantitattiva biex jivvalutaw aktar il-vulnerabbiltà ta’ territorju għat-tliet livelli ta’ probabbiltà ta’ għargħar (għoli, medju, baxx). Tabella taż-żoni tal-ispazju tal-veloċità (żoni li għalihom hija disponibbli stima tal-veloċità għall-għargħar ta’ ċertu tip f’ċertu xenarju). Sett ta’ dejta ġeografika prodott mid-Direttiva GIS dwar l-GIS dwar l-Għargħar tar-Riskju Għoli (TRI) tal-Agen u mmappjat għal skopijiet ta’ rappurtar għad-Direttiva Ewropea dwar l-Għargħar. Id-Direttiva Ewropea 2007/60/KE tat-23 ta’ Ottubru 2007 dwar il-valutazzjoni u l-immaniġġjar tar-riskji tal-għargħar (ĠU L 288, 06–11–2007, p. 27) tinfluwenza l-istrateġija għall-prevenzjoni tal-għargħar fl-Ewropa. Dan jirrikjedi l-produzzjoni ta’ pjanijiet ta’ ġestjoni tar-riskju ta’ għargħar biex jitnaqqsu l-konsegwenzi negattivi tal-għargħar fuq is-saħħa tal-bniedem, l-ambjent, il-wirt kulturali u l-attività ekonomika. L-għanijiet u r-rekwiżiti ta’ implimentazzjoni huma stabbiliti fil-Liġi tat-12 ta’ Lulju 2010 dwar l-Impenn Nazzjonali għall-Ambjent (LENE) u d-Digriet tat-2 ta’ Marzu 2011. F’dan il-kuntest, l-għan primarju tal-immappjar tar-riskju tal-għargħar u tal-għargħar għall-IRRs huwa li jikkontribwixxi, bl-omoġenizzazzjoni u l-oġġettività tal-għarfien dwar l-esponiment għall-għargħar, għall-iżvilupp ta’ pjanijiet ta’ ġestjoni tar-riskju tal-għargħar (WRMS). Dan is-sett ta’ dejta jintuża biex jiġu prodotti mapep tal-wiċċ tal-għargħar u mapep tar-riskju ta’ għargħar li jirrappreżentaw perikli u kwistjonijiet ta’ għargħar fuq skala xierqa, rispettivament.L-għan tagħhom huwa li jipprovdu evidenza kwantitattiva biex jivvalutaw aktar il-vulnerabbiltà ta’ territorju għat-tliet livelli ta’ probabbiltà ta’ għargħar (għoli, medju, baxx). Tabella taż-żoni tal-ispazju tal-veloċità (żoni li għalihom hija disponibbli stima tal-veloċità għall-għargħar ta’ ċertu tip f’ċertu xenarju). Sett ta’ dejta ġeografika prodott mid-Direttiva GIS dwar l-GIS dwar l-Għargħar tar-Riskju Għoli (TRI) tal-Agen u mmappjat għal skopijiet ta’ rappurtar għad-Direttiva Ewropea dwar l-Għargħar. Id-Direttiva Ewropea 2007/60/KE tat-23 ta’ Ottubru 2007 dwar il-valutazzjoni u l-immaniġġjar tar-riskji tal-għargħar (ĠU L 288, 06–11–2007, p. 27) tinfluwenza l-istrateġija għall-prevenzjoni tal-għargħar fl-Ewropa. Dan jirrikjedi l-produzzjoni ta’ pjanijiet ta’ ġestjoni tar-riskju ta’ għargħar biex jitnaqqsu l-konsegwenzi negattivi tal-għargħar fuq is-saħħa tal-bniedem, l-ambjent, il-wirt kulturali u l-attività ekonomika. L-għanijiet u r-rekwiżiti ta’ implimentazzjoni huma stabbiliti fil-Liġi tat-12 ta’ Lulju 2010 dwar l-Impenn Nazzjonali għall-Ambjent (LENE) u d-Digriet tat-2 ta’ Marzu 2011. F’dan il-kuntest, l-għan primarju tal-immappjar tar-riskju tal-għargħar u tal-għargħar għall-IRRs huwa li jikkontribwixxi, bl-omoġenizzazzjoni u l-oġġettività tal-għarfien dwar l-esponiment għall-għargħar, għall-iżvilupp ta’ pjanijiet ta’ ġestjoni tar-riskju tal-għargħar (WRMS). Dan is-sett ta’ dejta jintuża biex jiġu prodotti mapep tal-wiċċ tal-għargħar u mapep tar-riskju ta’ għargħar li jirrappreżentaw perikli u kwistjonijiet ta’ għargħar fuq skala xierqa, rispettivament.L-għan tagħhom huwa li jipprovdu evidenza kwantitattiva biex jivvalutaw aktar il-vulnerabbiltà ta’ territorju għat-tliet livelli ta’ probabbiltà ta’ għargħar (għoli, medju, baxx).

    Tabella taż-żoni tal-ispazju tal-veloċità (żoni li għalihom hija disponibbli stima tal-veloċità għall-għargħar ta’ ċertu

  20. Naturalistic Neuroimaging Database

    • openneuro.org
    Updated Apr 20, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sarah Aliko; Jiawen Huang; Florin Gheorghiu; Stefanie Meliss; Jeremy I Skipper (2021). Naturalistic Neuroimaging Database [Dataset]. http://doi.org/10.18112/openneuro.ds002837.v2.0.0
    Explore at:
    Dataset updated
    Apr 20, 2021
    Dataset provided by
    OpenNeurohttps://openneuro.org/
    Authors
    Sarah Aliko; Jiawen Huang; Florin Gheorghiu; Stefanie Meliss; Jeremy I Skipper
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Overview

    • The Naturalistic Neuroimaging Database (NNDb v2.0) contains datasets from 86 human participants doing the NIH Toolbox and then watching one of 10 full-length movies during functional magnetic resonance imaging (fMRI).The participants were all right-handed, native English speakers, with no history of neurological/psychiatric illnesses, with no hearing impairments, unimpaired or corrected vision and taking no medication. Each movie was stopped in 40-50 minute intervals or when participants asked for a break, resulting in 2-6 runs of BOLD-fMRI. A 10 minute high-resolution defaced T1-weighted anatomical MRI scan (MPRAGE) is also provided.
    • The NNDb V2.0 is now on Neuroscout, a platform for fast and flexible re-analysis of (naturalistic) fMRI studies. See: https://neuroscout.org/

    v2.0 Changes

    • Overview
      • We have replaced our own preprocessing pipeline with that implemented in AFNI’s afni_proc.py, thus changing only the derivative files. This introduces a fix for an issue with our normalization (i.e., scaling) step and modernizes and standardizes the preprocessing applied to the NNDb derivative files. We have done a bit of testing and have found that results in both pipelines are quite similar in terms of the resulting spatial patterns of activity but with the benefit that the afni_proc.py results are 'cleaner' and statistically more robust.
    • Normalization

      • Emily Finn and Clare Grall at Dartmouth and Rick Reynolds and Paul Taylor at AFNI, discovered and showed us that the normalization procedure we used for the derivative files was less than ideal for timeseries runs of varying lengths. Specifically, the 3dDetrend flag -normalize makes 'the sum-of-squares equal to 1'. We had not thought through that an implication of this is that the resulting normalized timeseries amplitudes will be affected by run length, increasing as run length decreases (and maybe this should go in 3dDetrend’s help text). To demonstrate this, I wrote a version of 3dDetrend’s -normalize for R so you can see for yourselves by running the following code:
      # Generate a resting state (rs) timeseries (ts)
      # Install / load package to make fake fMRI ts
      # install.packages("neuRosim")
      library(neuRosim)
      # Generate a ts
      ts.rs <- simTSrestingstate(nscan=2000, TR=1, SNR=1)
      # 3dDetrend -normalize
      # R command version for 3dDetrend -normalize -polort 0 which normalizes by making "the sum-of-squares equal to 1"
      # Do for the full timeseries
      ts.normalised.long <- (ts.rs-mean(ts.rs))/sqrt(sum((ts.rs-mean(ts.rs))^2));
      # Do this again for a shorter version of the same timeseries
      ts.shorter.length <- length(ts.normalised.long)/4
      ts.normalised.short <- (ts.rs[1:ts.shorter.length]- mean(ts.rs[1:ts.shorter.length]))/sqrt(sum((ts.rs[1:ts.shorter.length]- mean(ts.rs[1:ts.shorter.length]))^2));
      # By looking at the summaries, it can be seen that the median values become  larger
      summary(ts.normalised.long)
      summary(ts.normalised.short)
      # Plot results for the long and short ts
      # Truncate the longer ts for plotting only
      ts.normalised.long.made.shorter <- ts.normalised.long[1:ts.shorter.length]
      # Give the plot a title
      title <- "3dDetrend -normalize for long (blue) and short (red) timeseries";
      plot(x=0, y=0, main=title, xlab="", ylab="", xaxs='i', xlim=c(1,length(ts.normalised.short)), ylim=c(min(ts.normalised.short),max(ts.normalised.short)));
      # Add zero line
      lines(x=c(-1,ts.shorter.length), y=rep(0,2), col='grey');
      # 3dDetrend -normalize -polort 0 for long timeseries
      lines(ts.normalised.long.made.shorter, col='blue');
      # 3dDetrend -normalize -polort 0 for short timeseries
      lines(ts.normalised.short, col='red');
      
    • Standardization/modernization

      • The above individuals also encouraged us to implement the afni_proc.py script over our own pipeline. It introduces at least three additional improvements: First, we now use Bob’s @SSwarper to align our anatomical files with an MNI template (now MNI152_2009_template_SSW.nii.gz) and this, in turn, integrates nicely into the afni_proc.py pipeline. This seems to result in a generally better or more consistent alignment, though this is only a qualitative observation. Second, all the transformations / interpolations and detrending are now done in fewers steps compared to our pipeline. This is preferable because, e.g., there is less chance of inadvertently reintroducing noise back into the timeseries (see Lindquist, Geuter, Wager, & Caffo 2019). Finally, many groups are advocating using tools like fMRIPrep or afni_proc.py to increase standardization of analyses practices in our neuroimaging community. This presumably results in less error, less heterogeneity and more interpretability of results across studies. Along these lines, the quality control (‘QC’) html pages generated by afni_proc.py are a real help in assessing data quality and almost a joy to use.
    • New afni_proc.py command line

      • The following is the afni_proc.py command line that we used to generate blurred and censored timeseries files. The afni_proc.py tool comes with extensive help and examples. As such, you can quickly understand our preprocessing decisions by scrutinising the below. Specifically, the following command is most similar to Example 11 for ‘Resting state analysis’ in the help file (see https://afni.nimh.nih.gov/pub/dist/doc/program_help/afni_proc.py.html): afni_proc.py \ -subj_id "$sub_id_name_1" \ -blocks despike tshift align tlrc volreg mask blur scale regress \ -radial_correlate_blocks tcat volreg \ -copy_anat anatomical_warped/anatSS.1.nii.gz \ -anat_has_skull no \ -anat_follower anat_w_skull anat anatomical_warped/anatU.1.nii.gz \ -anat_follower_ROI aaseg anat freesurfer/SUMA/aparc.a2009s+aseg.nii.gz \ -anat_follower_ROI aeseg epi freesurfer/SUMA/aparc.a2009s+aseg.nii.gz \ -anat_follower_ROI fsvent epi freesurfer/SUMA/fs_ap_latvent.nii.gz \ -anat_follower_ROI fswm epi freesurfer/SUMA/fs_ap_wm.nii.gz \ -anat_follower_ROI fsgm epi freesurfer/SUMA/fs_ap_gm.nii.gz \ -anat_follower_erode fsvent fswm \ -dsets media_?.nii.gz \ -tcat_remove_first_trs 8 \ -tshift_opts_ts -tpattern alt+z2 \ -align_opts_aea -cost lpc+ZZ -giant_move -check_flip \ -tlrc_base "$basedset" \ -tlrc_NL_warp \ -tlrc_NL_warped_dsets \ anatomical_warped/anatQQ.1.nii.gz \ anatomical_warped/anatQQ.1.aff12.1D \ anatomical_warped/anatQQ.1_WARP.nii.gz \ -volreg_align_to MIN_OUTLIER \ -volreg_post_vr_allin yes \ -volreg_pvra_base_index MIN_OUTLIER \ -volreg_align_e2a \ -volreg_tlrc_warp \ -mask_opts_automask -clfrac 0.10 \ -mask_epi_anat yes \ -blur_to_fwhm -blur_size $blur \ -regress_motion_per_run \ -regress_ROI_PC fsvent 3 \ -regress_ROI_PC_per_run fsvent \ -regress_make_corr_vols aeseg fsvent \ -regress_anaticor_fast \ -regress_anaticor_label fswm \ -regress_censor_motion 0.3 \ -regress_censor_outliers 0.1 \ -regress_apply_mot_types demean deriv \ -regress_est_blur_epits \ -regress_est_blur_errts \ -regress_run_clustsim no \ -regress_polort 2 \ -regress_bandpass 0.01 1 \ -html_review_style pythonic We used similar command lines to generate ‘blurred and not censored’ and the ‘not blurred and not censored’ timeseries files (described more fully below). We will provide the code used to make all derivative files available on our github site (https://github.com/lab-lab/nndb).

      We made one choice above that is different enough from our original pipeline that it is worth mentioning here. Specifically, we have quite long runs, with the average being ~40 minutes but this number can be variable (thus leading to the above issue with 3dDetrend’s -normalise). A discussion on the AFNI message board with one of our team (starting here, https://afni.nimh.nih.gov/afni/community/board/read.php?1,165243,165256#msg-165256), led to the suggestion that '-regress_polort 2' with '-regress_bandpass 0.01 1' be used for long runs. We had previously used only a variable polort with the suggested 1 + int(D/150) approach. Our new polort 2 + bandpass approach has the added benefit of working well with afni_proc.py.

      Which timeseries file you use is up to you but I have been encouraged by Rick and Paul to include a sort of PSA about this. In Paul’s own words: * Blurred data should not be used for ROI-based analyses (and potentially not for ICA? I am not certain about standard practice). * Unblurred data for ISC might be pretty noisy for voxelwise analyses, since blurring should effectively boost the SNR of active regions (and even good alignment won't be perfect everywhere). * For uncensored data, one should be concerned about motion effects being left in the data (e.g., spikes in the data). * For censored data: * Performing ISC requires the users to unionize the censoring patterns during the correlation calculation. * If wanting to calculate power spectra or spectral parameters like ALFF/fALFF/RSFA etc. (which some people might do for naturalistic tasks still), then standard FT-based methods can't be used because sampling is no longer uniform. Instead, people could use something like 3dLombScargle+3dAmpToRSFC, which calculates power spectra (and RSFC params) based on a generalization of the FT that can handle non-uniform sampling, as long as the censoring pattern is mostly random and, say, only up to about 10-15% of the data. In sum, think very carefully about which files you use. If you find you need a file we have not provided, we can happily generate different versions of the timeseries upon request and can generally do so in a week or less.

    • Effect on results

      • From numerous tests on our own analyses, we have qualitatively found that results using our old vs the new afni_proc.py preprocessing pipeline do not change all that much in terms of general spatial patterns. There is, however, an
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Chicago Police Department (2025). sort [Dataset]. https://data.cityofchicago.org/Public-Safety/sort/bnsx-zzcw

sort

Explore at:
xml, tsv, csv, json, application/rdfxml, application/rssxmlAvailable download formats
Dataset updated
Jul 13, 2025
Authors
Chicago Police Department
Description

This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days. Data is extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system. In order to protect the privacy of crime victims, addresses are shown at the block level only and specific locations are not identified. Should you have questions about this dataset, you may contact the Research & Development Division of the Chicago Police Department at 312.745.6071 or RandD@chicagopolice.org. Disclaimer: These crimes may be based upon preliminary information supplied to the Police Department by the reporting parties that have not been verified. The preliminary crime classifications may be changed at a later date based upon additional investigation and there is always the possibility of mechanical or human error. Therefore, the Chicago Police Department does not guarantee (either expressed or implied) the accuracy, completeness, timeliness, or correct sequencing of the information and the information should not be used for comparison purposes over time. The Chicago Police Department will not be responsible for any error or omission, or for the use of, or the results obtained from the use of this information. All data visualizations on maps should be considered approximate and attempts to derive specific addresses are strictly prohibited. The Chicago Police Department is not responsible for the content of any off-site pages that are referenced by or that reference this web page other than an official City of Chicago or Chicago Police Department web page. The user specifically acknowledges that the Chicago Police Department is not responsible for any defamatory, offensive, misleading, or illegal conduct of other users, links, or third parties and that the risk of injury from the foregoing rests entirely with the user. The unauthorized use of the words "Chicago Police Department," "Chicago Police," or any colorable imitation of these words or the unauthorized use of the Chicago Police Department logo is unlawful. This web page does not, in any way, authorize such use. Data is updated daily Tuesday through Sunday. The dataset contains more than 65,000 records/rows of data and cannot be viewed in full in Microsoft Excel. Therefore, when downloading the file, select CSV from the Export menu. Open the file in an ASCII text editor, such as Wordpad, to view and search. To access a list of Chicago Police Department - Illinois Uniform Crime Reporting (IUCR) codes, go to http://data.cityofchicago.org/Public-Safety/Chicago-Police-Department-Illinois-Uniform-Crime-R/c7ck-438e

Search
Clear search
Close search
Google apps
Main menu