41 datasets found

C
sort
data.cityofchicago.org
csv, xlsx, xml
Updated Aug 9, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chicago Police Department (2025). sort [Dataset]. https://data.cityofchicago.org/Public-Safety/sort/bnsx-zzcw
Explore at:
xml, xlsx, csvAvailable download formats
Dataset updated
Aug 9, 2025
Authors
Chicago Police Department
Description
This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days. Data is extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system. In order to protect the privacy of crime victims, addresses are shown at the block level only and specific locations are not identified. Should you have questions about this dataset, you may contact the Research & Development Division of the Chicago Police Department at 312.745.6071 or RandD@chicagopolice.org. Disclaimer: These crimes may be based upon preliminary information supplied to the Police Department by the reporting parties that have not been verified. The preliminary crime classifications may be changed at a later date based upon additional investigation and there is always the possibility of mechanical or human error. Therefore, the Chicago Police Department does not guarantee (either expressed or implied) the accuracy, completeness, timeliness, or correct sequencing of the information and the information should not be used for comparison purposes over time. The Chicago Police Department will not be responsible for any error or omission, or for the use of, or the results obtained from the use of this information. All data visualizations on maps should be considered approximate and attempts to derive specific addresses are strictly prohibited. The Chicago Police Department is not responsible for the content of any off-site pages that are referenced by or that reference this web page other than an official City of Chicago or Chicago Police Department web page. The user specifically acknowledges that the Chicago Police Department is not responsible for any defamatory, offensive, misleading, or illegal conduct of other users, links, or third parties and that the risk of injury from the foregoing rests entirely with the user. The unauthorized use of the words "Chicago Police Department," "Chicago Police," or any colorable imitation of these words or the unauthorized use of the Chicago Police Department logo is unlawful. This web page does not, in any way, authorize such use. Data is updated daily Tuesday through Sunday. The dataset contains more than 65,000 records/rows of data and cannot be viewed in full in Microsoft Excel. Therefore, when downloading the file, select CSV from the Export menu. Open the file in an ASCII text editor, such as Wordpad, to view and search. To access a list of Chicago Police Department - Illinois Uniform Crime Reporting (IUCR) codes, go to http://data.cityofchicago.org/Public-Safety/Chicago-Police-Department-Illinois-Uniform-Crime-R/c7ck-438e
H
Replication Data for "Why Partisans Don't Sort: The Constraints on Partisan...
dataverse.harvard.edu
Updated May 2, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Clayton Nall; Jonathan Mummolo (2016). Replication Data for "Why Partisans Don't Sort: The Constraints on Partisan Segregation" [Dataset]. http://doi.org/10.7910/DVN/EDGRDC
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/EDGRDC
Dataset updated
May 2, 2016
Dataset provided by
Harvard Dataverse
Authors
Clayton Nall; Jonathan Mummolo
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Contains data and R scripts for the JOP article, "Why Partisans Don't Sort: The Constraints on Political Segregation." When downloading tabular data files, ensure that they appear in your working directory in CSV format.
d
Replication Data for: Why Partisans Don't Sort
search.dataone.org
Updated Nov 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nall, Clayton; Mummolo, Jonathan (2023). Replication Data for: Why Partisans Don't Sort [Dataset]. http://doi.org/10.7910/DVN/EHVYNN
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/EHVYNN
Dataset updated
Nov 21, 2023
Dataset provided by
Harvard Dataverse
Authors
Nall, Clayton; Mummolo, Jonathan
Description
Contains R scripts and data needed to reproduce the analyses found in Mummolo and Nall, "Why Partisans Don't Sort: The Constraints on Political Segregation." Read READ ME FIRST.rtf or READ ME FIRST.pdf for instructions on executing replication archive contents.

Case Study: Cyclist

kaggle.com

Updated Jul 27, 2021

Facebook

Twitter

Click to copy link

Link copied

Cite

PatrickRCampbell (2021). Case Study: Cyclist [Dataset]. https://www.kaggle.com/patrickrcampbell/case-study-cyclist/discussion

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jul 27, 2021

Dataset provided by

Kagglehttp://kaggle.com/

Authors

PatrickRCampbell

Description

Phase 1: ASK

Key Objectives:

1. Business Task * Cyclist is looking to increase their earnings, and wants to know if creating a social media campaign can influence "Casual" users to become "Annual" members.

2. Key Stakeholders: * The main stakeholder from Cyclist is Lily Moreno, whom is the Director of Marketing and responsible for the development of campaigns and initiatives to promote their bike-share program. The other teams involved with this project will be Marketing & Analytics, and the Executive Team.

3. Business Task: * Comparing the two kinds of users and defining how they use the platform, what variables they have in common, what variables are different, and how can they get Casual users to become Annual members

Phase 2: PREPARE:

Key Objectives:

1. Determine Data Credibility * Cyclist provided data from years 2013-2021 (through March 2021), all of which is first-hand data collected by the company.

2. Sort & Filter Data: * The stakeholders want to know how the current users are using their service, so I am focusing on using the data from 2020-2021 since this is the most relevant period of time to answer the business task.

#Installing packages
install.packages("tidyverse", repos = "http://cran.us.r-project.org")
install.packages("readr", repos = "http://cran.us.r-project.org")
install.packages("janitor", repos = "http://cran.us.r-project.org")
install.packages("geosphere", repos = "http://cran.us.r-project.org")
install.packages("gridExtra", repos = "http://cran.us.r-project.org")

library(tidyverse)
library(readr)
library(janitor)
library(geosphere)
library(gridExtra)

#Importing data & verifying the information within the dataset
all_tripdata_clean <- read.csv("/Data Projects/cyclist/cyclist_data_cleaned.csv")

glimpse(all_tripdata_clean)

summary(all_tripdata_clean)

Phase 3: PROCESS

Key Objectives:

1. Cleaning Data & Preparing for Analysis: * Once the data has been placed into one dataset, and checked for errors, we began cleaning the data. * Eliminating data that correlates to the company servicing the bikes, and any ride with a traveled distance of zero. * New columns will be added to assist in the analysis, and to provide accurate assessments of whom is using the bikes.

#Eliminating any data that represents the company performing maintenance, and trips without any measureable distance
all_tripdata_clean <- all_tripdata_clean[!(all_tripdata_clean$start_station_name == "HQ QR" | all_tripdata_clean$ride_length<0),] 

#Creating columns for the individual date components (days_of_week should be run last)
all_tripdata_clean$day_of_week <- format(as.Date(all_tripdata_clean$date), "%A")
all_tripdata_clean$date <- as.Date(all_tripdata_clean$started_at)
all_tripdata_clean$day <- format(as.Date(all_tripdata_clean$date), "%d")
all_tripdata_clean$month <- format(as.Date(all_tripdata_clean$date), "%m")
all_tripdata_clean$year <- format(as.Date(all_tripdata_clean$date), "%Y")

** Now I will begin calculating the length of rides being taken, distance traveled, and the mean amount of time & distance.**

#Calculating the ride length in miles & minutes
all_tripdata_clean$ride_length <- difftime(all_tripdata_clean$ended_at,all_tripdata_clean$started_at,units = "mins")

all_tripdata_clean$ride_distance <- distGeo(matrix(c(all_tripdata_clean$start_lng, all_tripdata_clean$start_lat), ncol = 2), matrix(c(all_tripdata_clean$end_lng, all_tripdata_clean$end_lat), ncol = 2))
all_tripdata_clean$ride_distance = all_tripdata_clean$ride_distance/1609.34 #converting to miles

#Calculating the mean time and distance based on the user groups
userType_means <- all_tripdata_clean %>% group_by(member_casual) %>% summarise(mean_time = mean(ride_length))


userType_means <- all_tripdata_clean %>% 
 group_by(member_casual) %>% 
 summarise(mean_time = mean(ride_length),mean_distance = mean(ride_distance))

Adding in calculations that will differentiate between bike types and which type of user is using each specific bike type.

#Calculations

with_bike_type <- all_tripdata_clean %>% filter(rideable_type=="classic_bike" | rideable_type=="electric_bike")

with_bike_type %>%
 mutate(weekday = wday(started_at, label = TRUE)) %>% 
 group_by(member_casual,rideable_type,weekday) %>%
 summarise(totals=n(), .groups="drop") %>%
 
with_bike_type %>%
 group_by(member_casual,rideable_type) %>%
 summarise(totals=n(), .groups="drop") %>%

 #Calculating the ride differential
 
 all_tripdata_clean %>% 
 mutate(weekday = wkday(started_at, label = TRUE)) %>% 
 group_by(member_casual, weekday) %>% 
 summarise(number_of_rides = n()
      ,average_duration = mean(ride_length),.groups = 'drop') %>% 
 arrange(me...

ACNC 2019 Annual Information Statement Data
researchdata.edu.au
data.gov.au
Updated May 10, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Australian Charities and Not-for-profits Commission (ACNC) (2021). ACNC 2019 Annual Information Statement Data [Dataset]. https://researchdata.edu.au/acnc-2019-annual-statement-data/2975980
Explore at:
Dataset updated
May 10, 2021
Dataset provided by
Data.govhttps://data.gov/
Authors
Australian Charities and Not-for-profits Commission (ACNC)
License
Attribution 2.5 (CC BY 2.5)https://creativecommons.org/licenses/by/2.5/
License information was derived automatically
Description
This dataset is updated weekly. Please ensure that you use the most up-to-date version.###\r

\r The Australian Charities and Not-for-profits Commission (ACNC) is Australia’s national regulator of charities.\r \r Since 3 December 2012, charities wanting to access Commonwealth charity tax concessions (and other benefits), need to register with the ACNC. Although many charities choose to register, registration with the ACNC is voluntary.\r \r Each year, registered charities are required to lodge an Annual Information Statement (AIS) with the ACNC. Charities are required to submit their AIS within six months of the end of their reporting period.\r \r Registered charities can apply to the ACNC to have some or all of the information they provide withheld from the ACNC Register. However, there are only limited circumstances when the ACNC can agree to withhold information. If a charity has applied to have their data withheld, the AIS data relating to that charity has been excluded from this dataset.\r \r This dataset can be used to find the AIS information lodged by multiple charities. It can also be used to filter and sort by different variables across all AIS information.\r \r This dataset can be used to find the AIS information lodged by multiple charities. It can also be used to filter and sort by different variables across all AIS information. AIS Information for individual charities can be viewed via the ACNC Charity Register.\r \r The AIS collects information about charity finances, and financial information provides a basis for understanding the charity and its activities in greater detail. \r We have published explanatory notes to help you understand this dataset.\r \r When comparing charities’ financial information it is important to consider each charity's unique situation. This is particularly true for small charities, which are not compelled to provide financial reports – reports that often contain more details about their financial position and activities – as part of their AIS.\r \r For more information on interpreting financial information, please refer to the ACNC website.\r \r The ACNC also publishes other datasets on data.gov.au as part of our commitment to open data and transparent regulation. Please click here to view them.\r \r NOTE: It is possible that some information in this dataset might be subject to a future request from a charity to have their information withheld. If this occurs, this information will still appear in the dataset until the next update.\r \r Please consider this risk when using this dataset.
d
Data from: Pueblo Grande (AZ U:9:1(ASM)): Unit 15, Washington and 48th...
search.dataone.org
Updated Jul 8, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abbott, David R.; Martin, Maria (2014). Pueblo Grande (AZ U:9:1(ASM)): Unit 15, Washington and 48th Streets: SSI Kitchell Data Recovery, Ceramic Rough Sort (RS_CERAMICS) Data [Dataset]. http://doi.org/10.6067/XCV8BG2MZH
Explore at:
Unique identifier
https://doi.org/10.6067/XCV8BG2MZH
Dataset updated
Jul 8, 2014
Dataset provided by
the Digital Archaeological Record
Authors
Abbott, David R.; Martin, Maria
Area covered

Description
The Kitchell Data Recovery project Ceramic Rough Sort (RS_CERAMICS) Data sheet contains data from the rough sort analysis of ceramics recovered during the Kitchell data recovery project. It contains information on ceramic types, tempers and counts; it also records vessel and rim forms where applicable. The data sheet also contains rim circumference and rim diameter measurements for some ceramic specimens.

See Partial Data Recovery and Burial Removal at Pueblo Grande (AZ U:9:1 (ASM)): Unit 15, The Former Maricopa County Sheriff's Substation, Washington and 48th Streets, Phoenix, Arizona (SSI Technical Report No. 02-43) for the final report on the Kitchell Data Recovery project.
d
Sediment macrofauna count data and images of multicores collected during R/V...
search.dataone.org
data.griidc.org
Updated Feb 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MacDonald, Ian (2025). Sediment macrofauna count data and images of multicores collected during R/V Weatherbird II cruise 1305, September 22-29, 2012 [Dataset]. http://doi.org/10.7266/N7BV7DKC
Explore at:
Unique identifier
https://doi.org/10.7266/N7BV7DKC
Dataset updated
Feb 5, 2025
Dataset provided by
GRIIDC
Authors
MacDonald, Ian
Description
This dataset contains 146 jpeg images of multicores collected during R/V Weatherbird II cruise 1305 from September 22nd to 29th 2012. Additionally, this includes a file of raw sort data for macrofauna to the family level.
Explore data formats and ingestion methods
kaggle.com
Updated Feb 12, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gabriel Preda (2021). Explore data formats and ingestion methods [Dataset]. https://www.kaggle.com/datasets/gpreda/iris-dataset/discussion?sort=undefined
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 12, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Gabriel Preda
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Why this Dataset

This dataset brings to you Iris Dataset in several data formats (see more details in the next sections).

You can use it to test the ingestion of data in all these formats using Python or R libraries. We also prepared Python Jupyter Notebook and R Markdown report that input all these formats:

Test Data Formats in Python

Test Data Formats in R

Iris Dataset

Iris Dataset was created by R. A. Fisher and donated by Michael Marshall.

Repository on UCI site: https://archive.ics.uci.edu/ml/datasets/iris

Data Source: https://archive.ics.uci.edu/ml/machine-learning-databases/iris/

The file downloaded is iris.data and is formatted as a comma delimited file.

This small data collection was created to help you test your skills with ingesting various data formats.

Content

This file was processed to convert the data in the following formats: * csv - comma separated values format * tsv - tab separated values format * parquet - parquet format
* feather - feather format * parquet.gzip - compressed parquet format * h5 - hdf5 format * pickle - Python binary object file - pickle format * xslx - Excel format
* npy - Numpy (Python library) binary format * npz - Numpy (Python library) binary compressed format * rds - Rds (R specific data format) binary format

Acknowledgements

I would like to acknowledge the work of the creator of the dataset - R. A. Fisher and of the donor - Michael Marshall.

Inspiration

Use these data formats to test your skills in ingesting data in various formats.
Water Data Online
researchdata.edu.au
data.gov.au
+1more
Updated Oct 21, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bureau of Meteorology (2014). Water Data Online [Dataset]. https://researchdata.edu.au/water-data-online/3528495
Explore at:
Dataset updated
Oct 21, 2014
Dataset provided by
Data.govhttps://data.gov/
Authors
Bureau of Meteorology
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Area covered

Description
Water Data Online provides free access to nationally consistent, current and historical water information. It allows you to view and download standardised data and reports. \r \r Watercourse level and watercourse discharge time series data from approximately 3500 water monitoring stations across Australia are available. \r \r Water Data Online displays time series data supplied by lead water agencies from each State and Territory with updates provided to the Bureau on a daily basis. \r \r Over time, more stations and parameters will become available and linkages to Water Data Online from the Geofabric will be implemented. \r \r Before using data please refer to licence preferences of the supplying organisations under the Copyright tab \r \r
g
First quarter 2024 / Table BOAMP-SIREN-BUYERS (BSA): a cross between the...
gimi9.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
First quarter 2024 / Table BOAMP-SIREN-BUYERS (BSA): a cross between the BOAMP table (DILA) and the Sirene Business Base (INSEE) | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_6644ac7663969d80f6047dd8/
Explore at:
Description
Crossing table of the BOAMP table (DILA) with the Sirene Business Base (INSEE) / First Quarter 2024. - The BUYER's Siren number (column "SN_30_Siren") is implemented for each ad (column and primary key "B_17_idweb"); - Several columns facilitating datamining have been added; - The names of the original columns have been prefixed, numbered and sorted alphabetically. ---- You will find here - The BSA for the first quarter of 2024 in free and open access (csv/separator ** semicolon** formats, and Public Prosecutor’s Office); - The schema of the BSA table (csv/comma separator format); - An excerpt from the March 30 BSA (csv/comma separator format) to quickly give you an idea of the Datagouv explorer. NB / The March 30 extract sees its columns of cells in json GESTION, DATA, and ANNONCES_ANTERIEURES purged. The data thus deleted can be found in a nicer format by following the links of the added columns: - B_41_GESTION_URL_JSON; - B_43_DONNEES_URL_JSON; - B_45_ANNONCES_ANTERIEURES_URL_JSON. ---- More info - Daily and paid updates on the entire BOAMP 2024 are available on our website under ► AuFilDuBoamp Downloads; - Further documentation can be found at ► AuFilDuBoamp Doc & TP. ---- Data sources - SIRENE database of companies and their establishments (SIREN, SIRET) of August - BOAMP API ---- To download the first quarter of the BSA with Python, run: For the CSV: df = pd.read_csv("https://www.data.gouv.fr/en/datasets/r/63f0d792-148a-4c95-a0b6-9e8ea8b0b34a", dtype='string', sep=';') For the Public Prosecutor's Office: df = pd.read_parquet("https://www.data.gouv.fr/en/datasets/r/f7a4a76e-ff50-4dc6-bae8-97368081add2") Enjoy! https://www.aufilduboamp.com/shares/aufilduboamp_docs/ap_tampon_blanc.jpg" alt="www.aufilduboamp.com and BOAMP data on datagouv" title="www.aufilduboamp.com and BOAMP data on datagouv">
Additional file 1: of Best-worst scaling improves measurement of first...
springernature.figshare.com
zip
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nichola Burton; Michael Burton; Dan Rigby; Clare Sutherland; Gillian Rhodes (2023). Additional file 1: of Best-worst scaling improves measurement of first impressions [Dataset]. http://doi.org/10.6084/m9.figshare.9894992.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.9894992.v1
Dataset updated
Jun 4, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Nichola Burton; Michael Burton; Dan Rigby; Clare Sutherland; Gillian Rhodes
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
R scripts used to generate the design and sort and score the data of Study 3, with annotation: intended as a template to build future BWS studies. (ZIP 15 kb)
Integration of Slurry Separation Technology & Refrigeration Units: Air...
catalog.data.gov
s.cnmilf.com
+1more
Updated Jun 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.usaid.gov (2024). Integration of Slurry Separation Technology & Refrigeration Units: Air Quality - CO [Dataset]. https://catalog.data.gov/dataset/integration-of-slurry-separation-technology-refrigeration-units-air-quality-co-b7d1e
Explore at:
Dataset updated
Jun 25, 2024
Dataset provided by
United States Agency for International Developmenthttp://usaid.gov/
Description
This is the carbon monoxide data. Each sheet (tab) is formatted to be exported as a .csv for use with the R-code (AQ-June20.R). In order for this code to work properly, it is important that this file remain intact. Do not change the column names or codes for data, for example. And to be safe, don’t even sort. Just in case. One simple change in the excel file could make the code full of bugs.
g
Integration of Slurry Separation Technology & Refrigeration Units: Air...
gimi9.com
Updated Jun 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Integration of Slurry Separation Technology & Refrigeration Units: Air Quality - PMVa | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_integration-of-slurry-separation-technology-refrigeration-units-air-quality-pmva-87359/
Explore at:
Dataset updated
Jun 25, 2024
Description
This is the gravimetric data used to calibrate the real time readings. Each sheet (tab) is formatted to be exported as a .csv for use with the R-code (AQ-June20.R). In order for this code to work properly, it is important that this file remain intact. Do not change the column names or codes for data, for example. And to be safe, don’t even sort. One simple change in the excel file could make the code full of bugs.
Integration of Slurry Separation Technology & Refrigeration Units: Air...
catalog.data.gov
Updated Jun 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.usaid.gov (2024). Integration of Slurry Separation Technology & Refrigeration Units: Air Quality - SO2 [Dataset]. https://catalog.data.gov/dataset/integration-of-slurry-separation-technology-refrigeration-units-air-quality-so2-a3122
Explore at:
Dataset updated
Jun 25, 2024
Dataset provided by
United States Agency for International Developmenthttp://usaid.gov/
Description
This is the raw SO2 data. Each sheet (tab) is formatted to be exported as a .csv for use with the R-code (AQ-June20.R). In order for this code to work properly, it is important that this file remain intact. Do not change the column names or codes for data, for example. And to be safe, don’t even sort. One simple change in the excel file could make the code full of bugs.
Data from: TDMentions: A Dataset of Technical Debt Mentions in Online Posts
zenodo.org
bin, bz2
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Morgan Ericsson; Morgan Ericsson; Anna Wingkvist; Anna Wingkvist (2020). TDMentions: A Dataset of Technical Debt Mentions in Online Posts [Dataset]. http://doi.org/10.5281/zenodo.2593142
Explore at:
bin, bz2Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.2593142
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Morgan Ericsson; Morgan Ericsson; Anna Wingkvist; Anna Wingkvist
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
# TDMentions: A Dataset of Technical Debt Mentions in Online Posts (version 1.0)

TDMentions is a dataset that contains mentions of technical debt from Reddit, Hacker News, and Stack Exchange. It also contains a list of blog posts on Medium that were tagged as technical debt. The dataset currently contains approximately 35,000 items.

## Data collection and processing

The dataset is mainly collected from existing datasets. We used data from:

- the archive of Reddit posts by Jason Baumgartner (available at [https://pushshift.io](https://pushshift.io),
- the archive of Hacker News available at Google's BigQuery (available at [https://console.cloud.google.com/marketplace/details/y-combinator/hacker-news](https://console.cloud.google.com/marketplace/details/y-combinator/hacker-news)), and the Stack Exchange data dump (available at [https://archive.org/details/stackexchange](https://archive.org/details/stackexchange)).
- the [GHTorrent](http://ghtorrent.org) project
- the [GH Archive](https://www.gharchive.org)

The data set currently contains data from the start of each source/service until 2018-12-31. For GitHub, we currently only include data from 2015-01-01.

We use the regular expression `tech(nical)?[\s\-_]*?debt` to find mentions in all sources except for Medium. We decided to limit our matches to variations of technical debt and tech debt. Other shorter forms, such as TD, can result in too many false positives. For Medium, we used the tag `technical-debt`.

## Data Format

The dataset is stored as a compressed (bzip2) JSON file with one JSON object per line. Each mention is represented as a JSON object with the following keys.

- `id`: the id used in the original source. We use the URL path to identify Medium posts.
- `body`: the text that contains the mention. This is either the comment or the title of the post. For Medium posts this is the title and subtitle (which might not mention technical debt, since posts are identified by the tag).
- `created_utc`: the time the item was posted in seconds since epoch in UTC.
- `author`: the author of the item. We use the username or userid from the source.
- `source`: where the item was posted. Valid sources are:
- HackerNews Comment
- HackerNews Job
- HackerNews Submission
- Reddit Comment
- Reddit Submission
- StackExchange Answer
- StackExchange Comment
- StackExchange Question
- Medium Post
- `meta`: Additional information about the item specific to the source. This includes, e.g., the subreddit a Reddit submission or comment was posted to, the score, etc. We try to use the same names, e.g., `score` and `num_comments` for keys that have the same meaning/information across multiple sources.

This is a sample item from Reddit:

```JSON
{
"id": "ab8auf",
"body": "Technical Debt Explained (x-post r/Eve)",
"created_utc": 1546271789,
"author": "totally_100_human",
"source": "Reddit Submission",
"meta": {
"title": "Technical Debt Explained (x-post r/Eve)",
"score": 1,
"num_comments": 0,
"url": "http://jestertrek.com/eve/technical-debt-2.png",
"subreddit": "RCBRedditBot"
}
}
```

## Sample Analyses

We decided to use JSON to store the data, since it is easy to work with from multiple programming languages. In the following examples, we use [`jq`](https://stedolan.github.io/jq/) to process the JSON.

### How many items are there for each source?

```
lbzip2 -cd postscomments.json.bz2 | jq '.source' | sort | uniq -c
```

### How many submissions that mentioned technical debt were posted each month?

```
lbzip2 -cd postscomments.json.bz2 | jq 'select(.source == "Reddit Submission") | .created_utc | strftime("%Y-%m")' | sort | uniq -c
```

### What are the titles of items that link (`meta.url`) to PDF documents?

```
lbzip2 -cd postscomments.json.bz2 | jq '. as $r | select(.meta.url?) | .meta.url | select(endswith(".pdf")) | $r.body'
```

### Please, I want CSV!

```
lbzip2 -cd postscomments.json.bz2 | jq -r '[.id, .body, .author] | @csv'
```

Note that you need to specify the keys you want to include for the CSV, so it is easier to either ignore the meta information or process each source.

Please see [https://github.com/sse-lnu/tdmentions](https://github.com/sse-lnu/tdmentions) for more analyses

# Limitations and Future updates

The current version of the dataset lacks GitHub data and Medium comments. GitHub data will be added in the next update. Medium comments (responses) will be added in a future update if we find a good way to represent these.
Envestnet | Yodlee's USA Consumer Spending Data (De-Identified) |...
datarade.ai
.sql, .txt
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Envestnet | Yodlee, Envestnet | Yodlee's USA Consumer Spending Data (De-Identified) | Row/Aggregate Level | Consumer Data covering 3600+ public and private corporations [Dataset]. https://datarade.ai/data-products/envestnet-yodlee-s-de-identified-consumer-spending-data-r-envestnet-yodlee
Explore at:
.sql, .txtAvailable download formats
Dataset provided by
Yodlee
Envestnethttp://envestnet.com/
Authors
Envestnet | Yodlee
Area covered
United States of America
Description
Envestnet®| Yodlee®'s Consumer Spending Data (Aggregate/Row) Panels consist of de-identified, near-real time (T+1) USA credit/debit/ACH transaction level data – offering a wide view of the consumer activity ecosystem. The underlying data is sourced from end users leveraging the aggregation portion of the Envestnet®| Yodlee®'s financial technology platform.

Envestnet | Yodlee Consumer Panels (Aggregate/Row) include data relating to millions of transactions, including ticket size and merchant location. The dataset includes de-identified credit/debit card and bank transactions (such as a payroll deposit, account transfer, or mortgage payment). Our coverage offers insights into areas such as consumer, TMT, energy, REITs, internet, utilities, ecommerce, MBS, CMBS, equities, credit, commodities, FX, and corporate activity. We apply rigorous data science practices to deliver key KPIs daily that are focused, relevant, and ready to put into production.

We offer free trials. Our team is available to provide support for loading, validation, sample scripts, or other services you may need to generate insights from our data.

Investors, corporate researchers, and corporates can use our data to answer some key business questions such as: - How much are consumers spending with specific merchants/brands and how is that changing over time? - Is the share of consumer spend at a specific merchant increasing or decreasing? - How are consumers reacting to new products or services launched by merchants? - For loyal customers, how is the share of spend changing over time? - What is the company’s market share in a region for similar customers? - Is the company’s loyal user base increasing or decreasing? - Is the lifetime customer value increasing or decreasing?

Use Cases Categories (Our data provides an innumerable amount of use cases, and we look forward to working with new ones): 1. Market Research: Company Analysis, Company Valuation, Competitive Intelligence, Competitor Analysis, Competitor Analytics, Competitor Insights, Customer Data Enrichment, Customer Data Insights, Customer Data Intelligence, Demand Forecasting, Ecommerce Intelligence, Employee Pay Strategy, Employment Analytics, Job Income Analysis, Job Market Pricing, Marketing, Marketing Data Enrichment, Marketing Intelligence, Marketing Strategy, Payment History Analytics, Price Analysis, Pricing Analytics, Retail, Retail Analytics, Retail Intelligence, Retail POS Data Analysis, and Salary Benchmarking

Investment Research: Financial Services, Hedge Funds, Investing, Mergers & Acquisitions (M&A), Stock Picking, Venture Capital (VC)

Consumer Analysis: Consumer Data Enrichment, Consumer Intelligence

Market Data: Analytics B2C Data Enrichment, Bank Data Enrichment, Behavioral Analytics, Benchmarking, Customer Insights, Customer Intelligence, Data Enhancement, Data Enrichment, Data Intelligence, Data Modeling, Ecommerce Analysis, Ecommerce Data Enrichment, Economic Analysis, Financial Data Enrichment, Financial Intelligence, Local Economic Forecasting, Location-based Analytics, Market Analysis, Market Analytics, Market Intelligence, Market Potential Analysis, Market Research, Market Share Analysis, Sales, Sales Data Enrichment, Sales Enablement, Sales Insights, Sales Intelligence, Spending Analytics, Stock Market Predictions, and Trend Analysis.

Additional Use Cases: - Use spending data to analyze sales/revenue broadly (sector-wide) or granular (company-specific). Historically, our tracked consumer spend has correlated above 85% with company-reported data from thousands of firms. Users can sort and filter by many metrics and KPIs, such as sales and transaction growth rates and online or offline transactions, as well as view customer behavior within a geographic market at a state or city level. - Reveal cohort consumer behavior to decipher long-term behavioral consumer spending shifts. Measure market share, wallet share, loyalty, consumer lifetime value, retention, demographics, and more.) - Study the effects of inflation rates via such metrics as increased total spend, ticket size, and number of transactions. - Seek out alpha-generating signals or manage your business strategically with essential, aggregated transaction and spending data analytics.
Diabetes data
kaggle.com
Updated Jul 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Veronica Zheng (2020). Diabetes data [Dataset]. https://www.kaggle.com/datasets/veronicazheng/diabetes-data/discussion?sort=undefined
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 9, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Veronica Zheng
Description
Dataset

This dataset was created by Veronica Zheng

Released under Other (specified in description)

Contents
Integration of Slurry Separation Technology & Refrigeration Units: Air...
res1catalogd-o-tdatad-o-tgov.vcapture.xyz
catalog.data.gov
Updated Jun 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.usaid.gov (2024). Integration of Slurry Separation Technology & Refrigeration Units: Air Quality - Particulate Matter [Dataset]. https://res1catalogd-o-tdatad-o-tgov.vcapture.xyz/dataset/integration-of-slurry-separation-technology-refrigeration-units-air-quality-particulate-ma-26bf1
Explore at:
Dataset updated
Jun 25, 2024
Dataset provided by
United States Agency for International Developmenthttp://usaid.gov/
Description
This is the raw particulate matter data. Each sheet (tab) is formatted to be exported as a .csv for use with the R-code (AQ-June20.R). In order for this code to work properly, it is important that this file remain intact. Do not change the column names or codes for data, for example. And to be safe, don’t even sort. One simple change in the excel file could make the code full of bugs.
Integration of Slurry Separation Technology & Refrigeration Units: Air...
res1catalogd-o-tdatad-o-tgov.vcapture.xyz
datasets.ai
+1more
Updated Jun 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.usaid.gov (2024). Integration of Slurry Separation Technology & Refrigeration Units: Air Quality - H2S [Dataset]. https://res1catalogd-o-tdatad-o-tgov.vcapture.xyz/dataset/integration-of-slurry-separation-technology-refrigeration-units-air-quality-h2s-4af17
Explore at:
Dataset updated
Jun 25, 2024
Dataset provided by
United States Agency for International Developmenthttp://usaid.gov/
Description
This is the raw H2S data- concentration of H2S in parts per million in the biogas. Each sheet (tab) is formatted to be exported as a .csv for use with the R-code (AQ-June20.R). In order for this code to work properly, it is important that this file remain intact. Do not change the column names or codes for data, for example. And to be safe, don’t even sort. One simple change in the excel file could make the code full of bugs.
d
Data from: Genetic diversity and spatial genetic structure of the grassland...
datadryad.org
data.niaid.nih.gov
+2more
zip
Updated Jun 11, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sascha van der Meer; Hans Jacquemyn (2016). Genetic diversity and spatial genetic structure of the grassland perennial Saxifraga granulata along two river systems [Dataset]. http://doi.org/10.5061/dryad.q3d2m
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.q3d2m
Dataset updated
Jun 11, 2016
Dataset provided by
Dryad
Authors
Sascha van der Meer; Hans Jacquemyn
Time period covered
May 27, 2015
Area covered
Europe, Belgium
Description
GeneMapper data of 560 individuals of Saxifraga granulata collected along two rivers in BelgiumRaw GeneMapper data with four extra columns with information about the samples (i.e. Sort, River, Population, Individual). Without the first four columns data can be easily read via the r function read.GeneMapper() from the r-package 'polysat'.Saxifraga_Rivers_GeneMapper.xlsx

Facebook

Twitter

Click to copy link

Link copied

Cite

Chicago Police Department (2025). sort [Dataset]. https://data.cityofchicago.org/Public-Safety/sort/bnsx-zzcw

sort

Explore at:

xml, xlsx, csvAvailable download formats

Dataset updated

Aug 9, 2025

Authors

Chicago Police Department

Description

This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days. Data is extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system. In order to protect the privacy of crime victims, addresses are shown at the block level only and specific locations are not identified. Should you have questions about this dataset, you may contact the Research & Development Division of the Chicago Police Department at 312.745.6071 or RandD@chicagopolice.org. Disclaimer: These crimes may be based upon preliminary information supplied to the Police Department by the reporting parties that have not been verified. The preliminary crime classifications may be changed at a later date based upon additional investigation and there is always the possibility of mechanical or human error. Therefore, the Chicago Police Department does not guarantee (either expressed or implied) the accuracy, completeness, timeliness, or correct sequencing of the information and the information should not be used for comparison purposes over time. The Chicago Police Department will not be responsible for any error or omission, or for the use of, or the results obtained from the use of this information. All data visualizations on maps should be considered approximate and attempts to derive specific addresses are strictly prohibited. The Chicago Police Department is not responsible for the content of any off-site pages that are referenced by or that reference this web page other than an official City of Chicago or Chicago Police Department web page. The user specifically acknowledges that the Chicago Police Department is not responsible for any defamatory, offensive, misleading, or illegal conduct of other users, links, or third parties and that the risk of injury from the foregoing rests entirely with the user. The unauthorized use of the words "Chicago Police Department," "Chicago Police," or any colorable imitation of these words or the unauthorized use of the Chicago Police Department logo is unlawful. This web page does not, in any way, authorize such use. Data is updated daily Tuesday through Sunday. The dataset contains more than 65,000 records/rows of data and cannot be viewed in full in Microsoft Excel. Therefore, when downloading the file, select CSV from the Export menu. Open the file in an ASCII text editor, such as Wordpad, to view and search. To access a list of Chicago Police Department - Illinois Uniform Crime Reporting (IUCR) codes, go to http://data.cityofchicago.org/Public-Safety/Chicago-Police-Department-Illinois-Uniform-Crime-R/c7ck-438e

Clear search

Close search

Google apps

Main menu

sort

Replication Data for "Why Partisans Don't Sort: The Constraints on Partisan...

Replication Data for: Why Partisans Don't Sort

Case Study: Cyclist

Phase 1: ASK

Key Objectives:

Phase 2: PREPARE:

Key Objectives:

Phase 3: PROCESS

Key Objectives:

ACNC 2019 Annual Information Statement Data

This dataset is updated weekly. Please ensure that you use the most up-to-date version.###\r

Data from: Pueblo Grande (AZ U:9:1(ASM)): Unit 15, Washington and 48th...

Sediment macrofauna count data and images of multicores collected during R/V...

Explore data formats and ingestion methods

Why this Dataset

Iris Dataset

Content

Acknowledgements

Inspiration

Water Data Online

First quarter 2024 / Table BOAMP-SIREN-BUYERS (BSA): a cross between the...

Additional file 1: of Best-worst scaling improves measurement of first...

Integration of Slurry Separation Technology & Refrigeration Units: Air...

Integration of Slurry Separation Technology & Refrigeration Units: Air...

Integration of Slurry Separation Technology & Refrigeration Units: Air...

Data from: TDMentions: A Dataset of Technical Debt Mentions in Online Posts

Envestnet | Yodlee's USA Consumer Spending Data (De-Identified) |...

Diabetes data

Dataset

Contents

Integration of Slurry Separation Technology & Refrigeration Units: Air...

Integration of Slurry Separation Technology & Refrigeration Units: Air...

Data from: Genetic diversity and spatial genetic structure of the grassland...

sortSee More Versions

sort