41 datasets found
  1. C

    sort

    • data.cityofchicago.org
    csv, xlsx, xml
    Updated Aug 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chicago Police Department (2025). sort [Dataset]. https://data.cityofchicago.org/Public-Safety/sort/bnsx-zzcw
    Explore at:
    xml, xlsx, csvAvailable download formats
    Dataset updated
    Aug 9, 2025
    Authors
    Chicago Police Department
    Description

    This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days. Data is extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system. In order to protect the privacy of crime victims, addresses are shown at the block level only and specific locations are not identified. Should you have questions about this dataset, you may contact the Research & Development Division of the Chicago Police Department at 312.745.6071 or RandD@chicagopolice.org. Disclaimer: These crimes may be based upon preliminary information supplied to the Police Department by the reporting parties that have not been verified. The preliminary crime classifications may be changed at a later date based upon additional investigation and there is always the possibility of mechanical or human error. Therefore, the Chicago Police Department does not guarantee (either expressed or implied) the accuracy, completeness, timeliness, or correct sequencing of the information and the information should not be used for comparison purposes over time. The Chicago Police Department will not be responsible for any error or omission, or for the use of, or the results obtained from the use of this information. All data visualizations on maps should be considered approximate and attempts to derive specific addresses are strictly prohibited. The Chicago Police Department is not responsible for the content of any off-site pages that are referenced by or that reference this web page other than an official City of Chicago or Chicago Police Department web page. The user specifically acknowledges that the Chicago Police Department is not responsible for any defamatory, offensive, misleading, or illegal conduct of other users, links, or third parties and that the risk of injury from the foregoing rests entirely with the user. The unauthorized use of the words "Chicago Police Department," "Chicago Police," or any colorable imitation of these words or the unauthorized use of the Chicago Police Department logo is unlawful. This web page does not, in any way, authorize such use. Data is updated daily Tuesday through Sunday. The dataset contains more than 65,000 records/rows of data and cannot be viewed in full in Microsoft Excel. Therefore, when downloading the file, select CSV from the Export menu. Open the file in an ASCII text editor, such as Wordpad, to view and search. To access a list of Chicago Police Department - Illinois Uniform Crime Reporting (IUCR) codes, go to http://data.cityofchicago.org/Public-Safety/Chicago-Police-Department-Illinois-Uniform-Crime-R/c7ck-438e

  2. H

    Replication Data for "Why Partisans Don't Sort: The Constraints on Partisan...

    • dataverse.harvard.edu
    Updated May 2, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Clayton Nall; Jonathan Mummolo (2016). Replication Data for "Why Partisans Don't Sort: The Constraints on Partisan Segregation" [Dataset]. http://doi.org/10.7910/DVN/EDGRDC
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 2, 2016
    Dataset provided by
    Harvard Dataverse
    Authors
    Clayton Nall; Jonathan Mummolo
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Contains data and R scripts for the JOP article, "Why Partisans Don't Sort: The Constraints on Political Segregation." When downloading tabular data files, ensure that they appear in your working directory in CSV format.

  3. d

    Replication Data for: Why Partisans Don't Sort

    • search.dataone.org
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nall, Clayton; Mummolo, Jonathan (2023). Replication Data for: Why Partisans Don't Sort [Dataset]. http://doi.org/10.7910/DVN/EHVYNN
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Nall, Clayton; Mummolo, Jonathan
    Description

    Contains R scripts and data needed to reproduce the analyses found in Mummolo and Nall, "Why Partisans Don't Sort: The Constraints on Political Segregation." Read READ ME FIRST.rtf or READ ME FIRST.pdf for instructions on executing replication archive contents.

  4. Case Study: Cyclist

    • kaggle.com
    Updated Jul 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PatrickRCampbell (2021). Case Study: Cyclist [Dataset]. https://www.kaggle.com/patrickrcampbell/case-study-cyclist/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 27, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    PatrickRCampbell
    Description

    Phase 1: ASK

    Key Objectives:

    1. Business Task * Cyclist is looking to increase their earnings, and wants to know if creating a social media campaign can influence "Casual" users to become "Annual" members.

    2. Key Stakeholders: * The main stakeholder from Cyclist is Lily Moreno, whom is the Director of Marketing and responsible for the development of campaigns and initiatives to promote their bike-share program. The other teams involved with this project will be Marketing & Analytics, and the Executive Team.

    3. Business Task: * Comparing the two kinds of users and defining how they use the platform, what variables they have in common, what variables are different, and how can they get Casual users to become Annual members

    Phase 2: PREPARE:

    Key Objectives:

    1. Determine Data Credibility * Cyclist provided data from years 2013-2021 (through March 2021), all of which is first-hand data collected by the company.

    2. Sort & Filter Data: * The stakeholders want to know how the current users are using their service, so I am focusing on using the data from 2020-2021 since this is the most relevant period of time to answer the business task.

    #Installing packages
    install.packages("tidyverse", repos = "http://cran.us.r-project.org")
    install.packages("readr", repos = "http://cran.us.r-project.org")
    install.packages("janitor", repos = "http://cran.us.r-project.org")
    install.packages("geosphere", repos = "http://cran.us.r-project.org")
    install.packages("gridExtra", repos = "http://cran.us.r-project.org")
    
    library(tidyverse)
    library(readr)
    library(janitor)
    library(geosphere)
    library(gridExtra)
    
    #Importing data & verifying the information within the dataset
    all_tripdata_clean <- read.csv("/Data Projects/cyclist/cyclist_data_cleaned.csv")
    
    glimpse(all_tripdata_clean)
    
    summary(all_tripdata_clean)
    
    

    Phase 3: PROCESS

    Key Objectives:

    1. Cleaning Data & Preparing for Analysis: * Once the data has been placed into one dataset, and checked for errors, we began cleaning the data. * Eliminating data that correlates to the company servicing the bikes, and any ride with a traveled distance of zero. * New columns will be added to assist in the analysis, and to provide accurate assessments of whom is using the bikes.

    #Eliminating any data that represents the company performing maintenance, and trips without any measureable distance
    all_tripdata_clean <- all_tripdata_clean[!(all_tripdata_clean$start_station_name == "HQ QR" | all_tripdata_clean$ride_length<0),] 
    
    #Creating columns for the individual date components (days_of_week should be run last)
    all_tripdata_clean$day_of_week <- format(as.Date(all_tripdata_clean$date), "%A")
    all_tripdata_clean$date <- as.Date(all_tripdata_clean$started_at)
    all_tripdata_clean$day <- format(as.Date(all_tripdata_clean$date), "%d")
    all_tripdata_clean$month <- format(as.Date(all_tripdata_clean$date), "%m")
    all_tripdata_clean$year <- format(as.Date(all_tripdata_clean$date), "%Y")
    
    

    ** Now I will begin calculating the length of rides being taken, distance traveled, and the mean amount of time & distance.**

    #Calculating the ride length in miles & minutes
    all_tripdata_clean$ride_length <- difftime(all_tripdata_clean$ended_at,all_tripdata_clean$started_at,units = "mins")
    
    all_tripdata_clean$ride_distance <- distGeo(matrix(c(all_tripdata_clean$start_lng, all_tripdata_clean$start_lat), ncol = 2), matrix(c(all_tripdata_clean$end_lng, all_tripdata_clean$end_lat), ncol = 2))
    all_tripdata_clean$ride_distance = all_tripdata_clean$ride_distance/1609.34 #converting to miles
    
    #Calculating the mean time and distance based on the user groups
    userType_means <- all_tripdata_clean %>% group_by(member_casual) %>% summarise(mean_time = mean(ride_length))
    
    
    userType_means <- all_tripdata_clean %>% 
     group_by(member_casual) %>% 
     summarise(mean_time = mean(ride_length),mean_distance = mean(ride_distance))
    

    Adding in calculations that will differentiate between bike types and which type of user is using each specific bike type.

    #Calculations
    
    with_bike_type <- all_tripdata_clean %>% filter(rideable_type=="classic_bike" | rideable_type=="electric_bike")
    
    with_bike_type %>%
     mutate(weekday = wday(started_at, label = TRUE)) %>% 
     group_by(member_casual,rideable_type,weekday) %>%
     summarise(totals=n(), .groups="drop") %>%
     
    with_bike_type %>%
     group_by(member_casual,rideable_type) %>%
     summarise(totals=n(), .groups="drop") %>%
    
     #Calculating the ride differential
     
     all_tripdata_clean %>% 
     mutate(weekday = wkday(started_at, label = TRUE)) %>% 
     group_by(member_casual, weekday) %>% 
     summarise(number_of_rides = n()
          ,average_duration = mean(ride_length),.groups = 'drop') %>% 
     arrange(me...
    
  5. ACNC 2019 Annual Information Statement Data

    • researchdata.edu.au
    • data.gov.au
    Updated May 10, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Australian Charities and Not-for-profits Commission (ACNC) (2021). ACNC 2019 Annual Information Statement Data [Dataset]. https://researchdata.edu.au/acnc-2019-annual-statement-data/2975980
    Explore at:
    Dataset updated
    May 10, 2021
    Dataset provided by
    Data.govhttps://data.gov/
    Authors
    Australian Charities and Not-for-profits Commission (ACNC)
    License

    Attribution 2.5 (CC BY 2.5)https://creativecommons.org/licenses/by/2.5/
    License information was derived automatically

    Description

    This dataset is updated weekly. Please ensure that you use the most up-to-date version.###\r

    \r The Australian Charities and Not-for-profits Commission (ACNC) is Australia’s national regulator of charities.\r \r Since 3 December 2012, charities wanting to access Commonwealth charity tax concessions (and other benefits), need to register with the ACNC. Although many charities choose to register, registration with the ACNC is voluntary.\r \r Each year, registered charities are required to lodge an Annual Information Statement (AIS) with the ACNC. Charities are required to submit their AIS within six months of the end of their reporting period.\r \r Registered charities can apply to the ACNC to have some or all of the information they provide withheld from the ACNC Register. However, there are only limited circumstances when the ACNC can agree to withhold information. If a charity has applied to have their data withheld, the AIS data relating to that charity has been excluded from this dataset.\r \r This dataset can be used to find the AIS information lodged by multiple charities. It can also be used to filter and sort by different variables across all AIS information.\r \r This dataset can be used to find the AIS information lodged by multiple charities. It can also be used to filter and sort by different variables across all AIS information. AIS Information for individual charities can be viewed via the ACNC Charity Register.\r \r The AIS collects information about charity finances, and financial information provides a basis for understanding the charity and its activities in greater detail. \r We have published explanatory notes to help you understand this dataset.\r \r When comparing charities’ financial information it is important to consider each charity's unique situation. This is particularly true for small charities, which are not compelled to provide financial reports – reports that often contain more details about their financial position and activities – as part of their AIS.\r \r For more information on interpreting financial information, please refer to the ACNC website.\r \r The ACNC also publishes other datasets on data.gov.au as part of our commitment to open data and transparent regulation. Please click here to view them.\r \r NOTE: It is possible that some information in this dataset might be subject to a future request from a charity to have their information withheld. If this occurs, this information will still appear in the dataset until the next update.\r \r Please consider this risk when using this dataset.

  6. d

    Data from: Pueblo Grande (AZ U:9:1(ASM)): Unit 15, Washington and 48th...

    • search.dataone.org
    Updated Jul 8, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abbott, David R.; Martin, Maria (2014). Pueblo Grande (AZ U:9:1(ASM)): Unit 15, Washington and 48th Streets: SSI Kitchell Data Recovery, Ceramic Rough Sort (RS_CERAMICS) Data [Dataset]. http://doi.org/10.6067/XCV8BG2MZH
    Explore at:
    Dataset updated
    Jul 8, 2014
    Dataset provided by
    the Digital Archaeological Record
    Authors
    Abbott, David R.; Martin, Maria
    Area covered
    Description

    The Kitchell Data Recovery project Ceramic Rough Sort (RS_CERAMICS) Data sheet contains data from the rough sort analysis of ceramics recovered during the Kitchell data recovery project. It contains information on ceramic types, tempers and counts; it also records vessel and rim forms where applicable. The data sheet also contains rim circumference and rim diameter measurements for some ceramic specimens.

    See Partial Data Recovery and Burial Removal at Pueblo Grande (AZ U:9:1 (ASM)): Unit 15, The Former Maricopa County Sheriff's Substation, Washington and 48th Streets, Phoenix, Arizona (SSI Technical Report No. 02-43) for the final report on the Kitchell Data Recovery project.

  7. d

    Sediment macrofauna count data and images of multicores collected during R/V...

    • search.dataone.org
    • data.griidc.org
    Updated Feb 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MacDonald, Ian (2025). Sediment macrofauna count data and images of multicores collected during R/V Weatherbird II cruise 1305, September 22-29, 2012 [Dataset]. http://doi.org/10.7266/N7BV7DKC
    Explore at:
    Dataset updated
    Feb 5, 2025
    Dataset provided by
    GRIIDC
    Authors
    MacDonald, Ian
    Description

    This dataset contains 146 jpeg images of multicores collected during R/V Weatherbird II cruise 1305 from September 22nd to 29th 2012. Additionally, this includes a file of raw sort data for macrofauna to the family level.

  8. Explore data formats and ingestion methods

    • kaggle.com
    Updated Feb 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabriel Preda (2021). Explore data formats and ingestion methods [Dataset]. https://www.kaggle.com/datasets/gpreda/iris-dataset/discussion?sort=undefined
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 12, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Gabriel Preda
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Why this Dataset

    This dataset brings to you Iris Dataset in several data formats (see more details in the next sections).

    You can use it to test the ingestion of data in all these formats using Python or R libraries. We also prepared Python Jupyter Notebook and R Markdown report that input all these formats:

    Iris Dataset

    Iris Dataset was created by R. A. Fisher and donated by Michael Marshall.

    Repository on UCI site: https://archive.ics.uci.edu/ml/datasets/iris

    Data Source: https://archive.ics.uci.edu/ml/machine-learning-databases/iris/

    The file downloaded is iris.data and is formatted as a comma delimited file.

    This small data collection was created to help you test your skills with ingesting various data formats.

    Content

    This file was processed to convert the data in the following formats: * csv - comma separated values format * tsv - tab separated values format * parquet - parquet format
    * feather - feather format * parquet.gzip - compressed parquet format * h5 - hdf5 format * pickle - Python binary object file - pickle format * xslx - Excel format
    * npy - Numpy (Python library) binary format * npz - Numpy (Python library) binary compressed format * rds - Rds (R specific data format) binary format

    Acknowledgements

    I would like to acknowledge the work of the creator of the dataset - R. A. Fisher and of the donor - Michael Marshall.

    Inspiration

    Use these data formats to test your skills in ingesting data in various formats.

  9. Water Data Online

    • researchdata.edu.au
    • data.gov.au
    • +1more
    Updated Oct 21, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bureau of Meteorology (2014). Water Data Online [Dataset]. https://researchdata.edu.au/water-data-online/3528495
    Explore at:
    Dataset updated
    Oct 21, 2014
    Dataset provided by
    Data.govhttps://data.gov/
    Authors
    Bureau of Meteorology
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Area covered
    Description

    Water Data Online provides free access to nationally consistent, current and historical water information. It allows you to view and download standardised data and reports. \r \r Watercourse level and watercourse discharge time series data from approximately 3500 water monitoring stations across Australia are available. \r \r Water Data Online displays time series data supplied by lead water agencies from each State and Territory with updates provided to the Bureau on a daily basis. \r \r Over time, more stations and parameters will become available and linkages to Water Data Online from the Geofabric will be implemented. \r \r Before using data please refer to licence preferences of the supplying organisations under the Copyright tab \r \r

  10. g

    First quarter 2024 / Table BOAMP-SIREN-BUYERS (BSA): a cross between the...

    • gimi9.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    First quarter 2024 / Table BOAMP-SIREN-BUYERS (BSA): a cross between the BOAMP table (DILA) and the Sirene Business Base (INSEE) | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_6644ac7663969d80f6047dd8/
    Explore at:
    Description

    Crossing table of the BOAMP table (DILA) with the Sirene Business Base (INSEE) / First Quarter 2024. - The BUYER's Siren number (column "SN_30_Siren") is implemented for each ad (column and primary key "B_17_idweb"); - Several columns facilitating datamining have been added; - The names of the original columns have been prefixed, numbered and sorted alphabetically. ---- You will find here - The BSA for the first quarter of 2024 in free and open access (csv/separator ** semicolon** formats, and Public Prosecutor’s Office); - The schema of the BSA table (csv/comma separator format); - An excerpt from the March 30 BSA (csv/comma separator format) to quickly give you an idea of the Datagouv explorer. NB / The March 30 extract sees its columns of cells in json GESTION, DATA, and ANNONCES_ANTERIEURES purged. The data thus deleted can be found in a nicer format by following the links of the added columns: - B_41_GESTION_URL_JSON; - B_43_DONNEES_URL_JSON; - B_45_ANNONCES_ANTERIEURES_URL_JSON. ---- More info - Daily and paid updates on the entire BOAMP 2024 are available on our website under ► AuFilDuBoamp Downloads; - Further documentation can be found at ► AuFilDuBoamp Doc & TP. ---- Data sources - SIRENE database of companies and their establishments (SIREN, SIRET) of August - BOAMP API ---- To download the first quarter of the BSA with Python, run: For the CSV: df = pd.read_csv("https://www.data.gouv.fr/en/datasets/r/63f0d792-148a-4c95-a0b6-9e8ea8b0b34a", dtype='string', sep=';') For the Public Prosecutor's Office: df = pd.read_parquet("https://www.data.gouv.fr/en/datasets/r/f7a4a76e-ff50-4dc6-bae8-97368081add2") Enjoy! https://www.aufilduboamp.com/shares/aufilduboamp_docs/ap_tampon_blanc.jpg" alt="www.aufilduboamp.com and BOAMP data on datagouv" title="www.aufilduboamp.com and BOAMP data on datagouv">

  11. Additional file 1: of Best-worst scaling improves measurement of first...

    • springernature.figshare.com
    zip
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nichola Burton; Michael Burton; Dan Rigby; Clare Sutherland; Gillian Rhodes (2023). Additional file 1: of Best-worst scaling improves measurement of first impressions [Dataset]. http://doi.org/10.6084/m9.figshare.9894992.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Nichola Burton; Michael Burton; Dan Rigby; Clare Sutherland; Gillian Rhodes
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    R scripts used to generate the design and sort and score the data of Study 3, with annotation: intended as a template to build future BWS studies. (ZIP 15 kb)

  12. Integration of Slurry Separation Technology & Refrigeration Units: Air...

    • catalog.data.gov
    • s.cnmilf.com
    • +1more
    Updated Jun 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.usaid.gov (2024). Integration of Slurry Separation Technology & Refrigeration Units: Air Quality - CO [Dataset]. https://catalog.data.gov/dataset/integration-of-slurry-separation-technology-refrigeration-units-air-quality-co-b7d1e
    Explore at:
    Dataset updated
    Jun 25, 2024
    Dataset provided by
    United States Agency for International Developmenthttp://usaid.gov/
    Description

    This is the carbon monoxide data. Each sheet (tab) is formatted to be exported as a .csv for use with the R-code (AQ-June20.R). In order for this code to work properly, it is important that this file remain intact. Do not change the column names or codes for data, for example. And to be safe, don’t even sort. Just in case. One simple change in the excel file could make the code full of bugs.

  13. g

    Integration of Slurry Separation Technology & Refrigeration Units: Air...

    • gimi9.com
    Updated Jun 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Integration of Slurry Separation Technology & Refrigeration Units: Air Quality - PMVa | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_integration-of-slurry-separation-technology-refrigeration-units-air-quality-pmva-87359/
    Explore at:
    Dataset updated
    Jun 25, 2024
    Description

    This is the gravimetric data used to calibrate the real time readings. Each sheet (tab) is formatted to be exported as a .csv for use with the R-code (AQ-June20.R). In order for this code to work properly, it is important that this file remain intact. Do not change the column names or codes for data, for example. And to be safe, don’t even sort. One simple change in the excel file could make the code full of bugs.

  14. Integration of Slurry Separation Technology & Refrigeration Units: Air...

    • catalog.data.gov
    Updated Jun 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.usaid.gov (2024). Integration of Slurry Separation Technology & Refrigeration Units: Air Quality - SO2 [Dataset]. https://catalog.data.gov/dataset/integration-of-slurry-separation-technology-refrigeration-units-air-quality-so2-a3122
    Explore at:
    Dataset updated
    Jun 25, 2024
    Dataset provided by
    United States Agency for International Developmenthttp://usaid.gov/
    Description

    This is the raw SO2 data. Each sheet (tab) is formatted to be exported as a .csv for use with the R-code (AQ-June20.R). In order for this code to work properly, it is important that this file remain intact. Do not change the column names or codes for data, for example. And to be safe, don’t even sort. One simple change in the excel file could make the code full of bugs.

  15. Data from: TDMentions: A Dataset of Technical Debt Mentions in Online Posts

    • zenodo.org
    bin, bz2
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Morgan Ericsson; Morgan Ericsson; Anna Wingkvist; Anna Wingkvist (2020). TDMentions: A Dataset of Technical Debt Mentions in Online Posts [Dataset]. http://doi.org/10.5281/zenodo.2593142
    Explore at:
    bin, bz2Available download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Morgan Ericsson; Morgan Ericsson; Anna Wingkvist; Anna Wingkvist
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    # TDMentions: A Dataset of Technical Debt Mentions in Online Posts (version 1.0)

    TDMentions is a dataset that contains mentions of technical debt from Reddit, Hacker News, and Stack Exchange. It also contains a list of blog posts on Medium that were tagged as technical debt. The dataset currently contains approximately 35,000 items.

    ## Data collection and processing

    The dataset is mainly collected from existing datasets. We used data from:

    - the archive of Reddit posts by Jason Baumgartner (available at [https://pushshift.io](https://pushshift.io),
    - the archive of Hacker News available at Google's BigQuery (available at [https://console.cloud.google.com/marketplace/details/y-combinator/hacker-news](https://console.cloud.google.com/marketplace/details/y-combinator/hacker-news)), and the Stack Exchange data dump (available at [https://archive.org/details/stackexchange](https://archive.org/details/stackexchange)).
    - the [GHTorrent](http://ghtorrent.org) project
    - the [GH Archive](https://www.gharchive.org)

    The data set currently contains data from the start of each source/service until 2018-12-31. For GitHub, we currently only include data from 2015-01-01.

    We use the regular expression `tech(nical)?[\s\-_]*?debt` to find mentions in all sources except for Medium. We decided to limit our matches to variations of technical debt and tech debt. Other shorter forms, such as TD, can result in too many false positives. For Medium, we used the tag `technical-debt`.

    ## Data Format

    The dataset is stored as a compressed (bzip2) JSON file with one JSON object per line. Each mention is represented as a JSON object with the following keys.

    - `id`: the id used in the original source. We use the URL path to identify Medium posts.
    - `body`: the text that contains the mention. This is either the comment or the title of the post. For Medium posts this is the title and subtitle (which might not mention technical debt, since posts are identified by the tag).
    - `created_utc`: the time the item was posted in seconds since epoch in UTC.
    - `author`: the author of the item. We use the username or userid from the source.
    - `source`: where the item was posted. Valid sources are:
    - HackerNews Comment
    - HackerNews Job
    - HackerNews Submission
    - Reddit Comment
    - Reddit Submission
    - StackExchange Answer
    - StackExchange Comment
    - StackExchange Question
    - Medium Post
    - `meta`: Additional information about the item specific to the source. This includes, e.g., the subreddit a Reddit submission or comment was posted to, the score, etc. We try to use the same names, e.g., `score` and `num_comments` for keys that have the same meaning/information across multiple sources.

    This is a sample item from Reddit:

    ```JSON
    {
    "id": "ab8auf",
    "body": "Technical Debt Explained (x-post r/Eve)",
    "created_utc": 1546271789,
    "author": "totally_100_human",
    "source": "Reddit Submission",
    "meta": {
    "title": "Technical Debt Explained (x-post r/Eve)",
    "score": 1,
    "num_comments": 0,
    "url": "http://jestertrek.com/eve/technical-debt-2.png",
    "subreddit": "RCBRedditBot"
    }
    }
    ```

    ## Sample Analyses

    We decided to use JSON to store the data, since it is easy to work with from multiple programming languages. In the following examples, we use [`jq`](https://stedolan.github.io/jq/) to process the JSON.

    ### How many items are there for each source?

    ```
    lbzip2 -cd postscomments.json.bz2 | jq '.source' | sort | uniq -c
    ```

    ### How many submissions that mentioned technical debt were posted each month?

    ```
    lbzip2 -cd postscomments.json.bz2 | jq 'select(.source == "Reddit Submission") | .created_utc | strftime("%Y-%m")' | sort | uniq -c
    ```

    ### What are the titles of items that link (`meta.url`) to PDF documents?

    ```
    lbzip2 -cd postscomments.json.bz2 | jq '. as $r | select(.meta.url?) | .meta.url | select(endswith(".pdf")) | $r.body'
    ```

    ### Please, I want CSV!

    ```
    lbzip2 -cd postscomments.json.bz2 | jq -r '[.id, .body, .author] | @csv'
    ```

    Note that you need to specify the keys you want to include for the CSV, so it is easier to either ignore the meta information or process each source.

    Please see [https://github.com/sse-lnu/tdmentions](https://github.com/sse-lnu/tdmentions) for more analyses

    # Limitations and Future updates

    The current version of the dataset lacks GitHub data and Medium comments. GitHub data will be added in the next update. Medium comments (responses) will be added in a future update if we find a good way to represent these.

  16. Envestnet | Yodlee's USA Consumer Spending Data (De-Identified) |...

    • datarade.ai
    .sql, .txt
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Envestnet | Yodlee, Envestnet | Yodlee's USA Consumer Spending Data (De-Identified) | Row/Aggregate Level | Consumer Data covering 3600+ public and private corporations [Dataset]. https://datarade.ai/data-products/envestnet-yodlee-s-de-identified-consumer-spending-data-r-envestnet-yodlee
    Explore at:
    .sql, .txtAvailable download formats
    Dataset provided by
    Yodlee
    Envestnethttp://envestnet.com/
    Authors
    Envestnet | Yodlee
    Area covered
    United States of America
    Description

    Envestnet®| Yodlee®'s Consumer Spending Data (Aggregate/Row) Panels consist of de-identified, near-real time (T+1) USA credit/debit/ACH transaction level data – offering a wide view of the consumer activity ecosystem. The underlying data is sourced from end users leveraging the aggregation portion of the Envestnet®| Yodlee®'s financial technology platform.

    Envestnet | Yodlee Consumer Panels (Aggregate/Row) include data relating to millions of transactions, including ticket size and merchant location. The dataset includes de-identified credit/debit card and bank transactions (such as a payroll deposit, account transfer, or mortgage payment). Our coverage offers insights into areas such as consumer, TMT, energy, REITs, internet, utilities, ecommerce, MBS, CMBS, equities, credit, commodities, FX, and corporate activity. We apply rigorous data science practices to deliver key KPIs daily that are focused, relevant, and ready to put into production.

    We offer free trials. Our team is available to provide support for loading, validation, sample scripts, or other services you may need to generate insights from our data.

    Investors, corporate researchers, and corporates can use our data to answer some key business questions such as: - How much are consumers spending with specific merchants/brands and how is that changing over time? - Is the share of consumer spend at a specific merchant increasing or decreasing? - How are consumers reacting to new products or services launched by merchants? - For loyal customers, how is the share of spend changing over time? - What is the company’s market share in a region for similar customers? - Is the company’s loyal user base increasing or decreasing? - Is the lifetime customer value increasing or decreasing?

    Use Cases Categories (Our data provides an innumerable amount of use cases, and we look forward to working with new ones): 1. Market Research: Company Analysis, Company Valuation, Competitive Intelligence, Competitor Analysis, Competitor Analytics, Competitor Insights, Customer Data Enrichment, Customer Data Insights, Customer Data Intelligence, Demand Forecasting, Ecommerce Intelligence, Employee Pay Strategy, Employment Analytics, Job Income Analysis, Job Market Pricing, Marketing, Marketing Data Enrichment, Marketing Intelligence, Marketing Strategy, Payment History Analytics, Price Analysis, Pricing Analytics, Retail, Retail Analytics, Retail Intelligence, Retail POS Data Analysis, and Salary Benchmarking

    1. Investment Research: Financial Services, Hedge Funds, Investing, Mergers & Acquisitions (M&A), Stock Picking, Venture Capital (VC)

    2. Consumer Analysis: Consumer Data Enrichment, Consumer Intelligence

    3. Market Data: Analytics B2C Data Enrichment, Bank Data Enrichment, Behavioral Analytics, Benchmarking, Customer Insights, Customer Intelligence, Data Enhancement, Data Enrichment, Data Intelligence, Data Modeling, Ecommerce Analysis, Ecommerce Data Enrichment, Economic Analysis, Financial Data Enrichment, Financial Intelligence, Local Economic Forecasting, Location-based Analytics, Market Analysis, Market Analytics, Market Intelligence, Market Potential Analysis, Market Research, Market Share Analysis, Sales, Sales Data Enrichment, Sales Enablement, Sales Insights, Sales Intelligence, Spending Analytics, Stock Market Predictions, and Trend Analysis.

    Additional Use Cases: - Use spending data to analyze sales/revenue broadly (sector-wide) or granular (company-specific). Historically, our tracked consumer spend has correlated above 85% with company-reported data from thousands of firms. Users can sort and filter by many metrics and KPIs, such as sales and transaction growth rates and online or offline transactions, as well as view customer behavior within a geographic market at a state or city level. - Reveal cohort consumer behavior to decipher long-term behavioral consumer spending shifts. Measure market share, wallet share, loyalty, consumer lifetime value, retention, demographics, and more.) - Study the effects of inflation rates via such metrics as increased total spend, ticket size, and number of transactions. - Seek out alpha-generating signals or manage your business strategically with essential, aggregated transaction and spending data analytics.

  17. Diabetes data

    • kaggle.com
    Updated Jul 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Veronica Zheng (2020). Diabetes data [Dataset]. https://www.kaggle.com/datasets/veronicazheng/diabetes-data/discussion?sort=undefined
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 9, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Veronica Zheng
    Description

    Dataset

    This dataset was created by Veronica Zheng

    Released under Other (specified in description)

    Contents

  18. Integration of Slurry Separation Technology & Refrigeration Units: Air...

    • res1catalogd-o-tdatad-o-tgov.vcapture.xyz
    • catalog.data.gov
    Updated Jun 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.usaid.gov (2024). Integration of Slurry Separation Technology & Refrigeration Units: Air Quality - Particulate Matter [Dataset]. https://res1catalogd-o-tdatad-o-tgov.vcapture.xyz/dataset/integration-of-slurry-separation-technology-refrigeration-units-air-quality-particulate-ma-26bf1
    Explore at:
    Dataset updated
    Jun 25, 2024
    Dataset provided by
    United States Agency for International Developmenthttp://usaid.gov/
    Description

    This is the raw particulate matter data. Each sheet (tab) is formatted to be exported as a .csv for use with the R-code (AQ-June20.R). In order for this code to work properly, it is important that this file remain intact. Do not change the column names or codes for data, for example. And to be safe, don’t even sort. One simple change in the excel file could make the code full of bugs.

  19. Integration of Slurry Separation Technology & Refrigeration Units: Air...

    • res1catalogd-o-tdatad-o-tgov.vcapture.xyz
    • datasets.ai
    • +1more
    Updated Jun 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.usaid.gov (2024). Integration of Slurry Separation Technology & Refrigeration Units: Air Quality - H2S [Dataset]. https://res1catalogd-o-tdatad-o-tgov.vcapture.xyz/dataset/integration-of-slurry-separation-technology-refrigeration-units-air-quality-h2s-4af17
    Explore at:
    Dataset updated
    Jun 25, 2024
    Dataset provided by
    United States Agency for International Developmenthttp://usaid.gov/
    Description

    This is the raw H2S data- concentration of H2S in parts per million in the biogas. Each sheet (tab) is formatted to be exported as a .csv for use with the R-code (AQ-June20.R). In order for this code to work properly, it is important that this file remain intact. Do not change the column names or codes for data, for example. And to be safe, don’t even sort. One simple change in the excel file could make the code full of bugs.

  20. d

    Data from: Genetic diversity and spatial genetic structure of the grassland...

    • datadryad.org
    • data.niaid.nih.gov
    • +2more
    zip
    Updated Jun 11, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sascha van der Meer; Hans Jacquemyn (2016). Genetic diversity and spatial genetic structure of the grassland perennial Saxifraga granulata along two river systems [Dataset]. http://doi.org/10.5061/dryad.q3d2m
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 11, 2016
    Dataset provided by
    Dryad
    Authors
    Sascha van der Meer; Hans Jacquemyn
    Time period covered
    May 27, 2015
    Area covered
    Europe, Belgium
    Description

    GeneMapper data of 560 individuals of Saxifraga granulata collected along two rivers in BelgiumRaw GeneMapper data with four extra columns with information about the samples (i.e. Sort, River, Population, Individual). Without the first four columns data can be easily read via the r function read.GeneMapper() from the r-package 'polysat'.Saxifraga_Rivers_GeneMapper.xlsx

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Chicago Police Department (2025). sort [Dataset]. https://data.cityofchicago.org/Public-Safety/sort/bnsx-zzcw

sort

Explore at:
xml, xlsx, csvAvailable download formats
Dataset updated
Aug 9, 2025
Authors
Chicago Police Department
Description

This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days. Data is extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system. In order to protect the privacy of crime victims, addresses are shown at the block level only and specific locations are not identified. Should you have questions about this dataset, you may contact the Research & Development Division of the Chicago Police Department at 312.745.6071 or RandD@chicagopolice.org. Disclaimer: These crimes may be based upon preliminary information supplied to the Police Department by the reporting parties that have not been verified. The preliminary crime classifications may be changed at a later date based upon additional investigation and there is always the possibility of mechanical or human error. Therefore, the Chicago Police Department does not guarantee (either expressed or implied) the accuracy, completeness, timeliness, or correct sequencing of the information and the information should not be used for comparison purposes over time. The Chicago Police Department will not be responsible for any error or omission, or for the use of, or the results obtained from the use of this information. All data visualizations on maps should be considered approximate and attempts to derive specific addresses are strictly prohibited. The Chicago Police Department is not responsible for the content of any off-site pages that are referenced by or that reference this web page other than an official City of Chicago or Chicago Police Department web page. The user specifically acknowledges that the Chicago Police Department is not responsible for any defamatory, offensive, misleading, or illegal conduct of other users, links, or third parties and that the risk of injury from the foregoing rests entirely with the user. The unauthorized use of the words "Chicago Police Department," "Chicago Police," or any colorable imitation of these words or the unauthorized use of the Chicago Police Department logo is unlawful. This web page does not, in any way, authorize such use. Data is updated daily Tuesday through Sunday. The dataset contains more than 65,000 records/rows of data and cannot be viewed in full in Microsoft Excel. Therefore, when downloading the file, select CSV from the Export menu. Open the file in an ASCII text editor, such as Wordpad, to view and search. To access a list of Chicago Police Department - Illinois Uniform Crime Reporting (IUCR) codes, go to http://data.cityofchicago.org/Public-Safety/Chicago-Police-Department-Illinois-Uniform-Crime-R/c7ck-438e

Search
Clear search
Close search
Google apps
Main menu