13 datasets found
  1. Bank Loan Analysis Project in Excel

    • kaggle.com
    Updated May 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sanjana Murthy (2024). Bank Loan Analysis Project in Excel [Dataset]. https://www.kaggle.com/datasets/sanjanamurthy392/bank-loan-analysis-project/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 4, 2024
    Dataset provided by
    Kaggle
    Authors
    Sanjana Murthy
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    About Datasets: - Domain : Finance - Project: Bank loan of customers - Datasets: Finance_1.xlsx & Finance_2.xlsx - Dataset Type: Excel Data - Dataset Size: Each Excel file has 39k+ records

    KPI's: 1. Year wise loan amount Stats 2. Grade and sub grade wise revol_bal 3. Total Payment for Verified Status Vs Total Payment for Non Verified Status 4. State wise loan status 5. Month wise loan status 6. Get more insights based on your understanding of the data

    Process: 1. Understanding the problem 2. Data Collection 3. Data Cleaning 4. Exploring and analyzing the data 5. Interpreting the results

    This data contains Power Query, Power Pivot, Merge data, Clustered Bar Chart, Clustered Column Chart, Line Chart, 3D Pie chart, Dashboard, slicers, timeline, formatting techniques.

  2. o

    Data from: Skepticism in science and punitive attitudes

    • openicpsr.org
    delimited
    Updated May 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jason Rydberg; Luke DeZago (2025). Skepticism in science and punitive attitudes [Dataset]. http://doi.org/10.3886/E228541V1
    Explore at:
    delimitedAvailable download formats
    Dataset updated
    May 4, 2025
    Dataset provided by
    University of Massachusetts Lowell
    Authors
    Jason Rydberg; Luke DeZago
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Replication materials for the manuscript "Skepticism in Science and Punitive Attitudes", published in the Journal of Criminal Justice.Note that the GSS repeated cross sections for 1972 to 2018 are too large to upload here, but they can be accessed from https://gss.norc.org/content/dam/gss/get-the-data/documents/spss/GSS_spss.zipIncluded here are:(A link to the repeated cross-sections data)Each of the 3 wave panels (2006-2010; 2008-2012; 2010-2014)Replication R script for the repeated cross sections cleaning and analysisReplication R script for the panel data cleaning and analysisAn excel spreadsheet with Uniform Crime Report data to merge to the cross sections.

  3. f

    Cleaned NHANES 1988-2018

    • figshare.com
    txt
    Updated Feb 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vy Nguyen; Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet (2025). Cleaned NHANES 1988-2018 [Dataset]. http://doi.org/10.6084/m9.figshare.21743372.v9
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 18, 2025
    Dataset provided by
    figshare
    Authors
    Vy Nguyen; Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The National Health and Nutrition Examination Survey (NHANES) provides data and have considerable potential to study the health and environmental exposure of the non-institutionalized US population. However, as NHANES data are plagued with multiple inconsistencies, processing these data is required before deriving new insights through large-scale analyses. Thus, we developed a set of curated and unified datasets by merging 614 separate files and harmonizing unrestricted data across NHANES III (1988-1994) and Continuous (1999-2018), totaling 135,310 participants and 5,078 variables. The variables conveydemographics (281 variables),dietary consumption (324 variables),physiological functions (1,040 variables),occupation (61 variables),questionnaires (1444 variables, e.g., physical activity, medical conditions, diabetes, reproductive health, blood pressure and cholesterol, early childhood),medications (29 variables),mortality information linked from the National Death Index (15 variables),survey weights (857 variables),environmental exposure biomarker measurements (598 variables), andchemical comments indicating which measurements are below or above the lower limit of detection (505 variables).csv Data Record: The curated NHANES datasets and the data dictionaries includes 23 .csv files and 1 excel file.The curated NHANES datasets involves 20 .csv formatted files, two for each module with one as the uncleaned version and the other as the cleaned version. The modules are labeled as the following: 1) mortality, 2) dietary, 3) demographics, 4) response, 5) medications, 6) questionnaire, 7) chemicals, 8) occupation, 9) weights, and 10) comments."dictionary_nhanes.csv" is a dictionary that lists the variable name, description, module, category, units, CAS Number, comment use, chemical family, chemical family shortened, number of measurements, and cycles available for all 5,078 variables in NHANES."dictionary_harmonized_categories.csv" contains the harmonized categories for the categorical variables.“dictionary_drug_codes.csv” contains the dictionary for descriptors on the drugs codes.“nhanes_inconsistencies_documentation.xlsx” is an excel file that contains the cleaning documentation, which records all the inconsistencies for all affected variables to help curate each of the NHANES modules.R Data Record: For researchers who want to conduct their analysis in the R programming language, only cleaned NHANES modules and the data dictionaries can be downloaded as a .zip file which include an .RData file and an .R file.“w - nhanes_1988_2018.RData” contains all the aforementioned datasets as R data objects. We make available all R scripts on customized functions that were written to curate the data.“m - nhanes_1988_2018.R” shows how we used the customized functions (i.e. our pipeline) to curate the original NHANES data.Example starter codes: The set of starter code to help users conduct exposome analysis consists of four R markdown files (.Rmd). We recommend going through the tutorials in order.“example_0 - merge_datasets_together.Rmd” demonstrates how to merge the curated NHANES datasets together.“example_1 - account_for_nhanes_design.Rmd” demonstrates how to conduct a linear regression model, a survey-weighted regression model, a Cox proportional hazard model, and a survey-weighted Cox proportional hazard model.“example_2 - calculate_summary_statistics.Rmd” demonstrates how to calculate summary statistics for one variable and multiple variables with and without accounting for the NHANES sampling design.“example_3 - run_multiple_regressions.Rmd” demonstrates how run multiple regression models with and without adjusting for the sampling design.

  4. What students answer when discussing about citation practices

    • zenodo.org
    • data.niaid.nih.gov
    bin, csv, zip
    Updated Sep 21, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Caroline Salamin; Noémi Cobolet; Raphaël Grolimund; Pascale Bouton; Caroline Salamin; Noémi Cobolet; Raphaël Grolimund; Pascale Bouton (2021). What students answer when discussing about citation practices [Dataset]. http://doi.org/10.5281/zenodo.290155
    Explore at:
    zip, bin, csvAvailable download formats
    Dataset updated
    Sep 21, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Caroline Salamin; Noémi Cobolet; Raphaël Grolimund; Pascale Bouton; Caroline Salamin; Noémi Cobolet; Raphaël Grolimund; Pascale Bouton
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This document explain how data were generated and how to interpret them.

    LICENSE: CC0
    But if you want to combine data with other datasets, feel free to use them as if they were published under CC0 license.
    Data were published in February 2017. At that time, Zenodo only provided CC BY, CC BY-SA, CC BY-NC, CC BY-ND and CC BY-NC-ND. No CC0 option was available.

    HOW DATA WERE COLLECTED
    The 21 recorded sessions took place between February 2013 and December 2016.
    Data were collected using Turning Technologies' remote controls (called *clickers*) and TurningPoint software.

    The 4 versions of the quiz used during these 4 years are provided in the 'quizzes' folder for information purpose (in PDF and Powerpoint formats).

    Turning Technologies records data in a closed format (.tpzx) that can be exported and converted them into 3 formats provided here (these 3 files contain the same data):

    * Excel (.xslx)
    * Comma-spearated values (.csv)
    * SQLite (.sqlite)

    The first one was directly exported from TurningPoint and is provided for Excel users who can't read CSV correctly.
    CSV was converted from Excel and is provided for non-Excel users.
    Finally, SQLite is provided in order to apply different sorting and filters to the data. It can be read using SQLite manager for Firefox ([https://addons.mozilla.org/en-US/firefox/addon/sqlite-manager/](https://addons.mozilla.org/en-US/firefox/addon/sqlite-manager/)).

    CODEBOOK
    Here is the name, the meaning and the possible values of the columns (name - meaning [possible values]). If students didn't answer the question, the value is '-'.

    Session - session number (chronological) [1 to 21]
    AcademicYear - academic year [12-13, 13-14, 14-15, 15-16, 16-17]
    Year - calendar year [2013, 2014, 2015, 2016]
    Month - month (number) [1 to 12]
    Day - day (number) [1 to 31]
    Section - section abbreviation [CH, ESC, GM, IF, SIE, SV]
    Level - students' level [BA2, BA3, MA]
    Language - course's language [FR or EN]
    DeviceID - clicker's ID [(unique ID within a session)]
    Q1 - answers to question 1 [A, B, C, D, E]
    Q2 - answers to question 2 [A, B, C, D]
    Q3 - answers to question 3 [A or B]
    Q4 - answers to question 4 [A or B]
    Q5 - answers to question 5 [A or B]
    Q6 - answers to question 6 [A or B]
    Q7 - answers to question 7 [A or B]
    Q8 - answers to question 8 [A or B]
    Q9 - answers to question 9 [A or B]
    Q8-9 - answers to the question 8-9 (merge) [A or B]
    Q10 - answers to question 10 [1, 2]
    Q11 - answers to question 11 [A or B]
    Q12 - answers to question 12 [A, B]

    Section abbreviation meaning
    * CH: chemistry
    * ESC: school of criminal justice (Unil)
    * GM: mechanical engineering
    * IF: financial engineering
    * SIE: environmental engineering
    * SV: life sciences

    Level meaning
    * BA2: 2nd year of Bachelor
    * BA3: 3rd year of Bachelor
    * MA: Master level

    Question types
    For some questions, multiple answers were allowed: Q1, Q2, Q10 & Q12.
    Half of the questions have only one correct answer, true or false: Q3, Q5, Q6, Q7, Q8, Q9 & Q8-9.
    Finally, for 2 questions only one answer was accepted, but there is not only one correct answer: Q4 & Q11.

    INFORMATION ABOUT THE SESSIONS
    Except otherwise stated below, all sessions were conducted like the original one: Q1 to Q12 (no Q8-9).
    The original French version of the quiz has been translated into English for a few sessions with Master students.
    For sessions 14 and 20, Q5 was removed and Q8 & Q9 were merged in Q8-9.
    Session 18 was a short one with only 7 sevens questions: Q1, Q2, Q3, Q4, Q6, Q7 & Q9.

    CONTACT INFORMATION
    If you have any question about these data, contact formations.bib@epfl.ch.

  5. o

    Uniform Crime Reporting (UCR) Program Data: Arrests by Age, Sex, and Race,...

    • openicpsr.org
    • search.datacite.org
    Updated Aug 16, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jacob Kaplan (2018). Uniform Crime Reporting (UCR) Program Data: Arrests by Age, Sex, and Race, 1980-2016 [Dataset]. http://doi.org/10.3886/E102263V5
    Explore at:
    Dataset updated
    Aug 16, 2018
    Dataset provided by
    University of Pennsylvania
    Authors
    Jacob Kaplan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    1980 - 2016
    Area covered
    United States
    Description
    Version 5 release notes:
    • Removes support for SPSS and Excel data.
    • Changes the crimes that are stored in each file. There are more files now with fewer crimes per file. The files and their included crimes have been updated below.
    • Adds in agencies that report 0 months of the year.
    • Adds a column that indicates the number of months reported. This is generated summing up the number of unique months an agency reports data for. Note that this indicates the number of months an agency reported arrests for ANY crime. They may not necessarily report every crime every month. Agencies that did not report a crime with have a value of NA for every arrest column for that crime.
    • Removes data on runaways.
    Version 4 release notes:
    • Changes column names from "poss_coke" and "sale_coke" to "poss_heroin_coke" and "sale_heroin_coke" to clearly indicate that these column includes the sale of heroin as well as similar opiates such as morphine, codeine, and opium. Also changes column names for the narcotic columns to indicate that they are only for synthetic narcotics.
    Version 3 release notes:
    • Add data for 2016.
    • Order rows by year (descending) and ORI.
    Version 2 release notes:
    • Fix bug where Philadelphia Police Department had incorrect FIPS county code.

    The Arrests by Age, Sex, and Race data is an FBI data set that is part of the annual Uniform Crime Reporting (UCR) Program data. This data contains highly granular data on the number of people arrested for a variety of crimes (see below for a full list of included crimes). The data sets here combine data from the years 1980-2015 into a single file. These files are quite large and may take some time to load.

    All the data was downloaded from NACJD as ASCII+SPSS Setup files and read into R using the package asciiSetupReader. All work to clean the data and save it in various file formats was also done in R. For the R code used to clean this data, see here.
    https://github.com/jacobkap/crime_data. If you have any questions, comments, or suggestions please contact me at jkkaplan6@gmail.com.

    I did not make any changes to the data other than the following. When an arrest column has a value of "None/not reported", I change that value to zero. This makes the (possible incorrect) assumption that these values represent zero crimes reported. The original data does not have a value when the agency reports zero arrests other than "None/not reported." In other words, this data does not differentiate between real zeros and missing values. Some agencies also incorrectly report the following numbers of arrests which I change to NA: 10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000, 100000, 99999, 99998.

    To reduce file size and make the data more manageable, all of the data is aggregated yearly. All of the data is in agency-year units such that every row indicates an agency in a given year. Columns are crime-arrest category units. For example, If you choose the data set that includes murder, you would have rows for each agency-year and columns with the number of people arrests for murder. The ASR data breaks down arrests by age and gender (e.g. Male aged 15, Male aged 18). They also provide the number of adults or juveniles arrested by race. Because most agencies and years do not report the arrestee's ethnicity (Hispanic or not Hispanic) or juvenile outcomes (e.g. referred to adult court, referred to welfare agency), I do not include these columns.

    To make it easier to merge with other data, I merged this data with the Law Enforcement Agency Identifiers Crosswalk (LEAIC) data. The data from the LEAIC add FIPS (state, county, and place) and agency type/subtype. Please note that some of the FIPS codes have leading zeros and if you open it in Excel it will automatically delete those leading zeros.

    I created 9 arrest categories myself. The categories are:
    • Total Male Juvenile
    • Total Female Juvenile
    • Total Male Adult
    • Total Female Adult
    • Total Ma

  6. g

    Jacob Kaplan's Concatenated Files: Uniform Crime Reporting (UCR) Program...

    • datasearch.gesis.org
    • openicpsr.org
    Updated Feb 19, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaplan, Jacob (2020). Jacob Kaplan's Concatenated Files: Uniform Crime Reporting (UCR) Program Data: Property Stolen and Recovered (Supplement to Return A) 1960-2018 [Dataset]. http://doi.org/10.3886/E105403
    Explore at:
    Dataset updated
    Feb 19, 2020
    Dataset provided by
    da|ra (Registration agency for social science and economic data)
    Authors
    Kaplan, Jacob
    Description

    For any questions about this data please email me at jacob@crimedatatool.com. If you use this data, please cite it.Version 4 release notes:Adds data for 2018Version 3 release notes:Adds data in the following formats: Excel.Changes project name to avoid confusing this data for the ones done by NACJD.Version 2 release notes:Adds data for 2017.Adds a "number_of_months_reported" variable which says how many months of the year the agency reported data.Property Stolen and Recovered is a Uniform Crime Reporting (UCR) Program data set with information on the number of offenses (crimes included are murder, rape, robbery, burglary, theft/larceny, and motor vehicle theft), the value of the offense, and subcategories of the offense (e.g. for robbery it is broken down into subcategories including highway robbery, bank robbery, gas station robbery). The majority of the data relates to theft. Theft is divided into subcategories of theft such as shoplifting, theft of bicycle, theft from building, and purse snatching. For a number of items stolen (e.g. money, jewelry and previous metals, guns), the value of property stolen and and the value for property recovered is provided. This data set is also referred to as the Supplement to Return A (Offenses Known and Reported). All the data was received directly from the FBI as text or .DTA files. I created a setup file based on the documentation provided by the FBI and read the data into R using the package asciiSetupReader. All work to clean the data and save it in various file formats was also done in R. For the R code used to clean this data, see here: https://github.com/jacobkap/crime_data. The Word document file available for download is the guidebook the FBI provided with the raw data which I used to create the setup file to read in data.There may be inaccuracies in the data, particularly in the group of columns starting with "auto." To reduce (but certainly not eliminate) data errors, I replaced the following values with NA for the group of columns beginning with "offenses" or "auto" as they are common data entry error values (e.g. are larger than the agency's population, are much larger than other crimes or months in same agency): 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000, 100000, 99942. This cleaning was NOT done on the columns starting with "value."For every numeric column I replaced negative indicator values (e.g. "j" for -1) with the negative number they are supposed to be. These negative number indicators are not included in the FBI's codebook for this data but are present in the data. I used the values in the FBI's codebook for the Offenses Known and Clearances by Arrest data.To make it easier to merge with other data, I merged this data with the Law Enforcement Agency Identifiers Crosswalk (LEAIC) data. The data from the LEAIC add FIPS (state, county, and place) and agency type/subtype. If an agency has used a different FIPS code in the past, check to make sure the FIPS code is the same as in this data.

  7. d

    Reporting behavior from WHO COVID-19 public data

    • search.dataone.org
    • data.niaid.nih.gov
    • +1more
    Updated Nov 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Auss Abbood (2023). Reporting behavior from WHO COVID-19 public data [Dataset]. http://doi.org/10.5061/dryad.9s4mw6mmb
    Explore at:
    Dataset updated
    Nov 29, 2023
    Dataset provided by
    Dryad Digital Repository
    Authors
    Auss Abbood
    Time period covered
    Dec 16, 2022
    Description

    Objective Daily COVID-19 data reported by the World Health Organization (WHO) may provide the basis for political ad hoc decisions including travel restrictions. Data reported by countries, however, is heterogeneous and metrics to evaluate its quality are scarce. In this work, we analyzed COVID-19 case counts provided by WHO and developed tools to evaluate country-specific reporting behaviors. Methods In this retrospective cross-sectional study, COVID-19 data reported daily to WHO from 3rd January 2020 until 14th June 2021 were analyzed. We proposed the concepts of binary reporting rate and relative reporting behavior and performed descriptive analyses for all countries with these metrics. We developed a score to evaluate the consistency of incidence and binary reporting rates. Further, we performed spectral clustering of the binary reporting rate and relative reporting behavior to identify salient patterns in these metrics. Results Our final analysis included 222 countries and regions...., Data collection COVID-19 data was downloaded from WHO. Using a public repository, we have added the countries' full names to the WHO data set using the two-letter abbreviations for each country to merge both data sets. The provided COVID-19 data covers January 2020 until June 2021. We uploaded the final data set used for the analyses of this paper. Data processing We processed data using a Jupyter Notebook with a Python kernel and publically available external libraries. This upload contains the required Jupyter Notebook (reporting_behavior.ipynb) with all analyses and some additional work, a README, and the conda environment yml (env.yml)., Any text editor including Microsoft Excel and their free alternatives can open the uploaded CSV file. Any web browser and some code editors (like the freely available Visual Studio Code) can show the uploaded Jupyter Notebook if the required Python environment is set up correctly.

  8. H

    County FIPS Matching Tool

    • dataverse.harvard.edu
    Updated Jan 20, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carl Klarner (2019). County FIPS Matching Tool [Dataset]. http://doi.org/10.7910/DVN/OSLU4G
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 20, 2019
    Dataset provided by
    Harvard Dataverse
    Authors
    Carl Klarner
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This tool--a simple csv or Stata file for merging--gives you a fast way to assign Census county FIPS codes to variously presented county names. This is useful for dealing with county names collected from official sources, such as election returns, which inconsistently present county names and often have misspellings. It will likely take less than ten minutes the first time, and about one minute thereafter--assuming all versions of your county names are in this file. There are about 3,142 counties in the U.S., and there are 77,613 different permutations of county names in this file (ave=25 per county, max=382). Counties with more likely permutations have more versions. Misspellings were added as I came across them over time. I DON'T expect people to cite the use of this tool. DO feel free to suggest the addition of other county name permutations.

  9. a

    Palatine Adopt-A-Hydrant Experience

    • mgp-solutions-mgp.hub.arcgis.com
    Updated May 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Admin_Palatine_IL (2024). Palatine Adopt-A-Hydrant Experience [Dataset]. https://mgp-solutions-mgp.hub.arcgis.com/items/2afc6e9932d14e71a61b1167fbbf66a4
    Explore at:
    Dataset updated
    May 2, 2024
    Dataset authored and provided by
    Admin_Palatine_IL
    Description

    The intention of the Adopt-A-Hydrant Application is for residents to search for their address and select a hydrant near them. They will be directed to a Survey 123 Connect form that will prompt them to provide their name, email and phone. After the form is submitted, power automate will trigger two actions. The first will send an email to both the resident and village employee managing the application. Additionally, Power Automate will add adoptee information to an excel file which can be used to generate a mail merge for mass notifications. The goal is to provide a clear line of communication from the village to the resident in the case of weather events like flooding, snow storms, etc. in which debris/obstructions may prevent access to hydrants during emergencies. Residents will also have the ability to un-adopt their hydrant. There is step-by-step instructions on how to adopt and un-adopt hydrants.

  10. Digitisation of Weather Records of Seungjeongwon Ilgi: A Historical Weather...

    • zenodo.org
    bin, csv, json, txt
    Updated Sep 27, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zeyu Lyu; Zeyu Lyu; Kohei Ichikawa; Kohei Ichikawa; Yongchao Cheng; Yongchao Cheng; Hisashi Hayakawa; Hisashi Hayakawa; Yukiko Kawamoto; Yukiko Kawamoto (2023). Digitisation of Weather Records of Seungjeongwon Ilgi: A Historical Weather Dynamics Dataset of the Korean Peninsula (1623-1910) [Dataset]. http://doi.org/10.5281/zenodo.7453644
    Explore at:
    csv, json, bin, txtAvailable download formats
    Dataset updated
    Sep 27, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Zeyu Lyu; Zeyu Lyu; Kohei Ichikawa; Kohei Ichikawa; Yongchao Cheng; Yongchao Cheng; Hisashi Hayakawa; Hisashi Hayakawa; Yukiko Kawamoto; Yukiko Kawamoto
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Korea
    Description

    Introduction

    This study has exploited the daily weather records of Seungjeongwon Ilgi from the NIKH database. Seungjeongwon Ilgi (http://sjw.history.go.kr/main.do) is a daily record of the Seungjeongwon, the Royal Secretariat of the Joseon Dynasty of Korea. These diaries span from 1623 to 1910 and generally involve daily weather records in the entry header. Their observational site would be located in Seoul (N37°35′, E126°59′). We have exploited the weather records from the NIKH database and classified the daily weather using text mining method. We have also converted the report dates from the traditional lunisolar calendar to the Gregorian calendar, to better contextualise our data into the contemporary daily measurements.

    Data

    We provide different formats (csv, xlsx, json) to facilitate the usage of data. The main contents of data are listed as below.

    • ID: The unique identifier of a specific record in the metadata, which can also serve as the identifier to merge with external data in the NIKH digital database.
    • Traditional calendar: The original lunar dates in the NIKH digital database, which are listed in data format "YYYY-MM-DD". More specifically, "L0" implies the leap year and "L1" implies the common year.
    • Leap: The identifier of a leap year.
    • Gregorian calendar: The Gregorian calendar date that converted by the traditional calendar date.
    • Weather Text: The text that describe the weather conditions. Specifically, multiple weather descriptions of the same day have been put together.
    • Flag: The computed value that indicates different combinations of weather conditions.
    • Volume: The volume of text in the original record.
    • Herbal Volume: The volume of text in the herbal record.
    • Sunny: A dummy variable that represents whether the weather description contains the expression of sunny.
    • Cloudy: A dummy variable that represents whether the weather description contains the expression of cloudy.
    • Rainy: A dummy variable that represents whether the weather description contains the expression of rainy.
    • Snow: A dummy variable that represents whether the weather description contains the expression of snow.
    • Wind: A dummy variable that represents whether the weather description contains the expression of wind.

    Import Data

    # Python
    # CSV file
    import pandas as pd
    data=pd.read_csv('~/SJWilgi_Seoul_Weather_YR1623_1910.csv',encoding="utf-8") 
    # JSON file
    data=pd.read_json('~/SJWilgi_Seoul_Weather_YR1623_1910.json',encoding="utf-8")
    # Excel file
    data=pd.read_excel('~/SJWilgi_Seoul_Weather_YR1623_1910.xlsx') # Excel file
    # R
    # CSV file
    library(readr)
    data<- read_csv("~/SJWilgi_Seoul_Weather_YR1623_1910.csv")
    # Excel file
    library(readxl)
    data <- read_excel("~/SJWilgi_Seoul_Weather_YR1623_1910.xlsx")

  11. g

    Jacob Kaplan's Concatenated Files: Uniform Crime Reporting (UCR) Program...

    • datasearch.gesis.org
    • openicpsr.org
    Updated Feb 19, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaplan, Jacob (2020). Jacob Kaplan's Concatenated Files: Uniform Crime Reporting (UCR) Program Data: Property Stolen and Recovered (Supplement to Return A) 1960-2017 [Dataset]. http://doi.org/10.3886/E105403V3
    Explore at:
    Dataset updated
    Feb 19, 2020
    Dataset provided by
    da|ra (Registration agency for social science and economic data)
    Authors
    Kaplan, Jacob
    Description

    For any questions about this data please email me at jacob@crimedatatool.com. If you use this data, please cite it.Version 3 release notes:Adds data in the following formats: Excel.Changes project name to avoid confusing this data for the ones done by NACJD.Version 2 release notes:Adds data for 2017.Adds a "number_of_months_reported" variable which says how many months of the year the agency reported data.Property Stolen and Recovered is a Uniform Crime Reporting (UCR) Program data set with information on the number of offenses (crimes included are murder, rape, robbery, burglary, theft/larceny, and motor vehicle theft), the value of the offense, and subcategories of the offense (e.g. for robbery it is broken down into subcategories including highway robbery, bank robbery, gas station robbery). The majority of the data relates to theft. Theft is divided into subcategories of theft such as shoplifting, theft of bicycle, theft from building, and purse snatching. For a number of items stolen (e.g. money, jewelry and previous metals, guns), the value of property stolen and and the value for property recovered is provided. This data set is also referred to as the Supplement to Return A (Offenses Known and Reported). All the data was received directly from the FBI as text or .DTA files. I created a setup file based on the documentation provided by the FBI and read the data into R using the package asciiSetupReader. All work to clean the data and save it in various file formats was also done in R. For the R code used to clean this data, see here: https://github.com/jacobkap/crime_data. The Word document file available for download is the guidebook the FBI provided with the raw data which I used to create the setup file to read in data.There may be inaccuracies in the data, particularly in the group of columns starting with "auto." To reduce (but certainly not eliminate) data errors, I replaced the following values with NA for the group of columns beginning with "offenses" or "auto" as they are common data entry error values (e.g. are larger than the agency's population, are much larger than other crimes or months in same agency): 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000, 100000, 99942. This cleaning was NOT done on the columns starting with "value."For every numeric column I replaced negative indicator values (e.g. "j" for -1) with the negative number they are supposed to be. These negative number indicators are not included in the FBI's codebook for this data but are present in the data. I used the values in the FBI's codebook for the Offenses Known and Clearances by Arrest data.To make it easier to merge with other data, I merged this data with the Law Enforcement Agency Identifiers Crosswalk (LEAIC) data. The data from the LEAIC add FIPS (state, county, and place) and agency type/subtype. If an agency has used a different FIPS code in the past, check to make sure the FIPS code is the same as in this data.

  12. f

    Excel spreadsheet containing, in separate sheets, underlying numerical data...

    • plos.figshare.com
    xlsx
    Updated Sep 15, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hoi Tong Wong; Adeline M. Luperchio; Sean Riley; Daniel J. Salamango (2023). Excel spreadsheet containing, in separate sheets, underlying numerical data used to generate the indicated figure panels. [Dataset]. http://doi.org/10.1371/journal.ppat.1011634.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Sep 15, 2023
    Dataset provided by
    PLOS Pathogens
    Authors
    Hoi Tong Wong; Adeline M. Luperchio; Sean Riley; Daniel J. Salamango
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Excel spreadsheet containing, in separate sheets, underlying numerical data used to generate the indicated figure panels.

  13. E

    Scottish Census 2011 Population by Council Area

    • find.data.gov.scot
    • dtechtive.com
    xml, zip
    Updated Feb 21, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Edinburgh (2017). Scottish Census 2011 Population by Council Area [Dataset]. http://doi.org/10.7488/ds/1908
    Explore at:
    zip(8.036 MB), xml(0.0038 MB)Available download formats
    Dataset updated
    Feb 21, 2017
    Dataset provided by
    University of Edinburgh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Scotland
    Description

    This data is sourced from the Census 2011 and shows the population and population density by council area. Raw data sourced from http://www.scotlandscensus.gov.uk/en/censusresults/downloadablefiles.html and then manipulated in excel to merge a number of tables. The resulting data was joined to a shapefile of Scottish Council areas from sharegeo (http://www.sharegeo.ac.uk/handle/10672/305). Both sources should be attributed as the sources of the base data. GIS vector data. This dataset was first accessioned in the EDINA ShareGeo Open repository on 2012-12-19 and migrated to Edinburgh DataShare on 2017-02-21.

  14. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Sanjana Murthy (2024). Bank Loan Analysis Project in Excel [Dataset]. https://www.kaggle.com/datasets/sanjanamurthy392/bank-loan-analysis-project/data
Organization logo

Bank Loan Analysis Project in Excel

Bank Loan Analysis Internship Project

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 4, 2024
Dataset provided by
Kaggle
Authors
Sanjana Murthy
License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

About Datasets: - Domain : Finance - Project: Bank loan of customers - Datasets: Finance_1.xlsx & Finance_2.xlsx - Dataset Type: Excel Data - Dataset Size: Each Excel file has 39k+ records

KPI's: 1. Year wise loan amount Stats 2. Grade and sub grade wise revol_bal 3. Total Payment for Verified Status Vs Total Payment for Non Verified Status 4. State wise loan status 5. Month wise loan status 6. Get more insights based on your understanding of the data

Process: 1. Understanding the problem 2. Data Collection 3. Data Cleaning 4. Exploring and analyzing the data 5. Interpreting the results

This data contains Power Query, Power Pivot, Merge data, Clustered Bar Chart, Clustered Column Chart, Line Chart, 3D Pie chart, Dashboard, slicers, timeline, formatting techniques.

Search
Clear search
Close search
Google apps
Main menu