16 datasets found

Merge number of excel file,convert into csv file
kaggle.com
zip
Updated Mar 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aashirvad pandey (2024). Merge number of excel file,convert into csv file [Dataset]. https://www.kaggle.com/datasets/aashirvadpandey/merge-number-of-excel-fileconvert-into-csv-file
Explore at:
zip(6731 bytes)Available download formats
Dataset updated
Mar 30, 2024
Authors
Aashirvad pandey
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Project Description:

Title: Pandas Data Manipulation and File Conversion

Overview: This project aims to demonstrate the basic functionalities of Pandas, a powerful data manipulation library in Python. In this project, we will create a DataFrame, perform some data manipulation operations using Pandas, and then convert the DataFrame into both Excel and CSV formats.

Key Objectives:

DataFrame Creation: Utilize Pandas to create a DataFrame with sample data.

Data Manipulation: Perform basic data manipulation tasks such as adding columns, filtering data, and performing calculations.

File Conversion: Convert the DataFrame into Excel (.xlsx) and CSV (.csv) file formats.

Tools and Libraries Used:

Python

Pandas

Project Implementation:

DataFrame Creation:

Import the Pandas library.

Create a DataFrame using either a dictionary, a list of dictionaries, or by reading data from an external source like a CSV file.

Populate the DataFrame with sample data representing various data types (e.g., integer, float, string, datetime).

Data Manipulation:

Add new columns to the DataFrame representing derived data or computations based on existing columns.

Filter the DataFrame to include only specific rows based on certain conditions.

Perform basic calculations or transformations on the data, such as aggregation functions or arithmetic operations.

File Conversion:

Utilize Pandas to convert the DataFrame into an Excel (.xlsx) file using the to_excel() function.

Convert the DataFrame into a CSV (.csv) file using the to_csv() function.

Save the generated files to the local file system for further analysis or sharing.

Expected Outcome:

Upon completion of this project, you will have gained a fundamental understanding of how to work with Pandas DataFrames, perform basic data manipulation tasks, and convert DataFrames into different file formats. This knowledge will be valuable for data analysis, preprocessing, and data export tasks in various data science and analytics projects.

Conclusion:

The Pandas library offers powerful tools for data manipulation and file conversion in Python. By completing this project, you will have acquired essential skills that are widely applicable in the field of data science and analytics. You can further extend this project by exploring more advanced Pandas functionalities or integrating it into larger data processing pipelines.in this data we add number of data and make that data a data frame.and save in single excel file as different sheet name and then convert that excel file in csv file .
Excel file containing additional data too large to fit in a PDF,...
plos.figshare.com
xlsx
Updated Dec 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Odette Verdejo-Torres; David C. Klein; Lorena Novoa-Aponte; Jaime Carrazco-Carrillo; Denzel Bonilla-Pinto; Antonio Rivera; Arpie Bakhshian; Fa’alataitaua M. Fitisemanu; Martha L. Jiménez-González; Lyra Flinn; Aidan T. Pezacki; Antonio Lanzirotti; Luis Antonio Ortiz Frade; Christopher J. Chang; Juan G. Navea; Crysten E. Blaby-Haas; Sarah J. Hainer; Teresita Padilla-Benavides (2024). Excel file containing additional data too large to fit in a PDF, CUT&RUN–RNAseq merge analyses. [Dataset]. http://doi.org/10.1371/journal.pgen.1011495.s018
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pgen.1011495.s018
Dataset updated
Dec 26, 2024
Dataset provided by
PLOShttp://plos.org/
Authors
Odette Verdejo-Torres; David C. Klein; Lorena Novoa-Aponte; Jaime Carrazco-Carrillo; Denzel Bonilla-Pinto; Antonio Rivera; Arpie Bakhshian; Fa’alataitaua M. Fitisemanu; Martha L. Jiménez-González; Lyra Flinn; Aidan T. Pezacki; Antonio Lanzirotti; Luis Antonio Ortiz Frade; Christopher J. Chang; Juan G. Navea; Crysten E. Blaby-Haas; Sarah J. Hainer; Teresita Padilla-Benavides
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Excel file containing additional data too large to fit in a PDF, CUT&RUN–RNAseq merge analyses.
Bank Loan Analysis Project in Excel
kaggle.com
zip
Updated May 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sanjana Murthy (2024). Bank Loan Analysis Project in Excel [Dataset]. https://www.kaggle.com/datasets/sanjanamurthy392/bank-loan-analysis-project/code
Explore at:
zip(38976902 bytes)Available download formats
Dataset updated
May 4, 2024
Authors
Sanjana Murthy
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
About Datasets: - Domain : Finance - Project: Bank loan of customers - Datasets: Finance_1.xlsx & Finance_2.xlsx - Dataset Type: Excel Data - Dataset Size: Each Excel file has 39k+ records

KPI's: 1. Year wise loan amount Stats 2. Grade and sub grade wise revol_bal 3. Total Payment for Verified Status Vs Total Payment for Non Verified Status 4. State wise loan status 5. Month wise loan status 6. Get more insights based on your understanding of the data

Process: 1. Understanding the problem 2. Data Collection 3. Data Cleaning 4. Exploring and analyzing the data 5. Interpreting the results

This data contains Power Query, Power Pivot, Merge data, Clustered Bar Chart, Clustered Column Chart, Line Chart, 3D Pie chart, Dashboard, slicers, timeline, formatting techniques.
FIRE1120: previous data tables
gov.uk
Updated Oct 18, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Home Office (2018). FIRE1120: previous data tables [Dataset]. https://www.gov.uk/government/statistical-data-sets/fire1120-previous-data-tables
Explore at:
Dataset updated
Oct 18, 2018
Dataset provided by
GOV.UKhttp://gov.uk/
Authors
Home Office
Description
FIRE1120: Staff joining fire authorities (headcount), by fire and rescue authority, gender and role (17 October 2024)

https://assets.publishing.service.gov.uk/media/6707823292bb81fcdbe7b5ff/fire-statistics-data-tables-fire1120-191023.xlsx">FIRE1120: Staff joining fire authorities (headcount), by fire and rescue authority, gender and role (19 October 2023) (MS Excel Spreadsheet, 194 KB)

https://assets.publishing.service.gov.uk/media/652d3a7f6b6fbf0014b756d9/fire-statistics-data-tables-fire1120-201022.xlsx">FIRE1120: Staff joining fire authorities (headcount), by fire and rescue authority, gender and role (20 October 2022) (MS Excel Spreadsheet, 293 KB)

https://assets.publishing.service.gov.uk/media/634e7f238fa8f5346ba7099b/fire-statistics-data-tables-fire1120-051121.xlsx">FIRE1120: Staff joining fire authorities (headcount), by fire and rescue authority, gender and role (05 November 2021) (MS Excel Spreadsheet, 220 KB)

https://assets.publishing.service.gov.uk/media/61853a37e90e07198018fb0b/fire-statistics-data-tables-fire1120-211021.xlsx">FIRE1120: Staff joining fire authorities (headcount), by fire and rescue authority, gender and role (21 October 2021) (MS Excel Spreadsheet, 210 KB)

https://assets.publishing.service.gov.uk/media/616d7d218fa8f5298406229e/fire-statistics-data-tables-fire1120-221020.xlsx">FIRE1120: Staff joining fire authorities, by fire and rescue authority, gender and role (22 October 2020) (MS Excel Spreadsheet, 157 KB)

https://assets.publishing.service.gov.uk/media/5f86b42b8fa8f517090ab0e4/fire-statistics-data-tables-fire1120-141119.xlsx">FIRE1120: Staff joining fire authorities, by fire and rescue authority, gender and role (14 November 2019) (MS Excel Spreadsheet, 116 KB)

https://assets.publishing.service.gov.uk/media/5dc9869ee5274a5c51437e43/fire-statistics-data-tables-fire1120-311019.xlsx">FIRE1120: Staff joining fire authorities, by fire and rescue authority, gender and role (31 October 2019) (MS Excel Spreadsheet, 116 KB)

https://assets.publishing.service.gov.uk/media/5db7098040f0b6379a7acbc4/fire-statistics-data-tables-fire1120-170119.xlsx">FIRE1120: Staff joining fire authorities, by fire and rescue authority, gender and role (17 January 2019) (MS Excel Spreadsheet, 74.5 KB)

https://assets.publishing.service.gov.uk/media/5c34bd7ee5274a65ab281de8/fire-statistics-data-tables-fire1120-18oct2018.xlsx">FIRE1120: Staff joining fire authorities, by fire and rescue authority, gender and role (18 October 2018) (MS Excel Spreadsheet, 74.3 KB)

https://assets.publishing.service.gov.uk/media/5bbcc352e5274a3611919f80/fire-statistics-data-tables-fire1120.xlsx">FIRE1120: Staff joining fire authorities, by fire and rescue authority, gender and role (26 October 2017) (MS Excel Spreadsheet, 24.3 KB)

Related content

<a href="https://www.gov.uk/government/statistical-data-sets/fire-
Cleaned NHANES 1988-2018
figshare.com
txt
Updated Feb 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vy Nguyen; Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet (2025). Cleaned NHANES 1988-2018 [Dataset]. http://doi.org/10.6084/m9.figshare.21743372.v9
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21743372.v9
Dataset updated
Feb 18, 2025
Dataset provided by
Figsharehttp://figshare.com/
Authors
Vy Nguyen; Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The National Health and Nutrition Examination Survey (NHANES) provides data and have considerable potential to study the health and environmental exposure of the non-institutionalized US population. However, as NHANES data are plagued with multiple inconsistencies, processing these data is required before deriving new insights through large-scale analyses. Thus, we developed a set of curated and unified datasets by merging 614 separate files and harmonizing unrestricted data across NHANES III (1988-1994) and Continuous (1999-2018), totaling 135,310 participants and 5,078 variables. The variables conveydemographics (281 variables),dietary consumption (324 variables),physiological functions (1,040 variables),occupation (61 variables),questionnaires (1444 variables, e.g., physical activity, medical conditions, diabetes, reproductive health, blood pressure and cholesterol, early childhood),medications (29 variables),mortality information linked from the National Death Index (15 variables),survey weights (857 variables),environmental exposure biomarker measurements (598 variables), andchemical comments indicating which measurements are below or above the lower limit of detection (505 variables).csv Data Record: The curated NHANES datasets and the data dictionaries includes 23 .csv files and 1 excel file.The curated NHANES datasets involves 20 .csv formatted files, two for each module with one as the uncleaned version and the other as the cleaned version. The modules are labeled as the following: 1) mortality, 2) dietary, 3) demographics, 4) response, 5) medications, 6) questionnaire, 7) chemicals, 8) occupation, 9) weights, and 10) comments."dictionary_nhanes.csv" is a dictionary that lists the variable name, description, module, category, units, CAS Number, comment use, chemical family, chemical family shortened, number of measurements, and cycles available for all 5,078 variables in NHANES."dictionary_harmonized_categories.csv" contains the harmonized categories for the categorical variables.“dictionary_drug_codes.csv” contains the dictionary for descriptors on the drugs codes.“nhanes_inconsistencies_documentation.xlsx” is an excel file that contains the cleaning documentation, which records all the inconsistencies for all affected variables to help curate each of the NHANES modules.R Data Record: For researchers who want to conduct their analysis in the R programming language, only cleaned NHANES modules and the data dictionaries can be downloaded as a .zip file which include an .RData file and an .R file.“w - nhanes_1988_2018.RData” contains all the aforementioned datasets as R data objects. We make available all R scripts on customized functions that were written to curate the data.“m - nhanes_1988_2018.R” shows how we used the customized functions (i.e. our pipeline) to curate the original NHANES data.Example starter codes: The set of starter code to help users conduct exposome analysis consists of four R markdown files (.Rmd). We recommend going through the tutorials in order.“example_0 - merge_datasets_together.Rmd” demonstrates how to merge the curated NHANES datasets together.“example_1 - account_for_nhanes_design.Rmd” demonstrates how to conduct a linear regression model, a survey-weighted regression model, a Cox proportional hazard model, and a survey-weighted Cox proportional hazard model.“example_2 - calculate_summary_statistics.Rmd” demonstrates how to calculate summary statistics for one variable and multiple variables with and without accounting for the NHANES sampling design.“example_3 - run_multiple_regressions.Rmd” demonstrates how run multiple regression models with and without adjusting for the sampling design.
SMARTDEST DATASET WP3 v1.0
data.europa.eu
unknown
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2025). SMARTDEST DATASET WP3 v1.0 [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-6787378?locale=el
Explore at:
unknown(9913124)Available download formats
Dataset updated
Jul 3, 2025
Dataset authored and provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The SMARTDEST DATASET WP3 v1.0 includes data at sub-city level for 7 cities: Amsterdam, Barcelona, Edinburgh, Lisbon, Ljubljana, Turin, and Venice. It is made up of information extracted from public sources at the local level (mostly, city council open data portals) or volunteered geographic information, that is, geospatial content generated by non-professionals using mapping systems available on the Internet (e.g., Geofabrik). Details on data sources and variables are included in a ‘metadata’ spreadsheet in the excel file. The same excel file contains 5 additional spreadsheets. The first one, labelled #1, was used to perform the analysis on the determinants of the geographical spread of tourism supply in SMARTDEST case study’s cities (in the main document D3.3, section 4.1), The second one (labelled #2) offers information that would allow to replicate the analysis on tourism-led population decline reported in section 4.3. As for spreadsheets named #3-AMS, #4-BCN, and #5-EDI, they refer to data sources and variables used to run follow-up analyses discussed in section 5.1, with the objective of digging into the causes of depopulation in Amsterdam, Barcelona, and Edinburgh, respectively. The column ‘row’ can be used to merge the excel file with the shapefile ‘db_task3.3_SmartDest’. Data are available at the buurt level in Amsterdam (an administrative unit roughly corresponding to a neighbourhood), census tract level in Barcelona and Ljubljana, for data zones in Edinburgh, statistical zones in Turin, and località in Venice.
Superstore Sales Analysis
kaggle.com
zip
Updated Oct 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ali Reda Elblgihy (2023). Superstore Sales Analysis [Dataset]. https://www.kaggle.com/datasets/aliredaelblgihy/superstore-sales-analysis/versions/1
Explore at:
zip(3009057 bytes)Available download formats
Dataset updated
Oct 21, 2023
Authors
Ali Reda Elblgihy
Description
Analyzing sales data is essential for any business looking to make informed decisions and optimize its operations. In this project, we will utilize Microsoft Excel and Power Query to conduct a comprehensive analysis of Superstore sales data. Our primary objectives will be to establish meaningful connections between various data sheets, ensure data quality, and calculate critical metrics such as the Cost of Goods Sold (COGS) and discount values. Below are the key steps and elements of this analysis:

1- Data Import and Transformation:

Gather and import relevant sales data from various sources into Excel.

Utilize Power Query to clean, transform, and structure the data for analysis.

Merge and link different data sheets to create a cohesive dataset, ensuring that all data fields are connected logically.

2- Data Quality Assessment:

Perform data quality checks to identify and address issues like missing values, duplicates, outliers, and data inconsistencies.

Standardize data formats and ensure that all data is in a consistent, usable state.

3- Calculating COGS:

Determine the Cost of Goods Sold (COGS) for each product sold by considering factors like purchase price, shipping costs, and any additional expenses.

Apply appropriate formulas and calculations to determine COGS accurately.

4- Discount Analysis:

Analyze the discount values offered on products to understand their impact on sales and profitability.

Calculate the average discount percentage, identify trends, and visualize the data using charts or graphs.

5- Sales Metrics:

Calculate and analyze various sales metrics, such as total revenue, profit margins, and sales growth.

Utilize Excel functions to compute these metrics and create visuals for better insights.

6- Visualization:

Create visualizations, such as charts, graphs, and pivot tables, to present the data in an understandable and actionable format.

Visual representations can help identify trends, outliers, and patterns in the data.

7- Report Generation:

Compile the findings and insights into a well-structured report or dashboard, making it easy for stakeholders to understand and make informed decisions.

Throughout this analysis, the goal is to provide a clear and comprehensive understanding of the Superstore's sales performance. By using Excel and Power Query, we can efficiently manage and analyze the data, ensuring that the insights gained contribute to the store's growth and success.
Uniform Crime Reporting (UCR) Program Data: Arrests by Age, Sex, and Race,...
search.datacite.org
doi.org
+1more
Updated 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacob Kaplan (2018). Uniform Crime Reporting (UCR) Program Data: Arrests by Age, Sex, and Race, 1980-2016 [Dataset]. http://doi.org/10.3886/e102263v5-10021
Explore at:
Unique identifier
https://doi.org/10.3886/e102263v5-10021
Dataset updated
2018
Dataset provided by
Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
DataCitehttps://www.datacite.org/
Authors
Jacob Kaplan
Description
Version 5 release notes:
Removes support for SPSS and Excel data.Changes the crimes that are stored in each file. There are more files now with fewer crimes per file. The files and their included crimes have been updated below.
Adds in agencies that report 0 months of the year.Adds a column that indicates the number of months reported. This is generated summing up the number of unique months an agency reports data for. Note that this indicates the number of months an agency reported arrests for ANY crime. They may not necessarily report every crime every month. Agencies that did not report a crime with have a value of NA for every arrest column for that crime.Removes data on runaways.
Version 4 release notes:
Changes column names from "poss_coke" and "sale_coke" to "poss_heroin_coke" and "sale_heroin_coke" to clearly indicate that these column includes the sale of heroin as well as similar opiates such as morphine, codeine, and opium. Also changes column names for the narcotic columns to indicate that they are only for synthetic narcotics.
Version 3 release notes:
Add data for 2016.Order rows by year (descending) and ORI.Version 2 release notes:
Fix bug where Philadelphia Police Department had incorrect FIPS county code.
The Arrests by Age, Sex, and Race data is an FBI data set that is part of the annual Uniform Crime Reporting (UCR) Program data. This data contains highly granular data on the number of people arrested for a variety of crimes (see below for a full list of included crimes). The data sets here combine data from the years 1980-2015 into a single file. These files are quite large and may take some time to load.
All the data was downloaded from NACJD as ASCII+SPSS Setup files and read into R using the package asciiSetupReader. All work to clean the data and save it in various file formats was also done in R. For the R code used to clean this data, see here. https://github.com/jacobkap/crime_data. If you have any questions, comments, or suggestions please contact me at jkkaplan6@gmail.com.

I did not make any changes to the data other than the following. When an arrest column has a value of "None/not reported", I change that value to zero. This makes the (possible incorrect) assumption that these values represent zero crimes reported. The original data does not have a value when the agency reports zero arrests other than "None/not reported." In other words, this data does not differentiate between real zeros and missing values. Some agencies also incorrectly report the following numbers of arrests which I change to NA: 10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000, 100000, 99999, 99998.

To reduce file size and make the data more manageable, all of the data is aggregated yearly. All of the data is in agency-year units such that every row indicates an agency in a given year. Columns are crime-arrest category units. For example, If you choose the data set that includes murder, you would have rows for each agency-year and columns with the number of people arrests for murder. The ASR data breaks down arrests by age and gender (e.g. Male aged 15, Male aged 18). They also provide the number of adults or juveniles arrested by race. Because most agencies and years do not report the arrestee's ethnicity (Hispanic or not Hispanic) or juvenile outcomes (e.g. referred to adult court, referred to welfare agency), I do not include these columns.

To make it easier to merge with other data, I merged this data with the Law Enforcement Agency Identifiers Crosswalk (LEAIC) data. The data from the LEAIC add FIPS (state, county, and place) and agency type/subtype. Please note that some of the FIPS codes have leading zeros and if you open it in Excel it will automatically delete those leading zeros.

I created 9 arrest categories myself. The categories are:
Total Male JuvenileTotal Female JuvenileTotal Male AdultTotal Female AdultTotal MaleTotal FemaleTotal JuvenileTotal AdultTotal ArrestsAll of these categories are based on the sums of the sex-age categories (e.g. Male under 10, Female aged 22) rather than using the provided age-race categories (e.g. adult Black, juvenile Asian). As not all agencies report the race data, my method is more accurate. These categories also make up the data in the "simple" version of the data. The "simple" file only includes the above 9 columns as the arrest data (all other columns in the data are just agency identifier columns). Because this "simple" data set need fewer columns, I include all offenses.

As the arrest data is very granular, and each category of arrest is its own column, there are dozens of columns per crime. To keep the data somewhat manageable, there are nine different files, eight which contain different crimes and the "simple" file. Each file contains the data for all years. The eight categories each have crimes belonging to a major crime category and do not overlap in crimes other than with the index offenses. Please note that the crime names provided below are not the same as the column names in the data. Due to Stata limiting column names to 32 characters maximum, I have abbreviated the crime names in the data. The files and their included crimes are:

Index Crimes
MurderRapeRobberyAggravated AssaultBurglaryTheftMotor Vehicle TheftArsonAlcohol CrimesDUIDrunkenness
LiquorDrug CrimesTotal DrugTotal Drug SalesTotal Drug PossessionCannabis PossessionCannabis SalesHeroin or Cocaine PossessionHeroin or Cocaine SalesOther Drug PossessionOther Drug SalesSynthetic Narcotic PossessionSynthetic Narcotic SalesGrey Collar and Property CrimesForgeryFraudStolen PropertyFinancial CrimesEmbezzlementTotal GamblingOther GamblingBookmakingNumbers LotterySex or Family CrimesOffenses Against the Family and Children
Other Sex Offenses
ProstitutionRapeViolent CrimesAggravated AssaultMurderNegligent ManslaughterRobberyWeapon Offenses
Other CrimesCurfewDisorderly ConductOther Non-trafficSuspicion
VandalismVagrancy
Simple
This data set has every crime and only the arrest categories that I created (see above).
If you have any questions, comments, or suggestions please contact me at jkkaplan6@gmail.com.
Digitisation of Weather Records of Seungjeongwon Ilgi: A Historical Weather...
zenodo.org
bin, csv, json, txt
Updated Sep 27, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zeyu Lyu; Zeyu Lyu; Kohei Ichikawa; Kohei Ichikawa; Yongchao Cheng; Yongchao Cheng; Hisashi Hayakawa; Hisashi Hayakawa; Yukiko Kawamoto; Yukiko Kawamoto (2023). Digitisation of Weather Records of Seungjeongwon Ilgi: A Historical Weather Dynamics Dataset of the Korean Peninsula (1623-1910) [Dataset]. http://doi.org/10.5281/zenodo.7453644
Explore at:
csv, json, bin, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7453644
Dataset updated
Sep 27, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Zeyu Lyu; Zeyu Lyu; Kohei Ichikawa; Kohei Ichikawa; Yongchao Cheng; Yongchao Cheng; Hisashi Hayakawa; Hisashi Hayakawa; Yukiko Kawamoto; Yukiko Kawamoto
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Korea
Description
Introduction

This study has exploited the daily weather records of Seungjeongwon Ilgi from the NIKH database. Seungjeongwon Ilgi (http://sjw.history.go.kr/main.do) is a daily record of the Seungjeongwon, the Royal Secretariat of the Joseon Dynasty of Korea. These diaries span from 1623 to 1910 and generally involve daily weather records in the entry header. Their observational site would be located in Seoul (N37°35′, E126°59′). We have exploited the weather records from the NIKH database and classified the daily weather using text mining method. We have also converted the report dates from the traditional lunisolar calendar to the Gregorian calendar, to better contextualise our data into the contemporary daily measurements.

Data

We provide different formats (csv, xlsx, json) to facilitate the usage of data. The main contents of data are listed as below.

ID: The unique identifier of a specific record in the metadata, which can also serve as the identifier to merge with external data in the NIKH digital database.

Traditional calendar: The original lunar dates in the NIKH digital database, which are listed in data format "YYYY-MM-DD". More specifically, "L0" implies the leap year and "L1" implies the common year.

Leap: The identifier of a leap year.

Gregorian calendar: The Gregorian calendar date that converted by the traditional calendar date.

Weather Text: The text that describe the weather conditions. Specifically, multiple weather descriptions of the same day have been put together.

Flag: The computed value that indicates different combinations of weather conditions.

Volume: The volume of text in the original record.

Herbal Volume: The volume of text in the herbal record.

Sunny: A dummy variable that represents whether the weather description contains the expression of sunny.

Cloudy: A dummy variable that represents whether the weather description contains the expression of cloudy.

Rainy: A dummy variable that represents whether the weather description contains the expression of rainy.

Snow: A dummy variable that represents whether the weather description contains the expression of snow.

Wind: A dummy variable that represents whether the weather description contains the expression of wind.

Import Data

# Python # CSV file import pandas as pd data=pd.read_csv('~/SJWilgi_Seoul_Weather_YR1623_1910.csv',encoding="utf-8") # JSON file data=pd.read_json('~/SJWilgi_Seoul_Weather_YR1623_1910.json',encoding="utf-8") # Excel file data=pd.read_excel('~/SJWilgi_Seoul_Weather_YR1623_1910.xlsx') # Excel file

# R # CSV file library(readr) data<- read_csv("~/SJWilgi_Seoul_Weather_YR1623_1910.csv") # Excel file library(readxl) data <- read_excel("~/SJWilgi_Seoul_Weather_YR1623_1910.xlsx")
Z
What students answer when discussing about citation practices
data-staging.niaid.nih.gov
data.niaid.nih.gov
+1more
Updated Sep 21, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Salamin, Caroline; Cobolet, Noémi; Grolimund, Raphaël; Bouton, Pascale (2021). What students answer when discussing about citation practices [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_290155
Explore at:
Dataset updated
Sep 21, 2021
Dataset provided by
Bibliothèque de l'EPFL
Authors
Salamin, Caroline; Cobolet, Noémi; Grolimund, Raphaël; Bouton, Pascale
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This document explain how data were generated and how to interpret them.

LICENSE: CC0 But if you want to combine data with other datasets, feel free to use them as if they were published under CC0 license.
Data were published in February 2017. At that time, Zenodo only provided CC BY, CC BY-SA, CC BY-NC, CC BY-ND and CC BY-NC-ND. No CC0 option was available.

HOW DATA WERE COLLECTED The 21 recorded sessions took place between February 2013 and December 2016.
Data were collected using Turning Technologies' remote controls (called clickers) and TurningPoint software.

The 4 versions of the quiz used during these 4 years are provided in the 'quizzes' folder for information purpose (in PDF and Powerpoint formats).

Turning Technologies records data in a closed format (.tpzx) that can be exported and converted them into 3 formats provided here (these 3 files contain the same data):

Excel (.xslx)

Comma-spearated values (.csv)

SQLite (.sqlite)

The first one was directly exported from TurningPoint and is provided for Excel users who can't read CSV correctly.
CSV was converted from Excel and is provided for non-Excel users.
Finally, SQLite is provided in order to apply different sorting and filters to the data. It can be read using SQLite manager for Firefox (https://addons.mozilla.org/en-US/firefox/addon/sqlite-manager/).

CODEBOOK Here is the name, the meaning and the possible values of the columns (name - meaning [possible values]). If students didn't answer the question, the value is '-'.

Session - session number (chronological) [1 to 21] AcademicYear - academic year [12-13, 13-14, 14-15, 15-16, 16-17] Year - calendar year [2013, 2014, 2015, 2016] Month - month (number) [1 to 12] Day - day (number) [1 to 31] Section - section abbreviation [CH, ESC, GM, IF, SIE, SV] Level - students' level [BA2, BA3, MA] Language - course's language [FR or EN] DeviceID - clicker's ID [(unique ID within a session)] Q1 - answers to question 1 [A, B, C, D, E] Q2 - answers to question 2 [A, B, C, D] Q3 - answers to question 3 [A or B] Q4 - answers to question 4 [A or B] Q5 - answers to question 5 [A or B] Q6 - answers to question 6 [A or B] Q7 - answers to question 7 [A or B] Q8 - answers to question 8 [A or B] Q9 - answers to question 9 [A or B] Q8-9 - answers to the question 8-9 (merge) [A or B] Q10 - answers to question 10 [1, 2] Q11 - answers to question 11 [A or B] Q12 - answers to question 12 [A, B]

Section abbreviation meaning * CH: chemistry * ESC: school of criminal justice (Unil) * GM: mechanical engineering * IF: financial engineering * SIE: environmental engineering * SV: life sciences

Level meaning
* BA2: 2nd year of Bachelor * BA3: 3rd year of Bachelor * MA: Master level

Question types For some questions, multiple answers were allowed: Q1, Q2, Q10 & Q12.
Half of the questions have only one correct answer, true or false: Q3, Q5, Q6, Q7, Q8, Q9 & Q8-9.
Finally, for 2 questions only one answer was accepted, but there is not only one correct answer: Q4 & Q11.

INFORMATION ABOUT THE SESSIONS Except otherwise stated below, all sessions were conducted like the original one: Q1 to Q12 (no Q8-9). The original French version of the quiz has been translated into English for a few sessions with Master students. For sessions 14 and 20, Q5 was removed and Q8 & Q9 were merged in Q8-9.
Session 18 was a short one with only 7 sevens questions: Q1, Q2, Q3, Q4, Q6, Q7 & Q9.

CONTACT INFORMATION If you have any question about these data, contact formations.bib@epfl.ch.
Israel Census
kaggle.com
zip
Updated Jul 31, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dan Ofer (2018). Israel Census [Dataset]. https://www.kaggle.com/danofer/israel-census
Explore at:
zip(4275033 bytes)Available download formats
Dataset updated
Jul 31, 2018
Authors
Dan Ofer
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
Israel
Description
Context

2008 Population & demographic census data for Israel, at the level of settlements and lower .

Content

Data provided at the sub-settlement level (i.e neighborhoods). Variable names (in Hebrew and English) and data dictionary provided in XLS files. 2008 statistical area names provided (along with top roads/neighborhoods per settlement). Excel data needs cleaning/merging from multiple sub-pages.

Ideas:

Combine with voting datasets

Correlate population or economic growth over time with demographics

Geospatial analysis

Merge and clean the data from the sub tables.

Acknowledgements

Data from Israel Central Bureau of Statistics (CBS): http://www.cbs.gov.il/census/census/pnimi_page.html?id_topic=12

Photo by Me (Dan Ofer).
Cyclistic_Divvy_data
kaggle.com
zip
Updated Jun 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rami Ghaith (2023). Cyclistic_Divvy_data [Dataset]. https://www.kaggle.com/datasets/ramighaith/cyclistic-divvy-data
Explore at:
zip(21440758 bytes)Available download formats
Dataset updated
Jun 11, 2023
Authors
Rami Ghaith
Description
The following data shows riding information for members vs casual riders at the company Cyclistic(made up name). This is a dataset used as a case study for the google data analytics certificate.

The Changes Done to the Data in Excel: - Removed all duplicated (none were found) - Added a ride_length column by subtracting ended_at by started_at using the following formula "=C2-B2" and then turned that type into a Time, 37:30:55 - Added a day_of_week column using the following formula "=WEEKDAY(B2,1)" to display the day the ride took place on, 1= sunday through 7=saturday. - There was data that can be seen as ########, that data was left the same with no changes done to it, this data simply represents negative data and should just be looked at as 0.

Processing the Data in RStudio: - Installed required packages such as tidyverse for data import and wrangling, lubridate for date functions and ggplot for visualization. - Step 1: I read the csv files into R to collect the data - Step 2: Made sure the data all contained the same column names because I want to merge them into one - Step 3: Renamed all column names to make sure they align, then merged them into one combined data - Step 4: More data cleaning and analyzing - Step 5: Once my data was cleaned and clearly telling a story, I began to visualize it. The visualizations done can be seen below.
WISE provisional reference GIS Water Framework Directive (WFD) dataset on...
sdi.eea.europa.eu
www:url
Updated Oct 17, 2012
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
European Environment Agency (2012). WISE provisional reference GIS Water Framework Directive (WFD) dataset on Groundwater Bodies - INTERNAL VERSION, Oct. 2012 [Dataset]. https://sdi.eea.europa.eu/catalogue/srv/api/records/caca3b89-d60b-4949-a556-e15c198b8faf
Explore at:
www:urlAvailable download formats
Dataset updated
Oct 17, 2012
Dataset provided by
European Environment Agencyhttp://www.eea.europa.eu/
Time period covered
Jan 1, 2009 - Dec 31, 2011
Area covered

Description
A Groundwater Body (GWB) under the Water Framework Directive (WFD) Art. 2 is defined as a distinct volume of groundwater within an aquifer or aquifers, whereas an aquifer is defined as a geological layer with significant groundwater flow. This definition of a GWB allows a wide scope of interpretations. EU Member States (MS) are under obligation to report the GWBs including the results of the GWB survey periodically according to the schedule of the WFD. Reportnet is used for the submission of GWB data to the EEA by MS and includes spatial data as GIS polygons and GWB characteristics in an XML schema.

The WISE provisional reference GIS WFD Dataset on GWBs combines spatial data consisting of several shape files and certain GWB attributes in a single table submitted by the MS according to Art. 13. The GWBs are divided into horizons, which represent distinct vertical layers of groundwater resources. All GWBs assigned to a certain horizon from one to five are merged into one shape file. GWBs assigned to horizons six or seven are combined in a single further shape file. Another two shape files comprise the GWBs of Reunion Island in the southern hemisphere and the GWBs from Switzerland as a non EU MS, all of which assigned to horizon 1.

The dbf tables of the shape files include the columns “EU_CD_GW” as the GWB identifier and “Horizon” describing the vertical positioning. The polygon identifier “Polygon_ID” was added subsequently, because some GWBs consist of several polygons with identical “EU_CD_GW”even in the same horizon. Some further GWB characteristics are provided with the Microsoft Excel file “GWB_attributes_2012June.xls” including the column “EU_CD_GW”, which serves as a key for joining spatial and attribute data. There is no corresponding spatial data for GWBs in the Microsoft Excel table without an entry in column “EU_CD_GW”. The spatial resolution is given for about a half of the GWBs in the column “Scale” of the xls file, which is varying between the MS from 1 : 10,000 to 1 : 1,000,000 and mostly in the range from 1 : 50,000 to 1 : 250,000. The processing of some of the GWB shape files by GIS routines as clip or intersect in combination with a test polygon resulted in errors. Therefore a correction of erroneous topological features causing routine failures was carried out. However, the GWB layer includes a multitude of in parts very tiny, distinct areas resulting in a highly detailed or fragmented pattern. In certain parts topological inconsistencies appear quite frequently and delineation methodologies are currently varying between the MS in terms of size and three dimensional positioning of GWBs. This version of the dataset has to be considered as a first step towards a consistent GWB picture throughout Europe, but it is not yet of a sufficient quality to support spatial analyses i.e. it is not a fully developed reference GIS dataset. Therefore, the layer is published as a preliminary version and use of this data is subject to certain restrictions outlined in the explanatory notes.

It should be underlined that the methodology used is still under discussion (Working Group C -Groundwater) and is not fully harmonised throughout the EU MS.

For the external publication the whole United Kingdom has to be removed due to licensing restrictions.
E
Scottish Census 2011 Population by Council Area
dtechtive.com
xml, zip
Updated Feb 21, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Edinburgh (2017). Scottish Census 2011 Population by Council Area [Dataset]. http://doi.org/10.7488/ds/1908
Explore at:
zip(8.036 MB), xml(0.0038 MB)Available download formats
Unique identifier
https://doi.org/10.7488/ds/1908
Dataset updated
Feb 21, 2017
Dataset provided by
University of Edinburgh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Scotland
Description
This data is sourced from the Census 2011 and shows the population and population density by council area. Raw data sourced from http://www.scotlandscensus.gov.uk/en/censusresults/downloadablefiles.html and then manipulated in excel to merge a number of tables. The resulting data was joined to a shapefile of Scottish Council areas from sharegeo (http://www.sharegeo.ac.uk/handle/10672/305). Both sources should be attributed as the sources of the base data. GIS vector data. This dataset was first accessioned in the EDINA ShareGeo Open repository on 2012-12-19 and migrated to Edinburgh DataShare on 2017-02-21.
Raw Data for Figs 2–6 Excel file, tabulated according to figure.
plos.figshare.com
xlsx
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrew Hammond; Xenia Karlsson; Ioanna Morianou; Kyros Kyrou; Andrea Beaghton; Matthew Gribble; Nace Kranjc; Roberto Galizi; Austin Burt; Andrea Crisanti; Tony Nolan (2023). Raw Data for Figs 2–6 Excel file, tabulated according to figure. [Dataset]. http://doi.org/10.1371/journal.pgen.1009321.s004
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pgen.1009321.s004
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Andrew Hammond; Xenia Karlsson; Ioanna Morianou; Kyros Kyrou; Andrea Beaghton; Matthew Gribble; Nace Kranjc; Roberto Galizi; Austin Burt; Andrea Crisanti; Tony Nolan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Includes all raw data points, lists of test statistics and sample sizes. (XLSX)
Sprocket Central Mock Data
kaggle.com
zip
Updated Oct 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adrian Diaz (2022). Sprocket Central Mock Data [Dataset]. https://www.kaggle.com/datasets/adriandiazny/sprocket-central-mock-data
Explore at:
zip(2727074 bytes)Available download formats
Dataset updated
Oct 30, 2022
Authors
Adrian Diaz
Description
This Dataset comes from the KPMG virtual data analytics internship, please refer to the link if you wish to sign up! Link: https://www.theforage.com/virtual-internships/theme/m7W4GMqeT3bh9Nb2c/KPMG-Data-Analytics-Virtual-Internship

The goal of this data set is to uncover insights from a company's sales. The excel sheet must be cleaned and updated to complete a proper analysis. Please clean and prepare the data for analysis. Refer to the "title sheet in the workbook for more information/tips. I have posted my Jupyter notebooks: [https://www.kaggle.com/code/adriandiazny/sprocket-central-exploratory-analysis] - If you wish to see an example of the analysis and presentation. Best of luck!

You will need to merge some of the excel sheets together to match up the transactions made and customer demographic. That is the only way useful insights on profit will be made.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Aashirvad pandey (2024). Merge number of excel file,convert into csv file [Dataset]. https://www.kaggle.com/datasets/aashirvadpandey/merge-number-of-excel-fileconvert-into-csv-file

Merge number of excel file,convert into csv file

merging the file and converting the file

Explore at:

zip(6731 bytes)Available download formats

Dataset updated

Mar 30, 2024

Authors

Aashirvad pandey

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Project Description:

Title: Pandas Data Manipulation and File Conversion

Overview: This project aims to demonstrate the basic functionalities of Pandas, a powerful data manipulation library in Python. In this project, we will create a DataFrame, perform some data manipulation operations using Pandas, and then convert the DataFrame into both Excel and CSV formats.

Key Objectives:

DataFrame Creation: Utilize Pandas to create a DataFrame with sample data.
Data Manipulation: Perform basic data manipulation tasks such as adding columns, filtering data, and performing calculations.
File Conversion: Convert the DataFrame into Excel (.xlsx) and CSV (.csv) file formats.

Tools and Libraries Used:

Python
Pandas

Project Implementation:

DataFrame Creation:
- Import the Pandas library.
- Create a DataFrame using either a dictionary, a list of dictionaries, or by reading data from an external source like a CSV file.
- Populate the DataFrame with sample data representing various data types (e.g., integer, float, string, datetime).
Data Manipulation:
- Add new columns to the DataFrame representing derived data or computations based on existing columns.
- Filter the DataFrame to include only specific rows based on certain conditions.
- Perform basic calculations or transformations on the data, such as aggregation functions or arithmetic operations.
File Conversion:
- Utilize Pandas to convert the DataFrame into an Excel (.xlsx) file using the to_excel() function.
- Convert the DataFrame into a CSV (.csv) file using the to_csv() function.
- Save the generated files to the local file system for further analysis or sharing.

Expected Outcome:

Upon completion of this project, you will have gained a fundamental understanding of how to work with Pandas DataFrames, perform basic data manipulation tasks, and convert DataFrames into different file formats. This knowledge will be valuable for data analysis, preprocessing, and data export tasks in various data science and analytics projects.

Conclusion:

The Pandas library offers powerful tools for data manipulation and file conversion in Python. By completing this project, you will have acquired essential skills that are widely applicable in the field of data science and analytics. You can further extend this project by exploring more advanced Pandas functionalities or integrating it into larger data processing pipelines.in this data we add number of data and make that data a data frame.and save in single excel file as different sheet name and then convert that excel file in csv file .

Clear search

Close search

Google apps

Main menu

Merge number of excel file,convert into csv file

Excel file containing additional data too large to fit in a PDF,...

Bank Loan Analysis Project in Excel

FIRE1120: previous data tables

Related content

Cleaned NHANES 1988-2018

SMARTDEST DATASET WP3 v1.0

Superstore Sales Analysis

Uniform Crime Reporting (UCR) Program Data: Arrests by Age, Sex, and Race,...

Digitisation of Weather Records of Seungjeongwon Ilgi: A Historical Weather...

What students answer when discussing about citation practices

Israel Census

Context

Content

Ideas:

Acknowledgements

Cyclistic_Divvy_data

WISE provisional reference GIS Water Framework Directive (WFD) dataset on...

Scottish Census 2011 Population by Council Area

Raw Data for Figs 2–6 Excel file, tabulated according to figure.

Sprocket Central Mock Data

Merge number of excel file,convert into csv file

merging the file and converting the file