Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Project Description:
Title: Pandas Data Manipulation and File Conversion
Overview: This project aims to demonstrate the basic functionalities of Pandas, a powerful data manipulation library in Python. In this project, we will create a DataFrame, perform some data manipulation operations using Pandas, and then convert the DataFrame into both Excel and CSV formats.
Key Objectives:
Tools and Libraries Used:
Project Implementation:
DataFrame Creation:
Data Manipulation:
File Conversion:
to_excel() function.to_csv() function.Expected Outcome:
Upon completion of this project, you will have gained a fundamental understanding of how to work with Pandas DataFrames, perform basic data manipulation tasks, and convert DataFrames into different file formats. This knowledge will be valuable for data analysis, preprocessing, and data export tasks in various data science and analytics projects.
Conclusion:
The Pandas library offers powerful tools for data manipulation and file conversion in Python. By completing this project, you will have acquired essential skills that are widely applicable in the field of data science and analytics. You can further extend this project by exploring more advanced Pandas functionalities or integrating it into larger data processing pipelines.in this data we add number of data and make that data a data frame.and save in single excel file as different sheet name and then convert that excel file in csv file .
Facebook
Twitterhttps://assets.publishing.service.gov.uk/media/6707823292bb81fcdbe7b5ff/fire-statistics-data-tables-fire1120-191023.xlsx">FIRE1120: Staff joining fire authorities (headcount), by fire and rescue authority, gender and role (19 October 2023) (MS Excel Spreadsheet, 194 KB)
https://assets.publishing.service.gov.uk/media/652d3a7f6b6fbf0014b756d9/fire-statistics-data-tables-fire1120-201022.xlsx">FIRE1120: Staff joining fire authorities (headcount), by fire and rescue authority, gender and role (20 October 2022) (MS Excel Spreadsheet, 293 KB)
https://assets.publishing.service.gov.uk/media/634e7f238fa8f5346ba7099b/fire-statistics-data-tables-fire1120-051121.xlsx">FIRE1120: Staff joining fire authorities (headcount), by fire and rescue authority, gender and role (05 November 2021) (MS Excel Spreadsheet, 220 KB)
https://assets.publishing.service.gov.uk/media/61853a37e90e07198018fb0b/fire-statistics-data-tables-fire1120-211021.xlsx">FIRE1120: Staff joining fire authorities (headcount), by fire and rescue authority, gender and role (21 October 2021) (MS Excel Spreadsheet, 210 KB)
https://assets.publishing.service.gov.uk/media/616d7d218fa8f5298406229e/fire-statistics-data-tables-fire1120-221020.xlsx">FIRE1120: Staff joining fire authorities, by fire and rescue authority, gender and role (22 October 2020) (MS Excel Spreadsheet, 157 KB)
https://assets.publishing.service.gov.uk/media/5f86b42b8fa8f517090ab0e4/fire-statistics-data-tables-fire1120-141119.xlsx">FIRE1120: Staff joining fire authorities, by fire and rescue authority, gender and role (14 November 2019) (MS Excel Spreadsheet, 116 KB)
https://assets.publishing.service.gov.uk/media/5dc9869ee5274a5c51437e43/fire-statistics-data-tables-fire1120-311019.xlsx">FIRE1120: Staff joining fire authorities, by fire and rescue authority, gender and role (31 October 2019) (MS Excel Spreadsheet, 116 KB)
https://assets.publishing.service.gov.uk/media/5db7098040f0b6379a7acbc4/fire-statistics-data-tables-fire1120-170119.xlsx">FIRE1120: Staff joining fire authorities, by fire and rescue authority, gender and role (17 January 2019) (MS Excel Spreadsheet, 74.5 KB)
https://assets.publishing.service.gov.uk/media/5c34bd7ee5274a65ab281de8/fire-statistics-data-tables-fire1120-18oct2018.xlsx">FIRE1120: Staff joining fire authorities, by fire and rescue authority, gender and role (18 October 2018) (MS Excel Spreadsheet, 74.3 KB)
https://assets.publishing.service.gov.uk/media/5bbcc352e5274a3611919f80/fire-statistics-data-tables-fire1120.xlsx">FIRE1120: Staff joining fire authorities, by fire and rescue authority, gender and role (26 October 2017) (MS Excel Spreadsheet, 24.3 KB)
<a href="https://www.gov.uk/government/statistical-data-sets/fire-
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Excel file containing additional data too large to fit in a PDF, CUT&RUN–RNAseq merge analyses.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
About Datasets: - Domain : Finance - Project: Bank loan of customers - Datasets: Finance_1.xlsx & Finance_2.xlsx - Dataset Type: Excel Data - Dataset Size: Each Excel file has 39k+ records
KPI's: 1. Year wise loan amount Stats 2. Grade and sub grade wise revol_bal 3. Total Payment for Verified Status Vs Total Payment for Non Verified Status 4. State wise loan status 5. Month wise loan status 6. Get more insights based on your understanding of the data
Process: 1. Understanding the problem 2. Data Collection 3. Data Cleaning 4. Exploring and analyzing the data 5. Interpreting the results
This data contains Power Query, Power Pivot, Merge data, Clustered Bar Chart, Clustered Column Chart, Line Chart, 3D Pie chart, Dashboard, slicers, timeline, formatting techniques.
Facebook
TwitterOriginal file: https://www.kaggle.com/datasets/redlineracer/nfl-combine-performance-data-2009-2019
Using NFL Combine data from 2009-2019, the information was cleaned and adjusted to conform to standard measurements in Excel. PivotTables were utilized to analyze the relationship between variables such as BMI, Draft Round, Teams, Schools, Players, Positions, and more. Additionally, a dashboard was created to present the findings in a clear and concise manner.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The National Health and Nutrition Examination Survey (NHANES) provides data and have considerable potential to study the health and environmental exposure of the non-institutionalized US population. However, as NHANES data are plagued with multiple inconsistencies, processing these data is required before deriving new insights through large-scale analyses. Thus, we developed a set of curated and unified datasets by merging 614 separate files and harmonizing unrestricted data across NHANES III (1988-1994) and Continuous (1999-2018), totaling 135,310 participants and 5,078 variables. The variables conveydemographics (281 variables),dietary consumption (324 variables),physiological functions (1,040 variables),occupation (61 variables),questionnaires (1444 variables, e.g., physical activity, medical conditions, diabetes, reproductive health, blood pressure and cholesterol, early childhood),medications (29 variables),mortality information linked from the National Death Index (15 variables),survey weights (857 variables),environmental exposure biomarker measurements (598 variables), andchemical comments indicating which measurements are below or above the lower limit of detection (505 variables).csv Data Record: The curated NHANES datasets and the data dictionaries includes 23 .csv files and 1 excel file.The curated NHANES datasets involves 20 .csv formatted files, two for each module with one as the uncleaned version and the other as the cleaned version. The modules are labeled as the following: 1) mortality, 2) dietary, 3) demographics, 4) response, 5) medications, 6) questionnaire, 7) chemicals, 8) occupation, 9) weights, and 10) comments."dictionary_nhanes.csv" is a dictionary that lists the variable name, description, module, category, units, CAS Number, comment use, chemical family, chemical family shortened, number of measurements, and cycles available for all 5,078 variables in NHANES."dictionary_harmonized_categories.csv" contains the harmonized categories for the categorical variables.“dictionary_drug_codes.csv” contains the dictionary for descriptors on the drugs codes.“nhanes_inconsistencies_documentation.xlsx” is an excel file that contains the cleaning documentation, which records all the inconsistencies for all affected variables to help curate each of the NHANES modules.R Data Record: For researchers who want to conduct their analysis in the R programming language, only cleaned NHANES modules and the data dictionaries can be downloaded as a .zip file which include an .RData file and an .R file.“w - nhanes_1988_2018.RData” contains all the aforementioned datasets as R data objects. We make available all R scripts on customized functions that were written to curate the data.“m - nhanes_1988_2018.R” shows how we used the customized functions (i.e. our pipeline) to curate the original NHANES data.Example starter codes: The set of starter code to help users conduct exposome analysis consists of four R markdown files (.Rmd). We recommend going through the tutorials in order.“example_0 - merge_datasets_together.Rmd” demonstrates how to merge the curated NHANES datasets together.“example_1 - account_for_nhanes_design.Rmd” demonstrates how to conduct a linear regression model, a survey-weighted regression model, a Cox proportional hazard model, and a survey-weighted Cox proportional hazard model.“example_2 - calculate_summary_statistics.Rmd” demonstrates how to calculate summary statistics for one variable and multiple variables with and without accounting for the NHANES sampling design.“example_3 - run_multiple_regressions.Rmd” demonstrates how run multiple regression models with and without adjusting for the sampling design.
Facebook
TwitterAnalyzing sales data is essential for any business looking to make informed decisions and optimize its operations. In this project, we will utilize Microsoft Excel and Power Query to conduct a comprehensive analysis of Superstore sales data. Our primary objectives will be to establish meaningful connections between various data sheets, ensure data quality, and calculate critical metrics such as the Cost of Goods Sold (COGS) and discount values. Below are the key steps and elements of this analysis:
1- Data Import and Transformation:
2- Data Quality Assessment:
3- Calculating COGS:
4- Discount Analysis:
5- Sales Metrics:
6- Visualization:
7- Report Generation:
Throughout this analysis, the goal is to provide a clear and comprehensive understanding of the Superstore's sales performance. By using Excel and Power Query, we can efficiently manage and analyze the data, ensuring that the insights gained contribute to the store's growth and success.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The SMARTDEST DATASET WP3 v1.0 includes data at sub-city level for 7 cities: Amsterdam, Barcelona, Edinburgh, Lisbon, Ljubljana, Turin, and Venice. It is made up of information extracted from public sources at the local level (mostly, city council open data portals) or volunteered geographic information, that is, geospatial content generated by non-professionals using mapping systems available on the Internet (e.g., Geofabrik). Details on data sources and variables are included in a ‘metadata’ spreadsheet in the excel file. The same excel file contains 5 additional spreadsheets. The first one, labelled #1, was used to perform the analysis on the determinants of the geographical spread of tourism supply in SMARTDEST case study’s cities (in the main document D3.3, section 4.1), The second one (labelled #2) offers information that would allow to replicate the analysis on tourism-led population decline reported in section 4.3. As for spreadsheets named #3-AMS, #4-BCN, and #5-EDI, they refer to data sources and variables used to run follow-up analyses discussed in section 5.1, with the objective of digging into the causes of depopulation in Amsterdam, Barcelona, and Edinburgh, respectively. The column ‘row’ can be used to merge the excel file with the shapefile ‘db_task3.3_SmartDest’. Data are available at the buurt level in Amsterdam (an administrative unit roughly corresponding to a neighbourhood), census tract level in Barcelona and Ljubljana, for data zones in Edinburgh, statistical zones in Turin, and località in Venice.
Facebook
TwitterA Groundwater Body (GWB) under the Water Framework Directive (WFD) Art. 2 is defined as a distinct volume of groundwater within an aquifer or aquifers, whereas an aquifer is defined as a geological layer with significant groundwater flow. This definition of a GWB allows a wide scope of interpretations. EU Member States (MS) are under obligation to report the GWBs including the results of the GWB survey periodically according to the schedule of the WFD. Reportnet is used for the submission of GWB data to the EEA by MS and includes spatial data as GIS polygons and GWB characteristics in an XML schema.
The WISE provisional reference GIS WFD Dataset on GWBs combines spatial data consisting of several shape files and certain GWB attributes in a single table submitted by the MS according to Art. 13. The GWBs are divided into horizons, which represent distinct vertical layers of groundwater resources. All GWBs assigned to a certain horizon from one to five are merged into one shape file. GWBs assigned to horizons six or seven are combined in a single further shape file. Another two shape files comprise the GWBs of Reunion Island in the southern hemisphere and the GWBs from Switzerland as a non EU MS, all of which assigned to horizon 1.
The dbf tables of the shape files include the columns “EU_CD_GW” as the GWB identifier and “Horizon” describing the vertical positioning. The polygon identifier “Polygon_ID” was added subsequently, because some GWBs consist of several polygons with identical “EU_CD_GW”even in the same horizon. Some further GWB characteristics are provided with the Microsoft Excel file “GWB_attributes_2012June.xls” including the column “EU_CD_GW”, which serves as a key for joining spatial and attribute data. There is no corresponding spatial data for GWBs in the Microsoft Excel table without an entry in column “EU_CD_GW”. The spatial resolution is given for about a half of the GWBs in the column “Scale” of the xls file, which is varying between the MS from 1 : 10,000 to 1 : 1,000,000 and mostly in the range from 1 : 50,000 to 1 : 250,000. The processing of some of the GWB shape files by GIS routines as clip or intersect in combination with a test polygon resulted in errors. Therefore a correction of erroneous topological features causing routine failures was carried out. However, the GWB layer includes a multitude of in parts very tiny, distinct areas resulting in a highly detailed or fragmented pattern. In certain parts topological inconsistencies appear quite frequently and delineation methodologies are currently varying between the MS in terms of size and three dimensional positioning of GWBs. This version of the dataset has to be considered as a first step towards a consistent GWB picture throughout Europe, but it is not yet of a sufficient quality to support spatial analyses i.e. it is not a fully developed reference GIS dataset. Therefore, the layer is published as a preliminary version and use of this data is subject to certain restrictions outlined in the explanatory notes.
It should be underlined that the methodology used is still under discussion (Working Group C -Groundwater) and is not fully harmonised throughout the EU MS.
For the external publication the whole United Kingdom has to be removed due to licensing restrictions.
Facebook
TwitterThe following data shows riding information for members vs casual riders at the company Cyclistic(made up name). This is a dataset used as a case study for the google data analytics certificate.
The Changes Done to the Data in Excel: - Removed all duplicated (none were found) - Added a ride_length column by subtracting ended_at by started_at using the following formula "=C2-B2" and then turned that type into a Time, 37:30:55 - Added a day_of_week column using the following formula "=WEEKDAY(B2,1)" to display the day the ride took place on, 1= sunday through 7=saturday. - There was data that can be seen as ########, that data was left the same with no changes done to it, this data simply represents negative data and should just be looked at as 0.
Processing the Data in RStudio: - Installed required packages such as tidyverse for data import and wrangling, lubridate for date functions and ggplot for visualization. - Step 1: I read the csv files into R to collect the data - Step 2: Made sure the data all contained the same column names because I want to merge them into one - Step 3: Renamed all column names to make sure they align, then merged them into one combined data - Step 4: More data cleaning and analyzing - Step 5: Once my data was cleaned and clearly telling a story, I began to visualize it. The visualizations done can be seen below.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data is sourced from the Census 2011 and shows the population and population density by council area. Raw data sourced from http://www.scotlandscensus.gov.uk/en/censusresults/downloadablefiles.html and then manipulated in excel to merge a number of tables. The resulting data was joined to a shapefile of Scottish Council areas from sharegeo (http://www.sharegeo.ac.uk/handle/10672/305). Both sources should be attributed as the sources of the base data. GIS vector data. This dataset was first accessioned in the EDINA ShareGeo Open repository on 2012-12-19 and migrated to Edinburgh DataShare on 2017-02-21.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
2008 Population & demographic census data for Israel, at the level of settlements and lower .
Data provided at the sub-settlement level (i.e neighborhoods). Variable names (in Hebrew and English) and data dictionary provided in XLS files. 2008 statistical area names provided (along with top roads/neighborhoods per settlement). Excel data needs cleaning/merging from multiple sub-pages.
Data from Israel Central Bureau of Statistics (CBS): http://www.cbs.gov.il/census/census/pnimi_page.html?id_topic=12
Photo by Me (Dan Ofer).
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset includes the results of the 2019 NFL Combine.
The data was extracted from nflcombineresults.com with Excel
nflcombineresults.com
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Project Description:
Title: Pandas Data Manipulation and File Conversion
Overview: This project aims to demonstrate the basic functionalities of Pandas, a powerful data manipulation library in Python. In this project, we will create a DataFrame, perform some data manipulation operations using Pandas, and then convert the DataFrame into both Excel and CSV formats.
Key Objectives:
Tools and Libraries Used:
Project Implementation:
DataFrame Creation:
Data Manipulation:
File Conversion:
to_excel() function.to_csv() function.Expected Outcome:
Upon completion of this project, you will have gained a fundamental understanding of how to work with Pandas DataFrames, perform basic data manipulation tasks, and convert DataFrames into different file formats. This knowledge will be valuable for data analysis, preprocessing, and data export tasks in various data science and analytics projects.
Conclusion:
The Pandas library offers powerful tools for data manipulation and file conversion in Python. By completing this project, you will have acquired essential skills that are widely applicable in the field of data science and analytics. You can further extend this project by exploring more advanced Pandas functionalities or integrating it into larger data processing pipelines.in this data we add number of data and make that data a data frame.and save in single excel file as different sheet name and then convert that excel file in csv file .