62 datasets found
  1. Merge number of excel file,convert into csv file

    • kaggle.com
    zip
    Updated Mar 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aashirvad pandey (2024). Merge number of excel file,convert into csv file [Dataset]. https://www.kaggle.com/datasets/aashirvadpandey/merge-number-of-excel-fileconvert-into-csv-file
    Explore at:
    zip(6731 bytes)Available download formats
    Dataset updated
    Mar 30, 2024
    Authors
    Aashirvad pandey
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Project Description:

    Title: Pandas Data Manipulation and File Conversion

    Overview: This project aims to demonstrate the basic functionalities of Pandas, a powerful data manipulation library in Python. In this project, we will create a DataFrame, perform some data manipulation operations using Pandas, and then convert the DataFrame into both Excel and CSV formats.

    Key Objectives:

    1. DataFrame Creation: Utilize Pandas to create a DataFrame with sample data.
    2. Data Manipulation: Perform basic data manipulation tasks such as adding columns, filtering data, and performing calculations.
    3. File Conversion: Convert the DataFrame into Excel (.xlsx) and CSV (.csv) file formats.

    Tools and Libraries Used:

    • Python
    • Pandas

    Project Implementation:

    1. DataFrame Creation:

      • Import the Pandas library.
      • Create a DataFrame using either a dictionary, a list of dictionaries, or by reading data from an external source like a CSV file.
      • Populate the DataFrame with sample data representing various data types (e.g., integer, float, string, datetime).
    2. Data Manipulation:

      • Add new columns to the DataFrame representing derived data or computations based on existing columns.
      • Filter the DataFrame to include only specific rows based on certain conditions.
      • Perform basic calculations or transformations on the data, such as aggregation functions or arithmetic operations.
    3. File Conversion:

      • Utilize Pandas to convert the DataFrame into an Excel (.xlsx) file using the to_excel() function.
      • Convert the DataFrame into a CSV (.csv) file using the to_csv() function.
      • Save the generated files to the local file system for further analysis or sharing.

    Expected Outcome:

    Upon completion of this project, you will have gained a fundamental understanding of how to work with Pandas DataFrames, perform basic data manipulation tasks, and convert DataFrames into different file formats. This knowledge will be valuable for data analysis, preprocessing, and data export tasks in various data science and analytics projects.

    Conclusion:

    The Pandas library offers powerful tools for data manipulation and file conversion in Python. By completing this project, you will have acquired essential skills that are widely applicable in the field of data science and analytics. You can further extend this project by exploring more advanced Pandas functionalities or integrating it into larger data processing pipelines.in this data we add number of data and make that data a data frame.and save in single excel file as different sheet name and then convert that excel file in csv file .

  2. e

    Excel Converting Llc Export Import Data | Eximpedia

    • eximpedia.app
    Updated Oct 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Excel Converting Llc Export Import Data | Eximpedia [Dataset]. https://www.eximpedia.app/companies/excel-converting-llc/25030592
    Explore at:
    Dataset updated
    Oct 15, 2025
    Description

    Excel Converting Llc Export Import Data. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.

  3. Z

    Data providers package for reporting Chemical Contaminants (official data...

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    • +1more
    Updated Feb 3, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    European Food Safety Authority (2020). Data providers package for reporting Chemical Contaminants (official data reporting phase) SSD1 [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_1256019
    Explore at:
    Dataset updated
    Feb 3, 2020
    Authors
    European Food Safety Authority
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In the framework of Articles 23 and 33 of Regulation (EC) No 178/2002 EFSA has received from the European Commission a mandate (M-2010-0374) to collect all available data on the occurrence of chemical contaminants in food and feed. These data are used in EFSA’s scientific opinions and reports on contaminants in food and feed.

    This data providers package provides the data collection configuration and supporting materials for reporting Chemical Contaminants in SSD1. These are to be used for the official data reporting phase.

    The package includes:

    The Standard Sample Description Version 2 XSD schema definition for CONTAMINANTS reporting.

    The general and CONTAMINANTS SSD1 specific business rules applied for the automatic validation of the submitted datasets.

    Excel Mapping tool to convert excel files after mapping into XML document.

    Please follow the instructions below for the correct use of the mapping tool to avoid compromising its functionalities:

    Download and save the MS Excel® Standard Sample Description file to your computer (do not open the file before saving and do not change the file name)

    Download and save the file MS Excel® Simplified Reporting Format (do not open the file before saving)

    Keep both Excel files in the same folder

    Open both Excel files and enable the macros

    Keep both files open in the same Excel instance when filling in the data

    Guidance on how to run the validation report after submitting data to the DCF.

  4. p

    Business Activity Survey 2009 - Samoa

    • microdata.pacificdata.org
    Updated Jul 2, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Samoa Bureau of Statistics (2019). Business Activity Survey 2009 - Samoa [Dataset]. https://microdata.pacificdata.org/index.php/catalog/253
    Explore at:
    Dataset updated
    Jul 2, 2019
    Dataset authored and provided by
    Samoa Bureau of Statistics
    Time period covered
    2009
    Area covered
    Samoa
    Description

    Abstract

    The intention is to collect data for the calendar year 2009 (or the nearest year for which each business keeps its accounts. The survey is considered a one-off survey, although for accurate NAs, such a survey should be conducted at least every five years to enable regular updating of the ratios, etc., needed to adjust the ongoing indicator data (mainly VAGST) to NA concepts. The questionnaire will be drafted by FSD, largely following the previous BAS, updated to current accounting terminology where necessary. The questionnaire will be pilot tested, using some accountants who are likely to complete a number of the forms on behalf of their business clients, and a small sample of businesses. Consultations will also include Ministry of Finance, Ministry of Commerce, Industry and Labour, Central Bank of Samoa (CBS), Samoa Tourism Authority, Chamber of Commerce, and other business associations (hotels, retail, etc.).

    The questionnaire will collect a number of items of information about the business ownership, locations at which it operates and each establishment for which detailed data can be provided (in the case of complex businesses), contact information, and other general information needed to clearly identify each unique business. The main body of the questionnaire will collect data on income and expenses, to enable value added to be derived accurately. The questionnaire will also collect data on capital formation, and will contain supplementary pages for relevant industries to collect volume of production data for selected commodities and to collect information to enable an estimate of value added generated by key tourism activities.

    The principal user of the data will be FSD which will incorporate the survey data into benchmarks for the NA, mainly on the current published production measure of GDP. The information on capital formation and other relevant data will also be incorporated into the experimental estimates of expenditure on GDP. The supplementary data on volumes of production will be used by FSD to redevelop the industrial production index which has recently been transferred under the SBS from the CBS. The general information about the business ownership, etc., will be used to update the Business Register.

    Outputs will be produced in a number of formats, including a printed report containing descriptive information of the survey design, data tables, and analysis of the results. The report will also be made available on the SBS website in “.pdf” format, and the tables will be available on the SBS website in excel tables. Data by region may also be produced, although at a higher level of aggregation than the national data. All data will be fully confidentialised, to protect the anonymity of all respondents. Consideration may also be made to provide, for selected analytical users, confidentialised unit record files (CURFs).

    A high level of accuracy is needed because the principal purpose of the survey is to develop revised benchmarks for the NA. The initial plan was that the survey will be conducted as a stratified sample survey, with full enumeration of large establishments and a sample of the remainder.

    Geographic coverage

    National Coverage

    Analysis unit

    The main statistical unit to be used for the survey is the establishment. For simple businesses that undertake a single activity at a single location there is a one-to-one relationship between the establishment and the enterprise. For large and complex enterprises, however, it is desirable to separate each activity of an enterprise into establishments to provide the most detailed information possible for industrial analysis. The business register will need to be developed in such a way that records the links between establishments and their parent enterprises. The business register will be created from administrative records and may not have enough information to recognize all establishments of complex enterprises. Large businesses will be contacted prior to the survey post-out to determine if they have separate establishments. If so, the extended structure of the enterprise will be recorded on the business register and a questionnaire will be sent to the enterprise to be completed for each establishment.

    SBS has decided to follow the New Zealand simplified version of its statistical units model for the 2009 BAS. Future surveys may consider location units and enterprise groups if they are found to be useful for statistical collections.

    It should be noted that while establishment data may enable the derivation of detailed benchmark accounts, it may be necessary to aggregate up to enterprise level data for the benchmarks if the ongoing data used to extrapolate the benchmark forward (mainly VAGST) are only available at the enterprise level.

    Universe

    The BAS's covered all employing units, and excluded small non-employing units such as the market sellers. The surveys also excluded central government agencies engaged in public administration (ministries, public education and health, etc.). It only covers businesses that pay the VAGST. (Threshold SAT$75,000 and upwards).

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    -Total Sample Size was 1240 -Out of the 1240, 902 successfully completed the questionnaire. -The other remaining 338 either never responded or were omitted (some businesses were ommitted from the sample as they do not meet the requirement to be surveyed) -Selection was all employing units paying VAGST (Threshold SAT $75,000 upwards)

    WILL CONFIRM LATER!!

    OSO LE MEA E LE FAASA...AEA :-)

    Mode of data collection

    Mail Questionnaire [mail]

    Research instrument

    1. General instructions, authority for the survey, etc;
    2. Business demography information on ownership, contact details, structure, etc.;
    3. Employment;
    4. Income;
    5. Expenses;
    6. Inventories;
    7. Profit or loss and reconciliation to business accounts' profit and loss;
    8. Fixed assets - purchases, disposals, book values
    9. Thank you and signature of respondent.

    Supplementary Pages Additional pages have been prepared to collect data for a limited range of industries. 1.Production data. To rebase and redevelop the Industrial Production Index (IPI), it is intended to collect volume of production information from a selection of large manufacturing businesses. The selection of businesses and products is critical to the usefulness of the IPI. The products must be homogeneous, and be of enough importance to the economy to justify collecting the data. Significance criteria should be established for the selection of products to include in the IPI, and the 2009 BAS provides an opportunity to collect benchmark data for a range of products known to be significant (based on information in the existing IPI, CPI weights, export data, etc.) as well as open questions for respondents to provide information on other significant products. 2.Tourism. There is a strong demand for estimates of tourism value added. To estimate tourism value added using the international standard Tourism Satellite Account methodology requires the use of an input-output table, which is beyond the capacity of SBS at present. However, some indicative estimates of the main parts of the economy influenced by tourism can be derived if the necessary data are collected. Tourism is a demand concept, based on defining tourists (the international standard includes both international and domestic tourists), what products are characteristically purchased by tourists, and which industries supply those products. Some questions targeted at those industries that have significant involvement with tourists (hotels, restaurants, transport and tour operators, vehicle hire, etc.), on how much of their income is sourced from tourism would provide valuable indicators of the size of the direct impact of tourism.

    Cleaning operations

    Partial imputation was done at the time of receipt of questionnaires, after follow-up procedures to obtain fully completed questionnaires have been followed. Imputation followed a process, i.e., apply ratios from responding units in the imputation cell to the partial data that was supplied. Procedures were established during the editing stage (a) to preserve the integrity of the questionnaires as supplied by respondents, and (b) to record all changes made to the questionnaires during editing. If SBS staff writes on the form, for example, this should only be done in red pen, to distinguish the alterations from the original information.

    Additional edit checks were developed, including checking against external data at enterprise/establishment level. External data to be checked against include VAGST and SNPF for turnover and purchases, and salaries and wages and employment data respectively. Editing and imputation processes were undertaken by FSD using Excel.

    Sampling error estimates

    NOT APPLICABLE!!

  5. n

    Factor for converting parts per million (ppm) of U, Th and K into Bq kg of...

    • narcis.nl
    • data.mendeley.com
    Updated Feb 18, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SUAREZ-NAVARRO, J (via Mendeley Data) (2019). Factor for converting parts per million (ppm) of U, Th and K into Bq kg of U-238, Th-232 and K-40 [Dataset]. http://doi.org/10.17632/ggmczjxk5d.1
    Explore at:
    Dataset updated
    Feb 18, 2019
    Dataset provided by
    Data Archiving and Networked Services (DANS)
    Authors
    SUAREZ-NAVARRO, J (via Mendeley Data)
    Description

    This data describes the calculation of the factors to transform the ppm of K, U and Th into Bq/kg of K-40, U-238 and Th-232 through their nuclear data. The Excel Spreadsheet shows the different operations with the expressions described in the PDF fill.

  6. GHS Safety Fingerprints

    • figshare.com
    xlsx
    Updated Oct 25, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brian Murphy (2018). GHS Safety Fingerprints [Dataset]. http://doi.org/10.6084/m9.figshare.7210019.v3
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Oct 25, 2018
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Brian Murphy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Spreadsheets targeted at the analysis of GHS safety fingerprints.AbstractOver a 20-year period, the UN developed the Globally Harmonized System (GHS) to address international variation in chemical safety information standards. By 2014, the GHS became widely accepted internationally and has become the cornerstone of OSHA’s Hazard Communication Standard. Despite this progress, today we observe that there are inconsistent results when different sources apply the GHS to specific chemicals, in terms of the GHS pictograms, hazard statements, precautionary statements, and signal words assigned to those chemicals. In order to assess the magnitude of this problem, this research uses an extension of the “chemical fingerprints” used in 2D chemical structure similarity analysis to GHS classifications. By generating a chemical safety fingerprint, the consistency of the GHS information for specific chemicals can be assessed. The problem is the sources for GHS information can differ. For example, the SDS for sodium hydroxide pellets found on Fisher Scientific’s website displays two pictograms, while the GHS information for sodium hydroxide pellets on Sigma Aldrich’s website has only one pictogram. A chemical information tool, which identifies such discrepancies within a specific chemical inventory, can assist in maintaining the quality of the safety information needed to support safe work in the laboratory. The tools for this analysis will be scaled to the size of a moderate large research lab or small chemistry department as a whole (between 1000 and 3000 chemical entities) so that labelling expectations within these universes can be established as consistently as possible.Most chemists are familiar with programs such as excel and google sheets which are spreadsheet programs that are used by many chemists daily. Though a monadal programming approach with these tools, the analysis of GHS information can be made possible for non-programmers. This monadal approach employs single spreadsheet functions to analyze the data collected rather than long programs, which can be difficult to debug and maintain. Another advantage of this approach is that the single monadal functions can be mixed and matched to meet new goals as information needs about the chemical inventory evolve over time. These monadal functions will be used to converts GHS information into binary strings of data called “bitstrings”. This approach is also used when comparing chemical structures. The binary approach make data analysis more manageable, as GHS information comes in a variety of formats such as pictures or alphanumeric strings which are difficult to compare on their face. Bitstrings generated using the GHS information can be compared using an operator such as the tanimoto coefficent to yield values from 0 for strings that have no similarity to 1 for strings that are the same. Once a particular set of information is analyzed the hope is the same techniques could be extended to more information. For example, if GHS hazard statements are analyzed through a spreadsheet approach the same techniques with minor modifications could be used to tackle more GHS information such as pictograms.Intellectual Merit. This research indicates that the use of the cheminformatic technique of structural fingerprints can be used to create safety fingerprints. Structural fingerprints are binary bit strings that are obtained from the non-numeric entity of 2D structure. This structural fingerprint allows comparison of 2D structure through the use of the tanimoto coefficient. The use of this structural fingerprint can be extended to safety fingerprints, which can be created by converting a non-numeric entity such as GHS information into a binary bit string and comparing data through the use of the tanimoto coefficient.Broader Impact. Extension of this research can be applied to many aspects of GHS information. This research focused on comparing GHS hazard statements, but could be further applied to other bits of GHS information such as pictograms and GHS precautionary statements. Another facet of this research is allowing the chemist who uses the data to be able to compare large dataset using spreadsheet programs such as excel and not need a large programming background. Development of this technique will also benefit the Chemical Health and Safety community and Chemical Information communities by better defining the quality of GHS information available and providing a scalable and transferable tool to manipulate this information to meet a variety of other organizational needs.

  7. o

    Data from: Climate Change and Educational Attainment in the Global Tropics

    • openicpsr.org
    Updated Mar 31, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heather Randell; Clark Gray (2019). Climate Change and Educational Attainment in the Global Tropics [Dataset]. http://doi.org/10.3886/E109141V2
    Explore at:
    Dataset updated
    Mar 31, 2019
    Dataset provided by
    University of North Carolina-Chapel Hill
    University of Maryland, College Park
    Authors
    Heather Randell; Clark Gray
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This project contains the Stata code as well as additional information used for the following paper:Randell, H & C Gray (Forthcoming). Climate Change and Educational Attainment in the Global Tropics. Proceedings of the National Academy of Sciences.The data are publicly available and can be accessed freely. The census data were obtained from IPUMS-International (https://international.ipums.org/international/) and the climate data were obtained from the CRU-Time Series Version 4.00 (http://data.ceda.ac.uk//badc/cru/data/cru_ts/cru_ts_4.00/).We include three do-files in this project:"Climate_-1_to_5.do" -- this file was used to convert the climate data into z-scores of climatic conditions experienced during ages -1 to 5 years among children in the sample. "ClimEducation_PNAS_FINAL.do" -- this file was used to process the census data downloaded from IPUMS-International, link it to the climate data, and perform all of the analyses in the study."Climate_6-10_and_11-current.do" -- this file was used to convert the climate data into z-scores of climatic conditions experienced during ages 6-10 and 11-current age among children in the sample.In addition, we include a shapefile (as well as related GIS files) for the final sample of analysis countries. The attribute "birthplace" is used to link the climate data to the census data. We include Python scripts for extracting monthly climate data for each 10-year temperature and precipitation file downloaded from CRU. "py0_60" extracts data for years one through five, and "py61_120" extracts data for years six through ten.Lastly, we include an excel file with inclusion/exclusion criteria for the countries and censuses available from IPUMS.

  8. d

    Protected Areas Database of the United States (PAD-US) 3.0 Vector Analysis...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Oct 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Protected Areas Database of the United States (PAD-US) 3.0 Vector Analysis and Summary Statistics [Dataset]. https://catalog.data.gov/dataset/protected-areas-database-of-the-united-states-pad-us-3-0-vector-analysis-and-summary-stati
    Explore at:
    Dataset updated
    Oct 22, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    United States
    Description

    Spatial analysis and statistical summaries of the Protected Areas Database of the United States (PAD-US) provide land managers and decision makers with a general assessment of management intent for biodiversity protection, natural resource management, and recreation access across the nation. The PAD-US 3.0 Combined Fee, Designation, Easement feature class (with Military Lands and Tribal Areas from the Proclamation and Other Planning Boundaries feature class) was modified to remove overlaps, avoiding overestimation in protected area statistics and to support user needs. A Python scripted process ("PADUS3_0_CreateVectorAnalysisFileScript.zip") associated with this data release prioritized overlapping designations (e.g. Wilderness within a National Forest) based upon their relative biodiversity conservation status (e.g. GAP Status Code 1 over 2), public access values (in the order of Closed, Restricted, Open, Unknown), and geodatabase load order (records are deliberately organized in the PAD-US full inventory with fee owned lands loaded before overlapping management designations, and easements). The Vector Analysis File ("PADUS3_0VectorAnalysisFile_ClipCensus.zip") associated item of PAD-US 3.0 Spatial Analysis and Statistics ( https://doi.org/10.5066/P9KLBB5D ) was clipped to the Census state boundary file to define the extent and serve as a common denominator for statistical summaries. Boundaries of interest to stakeholders (State, Department of the Interior Region, Congressional District, County, EcoRegions I-IV, Urban Areas, Landscape Conservation Cooperative) were incorporated into separate geodatabase feature classes to support various data summaries ("PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.zip") and Comma-separated Value (CSV) tables ("PADUS3_0SummaryStatistics_TabularData_CSV.zip") summarizing "PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.zip" are provided as an alternative format and enable users to explore and download summary statistics of interest (Comma-separated Table [CSV], Microsoft Excel Workbook [.XLSX], Portable Document Format [.PDF] Report) from the PAD-US Lands and Inland Water Statistics Dashboard ( https://www.usgs.gov/programs/gap-analysis-project/science/pad-us-statistics ). In addition, a "flattened" version of the PAD-US 3.0 combined file without other extent boundaries ("PADUS3_0VectorAnalysisFile_ClipCensus.zip") allow for other applications that require a representation of overall protection status without overlapping designation boundaries. The "PADUS3_0VectorAnalysis_State_Clip_CENSUS2020" feature class ("PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.gdb") is the source of the PAD-US 3.0 raster files (associated item of PAD-US 3.0 Spatial Analysis and Statistics, https://doi.org/10.5066/P9KLBB5D ). Note, the PAD-US inventory is now considered functionally complete with the vast majority of land protection types represented in some manner, while work continues to maintain updates and improve data quality (see inventory completeness estimates at: http://www.protectedlands.net/data-stewards/ ). In addition, changes in protected area status between versions of the PAD-US may be attributed to improving the completeness and accuracy of the spatial data more than actual management actions or new acquisitions. USGS provides no legal warranty for the use of this data. While PAD-US is the official aggregation of protected areas ( https://www.fgdc.gov/ngda-reports/NGDA_Datasets.html ), agencies are the best source of their lands data.

  9. AQI Dashboard Analysis Using Excel

    • kaggle.com
    zip
    Updated Aug 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aman Kumar Sisodiya (2025). AQI Dashboard Analysis Using Excel [Dataset]. https://www.kaggle.com/datasets/amankumarsisodiya/aqi-dashboard-analysis-using-excel/versions/1
    Explore at:
    zip(945625 bytes)Available download formats
    Dataset updated
    Aug 7, 2025
    Authors
    Aman Kumar Sisodiya
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    AQI-Dashboard-Project

    This project is an Excel-based analysis and visualization of Air Quality Index (AQI) across different cities. The goal is to understand pollution patterns and make data-driven observations using Excel tools like Power Query, Pivot Tables, Slicers, Charts, and DAX formulas.

    Project Highlights

    • Cleaned and structured AQI data in Excel
    • Built interactive dashboards using slicers and pivot tables
    • Categorized AQI levels into meaningful groups using DAX formulas
    • Derived key KPIs like average AQI, highest polluted city, and monthly trends
    • Made a user-friendly, visual dashboard to analyze AQI city-wise and over time

    Dashboard Preview

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F27389866%2Fde757d7ec55787577a9f8a2710a9e06a%2Faqi_analysis_dashboard.png?generation=1754590765375999&alt=media

    Files Included

    • AQI ANALYSIS BY AMAN SISODIYA.xlsx – Complete AQI dataset with dashboard
    • aqi_analysis_dashboard.png – Image preview of the Excel dashboard

    Tools Used

    • Microsoft Excel
    • Power Query
    • Pivot Tables & Charts
    • DAX (for AQI categorization logic)

    Purpose

    This project helped me understand how to turn raw environmental data into meaningful visual insights using Excel. It’s a demonstration of beginner-level data analytics and dashboarding skills.

    Created by Aman Sisodiya

  10. Winter Olympics Prediction - Fantasy Draft Picks

    • kaggle.com
    zip
    Updated Jan 19, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    EricSBrown (2022). Winter Olympics Prediction - Fantasy Draft Picks [Dataset]. https://www.kaggle.com/datasets/ericsbrown/winter-olympics-prediction-fantasy-draft-picks
    Explore at:
    zip(4928 bytes)Available download formats
    Dataset updated
    Jan 19, 2022
    Authors
    EricSBrown
    Description

    Olympic Draft Predictive Model

    Our family runs an Olympic Draft - similar to fantasy football or baseball - for each Olympic cycle. The purpose of this case study is to identify trends in medal count / point value to create a predictive analysis of which teams should be selected in which order.

    There are a few assumptions that will impact the final analysis: Point Value - Each medal is worth the following: Gold - 6 points Silver - 4 points Bronze - 3 points For analysis reviewing the last 10 Olympic cycles. Winter Olympics only.

    All GDP numbers are in USD

    My initial hypothesis is that larger GDP per capita and size of contingency are correlated with better points values for the Olympic draft.

    All Data pulled from the following Datasets:

    Winter Olympics Medal Count - https://www.kaggle.com/ramontanoeiro/winter-olympic-medals-1924-2018 Worldwide GDP History - https://data.worldbank.org/indicator/NY.GDP.MKTP.CD?end=2020&start=1984&view=chart

    GDP data was a wide format when downloaded from the World Bank. Opened file in Excel, removed irrelevant years, and saved as .csv.

    Process

    In RStudio utilized the following code to convert wide data to long:

    install.packages("tidyverse") library(tidyverse) library(tidyr)

    Converting to long data from wide

    long <- newgdpdata %>% gather(year, value, -c("Country Name","Country Code"))

    Completed these same steps for GDP per capita.

    Primary Key Creation

    Differing types of data between these two databases and there is not a good primary key to utilize. Used CONCAT to create a new key column in both combining the year and country code to create a unique identifier that matches between the datasets.

    SELECT *, CONCAT(year,country_code) AS "Primary" FROM medal_count

    Saved as new table "medals_w_primary"

    Utilized Excel to concatenate the primary key for GDP and GDP per capita utilizing:

    =CONCAT()

    Saved as new csv files.

    Uploaded all to SSMS.

    Contingent Size

    Next need to add contingent size.

    No existing database had this information. Pulled data from Wikipedia.

    2018 - No problem, pulled existing table. 2014 - Table was not created. Pulled information into excel, needed to convert the country NAMES into the country CODES.

    Created excel document with all ISO Country Codes. Items were broken down between both formats, either 2 or 3 letters. Example:

    AF/AFG

    Used =RIGHT(C1,3) to extract only the country codes.

    For the country participants list in 2014, copied source data from Wikipedia and pasted as plain text (not HTML).

    Items then showed as: Albania (2)

    Broke cells using "(" as the delimiter to separate country names and numbers, then find and replace to remove all parenthesis from this data.

    We were left with: Albania 2

    Used VLOOKUP to create correct country code: =VLOOKUP(A1,'Country Codes'!A:D,4,FALSE)

    This worked for almost all items with a few exceptions that didn't match. Based on nature and size of items, manually checked on which items were incorrect.

    Chinese Taipei 3 #N/A Great Britain 56 #N/A Virgin Islands 1 #N/A

    This was relatively easy to fix by adding corresponding line items to the Country Codes sheet to account for future variability in the country code names.

    Copied over to main sheet.

    Repeated this process for additional years.

    Once complete created sheet with all 10 cycles of data. In total there are 731 items.

    Data Cleaning

    Filtered by Country Code since this was an issue early on.

    Found a number of N/A Country Codes:

    Serbia and Montenegro FR Yugoslavia FR Yugoslavia Czechoslovakia Unified Team Yugoslavia Czechoslovakia East Germany West Germany Soviet Union Yugoslavia Czechoslovakia East Germany West Germany Soviet Union Yugoslavia

    Appears to be issues with older codes, Soviet Union block countries especially. Referred to historical data and filled in these country codes manually. Codes found on iso.org.

    Filled all in, one issue that was more difficult is the Unified Team of 1992 and Soviet Union. For simplicity used code for Russia - GDP data does not recognize the Soviet Union, breaks the union down to constituent countries. Using Russia is a reasonable figure for approximations and analysis to attempt to find trends.

    From here created a filter and scanned through the country names to ensure there were no obvious outliers. Found the following:

    Olympic Athletes from Russia[b] -- This is a one-off due to the recent PED controversy for Russia. Amended the Country Code to RUS to more accurately reflect the trends.

    Korea[a] and South Korea -- both were listed in 2018. This is due to the unified Korean team that competed. This is an outlier and does not warrant standing on its own as the 2022 Olympics will not have this team (as of this writing on 01/14/2022). Removed the COR country code item.

    Confirmed Primary Key was created for all entries.

    Ran minimum and maximum years, no...

  11. [Superseded] Intellectual Property Government Open Data 2019

    • data.gov.au
    • researchdata.edu.au
    csv-geo-au, pdf
    Updated Jan 26, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    IP Australia (2022). [Superseded] Intellectual Property Government Open Data 2019 [Dataset]. https://data.gov.au/data/dataset/activity/intellectual-property-government-open-data-2019
    Explore at:
    csv-geo-au(59281977), csv-geo-au(680030), csv-geo-au(39873883), csv-geo-au(37247273), csv-geo-au(25433945), csv-geo-au(92768371), pdf(702054), csv-geo-au(208449), csv-geo-au(166844), csv-geo-au(517357734), csv-geo-au(32100526), csv-geo-au(33981694), csv-geo-au(21315), csv-geo-au(6828919), csv-geo-au(86824299), csv-geo-au(359763), csv-geo-au(567412), csv-geo-au(153175), csv-geo-au(165051861), csv-geo-au(115749297), csv-geo-au(79743393), csv-geo-au(55504675), csv-geo-au(221026), csv-geo-au(50760305), csv-geo-au(2867571), csv-geo-au(212907250), csv-geo-au(4352457), csv-geo-au(4843670), csv-geo-au(1032589), csv-geo-au(1163830), csv-geo-au(278689420), csv-geo-au(28585330), csv-geo-au(130674), csv-geo-au(13968748), csv-geo-au(11926959), csv-geo-au(4802733), csv-geo-au(243729054), csv-geo-au(64511181), csv-geo-au(592774239), csv-geo-au(149948862)Available download formats
    Dataset updated
    Jan 26, 2022
    Dataset authored and provided by
    IP Australiahttp://ipaustralia.gov.au/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    What is IPGOD?

    The Intellectual Property Government Open Data (IPGOD) includes over 100 years of registry data on all intellectual property (IP) rights administered by IP Australia. It also has derived information about the applicants who filed these IP rights, to allow for research and analysis at the regional, business and individual level. This is the 2019 release of IPGOD.

    How do I use IPGOD?

    IPGOD is large, with millions of data points across up to 40 tables, making them too large to open with Microsoft Excel. Furthermore, analysis often requires information from separate tables which would need specialised software for merging. We recommend that advanced users interact with the IPGOD data using the right tools with enough memory and compute power. This includes a wide range of programming and statistical software such as Tableau, Power BI, Stata, SAS, R, Python, and Scalar.

    IP Data Platform

    IP Australia is also providing free trials to a cloud-based analytics platform with the capabilities to enable working with large intellectual property datasets, such as the IPGOD, through the web browser, without any installation of software. IP Data Platform

    References

    The following pages can help you gain the understanding of the intellectual property administration and processes in Australia to help your analysis on the dataset.

    Updates

    Tables and columns

    Due to the changes in our systems, some tables have been affected.

    • We have added IPGOD 225 and IPGOD 325 to the dataset!
    • The IPGOD 206 table is not available this year.
    • Many tables have been re-built, and as a result may have different columns or different possible values. Please check the data dictionary for each table before use.

    Data quality improvements

    Data quality has been improved across all tables.

    • Null values are simply empty rather than '31/12/9999'.
    • All date columns are now in ISO format 'yyyy-mm-dd'.
    • All indicator columns have been converted to Boolean data type (True/False) rather than Yes/No, Y/N, or 1/0.
    • All tables are encoded in UTF-8.
    • All tables use the backslash \ as the escape character.
    • The applicant name cleaning and matching algorithms have been updated. We believe that this year's method improves the accuracy of the matches. Please note that the "ipa_id" generated in IPGOD 2019 will not match with those in previous releases of IPGOD.
  12. Excel mapping tools for 2018 zoonoses data reporting

    • nde-dev.biothings.io
    • data.niaid.nih.gov
    • +1more
    Updated Feb 7, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    European Food Safety Authority (2020). Excel mapping tools for 2018 zoonoses data reporting [Dataset]. https://nde-dev.biothings.io/resources?id=zenodo_2549662
    Explore at:
    Dataset updated
    Feb 7, 2020
    Dataset provided by
    The European Food Safety Authorityhttp://www.efsa.europa.eu/
    Authors
    European Food Safety Authority
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The main objective of the mapping tool is to provide a simple and useable platform for MSs to map their country-specific standard terminology to that used by EFSA and to enable the production of an XML file for the submission of sample or aggregated-based zoonoses monitoring data via the DCF.

    The catalogues and the specific hierarchy of each data model (AMR, ESBL, PRV, FBO, POP and DST) are already inserted into each of the specific mapping tool. Specific Excel mapping tools correspond to each of the six data models are available.

    You can choose between the dynamic or the manual version of the tool.

  13. Car Connection Picture dataset

    • kaggle.com
    zip
    Updated Jan 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Usman Basharat (2023). Car Connection Picture dataset [Dataset]. https://www.kaggle.com/datasets/usmanbasharat/predicting-a-car-price-of-car-connection-picture/code
    Explore at:
    zip(1849886 bytes)Available download formats
    Dataset updated
    Jan 26, 2023
    Authors
    Usman Basharat
    Description

    This whole dataset contains numerous images of the interior, exterior of different cars with different models and years. This will be explained in detail later on in this report. As for any dataset, this requires a numerous approach of cleansing the data. Any unnecessary data will be extracted from the dataset. Any rows that contains more than four empty columns will be removed. Data will be extracted from all the image labels. Therefore, an approach would be get all this data and convert this into an excel file. Once the data has been cleansed, I will test this data using three different machine learning models. Different approaches to explore different topics can be approached by different learning models and techniques.

  14. Cyclistic

    • kaggle.com
    zip
    Updated May 12, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Salam Ibrahim (2022). Cyclistic [Dataset]. https://www.kaggle.com/datasets/salamibrahim/cyclistic
    Explore at:
    zip(209748131 bytes)Available download formats
    Dataset updated
    May 12, 2022
    Authors
    Salam Ibrahim
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    **Introduction ** This case study will be based on Cyclistic, a bike sharing company in Chicago. I will perform tasks of a junior data analyst to answer business questions. I will do this by following a process that includes the following phases: ask, prepare, process, analyze, share and act.

    Background Cyclistic is a bike sharing company that operates 5828 bikes within 692 docking stations. The company has been around since 2016 and separates itself from the competition due to the fact that they offer a variety of bike services including assistive options. Lily Moreno is the director of the marketing team and will be the person to receive these insights from this analysis.

    Case Study and business task Lily Morenos perspective on how to generate more income by marketing Cyclistics services correctly includes converting casual riders (one day passes and/or pay per ride customers) into annual riders with a membership. Annual riders are more profitable than casual riders according to the finance analysts. She would rather see a campaign targeting casual riders into annual riders, instead of launching campaigns targeting new costumers. So her strategy as the manager of the marketing team is simply to maximize the amount of annual riders by converting casual riders.

    In order to make a data driven decision, Moreno needs the following insights: - A better understanding of how casual riders and annual riders differ - Why would a casual rider become an annual one - How digital media can affect the marketing tactics

    Moreno has directed me to the first question - how do casual riders and annual riders differ?

    Stakeholders Lily Moreno, manager of the marketing team Cyclistic Marketing team Executive team

    Data sources and organization Data used in this report is made available and is licensed by Motivate International Inc. Personal data is hidden to protect personal information. Data used is from the past 12 months (01/04/2021 – 31/03/2022) of bike share dataset.

    By merging all 12 monthly bike share data provided, an extensive amount of data with 5,400,000 rows were returned and included in this analysis.

    Data security and limitations: Personal information is secured and hidden to prevent unlawful use. Original files are backed up in folders and subfolders.

    Tools and documentation of cleaning process The tools used for data verification and data cleaning are Microsoft Excel and R programming. The original files made accessible by Motivate International Inc. are backed up in their original format and in separate files.

    Microsoft Excel is used to generally look through the dataset and get a overview of the content. I performed simple checks of the data by filtering, sorting, formatting and standardizing the data to make it easily mergeable.. In Excel, I also changed data type to have the right format, removed unnecessary data if its incomplete or incorrect, created new columns to subtract and reformat existing columns and deleting empty cells. These tasks are easily done in spreadsheets and provides an initial cleaning process of the data.

    R will be used to perform queries of bigger datasets such as this one. R will also be used to create visualizations to answer the question at hand.

    Limitations Microsoft Excel has a limitation of 1,048,576 rows while the data of the 12 months combined are over 5,500,000 rows. When combining the 12 months of data into one table/sheet, Excel is no longer efficient and I switched over to R programming.

  15. f

    Data from: FlowCal: A User-Friendly, Open Source Software Tool for...

    • acs.figshare.com
    zip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sebastian M. Castillo-Hair; John T. Sexton; Brian P. Landry; Evan J. Olson; Oleg A. Igoshin; Jeffrey J. Tabor (2023). FlowCal: A User-Friendly, Open Source Software Tool for Automatically Converting Flow Cytometry Data from Arbitrary to Calibrated Units [Dataset]. http://doi.org/10.1021/acssynbio.5b00284.s002
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    ACS Publications
    Authors
    Sebastian M. Castillo-Hair; John T. Sexton; Brian P. Landry; Evan J. Olson; Oleg A. Igoshin; Jeffrey J. Tabor
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Flow cytometry is widely used to measure gene expression and other molecular biological processes with single cell resolution via fluorescent probes. Flow cytometers output data in arbitrary units (a.u.) that vary with the probe, instrument, and settings. Arbitrary units can be converted to the calibrated unit molecules of equivalent fluorophore (MEF) using commercially available calibration particles. However, there is no convenient, nonproprietary tool available to perform this calibration. Consequently, most researchers report data in a.u., limiting interpretation. Here, we report a software tool named FlowCal to overcome current limitations. FlowCal can be run using an intuitive Microsoft Excel interface, or customizable Python scripts. The software accepts Flow Cytometry Standard (FCS) files as inputs and is compatible with different calibration particles, fluorescent probes, and cell types. Additionally, FlowCal automatically gates data, calculates common statistics, and produces publication quality plots. We validate FlowCal by calibrating a.u. measurements of E. coli expressing superfolder GFP (sfGFP) collected at 10 different detector sensitivity (gain) settings to a single MEF value. Additionally, we reduce day-to-day variability in replicate E. coli sfGFP expression measurements due to instrument drift by 33%, and calibrate S. cerevisiae Venus expression data to MEF units. Finally, we demonstrate a simple method for using FlowCal to calibrate fluorescence units across different cytometers. FlowCal should ease the quantitative analysis of flow cytometry data within and across laboratories and facilitate the adoption of standard fluorescence units in synthetic biology and beyond.

  16. w

    Data from: New Data Reduction Tools and their Application to The Geysers...

    • data.wu.ac.at
    pdf
    Updated Dec 5, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2017). New Data Reduction Tools and their Application to The Geysers Geothermal Field [Dataset]. https://data.wu.ac.at/schema/geothermaldata_org/ZWE5ZDJlZWUtNWFkNC00ZGQzLWI1MTMtMDNiNDMzZDIwMDg5
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Dec 5, 2017
    Area covered
    5062496d9a50dfb3cb447a299bc65ba6d8e0625d, The Geysers
    Description

    Microsoft Excel based (using Visual Basic for Applications) data-reduction and visualization tools have been developed that enable the user to numerically reduce large sets of geothermal data to any size. The data can be quickly sifted and graphed to allow their study. The ability to analyze large data sets can field responses to field management procedures that would otherwise be undetectable. Field-wide trends such as decline rates, response to injection, evolution of superheat, recording instrumentation problems and data inconsistencies can be quickly queried and graphed. Here we demonstrate the application of these tools to data from The Geysers Geothermal field. We believe these data-reduction tools will also be useful in other applications, such as oil and gas field data, and well log data. A copy of these tools may be requested by contacting the authors.

  17. d

    Ecommerce Market Data | South-east Asia E-commerce Contacts | 170M Profiles...

    • datarade.ai
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Success.ai, Ecommerce Market Data | South-east Asia E-commerce Contacts | 170M Profiles | Verified Accuracy | Best Price Guarantee [Dataset]. https://datarade.ai/data-products/ecommerce-market-data-south-east-asia-e-commerce-contacts-success-ai
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset provided by
    Success.ai
    Area covered
    South East Asia, Yemen, Timor-Leste, Iraq, Qatar, Israel, Sri Lanka, Nepal, Philippines, Syrian Arab Republic, Lebanon
    Description

    Success.ai’s Ecommerce Market Data for South-east Asia E-commerce Contacts provides a robust and accurate dataset tailored for businesses and organizations looking to connect with professionals in the fast-growing e-commerce industry across South-east Asia. Covering roles such as e-commerce managers, digital strategists, logistics experts, and online marketplace leaders, this dataset offers verified contact details, professional insights, and actionable market data.

    With access to over 170 million verified profiles globally, Success.ai ensures your outreach, marketing, and research strategies are powered by accurate, continuously updated, and AI-validated data. Backed by our Best Price Guarantee, this solution empowers you to excel in one of the world’s most dynamic e-commerce regions.

    Why Choose Success.ai’s Ecommerce Market Data?

    1. Verified Contact Data for Precision Outreach

      • Access verified work emails, phone numbers, and LinkedIn profiles of e-commerce professionals across South-east Asia.
      • AI-driven validation ensures 99% accuracy, reducing communication inefficiencies and enhancing engagement rates.
    2. Comprehensive Coverage of South-east Asia’s E-commerce Market

      • Includes professionals from key e-commerce hubs such as Singapore, Indonesia, Thailand, Vietnam, Malaysia, and the Philippines.
      • Gain insights into regional consumer trends, logistics challenges, and online marketplace dynamics.
    3. Continuously Updated Datasets

      • Real-time updates capture changes in professional roles, company expansions, and market conditions.
      • Stay aligned with industry trends and emerging opportunities in South-east Asia’s e-commerce sector.
    4. Ethical and Compliant

      • Fully adheres to GDPR, CCPA, and other global data privacy regulations, ensuring responsible and lawful data usage.

    Data Highlights:

    • 170M+ Verified Global Profiles: Engage with e-commerce professionals and decision-makers across South-east Asia.
    • Verified Contact Details: Gain work emails, phone numbers, and LinkedIn profiles for precision targeting.
    • Regional Insights: Understand key trends in e-commerce, logistics, and consumer preferences in South-east Asia.
    • Leadership Insights: Connect with online marketplace leaders, logistics managers, and digital marketing professionals driving innovation in the sector.

    Key Features of the Dataset:

    1. Comprehensive Professional Profiles in E-commerce

      • Identify and connect with professionals managing e-commerce platforms, online marketplaces, and logistics operations.
      • Target individuals responsible for digital marketing, supply chain management, and e-commerce strategies.
    2. Advanced Filters for Precision Campaigns

      • Filter professionals by industry focus (apparel, electronics, food delivery), geographic location, or job function.
      • Tailor campaigns to align with specific business goals, such as logistics optimization, consumer engagement, or market entry.
    3. Regional and Market-specific Insights

      • Leverage data on e-commerce trends, regional consumer behaviors, and logistics challenges unique to South-east Asia.
      • Refine marketing strategies and business plans based on actionable insights from the region.
    4. AI-Driven Enrichment

      • Profiles enriched with actionable data enable personalized messaging, highlight unique value propositions, and improve engagement outcomes.

    Strategic Use Cases:

    1. Marketing Campaigns and Digital Outreach

      • Promote e-commerce solutions, logistics services, or online marketing tools to professionals in South-east Asia’s e-commerce industry.
      • Use verified contact data for multi-channel outreach, including email, phone, and digital campaigns.
    2. Market Research and Competitive Analysis

      • Analyze e-commerce trends and consumer preferences across South-east Asia to refine product offerings and marketing strategies.
      • Benchmark against competitors to identify growth opportunities and high-demand solutions.
    3. Partnership Development and Vendor Collaboration

      • Build relationships with e-commerce platforms, logistics providers, and digital marketing agencies exploring strategic partnerships.
      • Foster collaborations that enhance consumer experiences, improve delivery efficiency, or expand market reach.
    4. Recruitment and Talent Acquisition

      • Target HR professionals and hiring managers in the e-commerce industry seeking candidates for logistics, digital marketing, and platform management roles.
      • Provide workforce optimization platforms or training solutions tailored to the sector.

    Why Choose Success.ai?

    1. Best Price Guarantee

      • Access premium-quality e-commerce market data at competitive prices, ensuring strong ROI for your marketing, sales, and business development initiatives.
    2. Seamless Integration

      • Integrate verified e-commerce data into CRM systems, analytics ...
  18. e

    Excel Converting Group Llc Export Import Data | Eximpedia

    • eximpedia.app
    Updated Sep 2, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Excel Converting Group Llc Export Import Data | Eximpedia [Dataset]. https://www.eximpedia.app/companies/excel-converting-group-llc/10986666
    Explore at:
    Dataset updated
    Sep 2, 2025
    Description

    Excel Converting Group Llc Export Import Data. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.

  19. Google Data Analytics Capstone Project

    • kaggle.com
    Updated Oct 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Rookie (2022). Google Data Analytics Capstone Project [Dataset]. https://www.kaggle.com/datasets/rookieaj1234/google-data-analytics-capstone-project
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 1, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Data Rookie
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Project Name: Divvy Bikeshare Trip Data_Year2020 Date Range: April 2020 to December 2020. Analyst: Ajith Software: R Program, Microsoft Excel IDE: RStudio

    The following are the basic system requirements, necessary for the project: Processor: Intel i3 or AMD Ryzen 3 and higher Internal RAM: 8 GB or higher Operating System: Windows 7 or above, MacOS

    **Data Usage License: https://ride.divvybikes.com/data-license-agreement ** Introduction:

    In this case, study we aim to utilize different data analysis techniques and tools, to understand the rental patterns of the divvy bike sharing company and understand the key business improvement suggestions. This case study is a mandatory project to be submitted to achieve the Google Data Analytics Certification. The data utilized in this case study was licensed based on the provided data usage license. The trips between April 2020 to December 2020 are used to analyse the data.

    Scenario: Marketing team needs to design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ.

    Objective: The main objective of this case study, is to understand the customer usage patterns and the breakdown of customers, based on their subscription status and the average durations of the rental bike usage.

    Introduction to Data: The Data provided for this project, is adhered to the data usage license, laid down by the source company. The source data was provided in the CSV files and are month and quarter breakdowns. A total of 13 columns of data was provided in each csv file.

    The following are the columns, which were initially observed across the datasets.

    Ride_id Ride_type Start_station_name Start_station_id End_station_name End_station_id Usertype Start_time End_time Start_lat Start_lng End_lat End_lng

    Documentation, Cleaning and Preparing Data for Analysis: The total size of the datasets, for the year 2020, is approximately 450 MB, which is tiring job, when you have to upload them to the SQL database and visualize using the BI tools. I wanted to improve my skills into R environment and this is the best opportunity and optimal to use R for the data analysis.

    For more insights, installation procedures for R and RStudio, please refer to the following URL, for additional information.

    R Projects Document: https://www.r-project.org/other-docs.html RStudio Download: https://www.rstudio.com/products/rstudio/ Installation Guide: https://www.youtube.com/watch?v=TFGYlKvQEQ4

  20. g

    IP Australia - [Superseded] Intellectual Property Government Open Data 2019...

    • gimi9.com
    Updated Jul 20, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). IP Australia - [Superseded] Intellectual Property Government Open Data 2019 | gimi9.com [Dataset]. https://gimi9.com/dataset/au_intellectual-property-government-open-data-2019
    Explore at:
    Dataset updated
    Jul 20, 2018
    Area covered
    Australia
    Description

    What is IPGOD? The Intellectual Property Government Open Data (IPGOD) includes over 100 years of registry data on all intellectual property (IP) rights administered by IP Australia. It also has derived information about the applicants who filed these IP rights, to allow for research and analysis at the regional, business and individual level. This is the 2019 release of IPGOD. # How do I use IPGOD? IPGOD is large, with millions of data points across up to 40 tables, making them too large to open with Microsoft Excel. Furthermore, analysis often requires information from separate tables which would need specialised software for merging. We recommend that advanced users interact with the IPGOD data using the right tools with enough memory and compute power. This includes a wide range of programming and statistical software such as Tableau, Power BI, Stata, SAS, R, Python, and Scalar. # IP Data Platform IP Australia is also providing free trials to a cloud-based analytics platform with the capabilities to enable working with large intellectual property datasets, such as the IPGOD, through the web browser, without any installation of software. IP Data Platform # References The following pages can help you gain the understanding of the intellectual property administration and processes in Australia to help your analysis on the dataset. * Patents * Trade Marks * Designs * Plant Breeder’s Rights # Updates ### Tables and columns Due to the changes in our systems, some tables have been affected. * We have added IPGOD 225 and IPGOD 325 to the dataset! * The IPGOD 206 table is not available this year. * Many tables have been re-built, and as a result may have different columns or different possible values. Please check the data dictionary for each table before use. ### Data quality improvements Data quality has been improved across all tables. * Null values are simply empty rather than '31/12/9999'. * All date columns are now in ISO format 'yyyy-mm-dd'. * All indicator columns have been converted to Boolean data type (True/False) rather than Yes/No, Y/N, or 1/0. * All tables are encoded in UTF-8. * All tables use the backslash \ as the escape character. * The applicant name cleaning and matching algorithms have been updated. We believe that this year's method improves the accuracy of the matches. Please note that the "ipa_id" generated in IPGOD 2019 will not match with those in previous releases of IPGOD.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Aashirvad pandey (2024). Merge number of excel file,convert into csv file [Dataset]. https://www.kaggle.com/datasets/aashirvadpandey/merge-number-of-excel-fileconvert-into-csv-file
Organization logo

Merge number of excel file,convert into csv file

merging the file and converting the file

Explore at:
zip(6731 bytes)Available download formats
Dataset updated
Mar 30, 2024
Authors
Aashirvad pandey
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Project Description:

Title: Pandas Data Manipulation and File Conversion

Overview: This project aims to demonstrate the basic functionalities of Pandas, a powerful data manipulation library in Python. In this project, we will create a DataFrame, perform some data manipulation operations using Pandas, and then convert the DataFrame into both Excel and CSV formats.

Key Objectives:

  1. DataFrame Creation: Utilize Pandas to create a DataFrame with sample data.
  2. Data Manipulation: Perform basic data manipulation tasks such as adding columns, filtering data, and performing calculations.
  3. File Conversion: Convert the DataFrame into Excel (.xlsx) and CSV (.csv) file formats.

Tools and Libraries Used:

  • Python
  • Pandas

Project Implementation:

  1. DataFrame Creation:

    • Import the Pandas library.
    • Create a DataFrame using either a dictionary, a list of dictionaries, or by reading data from an external source like a CSV file.
    • Populate the DataFrame with sample data representing various data types (e.g., integer, float, string, datetime).
  2. Data Manipulation:

    • Add new columns to the DataFrame representing derived data or computations based on existing columns.
    • Filter the DataFrame to include only specific rows based on certain conditions.
    • Perform basic calculations or transformations on the data, such as aggregation functions or arithmetic operations.
  3. File Conversion:

    • Utilize Pandas to convert the DataFrame into an Excel (.xlsx) file using the to_excel() function.
    • Convert the DataFrame into a CSV (.csv) file using the to_csv() function.
    • Save the generated files to the local file system for further analysis or sharing.

Expected Outcome:

Upon completion of this project, you will have gained a fundamental understanding of how to work with Pandas DataFrames, perform basic data manipulation tasks, and convert DataFrames into different file formats. This knowledge will be valuable for data analysis, preprocessing, and data export tasks in various data science and analytics projects.

Conclusion:

The Pandas library offers powerful tools for data manipulation and file conversion in Python. By completing this project, you will have acquired essential skills that are widely applicable in the field of data science and analytics. You can further extend this project by exploring more advanced Pandas functionalities or integrating it into larger data processing pipelines.in this data we add number of data and make that data a data frame.and save in single excel file as different sheet name and then convert that excel file in csv file .

Search
Clear search
Close search
Google apps
Main menu