15 datasets found
  1. Merge number of excel file,convert into csv file

    • kaggle.com
    zip
    Updated Mar 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aashirvad pandey (2024). Merge number of excel file,convert into csv file [Dataset]. https://www.kaggle.com/datasets/aashirvadpandey/merge-number-of-excel-fileconvert-into-csv-file
    Explore at:
    zip(6731 bytes)Available download formats
    Dataset updated
    Mar 30, 2024
    Authors
    Aashirvad pandey
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Project Description:

    Title: Pandas Data Manipulation and File Conversion

    Overview: This project aims to demonstrate the basic functionalities of Pandas, a powerful data manipulation library in Python. In this project, we will create a DataFrame, perform some data manipulation operations using Pandas, and then convert the DataFrame into both Excel and CSV formats.

    Key Objectives:

    1. DataFrame Creation: Utilize Pandas to create a DataFrame with sample data.
    2. Data Manipulation: Perform basic data manipulation tasks such as adding columns, filtering data, and performing calculations.
    3. File Conversion: Convert the DataFrame into Excel (.xlsx) and CSV (.csv) file formats.

    Tools and Libraries Used:

    • Python
    • Pandas

    Project Implementation:

    1. DataFrame Creation:

      • Import the Pandas library.
      • Create a DataFrame using either a dictionary, a list of dictionaries, or by reading data from an external source like a CSV file.
      • Populate the DataFrame with sample data representing various data types (e.g., integer, float, string, datetime).
    2. Data Manipulation:

      • Add new columns to the DataFrame representing derived data or computations based on existing columns.
      • Filter the DataFrame to include only specific rows based on certain conditions.
      • Perform basic calculations or transformations on the data, such as aggregation functions or arithmetic operations.
    3. File Conversion:

      • Utilize Pandas to convert the DataFrame into an Excel (.xlsx) file using the to_excel() function.
      • Convert the DataFrame into a CSV (.csv) file using the to_csv() function.
      • Save the generated files to the local file system for further analysis or sharing.

    Expected Outcome:

    Upon completion of this project, you will have gained a fundamental understanding of how to work with Pandas DataFrames, perform basic data manipulation tasks, and convert DataFrames into different file formats. This knowledge will be valuable for data analysis, preprocessing, and data export tasks in various data science and analytics projects.

    Conclusion:

    The Pandas library offers powerful tools for data manipulation and file conversion in Python. By completing this project, you will have acquired essential skills that are widely applicable in the field of data science and analytics. You can further extend this project by exploring more advanced Pandas functionalities or integrating it into larger data processing pipelines.in this data we add number of data and make that data a data frame.and save in single excel file as different sheet name and then convert that excel file in csv file .

  2. Excel Add-ins for Data Integration and Analysis

    • blog.devart.com
    html
    Updated Dec 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Devart (2024). Excel Add-ins for Data Integration and Analysis [Dataset]. https://blog.devart.com/how-to-consolidate-customer-data-into-excel-using-powerful-add-ins.html
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Dec 27, 2024
    Dataset authored and provided by
    Devart
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Add-in, Description
    Description

    A reference table of popular Excel add-ins for consolidating, managing, and analyzing customer data.

  3. d

    Replication Data for Exploring an extinct society through the lens of...

    • dataone.org
    Updated Dec 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wieczorek, Oliver; Malzahn, Melanie (2023). Replication Data for Exploring an extinct society through the lens of Habitus-Field theory and the Tocharian text corpus [Dataset]. http://doi.org/10.7910/DVN/UF8DHK
    Explore at:
    Dataset updated
    Dec 16, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Wieczorek, Oliver; Malzahn, Melanie
    Description

    The files and workflow will allow you to replicate the study titled "Exploring an extinct society through the lens of Habitus-Field theory and the Tocharian text corpus". This study aimed at utilizing the CEToM-corpus (https://cetom.univie.ac.at/) (Tocharian) to analyze the life-world of the elites of an extinct society situated in modern eastern China. To acquire the raw data needed for steps 1 & 2, please contact Melanie Malzahn melanie.malzahn@univie.ac.at. We conducted a mixed methods study, containing of close reading, content analysis, and multiple correspondence analysis (MCA). The excel file titled "fragments_architecture_combined.xlsx" allows for replication of the MCA and equates to the third step of the workflow outlined below. We used the following programming languages and packages to prepare the dataset and to analyze the data. Data preparation and merging procedures were achieved in python (version 3.9.10) with packages pandas (version 1.5.3), os (version 3.12.0), re (version 3.12.0), numpy (version 1.24.3), gensim (version 4.3.1), BeautifulSoup4 (version 4.12.2), pyasn1 (version 0.4.8), and langdetect (version 1.0.9). Multiple Correspondence Analyses were conducted in R (version 4.3.2) with the packages FactoMineR (version 2.9), factoextra (version 1.0.7), readxl version(1.4.3), tidyverse version(2.0.0), ggplot2 (version 3.4.4) and psych (version 2.3.9). After requesting the necessary files, please open the scripts in the order outlined bellow and execute the code-files to replicate the analysis: Preparatory step: Create a folder for the python and r-scripts downloadable in this repository. Open the file 0_create folders.py and declare a root folder in line 19. This first script will generate you the following folders: "tarim-brahmi_database" = Folder, which contains tocharian dictionaries and tocharian text fragments. "dictionaries" = contains tocharian A and tocharian B vocabularies, including linguistic features such as translations, meanings, part of speech tags etc. A full overview of the words is provided on https://cetom.univie.ac.at/?words. "fragments" = contains tocharian text fragments as xml-files. "word_corpus_data" = folder will contain excel-files of the corpus data after the first step. "Architectural_terms" = This folder contains the data on the architectural terms used in the dataset (e.g. dwelling, house). "regional_data" = This folder contains the data on the findsports (tocharian and modern chinese equivalent, e.g. Duldur-Akhur & Kucha). "mca_ready_data" = This is the folder, in which the excel-file with the merged data will be saved. Note that the prepared file named "fragments_architecture_combined.xlsx" can be saved into this directory. This allows you to skip steps 1 &2 and reproduce the MCA of the content analysis based on the third step of our workflow (R-Script titled 3_conduct_MCA.R). First step - run 1_read_xml-files.py: loops over the xml-files in folder dictionaries and identifies a) word metadata, including language (Tocharian A or B), keywords, part of speech, lemmata, word etymology, and loan sources. Then, it loops over the xml-textfiles and extracts a text id number, langauge (Tocharian A or B), text title, text genre, text subgenre, prose type, verse type, material on which the text is written, medium, findspot, the source text in tocharian, and the translation where available. After successful feature extraction, the resulting pandas dataframe object is exported to the word_corpus_data folder. Second step - run 2_merge_excel_files.py: merges all excel files (corpus, data on findspots, word data) and reproduces the content analysis, which was based upon close reading in the first place. Third step - run 3_conduct_MCA.R: recodes, prepares, and selects the variables necessary to conduct the MCA. Then produces the descriptive values, before conducitng the MCA, identifying typical texts per dimension, and exporting the png-files uploaded to this repository.

  4. Cleaned NHANES 1988-2018

    • figshare.com
    txt
    Updated Feb 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vy Nguyen; Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet (2025). Cleaned NHANES 1988-2018 [Dataset]. http://doi.org/10.6084/m9.figshare.21743372.v9
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 18, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Vy Nguyen; Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The National Health and Nutrition Examination Survey (NHANES) provides data and have considerable potential to study the health and environmental exposure of the non-institutionalized US population. However, as NHANES data are plagued with multiple inconsistencies, processing these data is required before deriving new insights through large-scale analyses. Thus, we developed a set of curated and unified datasets by merging 614 separate files and harmonizing unrestricted data across NHANES III (1988-1994) and Continuous (1999-2018), totaling 135,310 participants and 5,078 variables. The variables conveydemographics (281 variables),dietary consumption (324 variables),physiological functions (1,040 variables),occupation (61 variables),questionnaires (1444 variables, e.g., physical activity, medical conditions, diabetes, reproductive health, blood pressure and cholesterol, early childhood),medications (29 variables),mortality information linked from the National Death Index (15 variables),survey weights (857 variables),environmental exposure biomarker measurements (598 variables), andchemical comments indicating which measurements are below or above the lower limit of detection (505 variables).csv Data Record: The curated NHANES datasets and the data dictionaries includes 23 .csv files and 1 excel file.The curated NHANES datasets involves 20 .csv formatted files, two for each module with one as the uncleaned version and the other as the cleaned version. The modules are labeled as the following: 1) mortality, 2) dietary, 3) demographics, 4) response, 5) medications, 6) questionnaire, 7) chemicals, 8) occupation, 9) weights, and 10) comments."dictionary_nhanes.csv" is a dictionary that lists the variable name, description, module, category, units, CAS Number, comment use, chemical family, chemical family shortened, number of measurements, and cycles available for all 5,078 variables in NHANES."dictionary_harmonized_categories.csv" contains the harmonized categories for the categorical variables.“dictionary_drug_codes.csv” contains the dictionary for descriptors on the drugs codes.“nhanes_inconsistencies_documentation.xlsx” is an excel file that contains the cleaning documentation, which records all the inconsistencies for all affected variables to help curate each of the NHANES modules.R Data Record: For researchers who want to conduct their analysis in the R programming language, only cleaned NHANES modules and the data dictionaries can be downloaded as a .zip file which include an .RData file and an .R file.“w - nhanes_1988_2018.RData” contains all the aforementioned datasets as R data objects. We make available all R scripts on customized functions that were written to curate the data.“m - nhanes_1988_2018.R” shows how we used the customized functions (i.e. our pipeline) to curate the original NHANES data.Example starter codes: The set of starter code to help users conduct exposome analysis consists of four R markdown files (.Rmd). We recommend going through the tutorials in order.“example_0 - merge_datasets_together.Rmd” demonstrates how to merge the curated NHANES datasets together.“example_1 - account_for_nhanes_design.Rmd” demonstrates how to conduct a linear regression model, a survey-weighted regression model, a Cox proportional hazard model, and a survey-weighted Cox proportional hazard model.“example_2 - calculate_summary_statistics.Rmd” demonstrates how to calculate summary statistics for one variable and multiple variables with and without accounting for the NHANES sampling design.“example_3 - run_multiple_regressions.Rmd” demonstrates how run multiple regression models with and without adjusting for the sampling design.

  5. Cyclistic

    • kaggle.com
    zip
    Updated May 12, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Salam Ibrahim (2022). Cyclistic [Dataset]. https://www.kaggle.com/datasets/salamibrahim/cyclistic
    Explore at:
    zip(209748131 bytes)Available download formats
    Dataset updated
    May 12, 2022
    Authors
    Salam Ibrahim
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    **Introduction ** This case study will be based on Cyclistic, a bike sharing company in Chicago. I will perform tasks of a junior data analyst to answer business questions. I will do this by following a process that includes the following phases: ask, prepare, process, analyze, share and act.

    Background Cyclistic is a bike sharing company that operates 5828 bikes within 692 docking stations. The company has been around since 2016 and separates itself from the competition due to the fact that they offer a variety of bike services including assistive options. Lily Moreno is the director of the marketing team and will be the person to receive these insights from this analysis.

    Case Study and business task Lily Morenos perspective on how to generate more income by marketing Cyclistics services correctly includes converting casual riders (one day passes and/or pay per ride customers) into annual riders with a membership. Annual riders are more profitable than casual riders according to the finance analysts. She would rather see a campaign targeting casual riders into annual riders, instead of launching campaigns targeting new costumers. So her strategy as the manager of the marketing team is simply to maximize the amount of annual riders by converting casual riders.

    In order to make a data driven decision, Moreno needs the following insights: - A better understanding of how casual riders and annual riders differ - Why would a casual rider become an annual one - How digital media can affect the marketing tactics

    Moreno has directed me to the first question - how do casual riders and annual riders differ?

    Stakeholders Lily Moreno, manager of the marketing team Cyclistic Marketing team Executive team

    Data sources and organization Data used in this report is made available and is licensed by Motivate International Inc. Personal data is hidden to protect personal information. Data used is from the past 12 months (01/04/2021 – 31/03/2022) of bike share dataset.

    By merging all 12 monthly bike share data provided, an extensive amount of data with 5,400,000 rows were returned and included in this analysis.

    Data security and limitations: Personal information is secured and hidden to prevent unlawful use. Original files are backed up in folders and subfolders.

    Tools and documentation of cleaning process The tools used for data verification and data cleaning are Microsoft Excel and R programming. The original files made accessible by Motivate International Inc. are backed up in their original format and in separate files.

    Microsoft Excel is used to generally look through the dataset and get a overview of the content. I performed simple checks of the data by filtering, sorting, formatting and standardizing the data to make it easily mergeable.. In Excel, I also changed data type to have the right format, removed unnecessary data if its incomplete or incorrect, created new columns to subtract and reformat existing columns and deleting empty cells. These tasks are easily done in spreadsheets and provides an initial cleaning process of the data.

    R will be used to perform queries of bigger datasets such as this one. R will also be used to create visualizations to answer the question at hand.

    Limitations Microsoft Excel has a limitation of 1,048,576 rows while the data of the 12 months combined are over 5,500,000 rows. When combining the 12 months of data into one table/sheet, Excel is no longer efficient and I switched over to R programming.

  6. Data from: DATASET FOR: A multimodal spectroscopic approach combining...

    • zenodo.org
    • producciocientifica.uv.es
    bin, csv, zip
    Updated Aug 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Perez Guaita; David Perez Guaita (2024). DATASET FOR: A multimodal spectroscopic approach combining mid-infrared and near-infrared for discriminating Gram-positive and Gram-negative bacteria [Dataset]. http://doi.org/10.5281/zenodo.10523185
    Explore at:
    bin, zip, csvAvailable download formats
    Dataset updated
    Aug 2, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    David Perez Guaita; David Perez Guaita
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description:

    This dataset comprises a comprehensive set of files designed for the analysis and 2D correlation of spectral data, specifically focusing on ATR and NIR spectra. It includes MATLAB scripts and supporting functions necessary to replicate the analysis, as well as the raw datasets used in the study. Below is a detailed description of the included files:

    1. Data Analysis:

      • File Name: Data_Analysis.mlx
      • Description: This MATLAB Live Script file contains the main script used for the classification analysis of the spectral data. It includes steps for preprocessing, analysis, and visualization of the ATR and NIR spectra.
    2. 2D Correlation Data Analysis:

      • File Name: Data_Analysis_2Dcorr.mlx
      • Description: This MATLAB Live Script file is similar to the primary analysis script but is specifically tailored for performing 2D correlation analysis on the spectral data. It includes detailed steps and code for executing the 2D correlation.
    3. Functions:

      • Folder Name: Functions
      • Description: This folder contains all the necessary MATLAB function files required to replicate the analyses presented in the scripts. These functions handle various preprocessing steps, calculations, and visualizations.
    4. Datasets:

      • File Names: ATR_dataset.xlsx, NIR_dataset.xlsx, Reference_data.csv
      • Description: These Excel files contain the raw spectral data for ATR and NIR analyses, as well as reference datasets. Each file includes multiple sheets with detailed measurements and metadata.

    Usage Notes:

    • Software Requirements:
      • MATLAB is required to run the .mlx files and utilize the functions.
      • PLS_Toolbox: Necessary for certain preprocessing and analysis steps.
      • MIDAS 2010: Available at MIDAS 2010, required for the 2D correlation analysis.
    • Replication: Users can replicate the analyses by running the Data_Analysis.mlx and Data_Analysis_2Dcorr.mlx scripts in MATLAB, ensuring that the Functions folder is in the MATLAB path.
    • Data Handling: The datasets are provided in .xlsx format, which can be easily imported into MATLAB or other data analysis software.
  7. H

    2000 Population Census Data Assembly, FUJIAN Province; 福建省2000人口普查统计资料汇编

    • dataverse.harvard.edu
    Updated Oct 30, 2008
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harvard Dataverse (2008). 2000 Population Census Data Assembly, FUJIAN Province; 福建省2000人口普查统计资料汇编 [Dataset]. http://doi.org/10.7910/DVN/2CQ8H1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 30, 2008
    Dataset provided by
    Harvard Dataverse
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Fujian
    Description

    The data files in this study have been created by merging individual per-county census files available as Microsoft Excel spreadsheets. Thus the study contains 111 data files, 1 file per census category. Each of these files contains merged census data for the following counties: GULOU 鼓楼区 (350102), TAIJIANG 台江区 (350103), CANGSHAN 仓山区 (350104), MAWEI 马尾区 (350105), JINAN 晋安区 (350111), MINHOU 闽侯县 (350121), LIANJIANG 连江县 (350122), LUOYUAN 罗源县 (350123), MINQING 闽清县 (350124), YONGTAI 永泰县 (350125), PINGTAN 平潭县 (350128), FUQING 福清市 (350181), CHANGLE 长乐市 (350182), GULANGYU 鼓浪屿区 (350202), SIMING 思明区 (350203), KAIYUAN 开元区 (350204), XINGLIN 杏林区 (350205), HULI 湖里区 (350206), JIMEI 集美区 (350211), TONGAN 同安区 (350212), CHENGXIANG 城厢区 (350302), HANJIANG 涵江区 (350303), PUTIAN 莆田县 (350321), XIANYOU 仙游县 (350322), MEILIE 梅列区 (350402), SANYUAN 三元区 (350403), MINGXI 明溪县 (350421), QINGLIU 清流县 (350423), NINGHUA 宁化县 (350424), DATIAN 大田县 (350425), YOUXI 尤溪县 (350426), SHAXIAN 沙县 (350427), JIANGLE 将乐县 (350428), TAINING 泰宁县 (350429), JIANNING 建宁县 (350430), YONGAN 永安市 (350481), LICHENG 鲤城区 (350502), FENGZE 丰泽区 (350503), LUOJIANG 洛江区 (350504), QUANGANG 泉港区 (350505), HUIAN 惠安县 (350521), ANXI 安溪县 (350524), YONGCHUN 永春县 (350525), DEHUA 德化县 (350526), SHISHI 石狮市 (350581), JINJIANG 晋江市 (350582), NANAN 南安市 (350583), XIANGCHENG 芗城区 (350602), LONGWEN 龙文区 (350603), YUNXIAO 云霄县 (350622), ZHANGPU 漳浦县 (350623), ZHAOAN 诏安县 (350624), CHANGTAI 长泰县 (350625), DONGSHAN 东山县 (350626), NANJING 南靖县 (350627), PINGHE 平和县 (350628), HUAAN 华安县 (350629), LONGHAI 龙海市 (350681), YANPING 延平区 (350702), SHUNCHANG 顺昌县 (350721), PUCHENG 浦城县 (350722), GUANGZE 光泽县 (350723), SONGXI 松溪县 (350724), ZHENGHE 政和县 (350725), SHAOWU 邵武市 (350781), WUYISHAN 武夷山市 (350782), JIANOU 建瓯市 (350783), JIANYANG 建阳市 (350784), XINLUO 新罗区 (350802), CHANGTING 长汀县 (350821), YONGDING 永定县 (350822), SHANGHANG 上杭县 (350823), WUPING 武平县 (350824), LIANCHENG 连城县 (350825), ZHANGPING 漳平市 (350881), JIAOCHENG 蕉城区 (350902), XIAPU 霞浦县 (350921), GUTIAN 古田县 (350922), PINGNAN 屏南县 (350923), SHOUNING 寿宁县 (350924), ZHOUNING 周宁县 (350925), ZHERONG 柘荣县 (350926), FUAN 福安市 (350981), FUDING 福鼎市 (350982).

  8. C

    Image stitching data set

    • ckan.mobidatalab.eu
    txt, xlsx, zip
    Updated Jun 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bundesanstalt für Wasserbau (2023). Image stitching data set [Dataset]. https://ckan.mobidatalab.eu/sq/dataset/imagestitchingrecord
    Explore at:
    zip, xlsx, txtAvailable download formats
    Dataset updated
    Jun 14, 2023
    Dataset provided by
    Bundesanstalt für Wasserbau
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the data set for the essay "Automatic merging of separated construction plans of hydraulic structures" submitted for Bautechnik 5/22. The data set is structured as follows: - The ZIP file "01 Original Data" contains 233 folders (named after the TU IDs) with the associated partial recordings in TIF format. The TIFs are binary compressed in CCITT Fax 4 format. 219 TUs are divided into two parts and 14 into three parts. The original data therefore consists of 480 partial recordings. - The ZIP file "02 Interim Results" contains 233 folders (named after the TU IDs) with relevant intermediate results generated during stitching. This includes the input images scaled to 10 MP, the visualization of the feature assignment(s) and the result in downscaled resolution with visualized seam lines. - The ZIP file "03_Results" contains the 170 successfully merged plans in high resolution in TIF format - The Excel file "Dataset" contains metadata on the 233 examined TUs including the DOT graph of the assignment described in the work and the correctness rating the results and the assignment to the presented sources of error. The data set was generated with the following metadata query in the IT system Digital Management of Technical Documents (DVtU): Microfilm metadata - TA (partial recording) - Number: "> 1" Document metadata - Object part: "130 (Wehrwangen, Wehrpillars)" - Object ID no .: "213 (Weir systems)" - Detail: "*[Bb]wehrung*" - Version: "01.00.00"

  9. Israel Census

    • kaggle.com
    zip
    Updated Jul 31, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dan Ofer (2018). Israel Census [Dataset]. https://www.kaggle.com/danofer/israel-census
    Explore at:
    zip(4275033 bytes)Available download formats
    Dataset updated
    Jul 31, 2018
    Authors
    Dan Ofer
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Israel
    Description

    Context

    2008 Population & demographic census data for Israel, at the level of settlements and lower .

    Content

    Data provided at the sub-settlement level (i.e neighborhoods). Variable names (in Hebrew and English) and data dictionary provided in XLS files. 2008 statistical area names provided (along with top roads/neighborhoods per settlement). Excel data needs cleaning/merging from multiple sub-pages.

    Ideas:

    • Combine with voting datasets
    • Correlate population or economic growth over time with demographics
    • Geospatial analysis
    • Merge and clean the data from the sub tables.

    Acknowledgements

    Data from Israel Central Bureau of Statistics (CBS): http://www.cbs.gov.il/census/census/pnimi_page.html?id_topic=12

    Photo by Me (Dan Ofer).

  10. [Superseded] Intellectual Property Government Open Data 2019

    • researchdata.edu.au
    • data.gov.au
    Updated Jun 6, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    IP Australia (2019). [Superseded] Intellectual Property Government Open Data 2019 [Dataset]. https://researchdata.edu.au/superseded-intellectual-property-data-2019/2994670
    Explore at:
    Dataset updated
    Jun 6, 2019
    Dataset provided by
    Data.govhttps://data.gov/
    Authors
    IP Australia
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    What is IPGOD?\r

    The Intellectual Property Government Open Data (IPGOD) includes over 100 years of registry data on all intellectual property (IP) rights administered by IP Australia. It also has derived information about the applicants who filed these IP rights, to allow for research and analysis at the regional, business and individual level. This is the 2019 release of IPGOD.\r \r \r

    How do I use IPGOD?\r

    IPGOD is large, with millions of data points across up to 40 tables, making them too large to open with Microsoft Excel. Furthermore, analysis often requires information from separate tables which would need specialised software for merging. We recommend that advanced users interact with the IPGOD data using the right tools with enough memory and compute power. This includes a wide range of programming and statistical software such as Tableau, Power BI, Stata, SAS, R, Python, and Scalar.\r \r \r

    IP Data Platform\r

    IP Australia is also providing free trials to a cloud-based analytics platform with the capabilities to enable working with large intellectual property datasets, such as the IPGOD, through the web browser, without any installation of software. IP Data Platform\r \r

    References\r

    \r The following pages can help you gain the understanding of the intellectual property administration and processes in Australia to help your analysis on the dataset.\r \r * Patents\r * Trade Marks\r * Designs\r * Plant Breeder’s Rights\r \r \r

    Updates\r

    \r

    Tables and columns\r

    \r Due to the changes in our systems, some tables have been affected.\r \r * We have added IPGOD 225 and IPGOD 325 to the dataset!\r * The IPGOD 206 table is not available this year.\r * Many tables have been re-built, and as a result may have different columns or different possible values. Please check the data dictionary for each table before use.\r \r

    Data quality improvements\r

    \r Data quality has been improved across all tables.\r \r * Null values are simply empty rather than '31/12/9999'.\r * All date columns are now in ISO format 'yyyy-mm-dd'.\r * All indicator columns have been converted to Boolean data type (True/False) rather than Yes/No, Y/N, or 1/0.\r * All tables are encoded in UTF-8.\r * All tables use the backslash \ as the escape character.\r * The applicant name cleaning and matching algorithms have been updated. We believe that this year's method improves the accuracy of the matches. Please note that the "ipa_id" generated in IPGOD 2019 will not match with those in previous releases of IPGOD.

  11. Community Database

    • fisheries.noaa.gov
    • s.cnmilf.com
    • +3more
    Updated Aug 9, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Barbara P Rountree (2022). Community Database [Dataset]. https://www.fisheries.noaa.gov/inport/item/26854
    Explore at:
    Dataset updated
    Aug 9, 2022
    Dataset provided by
    Northeast Fisheries Science Center
    Authors
    Barbara P Rountree
    Time period covered
    1997 - Dec 2, 2125
    Area covered
    Northeast
    Description

    This excel spreadsheet is the result of merging at the port level of several of the in-house fisheries databases in combination with other demographic databases such as the U.S. census. The fisheries databases used include port listings, weighout (dealer) landings, permit information on homeports and owner cities of residence, dealer permit information, and logbook records. The database consoli...

  12. g

    IP Australia - [Superseded] Intellectual Property Government Open Data 2019...

    • gimi9.com
    Updated Jul 20, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). IP Australia - [Superseded] Intellectual Property Government Open Data 2019 | gimi9.com [Dataset]. https://gimi9.com/dataset/au_intellectual-property-government-open-data-2019
    Explore at:
    Dataset updated
    Jul 20, 2018
    Area covered
    Australia
    Description

    What is IPGOD? The Intellectual Property Government Open Data (IPGOD) includes over 100 years of registry data on all intellectual property (IP) rights administered by IP Australia. It also has derived information about the applicants who filed these IP rights, to allow for research and analysis at the regional, business and individual level. This is the 2019 release of IPGOD. # How do I use IPGOD? IPGOD is large, with millions of data points across up to 40 tables, making them too large to open with Microsoft Excel. Furthermore, analysis often requires information from separate tables which would need specialised software for merging. We recommend that advanced users interact with the IPGOD data using the right tools with enough memory and compute power. This includes a wide range of programming and statistical software such as Tableau, Power BI, Stata, SAS, R, Python, and Scalar. # IP Data Platform IP Australia is also providing free trials to a cloud-based analytics platform with the capabilities to enable working with large intellectual property datasets, such as the IPGOD, through the web browser, without any installation of software. IP Data Platform # References The following pages can help you gain the understanding of the intellectual property administration and processes in Australia to help your analysis on the dataset. * Patents * Trade Marks * Designs * Plant Breeder’s Rights # Updates ### Tables and columns Due to the changes in our systems, some tables have been affected. * We have added IPGOD 225 and IPGOD 325 to the dataset! * The IPGOD 206 table is not available this year. * Many tables have been re-built, and as a result may have different columns or different possible values. Please check the data dictionary for each table before use. ### Data quality improvements Data quality has been improved across all tables. * Null values are simply empty rather than '31/12/9999'. * All date columns are now in ISO format 'yyyy-mm-dd'. * All indicator columns have been converted to Boolean data type (True/False) rather than Yes/No, Y/N, or 1/0. * All tables are encoded in UTF-8. * All tables use the backslash \ as the escape character. * The applicant name cleaning and matching algorithms have been updated. We believe that this year's method improves the accuracy of the matches. Please note that the "ipa_id" generated in IPGOD 2019 will not match with those in previous releases of IPGOD.

  13. f

    Excel spreadsheet containing the underlying numerical data for Figs 1C, 2C,...

    • figshare.com
    xlsx
    Updated Jun 6, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bihuan Chen; Xiaonan Liu; Yina Wang; Jie Bai; Xiangyu Liu; Guisheng Xiang; Wei Liu; Xiaoxi Zhu; Jian Cheng; Lina Lu; Guanghui Zhang; Ge Zhang; Zongjie Dai; Shuhui Zi; Shengchao Yang; Huifeng Jiang (2023). Excel spreadsheet containing the underlying numerical data for Figs 1C, 2C, 2D, 4B, 4C, 5A, 5B, S11, S12 and S14. [Dataset]. http://doi.org/10.1371/journal.pbio.3002131.s021
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 6, 2023
    Dataset provided by
    PLOS Biology
    Authors
    Bihuan Chen; Xiaonan Liu; Yina Wang; Jie Bai; Xiangyu Liu; Guisheng Xiang; Wei Liu; Xiaoxi Zhu; Jian Cheng; Lina Lu; Guanghui Zhang; Ge Zhang; Zongjie Dai; Shuhui Zi; Shengchao Yang; Huifeng Jiang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Excel spreadsheet containing the underlying numerical data for Figs 1C, 2C, 2D, 4B, 4C, 5A, 5B, S11, S12 and S14.

  14. The excel sheet contains all data that were used to generate Figs 2F, 3C–3J,...

    • plos.figshare.com
    xlsx
    Updated Nov 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amrita Das Gupta; Livia Asan; Jennifer John; Carlo Beretta; Thomas Kuner; Johannes Knabbe (2023). The excel sheet contains all data that were used to generate Figs 2F, 3C–3J, and S1I, S2C–S2F, S3A–S3F, S4D, S4F and S5. [Dataset]. http://doi.org/10.1371/journal.pbio.3002357.s013
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Nov 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Amrita Das Gupta; Livia Asan; Jennifer John; Carlo Beretta; Thomas Kuner; Johannes Knabbe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data can be found on sheets within the excel document named after the corresponding figure and panel. (XLSX)

  15. Data from: Electricity Load Forecasting

    • kaggle.com
    zip
    Updated Apr 30, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saurabh Shahane (2021). Electricity Load Forecasting [Dataset]. https://www.kaggle.com/saurabhshahane/electricity-load-forecasting
    Explore at:
    zip(47792145 bytes)Available download formats
    Dataset updated
    Apr 30, 2021
    Authors
    Saurabh Shahane
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Context

    This is a useful dataset to train and test Machine Learning forecasting algorithms and compare results with the official forecast from weekly pre-dispatch reports. The following considerations should be kept to compare forecasting results with the weekly pre-dispatch forecast: 1. Saturday is the first day of each weekly forecast; for instance, Friday is the last day. 2. A 72 hours gap of unseen records should be considered before the first day to forecast. In other words, next week forecast should be done with records until each Tuesday last hour.

    Data sources provide hourly records. The data composition is the following: 1. Historical electricity load, available on daily post-dispatch reports, from the grid operator (CND). 2. Historical weekly forecasts available on weekly pre-dispatch reports, both from CND. 3. Calendar information related to school periods, from Panama's Ministery of Education. 4. Calendar information related to holidays, from "When on Earth?" website. 5. Weather variables, such as temperature, relative humidity, precipitation, and wind speed, for three main cities in Panama, from Earthdata.

    Content

    The original data sources provide the post-dispatch electricity load in individual Excel files on a daily basis and weekly pre-dispatch electricity load forecast data in individual Excel files on a weekly basis, both with hourly granularity. Holidays and school periods data is sparse, along with websites and PDF files. Weather data is available on daily NetCDF files.

    For simplicity, the published datasets are already pre-processed by merging all data sources on the date-time index: 1. A CSV file containing all records in a single continuous dataset with all variables. 2. A CSV file containing the load forecast from weekly pre-dispatch reports. 3. Two Excel files containing suggested regressors and 14 training/testing datasets pairs as described in the PDF file.

    Acknowledgements

    Aguilar Madrid, Ernesto (2021), “Short-term electricity load forecasting (Panama case study)”, Mendeley Data, V1, doi: 10.17632/byx7sztj59.1

  16. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Aashirvad pandey (2024). Merge number of excel file,convert into csv file [Dataset]. https://www.kaggle.com/datasets/aashirvadpandey/merge-number-of-excel-fileconvert-into-csv-file
Organization logo

Merge number of excel file,convert into csv file

merging the file and converting the file

Explore at:
zip(6731 bytes)Available download formats
Dataset updated
Mar 30, 2024
Authors
Aashirvad pandey
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Project Description:

Title: Pandas Data Manipulation and File Conversion

Overview: This project aims to demonstrate the basic functionalities of Pandas, a powerful data manipulation library in Python. In this project, we will create a DataFrame, perform some data manipulation operations using Pandas, and then convert the DataFrame into both Excel and CSV formats.

Key Objectives:

  1. DataFrame Creation: Utilize Pandas to create a DataFrame with sample data.
  2. Data Manipulation: Perform basic data manipulation tasks such as adding columns, filtering data, and performing calculations.
  3. File Conversion: Convert the DataFrame into Excel (.xlsx) and CSV (.csv) file formats.

Tools and Libraries Used:

  • Python
  • Pandas

Project Implementation:

  1. DataFrame Creation:

    • Import the Pandas library.
    • Create a DataFrame using either a dictionary, a list of dictionaries, or by reading data from an external source like a CSV file.
    • Populate the DataFrame with sample data representing various data types (e.g., integer, float, string, datetime).
  2. Data Manipulation:

    • Add new columns to the DataFrame representing derived data or computations based on existing columns.
    • Filter the DataFrame to include only specific rows based on certain conditions.
    • Perform basic calculations or transformations on the data, such as aggregation functions or arithmetic operations.
  3. File Conversion:

    • Utilize Pandas to convert the DataFrame into an Excel (.xlsx) file using the to_excel() function.
    • Convert the DataFrame into a CSV (.csv) file using the to_csv() function.
    • Save the generated files to the local file system for further analysis or sharing.

Expected Outcome:

Upon completion of this project, you will have gained a fundamental understanding of how to work with Pandas DataFrames, perform basic data manipulation tasks, and convert DataFrames into different file formats. This knowledge will be valuable for data analysis, preprocessing, and data export tasks in various data science and analytics projects.

Conclusion:

The Pandas library offers powerful tools for data manipulation and file conversion in Python. By completing this project, you will have acquired essential skills that are widely applicable in the field of data science and analytics. You can further extend this project by exploring more advanced Pandas functionalities or integrating it into larger data processing pipelines.in this data we add number of data and make that data a data frame.and save in single excel file as different sheet name and then convert that excel file in csv file .

Search
Clear search
Close search
Google apps
Main menu