100+ datasets found
  1. Merge number of excel file,convert into csv file

    • kaggle.com
    zip
    Updated Mar 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aashirvad pandey (2024). Merge number of excel file,convert into csv file [Dataset]. https://www.kaggle.com/datasets/aashirvadpandey/merge-number-of-excel-fileconvert-into-csv-file/data
    Explore at:
    zip(6731 bytes)Available download formats
    Dataset updated
    Mar 30, 2024
    Authors
    Aashirvad pandey
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Project Description:

    Title: Pandas Data Manipulation and File Conversion

    Overview: This project aims to demonstrate the basic functionalities of Pandas, a powerful data manipulation library in Python. In this project, we will create a DataFrame, perform some data manipulation operations using Pandas, and then convert the DataFrame into both Excel and CSV formats.

    Key Objectives:

    1. DataFrame Creation: Utilize Pandas to create a DataFrame with sample data.
    2. Data Manipulation: Perform basic data manipulation tasks such as adding columns, filtering data, and performing calculations.
    3. File Conversion: Convert the DataFrame into Excel (.xlsx) and CSV (.csv) file formats.

    Tools and Libraries Used:

    • Python
    • Pandas

    Project Implementation:

    1. DataFrame Creation:

      • Import the Pandas library.
      • Create a DataFrame using either a dictionary, a list of dictionaries, or by reading data from an external source like a CSV file.
      • Populate the DataFrame with sample data representing various data types (e.g., integer, float, string, datetime).
    2. Data Manipulation:

      • Add new columns to the DataFrame representing derived data or computations based on existing columns.
      • Filter the DataFrame to include only specific rows based on certain conditions.
      • Perform basic calculations or transformations on the data, such as aggregation functions or arithmetic operations.
    3. File Conversion:

      • Utilize Pandas to convert the DataFrame into an Excel (.xlsx) file using the to_excel() function.
      • Convert the DataFrame into a CSV (.csv) file using the to_csv() function.
      • Save the generated files to the local file system for further analysis or sharing.

    Expected Outcome:

    Upon completion of this project, you will have gained a fundamental understanding of how to work with Pandas DataFrames, perform basic data manipulation tasks, and convert DataFrames into different file formats. This knowledge will be valuable for data analysis, preprocessing, and data export tasks in various data science and analytics projects.

    Conclusion:

    The Pandas library offers powerful tools for data manipulation and file conversion in Python. By completing this project, you will have acquired essential skills that are widely applicable in the field of data science and analytics. You can further extend this project by exploring more advanced Pandas functionalities or integrating it into larger data processing pipelines.in this data we add number of data and make that data a data frame.and save in single excel file as different sheet name and then convert that excel file in csv file .

  2. TF 2.0 QA - Simplified - DataFrame

    • kaggle.com
    zip
    Updated Nov 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pavel Kovalets (2019). TF 2.0 QA - Simplified - DataFrame [Dataset]. https://www.kaggle.com/feanorpk/tf-20-qa-simplified-dataframe
    Explore at:
    zip(262240620 bytes)Available download formats
    Dataset updated
    Nov 12, 2019
    Authors
    Pavel Kovalets
    Description

    Content

    This dataset was created from the TensorFlow 2.0 Question Answering primary dataset using this very handy utility script. The main differences from the original one are: - the structure is flattened to a simple DataFrame - long_answer_candidates were removed - only first annotations kept for both long and short answer (for short answer it is a reasonable approximation because there are very few samples with multiple short answers)

    Acknowledgements

    Thanks xhlulu for providing the utility script.

  3. AI4Code Train Dataframe

    • kaggle.com
    zip
    Updated May 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Darien Schettler (2022). AI4Code Train Dataframe [Dataset]. https://www.kaggle.com/datasets/dschettler8845/ai4code-train-dataframe
    Explore at:
    zip(622120487 bytes)Available download formats
    Dataset updated
    May 12, 2022
    Authors
    Darien Schettler
    Description

    [EDIT/UPDATE]

    There are a few important updates.

    1. When SAVING the pd.Dataframe as a .csv, the following command should be used to avoid improper interpretation of newline character(s).
    train_df.to_csv(
      "train.csv", index=False, 
      encoding='utf-8', 
      quoting=csv.QUOTE_NONNUMERIC  # <== THIS IS REQUIRED
    )
    
    1. When LOADING the .csv as a pd.Dataframe, the following command must be used to avoid misinterpretation of NaN like strings (null, nan, ...) as pd.NaN values.
    train_df = pd.read_csv(
      "/kaggle/input/ai4code-train-dataframe/train.csv", 
      keep_default_na=False  # <== THIS IS REQUIRED
    )
    
  4. Z

    Data from: HiPR-FISH Spatial Mapping of Cheese Rind Microbial Communities

    • data.niaid.nih.gov
    Updated Apr 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Weiss, Emily CP; Dutton, Rachel J (2023). HiPR-FISH Spatial Mapping of Cheese Rind Microbial Communities [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7613702
    Explore at:
    Dataset updated
    Apr 13, 2023
    Dataset provided by
    Arcadia Science
    Authors
    Weiss, Emily CP; Dutton, Rachel J
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is associated with this HiPR-FISH Spatial Mapping of Cheese Rind Microbial Communities pub from Arcadia Science.

    HiPR-FISH spatial imaging was used to look at the distribution of microbes within five distinct microbial communities growing on the surface of aged cheeses. Probe design and imaging was performed by Kanvas Biosciences.

    This dataset includes the following:

    For each field of view (roughly 135µm x 135µm; 7 FOVs per each cheese specimen):

    A fluorescence intensity image (*_spectral_max_projection.png/.tif).

    A pseudo-colored microbe-labeled image (*_identification.png/.tif).

    A data frame contains each identified microbe's identity, position, and size (*_cell_information.csv).

    A segmented mask for microbiota (*_segmentation.png/.tif)

    A spatial proximity graph for each species close to each other, showing the spatial enrichment over random distribution (*_spatialheatmap.png).

    A corresponding data frame used to generate the spatial proximity graph (_absolute_spatial_association.csv) and dataframe for the average of 500 random shuffles of the taxa (_randomized_spatial_association_matrix.csv).

    For each cheese specimen:

    A widefield image with FOVs located on the image (*_WF_overlay.png).

    In general:

    A png showing the color legend for each species. (ARC1_taxa_color_legend.png)

    A data frame showing the environmental location of each FOV in the cheese (RIND/CURD) and the location of each FOV relative to FOV 1. (ARC1_Cheese_Map.csv).

    A vignette showing an example of each cell and its false coloring according to its taxonomic identification (ARC1_detected_species_representative_cell_vignette.png).

    Sequences used as input in probe design (16S_18S_forKanvas.fasta).

    A CSV file containing the sequences that belong to each ASV (ARC1_sequences_to_ASVs.csv).

    Plots of log-transformed counts for each microbe detected across all FOVs, and broken down for each cheese (*detected_species_absolute_abundance.png).

    CSVs containing pairwise correlation of FOVs based on spatial association (ARC1_spatial_association_FOV_correlation.csv) and microbial abundance (ARC1_abundance_FOV_correlation.csv).

    Plots of spatial association matrices, aggregated for different cheeses and different locations (RIND vs CURD) (*samples_*loc_relative_spatial_association.png).

    CSV containing the principle component coordinates for each FOV (ARC1_abundance_FOV_PCA.csv, ARC1_spatial_association_FOV_PCA.csv).

    CSV containing the mean fold-change in number of edges between each ASV and the corresponding p-value when compared to the null state (random spatial association matrices) (ARC1_spatial_enrichment_significance.csv).

  5. Z

    Longitudinal corpus of privacy policies

    • data.niaid.nih.gov
    • resodate.org
    Updated Dec 12, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wagner, Isabel (2022). Longitudinal corpus of privacy policies [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5841138
    Explore at:
    Dataset updated
    Dec 12, 2022
    Dataset provided by
    University of Basel
    Authors
    Wagner, Isabel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a corpus of 56,416 unique privacy policy texts spanning the years 1996-2021.

    policy-texts.zip contains a directory of text files with the policy texts. File names are the hashes of the policy text.

    policy-metadata.zip contains two CSV files (can be imported into a pandas dataframe) with policy metadata including readability measures for each policy text.

    labeled-policies.zip contains CSV files with content labels for each policy. Labeling was done using a BERT classifier.

    Details on the methodology can be found in the accompanying paper.

  6. Dataset for: Infectious disease responses to human climate change...

    • zenodo.org
    csv
    Updated Aug 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Georgia Titcomb; Georgia Titcomb; Johnny Uelmen; Johnny Uelmen; Mark Janko; Mark Janko; Charles Nunn; Charles Nunn (2024). Dataset for: Infectious disease responses to human climate change adaptations [Dataset]. http://doi.org/10.5281/zenodo.13314361
    Explore at:
    csvAvailable download formats
    Dataset updated
    Aug 16, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Georgia Titcomb; Georgia Titcomb; Johnny Uelmen; Johnny Uelmen; Mark Janko; Mark Janko; Charles Nunn; Charles Nunn
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Measurement technique
    <div> <p>This dataset includes original data sources and data that have been extracted from other sources that are referenced in the manuscript entitled "Infectious disease responses to human climate change adaptations". </p> <p>Original data:</p> <p><strong>Table_1_source_papers</strong></p> <p>We conducted a Web of Science search following PRISMA guidelines (SI I). Search terms included each topic, followed by “AND (infectious disease* OR zoono* OR pathogen* OR parasit*) AND (human OR people).” Papers were assessed for any positive, negative, or neutral link between each topic (dam construction, crop shifts, rainwater harvesting, mining, migration, carbon sequestration, and public transit) and human infectious diseases. Searches on poultry and transit returned >5,000 papers, so searches were restricted to review topics only. We further restricted the 3479 results for livestock shifts to those with ‘shift’ in the abstract. Following screening of 3485 papers (6964 including all livestock), 108 papers met initial review criteria of being relevant to each adaptation or mitigation and discussing a human infectious disease; of which only 14 were quantitative studies with a control or reference group.</p> <p>Extracted data:</p> <ul> <li><strong>change_livestock_country</strong> <ul> <li>Data were extracted from Ogutu 2016 supplementary materials and include percent change calculations for different livestock in different Kenyan counties.</li> <li>Original data source citation: <p>Ogutu, J. O., Piepho, H.-P., Said, M. Y., Ojwang, G. O., Njino, L. W., Kifugo, S. C., & Wargute, P. W. (2016). Extreme wildlife declines and concurrent increase in livestock numbers in Kenya: What are the causes? <em>PloS ONE</em>, <em>11</em>(9), e0163249. https://doi.org/10.1371/journal.pone.0163249</p> </li> </ul> </li> <li><strong>country_avg_schist_wormy_world</strong> <ul> <li>Schistosomiasis survey data were obtained from the Global Atlas of Helminth Infection and were generated by downloading map data in csv format. Prevalence values were calculated by taking the mean maximum prevalence.</li> <li>Original data source citation: <p>London Applied & Spatial Epidemiology Research Group (LASER). (2023). <em>Global Atlas of Helminth Infections: STH and Schistosomiasis</em> [dataset]. London School of Hygiene and Tropical Medicine. https://lshtm.maps.arcgis.com/apps/webappviewer/index.html?id=2e1bc70731114537a8504e3260b6fbc0</p> </li> </ul> </li> <li><strong>kenya_precip_change_1951_2020</strong> <ul> <li>Data were extracted from the Climate Change Knowledge Portal and downloaded in csv format.</li> <li>Original data source citation: <p>World Bank Group. (2023). <em>Climate Data & Projections—Kenya</em>. Climate Change Knowledge Portal. https://climateknowledgeportal.worldbank.org/country/kenya/climate-data-projections</p> </li> </ul> </li> </ul> </div>
    Description

    Original and derived data products referenced in the original manuscript are provided in the data package.

    Description of the data and file structure

    Original data:

    Table_1_source_papers.csv: Papers that met review criteria and which are summarized in Table 1 of the manuscript.

    1. ID: The paper identification number
    2. Topic: The broad topic (i.e., each row of Table 1)
    3. Authors: The names of the authors of the paper
    4. Article Title: The title of the paper
    5. Source Title: The name of the journal in which the paper was published
    6. Abstract: The paper's abstract, retrieved from the Web of Science search
    7. study_type: Classification of the study methodology/approach. "A" = a designed study that shows effect ,"B" = a pre/post study, "C" = a comparison of health outcomes or pathogen risk relative to a 'control/comparison' area, "D" = some quantitative effect but no control, "E" = qualitative comments but little supporting evidence, and/or a qualitative review.
    8. pathogen_broad: Broad classification of the type of pathogen discussed in the paper.
    9. transmission_type: Categorization of indirect, direct, sexual, vector, or other transmission modes.
    10. pathogen_type: Categorization of bacteria, helminth, virus, protozoa, fungi, or other pathogen types.
    11. country: Country in which the study was performed or results discussed. When countries were not available, regions were used. NA values indicate papers in which a geographic region was not relevant to the study (i.e., a methods-based study).

    Derived data:

    change_livestock_country.csv: A dataframe containing values used to generate Figure 4a in the manuscript.

    1. County Name: The name of the county in Kenya
    2. Sheep and goats 1980: The estimated number of sheep and goats in 1980
    3. Sheep and goats 2016: The estimated number of sheep and goats in 2016
    4. pct_change_shoat: The percent change in sheep and goat numbers from 1980 to 2016
    5. Cattle 1980: The estimated number of cattle in 1980
    6. Cattle 2016: The estimated number of cattle in 2016
    7. pct_change_cattle: The percent change in cattle numbers from 1980 to 2016
    8. Camel 1980: The estimated number of camels in 1980
    9. Camel 2016: The estimated number of camels in 2016
    10. pct_change_camel: The percent change in camel numbers from 1980 to 2016
    11. human_pop 1980: The estimated human population in the county in 1980
    12. human_pop 2016: The estimated human population in the county in 1980
    13. pct_change_human: The percent change in the human population from 1980 to 2016
    14. area_sq_km: The land area of the county
    15. change_ind_per_sq_km_shoat: Absolute change in number of sheep and goats from 1980 to 2016
    16. change_ind_per_sq_km_cattle: Absolute change in number of cattle from 1980 to 2016
    17. change_ind_per_sq_km_camel: Absolute change in number of camels from 1980 to 2016

    country_avg_schist_wormy_world.csv: A dataframe containing values used to generate Figure 3 in the manuscript.

    • Country: The country in which the schistosome prevalence studies were performed.
    • Latitude: The latitute in decimal degrees
    • Longitude: The longitute in decimal degrees
    • Maximum.prevalence: The mean maximum schistosomiasis prevalence of studies conducted within each country.

    kenya_precip_change_1951_2020.csv: A dataframe containing values used to generate Figure 4b in the manuscript.

    • Precipitation (mm): Binned annual precipitation values
    • 1951-1980: The density of observations for each annual precipitation value for the 1951-1980 period
    • 1971-2000: The density of observations for each annual precipitation value for the 1971-2000 period
    • 1991-2020: The density of observations for each annual precipitation value for the 1991-2020 period

    Sharing/Access information

    Data were derived from the following sources:

  7. h

    descriptor_prediction

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuanhao Qu, descriptor_prediction [Dataset]. https://huggingface.co/datasets/yhqu/descriptor_prediction
    Explore at:
    Authors
    Yuanhao Qu
    Description

    Descriptor Prediction Dataset

    This dataset is part of the Deep Principle Bench collection.

      Files
    

    descriptor_prediction.csv: Main dataset file

      Usage
    

    import pandas as pd from datasets import load_dataset

    Load the dataset

    dataset = load_dataset("yhqu/descriptor_prediction")

    Or load directly as pandas DataFrame

    df = pd.read_csv("hf://datasets/yhqu/descriptor_prediction/descriptor_prediction.csv")

      Citation
    

    Please cite this work if you use… See the full description on the dataset page: https://huggingface.co/datasets/yhqu/descriptor_prediction.

  8. d

    National Water Model RouteLinks CSV

    • search.dataone.org
    • dataone.org
    Updated Apr 15, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jason A Regina; Austin Raney (2022). National Water Model RouteLinks CSV [Dataset]. http://doi.org/10.4211/hs.d154f19f762c4ee9b74be55f504325d3
    Explore at:
    Dataset updated
    Apr 15, 2022
    Dataset provided by
    Hydroshare
    Authors
    Jason A Regina; Austin Raney
    Time period covered
    Apr 12, 2019 - Oct 14, 2021
    Area covered
    Description

    This resource contains "RouteLink" files for version 2.1.6 of the National Water Model which are used to associate feature identifiers for computational reaches to relevant metadata. These data are important for comparing NWM feature data to USGS streamflow and lake observations. The original RouteLink files are in NetCDF format and available here: https://www.nco.ncep.noaa.gov/pmb/codes/nwprod

    This resource includes the files in a human-friendlier CSV format for easier use, and a machine-friendlier file in HDF5 format which contains a single pandas.DataFrame. The scripts and supporting utilities are also included for users that wish to rebuild these files. Source code is hosted here: https://github.com/jarq6c/NWM_RouteLinks

  9. Rats have higher confidence in newer memories [dataset]

    • zenodo.org
    csv
    Updated Jul 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hannah Joo; Hannah Joo (2021). Rats have higher confidence in newer memories [dataset] [Dataset]. http://doi.org/10.5281/zenodo.5123545
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jul 23, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Hannah Joo; Hannah Joo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data corresponding to the paper Rats have higher confidence in newer memories [bioRXiv]. In particular, there is one Pandas dataframe (encoded as a csv file) for each of the four rats S, T, R, D. Each dataframe has 92 columns. The first column encodes the epoch while the other columns correspond to features of the task described in detail in the paper.

  10. HPA - Processed Train Dataframe With Cell-Wise RLE

    • kaggle.com
    zip
    Updated Feb 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Darien Schettler (2021). HPA - Processed Train Dataframe With Cell-Wise RLE [Dataset]. https://www.kaggle.com/dschettler8845/hpa-processed-train-dataframe-with-cellwise-rle
    Explore at:
    zip(1111131078 bytes)Available download formats
    Dataset updated
    Feb 9, 2021
    Authors
    Darien Schettler
    Description

    Description

    This is a CSV file after some minor preprocessing (one-hot-expansion, etc.) that also includes all the RLEs and Bounding Boxes as a list for each respective ID.

    The individual RLEs in the list will correspond to a cell in the given image. 
The individual Bounding Boxes in the list will correspond to a cell in the given image.

    The RLE and Bounding Box are ordered to refer to the same respective cell.

  11. The Device Activity Report with Complete Knowledge (DARCK) for NILM

    • zenodo.org
    bin, xz
    Updated Dec 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Justus Breyer; Justus Breyer; Kai Gützlaff; Kai Gützlaff; Leonardo Pompe; Leonardo Pompe; Klaus Wehrle; Klaus Wehrle (2025). The Device Activity Report with Complete Knowledge (DARCK) for NILM [Dataset]. http://doi.org/10.5281/zenodo.17159850
    Explore at:
    bin, xzAvailable download formats
    Dataset updated
    Dec 16, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Justus Breyer; Justus Breyer; Kai Gützlaff; Kai Gützlaff; Leonardo Pompe; Leonardo Pompe; Klaus Wehrle; Klaus Wehrle
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    1. Abstract

    This dataset contains aggregated and sub-metered power consumption data from a two-person apartment in Germany. Data was collected from March 5 to September 4, 2025, spanning 6 months. It includes an aggregate reading from a main smart meter and individual readings from 40 smart plugs, smart relays, and smart power meters monitoring various appliances.

    2. Dataset Overview

    • Apartment: Two-person apartment, approx. 58m², located in Aachen, Germany.
    • Aggregate Meter: eBZ DD3
    • Sub-meters: 31 Shelly Plus Plug S, 6 Shelly Plus 1PM, 3 Shelly Plus PM Mini Gen3
    • Sampling Rate: 1 Hz
    • Measured Quantity: Active Power
    • Unit of Measurement: Watt
    • Duration: 6 months
    • Format: Single CSV file (`DARCK.csv`)
    • Structure: Timestamped rows with columns for the aggregate meter and each sub-metered appliance.
    • Completeness: The main power meter has a completeness of 99.3%. Missing values were linearly interpolated.

    3. Download and Usage

    The dataset can be downloaded here: https://doi.org/10.5281/zenodo.17159850

    As it contains longer off periods with zeros, the CSV file is nicely compressible.


    To extract it use: xz -d DARCK.csv.xz.
    The compression leads to a 97% smaller file size (From 4GB to 90.9MB).


    To use the dataset in python, you can, e.g., load the csv file into a pandas dataframe.

    python
    import pandas as pd

    df = pd.read_csv("DARCK.csv", parse_dates=["time"])

    4. Measurement Setup

    The main meter was monitored using an infrared reading head magnetically attached to the infrared interface of the meter. An ESP8266 flashed with Tasmota decodes the binary datagrams and forwards the Watt readings to the MQTT broker. Individual appliances were monitored using a combination of Shelly Plugs (for outlets), Shelly 1PM (for wired-in devices like ceiling lights), and Shelly PM Mini (for each of the three phases of the oven). All devices reported to a central InfluxDB database via Home Assistant running in docker on a Dell OptiPlex 3020M.

    5. File Format (DARCK.csv)

    The dataset is provided as a single comma-separated value (CSV) file.

    • The first row is a header containing the column names.
    • All power values are rounded to the first decimal place.
    • There are no missing values in the final dataset.
    • Each row represents 1 second, from start of measuring in March until the end in September.

    Column Descriptions

    Column Name

    Data Type

    Unit

    Description

    timedatetime-Timestamp for the reading in YYYY-MM-DD HH:MM:SS
    mainfloatWattTotal aggregate power consumption for the apartment, measured at the main electrical panel.
    [appliance_name]floatWattPower consumption of an individual appliance (e.g., lightbathroom, fridge, sherlockpc). See Section 8 for a full list.
    Aggregate Columns
    aggr_chargersfloatWattThe sum of sherlockcharger, sherlocklaptop, watsoncharger, watsonlaptop, watsonipadcharger, kitchencharger.
    aggr_stoveplatesfloatWattThe sum of stoveplatel1 and stoveplatel2.
    aggr_lightsfloatWattThe sum of lightbathroom, lighthallway, lightsherlock, lightkitchen, lightlivingroom, lightwatson, lightstoreroom, fcob, sherlockalarmclocklight, sherlockfloorlamphue, sherlockledstrip, livingfloorlamphue, sherlockglobe, watsonfloorlamp, watsondesklamp and watsonledmap.
    Analysis Columns
    inaccuracyfloatWattAs no electrical device bypasses a power meter, the true inaccuracy can be assessed. It is the absolute error between the sum of individual measurements and the mains reading. A 30W offset is applied to the sum since the measurement devices themselves draw power which is otherwise unaccounted for.

    6. Data Postprocessing Pipeline

    The final dataset was generated from two raw data sources (meter.csv and shellies.csv) using a comprehensive postprocessing pipeline.

    6.1. Main Meter (main) Postprocessing

    The aggregate power data required several cleaning steps to ensure accuracy.

    1. Outlier Removal: Readings below 10W or above 10,000W were removed (merely 3 occurrences).
    2. Timestamp Burst Correction: The source data contained bursts of delayed readings. A custom algorithm was used to identify these bursts (large time gap followed by rapid readings) and back-fill the timestamps to create an evenly spaced time series.
    3. Alignment & Interpolation: The smart meter pushes a new value via infrared every second. To align those to the whole seconds, it was resampled to a 1-second frequency by taking the mean of all readings within each second (in 99.5% only 1 value). Any resulting gaps (0.7% outage ratio) were filled using linear interpolation.

    6.2. Sub-metered Devices (shellies) Postprocessing

    The Shelly devices are not prone to the same burst issue as the ESP8266 is. They push a new reading at every change in power drawn. If no power change is observed or the one observed is too small (less than a few Watt), the reading is pushed once a minute, together with a heartbeat. When a device turns on or off, intermediate power values are published, which leads to sub-second values that need to be handled.

    1. Grouping: Data was grouped by the unique device identifier.
    2. Resampling & Filling: The data for each device was resampled to a 1-second frequency using .resample('1s').last().ffill().
      This method was chosen to firstly, capture the last known state of the device within each second, handling rapid on/off events. Secondly, to forward-fill the last state across periods of no new data, modeling that the device's consumption remained constant until a new reading was sent.

    6.3. Merging and Finalization

    1. Merge: The cleaned main meter and all sub-metered device dataframes were merged into a single dataframe on the time index.
    2. Final Fill: Any remaining NaN values (e.g., from before a device was installed) were filled with 0.0, assuming zero consumption.

    7. Manual Corrections and Known Data Issues

    During analysis, two significant unmetered load events were identified and manually corrected to improve the accuracy of the aggregate reading. The error column (inaccuracy) was recalculated after these corrections.

    1. March 10th - Unmetered Bulb: An unmetered 107W bulb was active. It was subtracted from the main reading as if it never happened.
    2. May 31st - Unmetered Air Pump: An unmetered 101W pump for an air mattress was used directly in an outlet with no intermediary plug and hence manually added to the respective plug.

    8. Appliance Details and Multipurpose Plugs

    The following table lists the column names with an explanation where needed. As Watson moved at the beginning of June, some metering plugs changed their appliance.

  12. h

    M1_EURUSD_candles

    • huggingface.co
    Updated Feb 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    dominic (2022). M1_EURUSD_candles [Dataset]. https://huggingface.co/datasets/Riot186/M1_EURUSD_candles
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 6, 2022
    Authors
    dominic
    License

    https://choosealicense.com/licenses/afl-3.0/https://choosealicense.com/licenses/afl-3.0/

    Description

    All chunks have more than 4000 rows of data in chronological order in a panda dataframe

      CSV files are the same data in chronological order, some may not be more than 4000 rows
    
  13. h

    gene_editing

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuanhao Qu, gene_editing [Dataset]. https://huggingface.co/datasets/yhqu/gene_editing
    Explore at:
    Authors
    Yuanhao Qu
    Description

    Gene Editing Dataset

    This dataset is part of the Deep Principle Bench collection.

      Files
    

    gene_editing.csv: Main dataset file

      Usage
    

    import pandas as pd from datasets import load_dataset

    Load the dataset

    dataset = load_dataset("yhqu/gene_editing")

    Or load directly as pandas DataFrame

    df = pd.read_csv("hf://datasets/yhqu/gene_editing/gene_editing.csv")

      Citation
    

    Please cite this work if you use this dataset in your research.

  14. Z

    Cyber-Physical System power Consumption

    • data.niaid.nih.gov
    • resodate.org
    • +2more
    Updated Nov 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Iuhasz, Gabriel; Fortis, Teodor-Florin (2024). Cyber-Physical System power Consumption [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_14215754
    Explore at:
    Dataset updated
    Nov 26, 2024
    Dataset provided by
    West University of Timişoara
    Authors
    Iuhasz, Gabriel; Fortis, Teodor-Florin
    License

    http://www.apache.org/licenses/LICENSE-2.0http://www.apache.org/licenses/LICENSE-2.0

    Description

    Files

    This dataset is comprised of 5 CSV files contained in the data.zip archive. Each one represents a production machine from which various sensor data has been collected. The average cadence for collection was 5 measurements per second. The monitored devices where used for hydroforming.

    The collection period covered the period from 2023-06-01 until 2023-08-05.

    Data

    These files represent a complete data dump from the data available in the time-series database, InfluxDB, used for collection. Because of this some columns have no semantic value for detecting production cycles or any other analytics.

    Each file contains a total of 14 columns. Some of the columns are artefacts of the query used to extract the data from InfluxDB and can be discarded. These columns are: results, table _start, _stop

    results - An artefact of the InfluxDB query, signifies postprocessing of results in this dataset. It is "mean".

    table - An artefact of the InfluxDB query, can be discarded.

    _start and _stop - Refers to ingestion related data, used in monitoring ingestion.

    _field - An artefact of the InfluxDB query, specifying what field to use for the query.

    _measurement - An artefact of the InfluxDB query, specifying what measurement to use for the query. Contains the same information as device_id.

    host - An artefact of the InfluxDB query, the unique name of the host used for the InfluxDB sink in Kubernetes.

    kafka_topic - Name of the Kafka topic used for collection.

    Pertinent columns are:

    _time - Denotes the time at which a particular event has been measured, it is used as index when creating a dataframe.

    _time.1 - Duplicate of _time for sanity check and ease of analysis when _time is set as index

    _value - Represents the value measured by each sensor type.

    device_id - Unique identifier of the manufacturing device, should be the same as the file name, i.e. B827EB8D8E0C.

    ingestion_time - Timestamp when the data has been collected and ingested by influxDB.

    sid - Unique sensor ID; the power measurements can be found at sid 1.

    Annotations

    There are two additional files which contain annotation data:

    scamp_devices.csv - Contains mapping information between the dataset device ID (defined in column "DeviceIDMonitoring") and the ground truth file ID (defined in column "DeviceID")

    scamp_report_3m.csv - Contains the ground truth, which can be used for validation of cycle detection and analysis methods. The columns are as follows:

    ReportID - Internal unique ID created during data collection. It can be discarded.

    JobID - Internal Scheduling Job unique ID.

    DeviceID - The unique ID of the devices used for manufacturing needs to be mapped using the scamp_device.csv data.

    StartTime - Start time of operations

    EndTime - End time of operations

    ProductID - Unique identifier of the product being manufactured.

    CycleTime - Average length of cycle in seconds, added manually by operators. It can be unreliable.

    QuantityProduced - Number of products manufactured during the timeframe given by StartTime and EndTime.

    QuantityScrap - Number of scraped/malformed products in the given timeframe. These are part of the QuantityProduced, not in addition to it.

    IntreruptionMinuted - Minutes of production halt.

    scamp_patterns.csv - Contains the start and end timestamp for selected example production cycles. These where chosen based on expert users.

    Jupyter Notebook

    We have provided a sample Jupyter notebook (verify_data.ipynb), which gives examples of how the dataset can be loaded and visualised as well as examples of how the sample patterns and ground truth can be addressed and visualised.

    Note

    The Jupyter Notebook contains an example of how the data can be loaded and visualised. Please note that both data should be filtered based on sid; the power measurements are collected by sid 1. See Notebook for example.

  15. Expression vs genomics for predicting dependencies

    • figshare.com
    hdf
    Updated May 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Broad DepMap (2024). Expression vs genomics for predicting dependencies [Dataset]. http://doi.org/10.6084/m9.figshare.25843450.v1
    Explore at:
    hdfAvailable download formats
    Dataset updated
    May 17, 2024
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Broad DepMap
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset supports the "Gene expression has more power for predicting in vitro cancer cell vulnerabilities than genomics" preprint by Dempster et al. To generate the figure panels seen in the preprint using these data, use FigurePanelGeneration.ipynb. This study includes five datasets (citations and details in manuscript).Achilles: the Broad Institute's DepMap public 19Q4 CRISPR knockout screens processed with CERESScore: The Sanger Wellcome Institute's Project Score CRISPR knockout screens processed with CERESRNAi: The DEMETER2-processed combined dataset which includes RNAi data from Achilles, DRIVE, and Marcotte breast screens.PRISM: The PRISM pooled in vitro repurposing primary screen of compoundsGDSC17: Cancer drug in vitro drug screens performed by SangerThe files of most interest to a biologist are Summary.csv. If you are interested in trying machine learning, the files Features.hdf5 and Target.hdf5 contain the data munged in a convenient form for standard supervised machine learning algorithms.Some large files are in the binary format hdf5 for efficiency in space and read-in. These files each contain three named hdf5 datasets. "dim_0" holds the row/index names as an array of strings, "dim_1" holds the column names as an array of strings, and "data" holds the matrix contents as a 2D array of floats. In python, these files can be read in with: import pandas as pd import h5py def read_hdf5(filename): src = h5py.File(filename, 'r') try: dim_0 = [x.decode('utf8') for x in src['dim_0']] dim_1 = [x.decode('utf8') for x in src['dim_1']] data = np.array(src['data']) return pd.DataFrame(index=dim_0, columns=dim_1, data=data) finally: src.close()##################################################################Files (not every dataset will have every type of file listed below):##################################################################AllFeaturePredictions.hdf5: Matrix of cell lines by perturbations, with values indicating the predicted viability using a model with all feature types.ENAdditionScore.csv: A matrix of perturbations by number of features. Values indicate an elastic net model performance (Pearson correlation of concatenated out-of-sample predictions with the values given in Target.hdf5) using only the top X features, where X is the column header.FeatureDropScore.csv: Perturbations and predictive performance for a model using all single gene expression features EXCEPT those that had greater than 0.1 feature importance in a model trained with all single gene expression features. Features.hdf5: A very large matrix of all cell lines by all used CCLE cell features. Continuous features were zscored. Cell lines missing mutation or expression data were dropped. Remaining NA values were imputed to zero. Features types are indicated by the column matrix suffixes: _Exp: expression _Hot: hotspot mutation _Dam: damaging mutation _OtherMut: other mutation _CN: copy number _GSEA: ssGSEA score for an MSigDB gene set _MethTSS: Methylation of transcription start sites _MethCpG: Methylation of CpG islands _Fusion: Gene fusions _Cell: cell tissue propertiesNormLRT.csv: the normLRT score for the given perturbationRFAdditionScore.csv: similar to ENAdditionScore, but using a random forest model.Summary.csv: A dataframe containing predictive model results. Columns: model: Specifies the collection of features used (Expression, Mutation, Exp+CN, etc) gene: The perturbation (column in Target.hdf5) examined. Actually a compound for the PRISM and GDSC17 datasets. overall_pearson: Pearson correlation of concatenated out-of-sample predictions with the values given in Target.hdf5 feature: the Nth most important feature, found by retraining the model with all cell lines (N = 0-9) feature_importance: the feature importance as assessed by sklearn's RandomForestRegressorTarget.hdf5: A matrix of cell lines by perturbations, with entries indicating post-perturbation viability scores. Note that the scales of the viability effects are different for different datasets. See manuscript methods for details.PerturbationInfo.csv: Additional drug annotations for the PRISM and GDSC17 datasetsApproximateCFE.hdf5: A set of Cancer Functional Event cell features based on CCLE data, adapted from Iorio et al. 2016 (10.1016/j.cell.2016.06.017)DepMapSampleInfo.csv: sample info from DepMap_public_19Q4 data, reproduced here as a convenience.GeneRelationships.csv: A list of genes and their related (partner) genes, with the type of relationship (self, protein-protein interaction, CORUM complex membership, paralog). OncoKB_oncogenes.csv: A list of genes that have non-expression-based alterations listed as likely oncogenic or oncogenic by OncoKB as of 9 May 2018.

  16. Z

    GENEActiv accelerometer file related to the #120 OxWearables / stepcount...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wattelez, Guillaume (2024). GENEActiv accelerometer file related to the #120 OxWearables / stepcount issue [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11557420
    Explore at:
    Dataset updated
    Nov 25, 2024
    Dataset provided by
    University of New Caledonia
    Authors
    Wattelez, Guillaume
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    An example of .bin file that have an IndexError when processing.

    Consider #120 OxWearables / stepcount issue for more details.

    The .csv files are 1-second epoch conversions from the .bin file and contain time, x, y, z columns. The conversion was done by:

    reading the .bin with the GENEAread R package.

    keeping only the time, x, y and z columns.

    saving the data.frame into a .csv file.

    The only difference between the .csv files is the column format used for the time column before saving:

    time column in XXXXXX_....csv had a string class

    time column in XXXXXT....csv had a "POSIXct" "POSIXt" class

  17. Z

    Python Time Normalized Superposed Epoch Analysis (SEAnorm) Example Data Set

    • data.niaid.nih.gov
    Updated Jul 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Walton, Sam D.; Murphy, Kyle R. (2022). Python Time Normalized Superposed Epoch Analysis (SEAnorm) Example Data Set [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6835136
    Explore at:
    Dataset updated
    Jul 15, 2022
    Dataset provided by
    Self
    Authors
    Walton, Sam D.; Murphy, Kyle R.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Solar Wind Omni and SAMPEX ( Solar Anomalous and Magnetospheric Particle Explorer) datasets used in examples for SEAnorm, a time normalized superposed epoch analysis package in python.

    Both data sets are stored as either a HDF5 or a compressed csv file (csv.bz2) which contain a Pandas DataFrame of either the Solar Wind Omni and SAMPEX data sets. The data sets where written with pandas.DataFrame.to_hdf() and pandas.DataFrame.to_csv() using a compression level of 9. The DataFrames can be read using pandas.DataFrame.read_hdf( ) or pandas.DataFrame.read_csv( ) depending on the file format.

    The Solar Wind Omni data sets contains solar wind velocity (V) and dynamic pressure (P), the southward interplanetary magnetic field in Geocentric Solar Ecliptic System (GSE) coordinates (B_Z_GSE), the auroral electrojet index (AE), and the Sym-H index all at 1 minute cadence.

    The SAMPEX data set contains electron flux from the Proton/Electron Telescope (PET) at two energy channels 1.5-6.0 MeV (ELO) and 2.5-14 MeV (EHI) at an approximate 6 second cadence.

  18. Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...

    • zenodo.org
    zip
    Updated Oct 20, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari (2022). LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive snapshots of our lives in the wild [Dataset]. http://doi.org/10.5281/zenodo.6832242
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 20, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    LifeSnaps Dataset Documentation

    Ubiquitous self-tracking technologies have penetrated various aspects of our lives, from physical and mental health monitoring to fitness and entertainment. Yet, limited data exist on the association between in the wild large-scale physical activity patterns, sleep, stress, and overall health, and behavioral patterns and psychological measurements due to challenges in collecting and releasing such datasets, such as waning user engagement, privacy considerations, and diversity in data modalities. In this paper, we present the LifeSnaps dataset, a multi-modal, longitudinal, and geographically-distributed dataset, containing a plethora of anthropological data, collected unobtrusively for the total course of more than 4 months by n=71 participants, under the European H2020 RAIS project. LifeSnaps contains more than 35 different data types from second to daily granularity, totaling more than 71M rows of data. The participants contributed their data through numerous validated surveys, real-time ecological momentary assessments, and a Fitbit Sense smartwatch, and consented to make these data available openly to empower future research. We envision that releasing this large-scale dataset of multi-modal real-world data, will open novel research opportunities and potential applications in the fields of medical digital innovations, data privacy and valorization, mental and physical well-being, psychology and behavioral sciences, machine learning, and human-computer interaction.

    The following instructions will get you started with the LifeSnaps dataset and are complementary to the original publication.

    Data Import: Reading CSV

    For ease of use, we provide CSV files containing Fitbit, SEMA, and survey data at daily and/or hourly granularity. You can read the files via any programming language. For example, in Python, you can read the files into a Pandas DataFrame with the pandas.read_csv() command.

    Data Import: Setting up a MongoDB (Recommended)

    To take full advantage of the LifeSnaps dataset, we recommend that you use the raw, complete data via importing the LifeSnaps MongoDB database.

    To do so, open the terminal/command prompt and run the following command for each collection in the DB. Ensure you have MongoDB Database Tools installed from here.

    For the Fitbit data, run the following:

    mongorestore --host localhost:27017 -d rais_anonymized -c fitbit 

    For the SEMA data, run the following:

    mongorestore --host localhost:27017 -d rais_anonymized -c sema 

    For surveys data, run the following:

    mongorestore --host localhost:27017 -d rais_anonymized -c surveys 

    If you have access control enabled, then you will need to add the --username and --password parameters to the above commands.

    Data Availability

    The MongoDB database contains three collections, fitbit, sema, and surveys, containing the Fitbit, SEMA3, and survey data, respectively. Similarly, the CSV files contain related information to these collections. Each document in any collection follows the format shown below:

    {
      _id: 
  19. m

    Philaenus spumarius and other meadow spittlebugs in Trentino. Italy

    • data.mendeley.com
    Updated Sep 20, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sabina Avosani (2021). Philaenus spumarius and other meadow spittlebugs in Trentino. Italy [Dataset]. http://doi.org/10.17632/7rv4czkykr.1
    Explore at:
    Dataset updated
    Sep 20, 2021
    Authors
    Sabina Avosani
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Italy, Trentino-South Tyrol
    Description

    This dataset is associated with the article "Occupancy and detection of agricultural threats: the case of Philaenus spumarius, European vector of Xylella fastidiosa" by the same authors published in JOURNAL 2021 . The data about Philaenus spumarius and other co-occurring species were collected in Trentino, Italy, during the spring and summer 2018 in olive orchards and vineyards. Here are provided the raw data, some preprocessed data and the R codes that we used for the analysis presented in the publication. Please refer to the above mentioned article for more details.

    List of files:

    samplings.xlsx original dataset of field sampling (Sheet: survey), site coordinates and info (sheet: info site) and metadata (sheet: legenda) counts_per_site.csv occupancy abundance dataframe for p. spumarius philaenus_occupancy_data.csv occupancy presence dataframe for p. spumarius sites.cov.csv site covariates for occupancy model observation.cov.csv observation covariates for occupancy mode Rcode.zip commented code and data in R format to run occupancy models for P. Spumarius

  20. H

    Excel_file_inluding_measured_values

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Jan 24, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elham Akbari (2026). Excel_file_inluding_measured_values [Dataset]. http://doi.org/10.7910/DVN/1BNFJ5
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 24, 2026
    Dataset provided by
    Harvard Dataverse
    Authors
    Elham Akbari
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The excel files are the measured values (mostly particle properties from microscopic images) in .csv format. The files are readable by pandas dataframe and exported by pandas dataframe. Files with Intensity values are the intensity values from maximum z-stack projection of fluorescent micrographs taken from the particles inside the DLD device

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Aashirvad pandey (2024). Merge number of excel file,convert into csv file [Dataset]. https://www.kaggle.com/datasets/aashirvadpandey/merge-number-of-excel-fileconvert-into-csv-file/data
Organization logo

Merge number of excel file,convert into csv file

merging the file and converting the file

Explore at:
zip(6731 bytes)Available download formats
Dataset updated
Mar 30, 2024
Authors
Aashirvad pandey
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Project Description:

Title: Pandas Data Manipulation and File Conversion

Overview: This project aims to demonstrate the basic functionalities of Pandas, a powerful data manipulation library in Python. In this project, we will create a DataFrame, perform some data manipulation operations using Pandas, and then convert the DataFrame into both Excel and CSV formats.

Key Objectives:

  1. DataFrame Creation: Utilize Pandas to create a DataFrame with sample data.
  2. Data Manipulation: Perform basic data manipulation tasks such as adding columns, filtering data, and performing calculations.
  3. File Conversion: Convert the DataFrame into Excel (.xlsx) and CSV (.csv) file formats.

Tools and Libraries Used:

  • Python
  • Pandas

Project Implementation:

  1. DataFrame Creation:

    • Import the Pandas library.
    • Create a DataFrame using either a dictionary, a list of dictionaries, or by reading data from an external source like a CSV file.
    • Populate the DataFrame with sample data representing various data types (e.g., integer, float, string, datetime).
  2. Data Manipulation:

    • Add new columns to the DataFrame representing derived data or computations based on existing columns.
    • Filter the DataFrame to include only specific rows based on certain conditions.
    • Perform basic calculations or transformations on the data, such as aggregation functions or arithmetic operations.
  3. File Conversion:

    • Utilize Pandas to convert the DataFrame into an Excel (.xlsx) file using the to_excel() function.
    • Convert the DataFrame into a CSV (.csv) file using the to_csv() function.
    • Save the generated files to the local file system for further analysis or sharing.

Expected Outcome:

Upon completion of this project, you will have gained a fundamental understanding of how to work with Pandas DataFrames, perform basic data manipulation tasks, and convert DataFrames into different file formats. This knowledge will be valuable for data analysis, preprocessing, and data export tasks in various data science and analytics projects.

Conclusion:

The Pandas library offers powerful tools for data manipulation and file conversion in Python. By completing this project, you will have acquired essential skills that are widely applicable in the field of data science and analytics. You can further extend this project by exploring more advanced Pandas functionalities or integrating it into larger data processing pipelines.in this data we add number of data and make that data a data frame.and save in single excel file as different sheet name and then convert that excel file in csv file .

Search
Clear search
Close search
Google apps
Main menu